https://bz.apache.org/bugzilla/show_bug.cgi?id=64418
j-lawyer.org <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW --- Comment #6 from j-lawyer.org <[email protected]> --- Well, I would love to get rid of the expensive XML handling - however, I do not see how I could avoid it given POIs API. Is there an alternative approach for "getting all text content of text fields / text boxes"? Even Apache Tika seems to use the exact same approach in their XWPFWordExtractorDecorator.java: 331 // Also extract any paragraphs embedded in text boxes 332 //Note "w:txbxContent//"...must look for all descendant paragraphs 333 //not just the immediate children of txbxContent -- TIKA-2807 334 if (config.getIncludeShapeBasedContent()) { 335 for (XmlObject embeddedParagraph : paragraph.getCTP().selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' declare namespace wps='http://schemas.microsoft.com/office/word/2010/wordprocessingShape' .//*/wps:txbx/w:txbxContent//w:p")) { 336 extractParagraph(new XWPFParagraph(CTP.Factory.parse(embeddedParagraph.xmlText()), paragraph.getBody()), listManager, xhtml); 337 } 338 } Am I missing something? Thanks, Jens -- You are receiving this mail because: You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
