[ https://issues.apache.org/jira/browse/TIKA-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474250#comment-13474250 ]
Michael McCandless commented on TIKA-1005: ------------------------------------------ Could you attach an example showing the problem? Thanks. > In Microsoft Office Word 2010 documents, text inside a textbox is not > extracted/parsed out. > ------------------------------------------------------------------------------------------- > > Key: TIKA-1005 > URL: https://issues.apache.org/jira/browse/TIKA-1005 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.2 > Environment: Windows 7, Windows Server 2008, Windows Server 2008 R2 > (32bit and 64bit each) > Reporter: David A. Patterson > > Text inside a textbox, which itself can be in the body, the header or the > footer, is not extracted using any type of parser (including > AutoDetectParser) in combination with any type of ContentHandler. This is > NOT a duplicate of TIKA-904. This specifically concerns the .docx file > format. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira