[
https://issues.apache.org/jira/browse/TIKA-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474250#comment-13474250
]
Michael McCandless commented on TIKA-1005:
------------------------------------------
Could you attach an example showing the problem? Thanks.
> In Microsoft Office Word 2010 documents, text inside a textbox is not
> extracted/parsed out.
> -------------------------------------------------------------------------------------------
>
> Key: TIKA-1005
> URL: https://issues.apache.org/jira/browse/TIKA-1005
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.2
> Environment: Windows 7, Windows Server 2008, Windows Server 2008 R2
> (32bit and 64bit each)
> Reporter: David A. Patterson
>
> Text inside a textbox, which itself can be in the body, the header or the
> footer, is not extracted using any type of parser (including
> AutoDetectParser) in combination with any type of ContentHandler. This is
> NOT a duplicate of TIKA-904. This specifically concerns the .docx file
> format.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira