[ https://issues.apache.org/jira/browse/TIKA-526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162703#comment-13162703 ]
Fabian Lange commented on TIKA-526: ----------------------------------- I did a fix for this by improving org.apache.poi.xwpf.usermodel.XWPFParagraph currently it ignores all smart tags instead of processing them. I will proposing them a fix using your example document as test. If it gets accepted, I have a test case for Tika already handy, so we can add it here as well. > OOXMLParser fails to extract text from within smart tags > -------------------------------------------------------- > > Key: TIKA-526 > URL: https://issues.apache.org/jira/browse/TIKA-526 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 0.7 > Reporter: Geoff Jarrad > Attachments: smarttag-snippet.docx > > > Documents in the .docx format may contain smart-tags (of element type > w:smartTag). Such a smart-tag will surround the tagged text (found in element > w:r). > The OOXMLParser does not extract the text contained within smart-tags. > [Example document to follow] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira