[ https://issues.apache.org/jira/browse/TIKA-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172825#comment-13172825 ]
Nick Burch commented on TIKA-819: --------------------------------- You have to explicitly ask for embedded files to be parsed, by supplying a Parser in the ParseContext object If you don't want recursion, don't supply the parser! > Make Option to Exclude Embedded Files' Text for Text Content > ------------------------------------------------------------ > > Key: TIKA-819 > URL: https://issues.apache.org/jira/browse/TIKA-819 > Project: Tika > Issue Type: New Feature > Components: general > Affects Versions: 1.0 > Environment: Windows-7 + JDK 1.6 u26 > Reporter: Albert L. > Fix For: 1.1 > > > It would be nice to be able to disable text content from embedded files. > For example, if I have a DOCX with an embedded PPTX, then I would like the > option to disable text from the PPTX from showing up when asking for the text > content from DOCX. In other words, it would be nice to have the option to > get text content *only* from the DOCX instead of the DOCX+PPTX. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira