[ https://issues.apache.org/jira/browse/TIKA-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174121#comment-13174121 ]
Albert L. commented on TIKA-819: -------------------------------- I think that by default retrieving the text content should be a recursive and deep. An optional command-line argument would set Tika to a "cursory" text content retrieval. Hence, I suggest the following. -c or --cursory Output cursory content (does not recursively retrieve content from embedded/attached files) Thanks! > Make Option to Exclude Embedded Files' Text for Text Content > ------------------------------------------------------------ > > Key: TIKA-819 > URL: https://issues.apache.org/jira/browse/TIKA-819 > Project: Tika > Issue Type: New Feature > Components: general > Affects Versions: 1.0 > Environment: Windows-7 + JDK 1.6 u26 > Reporter: Albert L. > Fix For: 1.1 > > > It would be nice to be able to disable text content from embedded files. > For example, if I have a DOCX with an embedded PPTX, then I would like the > option to disable text from the PPTX from showing up when asking for the text > content from DOCX. In other words, it would be nice to have the option to > get text content *only* from the DOCX instead of the DOCX+PPTX. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira