[ 
https://issues.apache.org/jira/browse/TIKA-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174121#comment-13174121
 ] 

Albert L. commented on TIKA-819:
--------------------------------

I think that by default retrieving the text content should be a recursive and 
deep.  An optional command-line argument would set Tika to a "cursory" text 
content retrieval.  Hence, I suggest the following.

-c  or --cursory        Output cursory content (does not recursively retrieve 
content from embedded/attached files)


Thanks!
                
> Make Option to Exclude Embedded Files' Text for Text Content
> ------------------------------------------------------------
>
>                 Key: TIKA-819
>                 URL: https://issues.apache.org/jira/browse/TIKA-819
>             Project: Tika
>          Issue Type: New Feature
>          Components: general
>    Affects Versions: 1.0
>         Environment: Windows-7 + JDK 1.6 u26
>            Reporter: Albert L.
>             Fix For: 1.1
>
>
> It would be nice to be able to disable text content from embedded files.
> For example, if I have a DOCX with an embedded PPTX, then I would like the 
> option to disable text from the PPTX from showing up when asking for the text 
> content from DOCX.  In other words, it would be nice to have the option to 
> get text content *only* from the DOCX instead of the DOCX+PPTX.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to