Tim Allison created TIKA-1294:
---------------------------------

             Summary: Add ability to turn off extraction of PDXObjectImages 
(TIKA-1268) from PDFs
                 Key: TIKA-1294
                 URL: https://issues.apache.org/jira/browse/TIKA-1294
             Project: Tika
          Issue Type: Improvement
            Reporter: Tim Allison
            Priority: Trivial


TIKA-1268 added the capability to extract embedded images as regular embedded 
resources...a great feature!

However, for some use cases, it might not be desirable to extract those types 
of embedded resources.  I see two ways of allowing the client to choose whether 
or not to extract those images:

1) set a value in the metadata for the extracted images that identifies them as 
embedded PDXObjectImages vs regular image attachments.  The client can then 
choose not to process embedded resources with a given metadata value.

2) allow the client to set a parameter in the PDFConfig object.

My initial proposal is to go with option 2, and I'll attach a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to