[ https://issues.apache.org/jira/browse/TIKA-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison updated TIKA-1294: ------------------------------ Attachment: TIKA-1294.patch All feedback welcome. > Add ability to turn off extraction of PDXObjectImages (TIKA-1268) from PDFs > --------------------------------------------------------------------------- > > Key: TIKA-1294 > URL: https://issues.apache.org/jira/browse/TIKA-1294 > Project: Tika > Issue Type: Improvement > Reporter: Tim Allison > Priority: Trivial > Attachments: TIKA-1294.patch > > > TIKA-1268 added the capability to extract embedded images as regular embedded > resources...a great feature! > However, for some use cases, it might not be desirable to extract those types > of embedded resources. I see two ways of allowing the client to choose > whether or not to extract those images: > 1) set a value in the metadata for the extracted images that identifies them > as embedded PDXObjectImages vs regular image attachments. The client can > then choose not to process embedded resources with a given metadata value. > 2) allow the client to set a parameter in the PDFConfig object. > My initial proposal is to go with option 2, and I'll attach a patch shortly. -- This message was sent by Atlassian JIRA (v6.2#6252)