[
https://issues.apache.org/jira/browse/TIKA-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010531#comment-14010531
]
Hudson commented on TIKA-1294:
------------------------------
SUCCESS: Integrated in tika-trunk-jdk1.7 #5 (See
[https://builds.apache.org/job/tika-trunk-jdk1.7/5/])
TIKA-1294 add ability to turn off image extraction from PDFs (tallison:
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1597856)
* /tika/trunk/CHANGES.txt
*
/tika/trunk/tika-core/src/main/java/org/apache/tika/metadata/TikaCoreProperties.java
*
/tika/trunk/tika-core/src/main/java/org/apache/tika/metadata/TikaMetadataKeys.java
*
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDF2XHTML.java
*
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParser.java
*
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParserConfig.java
*
/tika/trunk/tika-parsers/src/main/resources/org/apache/tika/parser/pdf/PDFParser.properties
* /tika/trunk/tika-parsers/src/test/java/org/apache/tika/TikaTest.java
*
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java
> Add ability to turn off extraction of PDXObjectImages (TIKA-1268) from PDFs
> ---------------------------------------------------------------------------
>
> Key: TIKA-1294
> URL: https://issues.apache.org/jira/browse/TIKA-1294
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
> Assignee: Tim Allison
> Priority: Trivial
> Fix For: 1.6
>
> Attachments: TIKA-1294.patch, TIKA-1294v1.patch
>
>
> TIKA-1268 added the capability to extract embedded images as regular embedded
> resources...a great feature!
> However, for some use cases, it might not be desirable to extract those types
> of embedded resources. I see two ways of allowing the client to choose
> whether or not to extract those images:
> 1) set a value in the metadata for the extracted images that identifies them
> as embedded PDXObjectImages vs regular image attachments. The client can
> then choose not to process embedded resources with a given metadata value.
> 2) allow the client to set a parameter in the PDFConfig object.
> My initial proposal is to go with option 2, and I'll attach a patch shortly.
--
This message was sent by Atlassian JIRA
(v6.2#6252)