[
https://issues.apache.org/jira/browse/TIKA-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13892149#comment-13892149
]
Johan van der Knijff commented on TIKA-1232:
--------------------------------------------
One thing to watch out for is that PDF has two places where you can define the
version: the file header and, from PDF 1.4 onward, the catalog dictionary in
the trailer. Both can be different (in which case the latter has precedence)
See p. 39 of ISO 32000:
http://www.adobe.com/devnet/acrobat/pdfs/PDF32000_2008.pdf
On top of that PDF 1.7 also adds Extension Levels (p.108), maybe those should
be included as well?
> Add PDF version to PDFParser output
> -----------------------------------
>
> Key: TIKA-1232
> URL: https://issues.apache.org/jira/browse/TIKA-1232
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Affects Versions: 1.5
> Environment: JDK6
> Reporter: William Palmer
> Assignee: Tim Allison
> Priority: Minor
> Attachments: pdfversion.patch
>
>
> I'd like to identify the PDF version of files, this is not currently reported
> by the PDFParser although the information is available via PDFBox. I have
> attached a patch that adds the format version to the Metadata object.
> However, I am not familiar enough with the Tika source to know if an
> alternative metadata key should be used, or this new one added.
> Comments welcome.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)