[ https://issues.apache.org/jira/browse/TIKA-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tyler Palsulich updated TIKA-1232: ---------------------------------- Attachment: testComment.pdf I just tried to remove the testMetadataEquality workaround for [PDFBOX-1922|https://issues.apache.org/jira/browse/PDFBOX-1922] in PDFParserTest#testSequentialParser, but testComment.pdf (attached and in test-docs) is returning different PDF versions for the default and nonSequential parsers. I made a comment on the PDFBOX issue. For now, I think we should reopen this issue. I'm not sure what exactly is causing the problem (header/trailer thing, or something else?). > Add PDF version to PDFParser output > ----------------------------------- > > Key: TIKA-1232 > URL: https://issues.apache.org/jira/browse/TIKA-1232 > Project: Tika > Issue Type: Improvement > Components: parser > Affects Versions: 1.5 > Environment: JDK6 > Reporter: William Palmer > Assignee: Tim Allison > Priority: Minor > Attachments: Sample 10.x.pdf, Sample 11.x PDFA-1b.pdf, Sample > 4.x.pdf, Sample 5.x.pdf, Sample 6.x.pdf, Sample 7.x.pdf, Sample 8.x.pdf, > Sample 9.x.pdf, TIKA-1232v1.patch, TIKA-1232v2.patch, pdfversion.patch, > testComment.pdf > > > I'd like to identify the PDF version of files, this is not currently reported > by the PDFParser although the information is available via PDFBox. I have > attached a patch that adds the format version to the Metadata object. > However, I am not familiar enough with the Tika source to know if an > alternative metadata key should be used, or this new one added. > Comments welcome. -- This message was sent by Atlassian JIRA (v6.2#6252)