[
https://issues.apache.org/jira/browse/PDFBOX-4521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822623#comment-16822623
]
Oliver Mannion commented on PDFBOX-4521:
----------------------------------------
Thank you [~tilman] for a super fast fix! š
I've verified org.apache.pdfbox:pdfbox:2.0.16-20190420.123304-37 from
[https://repository.apache.org/content/groups/snapshots/] can parse
[^Editathon_cheat_sheet_(EN)_MetaDefender.pdf] We'll have to upgrade to that
version from pdfbox 2.0.6 (which we are still using because ofĀ
org.apache.tika:tika-parsers:1.16) or backport the fix if we can'tĀ upgrade.
I'll raise this with our vendor. I only had a quick look at the PDF spec, does
it consider a missing Info value from file trailer to be an incorrect PDF? I'm
still able to open it in a PDF viewer.
> Missing Info value from file trailer: org.apache.pdfbox.cos.COSName cannot be
> cast to org.apache.pdfbox.cos.COSDictionary
> -------------------------------------------------------------------------------------------------------------------------
>
> Key: PDFBOX-4521
> URL: https://issues.apache.org/jira/browse/PDFBOX-4521
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 2.0.15
> Reporter: Oliver Mannion
> Assignee: Tilman Hausherr
> Priority: Major
> Fix For: 2.0.16, 3.0.0 PDFBox
>
> Attachments: Editathon_cheat_sheet_(EN).pdf,
> Editathon_cheat_sheet_(EN)_MetaDefender.pdf
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> The following exception
> {code:java}
> Cause: java.lang.ClassCastException: org.apache.pdfbox.cos.COSName cannot be
> cast to org.apache.pdfbox.cos.COSDictionary at
> org.apache.pdfbox.pdmodel.PDDocument.getDocumentInformation(PDDocument.java:740)
> at org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:242)
> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:154) at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135){code}
> is generated by PDF documents that have no value in the file trailer for the
> {{Info}} key, eg:
> {code:java}
> << /Size 50/Root 8 0 R/Info /ID >>
> {code}
> According to the [PDF
> spec|http://wwwimages.adobe.com/www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/pdf_reference_1-7.pdf]
> Ā the {{Info}} key is optional. PDFBox correctly handles the case when there
> is no {{Info}} key and no value is present, but in this case, the key is
> present but without a value.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]