[jira] Commented: (PDFBOX-813) ClassCastException: COSInteger cannot be cast to COSDictionary

Adam Nichols (JIRA) Mon, 27 Sep 2010 15:02:03 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915503#action_12915503
 ]


Adam Nichols commented on PDFBOX-813:
-------------------------------------

Well, I can tell you the reason it can't be parsed is because it's not a valid 
PDF.  If you open it and look at the bottom, you'll find that the trailer looks 
like this:
trailer
<<
/Size 41
/Root 2

There's not even a newline nor carriage return after that last "2".  Since this 
does not conform to Adobe's PDF specification, the way this should be handled 
is undefined, so throwing an exception is not unreasonable.

However, what is interesting is that if you replace PDDocument.load(inputpath, 
true); with PDDocument.load(inputpath); or PDDocument.load(inputpath, false); 
the exception is not thrown!  I find this most interesting because force is 
only passed into the parser object it's only used once in that class and it 
seems to be used to prevent an exception from being thrown.

I looked into this a little further and found that if forceParsing is false, 
the exception your PDF throws is an IOException and it's caught and basically 
ignored by code which handles invalid PDFs which have random data after the EOF 
marker.  If you are blindly loading a document (aka forcing the loading), and 
that document is corrupt, you can't expect that there was enough information 
read to properly.

My suggestion would be to load documents without the force option and 
understand that there are some non-conforming PDFs which may not be able to be 
parsed and have your code handle that accordingly.  This message will hit the 
developers mailing list and we will discuss the possibility of deprecating the 
force option on the load() method.  While it may have been accurate when it was 
first introduced, I feel that it's misleading now that we handle so many 
different things which are out-of-spec.

> ClassCastException: COSInteger cannot be cast to COSDictionary
> --------------------------------------------------------------
>
>                 Key: PDFBOX-813
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-813
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.2.1, 1.3.0
>         Environment: Windows XP
> java version "1.6.0_12"
> Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
> Java HotSpot(TM) Client VM (build 11.2-b01, mixed mode, sharing)
>            Reporter: CP
>            Priority: Critical
>         Attachments: CancerSummReport_34914.pdf, PDFBoxBug.java
>
>
> I get the below exceptions when calling 
> pdfDoc.getDocumentCatalog().getAllPages(). The code continues after the first 
> exception because I've called 
> PDDocument.load("C:/CancerSummReport_34914.pdf", true)  setting the load 
> "force" param to true. The second exception causes the code to abort.
> (I will try uploading the PDF that causes this problem)
> 2010-09-02 16:47:47,521 [main] WARN  (PDFParser.java:189) - Parsing Error, 
> Skipping Object
> java.io.IOException: Error: Expected an integer type, actual='bj'
>       at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1310)
>       at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:497)
>       at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:179)
>       at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:878)
>       at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:843)
>       at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:768)
>       at com.xyz.framework.functionalTests.PDFBoxBug.main(PDFBoxBug.java:16)
> 2010-09-02 16:47:47,552 [main] WARN  (BaseParser.java:215) - Invalid 
> dictionary, found:? but expected:''
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.pdfbox.cos.COSInteger cannot be cast to 
> org.apache.pdfbox.cos.COSDictionary
>       at 
> org.apache.pdfbox.pdmodel.PDDocument.getDocumentCatalog(PDDocument.java:414)
>       at com.xyz.framework.functionalTests.PDFBoxBug.main(PDFBoxBug.java:18)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PDFBOX-813) ClassCastException: COSInteger cannot be cast to COSDictionary

Reply via email to