[
https://issues.apache.org/jira/browse/PDFBOX-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745437#action_12745437
]
Chris Bowditch commented on PDFBOX-504:
---------------------------------------
Thanks Jeremias. Your explanation makes sense. I knew there must have been a
better way to fix this. I tried just about every encoding other than US-ASCII :)
> Can't Parse any PDF using IBM JDK
> ---------------------------------
>
> Key: PDFBOX-504
> URL: https://issues.apache.org/jira/browse/PDFBOX-504
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 0.8.0-incubator
> Environment: RedHat Linux IBM JDK
> Reporter: Chris Bowditch
> Priority: Critical
> Attachments: ibm-parse-bug.patch, IBMJDKParseFix.diff, readable.pdf
>
>
> All PDF (that I have tried) fail to parse using IBM JDK 1.5 on RedHat Linux.
> The error you receive is:
> Exception in thread "main" java.io.IOException: Error: Expected an integer
> type, actual='ãÃÃ'
> at
> org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1220)
> at
> org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:493)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:172)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:736)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:704)
> at org.apache.pdfbox.PDFReader.parseDocument(PDFReader.java:323)
> at org.apache.pdfbox.PDFReader.openPDFFile(PDFReader.java:286)
> at org.apache.pdfbox.PDFReader.main(PDFReader.java:271)
> Although after debugging the actual error is hidden:
> java.io.IOException: Error: Expected an integer type, actual='ãÏÓ'
> at
> org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1220)
> at
> org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:483)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:172)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:736)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:704)
> at org.apache.pdfbox.PDFReader.parseDocument(PDFReader.java:323)
> at org.apache.pdfbox.PDFReader.openPDFFile(PDFReader.java:286)
> at org.apache.pdfbox.PDFReader.main(PDFReader.java:271)
> The characters shown in the hidden message occur at the start of most PDF
> Files that I have checked:
> %PDF-1.4
> %âãÏÓ
> 6 0 obj
> <</Filter /FlateDecode
> /Length 489
> >>
> stream
> Tracing the code I can see the problem is down to the skipToNextObject()
> method in PDFParser class. This method is new since v0.7.4.
> The code converts the array of 16 bytes to a String. The characters âãÏÓ are
> read as negative numbers in both Sun and IBM JDKs but whilst on Sun the
> String created from the byte array contains the characters on IBM JDK these
> characters are missing from the String. So when you read back 16 characters
> the stream offset is incorrect.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.