[
https://issues.apache.org/jira/browse/PDFBOX-5025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17908676#comment-17908676
]
ASF subversion and git services commented on PDFBOX-5025:
---------------------------------------------------------
Commit 1922732 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1922732 ]
PDFBOX-5025: Unread trailing e in numbers, by Cody Wayne Holmes; closes #91
> BaseParser fails when a number is followed by a string starting with 'e'
> ------------------------------------------------------------------------
>
> Key: PDFBOX-5025
> URL: https://issues.apache.org/jira/browse/PDFBOX-5025
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 2.0.21, 2.0.32, 3.0.3 PDFBox
> Reporter: Cody Wayne Holmes
> Assignee: Tilman Hausherr
> Priority: Major
> Attachments: issue2931.pdf, issue3323.pdf
>
>
> I have found an issue in the latest version of PDFBox where parsing fails in
> the BaseParser when `parseDirObject` parses a number and the following string
> starts with an 'e'.
>
> This is due to the attempt to include numbers stored in scientific notation
> and the number being followed by the endobject keyword. These are invalid
> pdfs that don't contain a new line after the number before the 'endobject'
> keyword, but the failure can be prevented.
>
> I have found one way that seems to resolve this problem is by checking if
> the last character in the read number string is an e or E. If it is then
> removing it from the read string and unreading it from the source allows
> parsing to complete as expected.
>
> {code:java}
> private COSNumber parseCOSNumber() throws IOException
> {
> ...
> // Remove last character if it is not a number
> char lastc = buf.charAt(buf.length() - 1);
> if (lastc == 'e' || lastc == 'E')
> {
> buf.deleteCharAt(buf.length() - 1);
> seqSource.unread(lastc);
> }
> return COSNumber.get(buf.toString());
> }
> {code}
>
> An example of this error can be seen in PDF.js issue3323.
> [https://github.com/mozilla/pdf.js/blob/4ba28de2608866dcb10d627d77dc19ff3d017c17/test/pdfs/issue3323.pdf]
> Some more pdfs were attached as well with the issue.
>
>
> [https://github.com/mozilla/pdf.js/commit/26f5b1b2d37c7b74a073dee75d66fcc04fae10e8]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]