[
https://issues.apache.org/jira/browse/PDFBOX-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861966#action_12861966
]
Adam Taft commented on PDFBOX-503:
----------------------------------
Could there be a better message in the IOException here? Specifically, this
bug (and patch) addresses the case when a non-PDF document is sent to the
BaseParser. I ran into a problem where I was sending a non-PDF document to the
parser (whoops), and it dutifully reported the above error. I couldn't figure
out exactly what the problem was, because I knew the PDF itself was valid.
Could the IOException message read something like: "Error: Input was EOF or
not a valid PDF type." Hopefully this would help the end programmer to
identify their mistake earlier.
> PDF loader causes infinite loop on non-PDF inputs
> -------------------------------------------------
>
> Key: PDFBOX-503
> URL: https://issues.apache.org/jira/browse/PDFBOX-503
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 0.8.0-incubator
> Reporter: Dave Engberg
> Fix For: 0.8.0-incubator
>
>
> The current SVN head for the pdfbox incubator will experience an infinite
> loop in PDFParser.parseHeader() if you feed any non-PDF document to the
> parser. The problem is that it tries to find the PDF header within the
> document by skipping over any non-matching lines which don't start with a
> numeric digit. It relies on a readLine() function from BaseParser.java which
> will return an empty string when the stream is at the end-of-file. The
> parseHeader() call will loop on these empty lines.
> I've patched this in our system by throwing an IOException from
> BaseParser.readLine() if the stream is already at the end-of-file at the
> beginning of that call.
> Index: src/main/java/org/apache/pdfbox/pdfparser/BaseParser.java
> ===================================================================
> --- src/main/java/org/apache/pdfbox/pdfparser/BaseParser.java (revision
> 802578)
> +++ src/main/java/org/apache/pdfbox/pdfparser/BaseParser.java (working copy)
> @@ -1088,6 +1088,11 @@
> {
> StringBuffer buffer = new StringBuffer( 11 );
>
> + if (pdfSource.isEOF())
> + {
> + throw new IOException( "Error: End-of-File, expected line");
> + }
> +
> int c;
> while ((c = pdfSource.read()) != -1)
> {
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.