PDF loader causes infinite loop on non-PDF inputs
-------------------------------------------------
Key: PDFBOX-503
URL: https://issues.apache.org/jira/browse/PDFBOX-503
Project: PDFBox
Issue Type: Bug
Components: Parsing
Affects Versions: 0.8.0-incubator
Reporter: Dave Engberg
The current SVN head for the pdfbox incubator will experience an infinite loop
in PDFParser.parseHeader() if you feed any non-PDF document to the parser. The
problem is that it tries to find the PDF header within the document by skipping
over any non-matching lines which don't start with a numeric digit. It relies
on a readLine() function from BaseParser.java which will return an empty string
when the stream is at the end-of-file. The parseHeader() call will loop on
these empty lines.
I've patched this in our system by throwing an IOException from
BaseParser.readLine() if the stream is already at the end-of-file at the
beginning of that call.
Index: src/main/java/org/apache/pdfbox/pdfparser/BaseParser.java
===================================================================
--- src/main/java/org/apache/pdfbox/pdfparser/BaseParser.java (revision
802578)
+++ src/main/java/org/apache/pdfbox/pdfparser/BaseParser.java (working copy)
@@ -1088,6 +1088,11 @@
{
StringBuffer buffer = new StringBuffer( 11 );
+ if (pdfSource.isEOF())
+ {
+ throw new IOException( "Error: End-of-File, expected line");
+ }
+
int c;
while ((c = pdfSource.read()) != -1)
{
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.