[ 
https://issues.apache.org/jira/browse/PDFBOX-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13604439#comment-13604439
 ] 

Andreas Lehmkühler commented on PDFBOX-1541:
--------------------------------------------

The problem is the very last stream in the pdf with the object number 29. The 
stream doesn't have the given length which leads to an error whatever parser is 
used. All (?) PDFbox version <= 1.6.0 are working as they read until 
"endstream" instead of using the length of the stream. So should we implement 
the old readuntilendstream method in case of a broken stream?
                
> expected='endstream' actual='' failure to parse
> -----------------------------------------------
>
>                 Key: PDFBOX-1541
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1541
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.7.1
>         Environment: Ubuntu 12.04, JDK 1.7
>            Reporter: Jinder Aujla
>         Attachments: exporeal09_flyer_email3.pdf
>
>
> Following exception thrown when parsing attached PDF
> Caused by: java.io.IOException: expected='endstream' actual='' 
> org.apache.pdfbox.io.PushBackInputStream@2a789924
>       at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:597)
>       at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:575)
>       at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to