[ https://issues.apache.org/jira/browse/PDFBOX-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13604439#comment-13604439 ]
Andreas Lehmkühler commented on PDFBOX-1541: -------------------------------------------- The problem is the very last stream in the pdf with the object number 29. The stream doesn't have the given length which leads to an error whatever parser is used. All (?) PDFbox version <= 1.6.0 are working as they read until "endstream" instead of using the length of the stream. So should we implement the old readuntilendstream method in case of a broken stream? > expected='endstream' actual='' failure to parse > ----------------------------------------------- > > Key: PDFBOX-1541 > URL: https://issues.apache.org/jira/browse/PDFBOX-1541 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 1.7.1 > Environment: Ubuntu 12.04, JDK 1.7 > Reporter: Jinder Aujla > Attachments: exporeal09_flyer_email3.pdf > > > Following exception thrown when parsing attached PDF > Caused by: java.io.IOException: expected='endstream' actual='' > org.apache.pdfbox.io.PushBackInputStream@2a789924 > at > org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:597) > at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:575) > at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira