[ https://issues.apache.org/jira/browse/PDFBOX-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094640#comment-15094640 ]
Yauheni Salopiy commented on PDFBOX-3189: ----------------------------------------- Hi [~tilman], Thank You for Your investigation. Yes, I see that on character offset 2515690 there are a lot of *NUL* bytes. Can we consider that the document is not valid? Anyway it will be nice if PDFBox will be "lenient" to such cases. I can open this document without any issue in all PDF Readers I have (Adobe, Win8 embedded etc.) Best Reagrds, Yauheni Salopiy > java.io.IOException is thrown from both NonSequentialPDFParser and PDFParser > ---------------------------------------------------------------------------- > > Key: PDFBOX-3189 > URL: https://issues.apache.org/jira/browse/PDFBOX-3189 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 1.8.10 > Reporter: Yauheni Salopiy > Attachments: PDFBOX-3189_StackTrace.txt, obannual35_2015.pdf > > > On parsing of complex PDF document both NonSequentialPDFParser and PDFParser > throw java.io.IOException (different causes). > *NonSequentialPDFParser:* > Caused by: java.io.*IOException* > at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:109) > at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:379) > at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:291) > at > org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:225) > at > org.apache.pdfbox.pdfparser.PDFObjectStreamParser.<init>(PDFObjectStreamParser.java:64) > at > org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseObjectDynamically(NonSequentialPDFParser.java:1493) > *PDFParser:* > Caused by: java.io.*IOException*: Error: Expected a long type at offset 465, > instead got > '163111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111' > at org.apache.pdfbox.pdfparser.BaseParser.readLong(BaseParser.java:1695) > at > org.apache.pdfbox.pdfparser.BaseParser.readObjectNumber(BaseParser.java:1623) > at > org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:99) > at > org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:683) > at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:255) > Please, see in attachments full StackTrace for both cases and failing > document for reference. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org