Harun Reşit Zafer created PDFBOX-3887:
-----------------------------------------
Summary: Getting a "DataFormatException: invalid distance too far
back" exception for the attached file
Key: PDFBOX-3887
URL: https://issues.apache.org/jira/browse/PDFBOX-3887
Project: PDFBox
Issue Type: Bug
Components: Text extraction
Affects Versions: 2.0.7
Environment: Windows 10 64-bit, Ubuntu 14.04 64-bit.
java version "1.8.0_141"
Java(TM) SE Runtime Environment (build 1.8.0_141-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.141-b15, mixed mode)
Reporter: Harun Reşit Zafer
Attachments: non-contract_00025.pdf
PdfBox throws the following exception:
{code:java}
Caused by: java.io.IOException: java.util.zip.DataFormatException: invalid
distance too far back
at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:82)
at org.apache.pdfbox.cos.COSInputStream.create(COSInputStream.java:69)
at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:162)
at
org.apache.pdfbox.pdfparser.PDFObjectStreamParser.<init>(PDFObjectStreamParser.java:55)
at
org.apache.pdfbox.pdfparser.COSParser.parseObjectStream(COSParser.java:847)
at
org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:753)
at
org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:678)
at
org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:638)
at
org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:236)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:271)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:984)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:940)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:888)
at
com.diligen.parser.pdf.PdfBoxHelper.getDocumentWithLineSegments(PdfBoxHelper.java:131)
... 7 more
Caused by: java.util.zip.DataFormatException: invalid distance too far back
at java.util.zip.Inflater.inflateBytes(Native Method)
at java.util.zip.Inflater.inflate(Inflater.java:259)
at java.util.zip.Inflater.inflate(Inflater.java:280)
at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:107)
at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:73)
... 20 more
{code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]