Harun Reşit Zafer created PDFBOX-3887:
-----------------------------------------

             Summary: Getting a "DataFormatException: invalid distance too far 
back" exception for the attached file
                 Key: PDFBOX-3887
                 URL: https://issues.apache.org/jira/browse/PDFBOX-3887
             Project: PDFBox
          Issue Type: Bug
          Components: Text extraction
    Affects Versions: 2.0.7
         Environment: Windows 10 64-bit, Ubuntu 14.04 64-bit. 

java version "1.8.0_141" 
Java(TM) SE Runtime Environment (build 1.8.0_141-b15) 
Java HotSpot(TM) 64-Bit Server VM (build 25.141-b15, mixed mode)
            Reporter: Harun Reşit Zafer
         Attachments: non-contract_00025.pdf

PdfBox throws the following exception:

{code:java}
Caused by: java.io.IOException: java.util.zip.DataFormatException: invalid 
distance too far back
        at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:82)
        at org.apache.pdfbox.cos.COSInputStream.create(COSInputStream.java:69)
        at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:162)
        at 
org.apache.pdfbox.pdfparser.PDFObjectStreamParser.<init>(PDFObjectStreamParser.java:55)
        at 
org.apache.pdfbox.pdfparser.COSParser.parseObjectStream(COSParser.java:847)
        at 
org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:753)
        at 
org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:678)
        at 
org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:638)
        at 
org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:236)
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:271)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:984)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:940)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:888)
        at 
com.diligen.parser.pdf.PdfBoxHelper.getDocumentWithLineSegments(PdfBoxHelper.java:131)
        ... 7 more
Caused by: java.util.zip.DataFormatException: invalid distance too far back
        at java.util.zip.Inflater.inflateBytes(Native Method)
        at java.util.zip.Inflater.inflate(Inflater.java:259)
        at java.util.zip.Inflater.inflate(Inflater.java:280)
        at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:107)
        at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:73)
        ... 20 more
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to