[
https://issues.apache.org/jira/browse/PDFBOX-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr resolved PDFBOX-3887.
-------------------------------------
Resolution: Fixed
Assignee: Tilman Hausherr
Fix Version/s: 3.0.0
2.0.8
> Getting a "DataFormatException: invalid distance too far back" exception for
> the attached file
> ----------------------------------------------------------------------------------------------
>
> Key: PDFBOX-3887
> URL: https://issues.apache.org/jira/browse/PDFBOX-3887
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 2.0.7
> Environment: Windows 10 64-bit, Ubuntu 14.04 64-bit.
> java version "1.8.0_141"
> Java(TM) SE Runtime Environment (build 1.8.0_141-b15)
> Java HotSpot(TM) 64-Bit Server VM (build 25.141-b15, mixed mode)
> Reporter: Harun Reşit Zafer
> Assignee: Tilman Hausherr
> Labels: extraction, parsing
> Fix For: 2.0.8, 3.0.0
>
> Attachments: non-contract_00025.pdf
>
>
> PdfBox throws the following exception:
> {code:java}
> Caused by: java.io.IOException: java.util.zip.DataFormatException: invalid
> distance too far back
> at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:82)
> at org.apache.pdfbox.cos.COSInputStream.create(COSInputStream.java:69)
> at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:162)
> at
> org.apache.pdfbox.pdfparser.PDFObjectStreamParser.<init>(PDFObjectStreamParser.java:55)
> at
> org.apache.pdfbox.pdfparser.COSParser.parseObjectStream(COSParser.java:847)
> at
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:753)
> at
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:678)
> at
> org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:638)
> at
> org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:236)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:271)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:984)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:940)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:888)
> at
> com.diligen.parser.pdf.PdfBoxHelper.getDocumentWithLineSegments(PdfBoxHelper.java:131)
> ... 7 more
> Caused by: java.util.zip.DataFormatException: invalid distance too far back
> at java.util.zip.Inflater.inflateBytes(Native Method)
> at java.util.zip.Inflater.inflate(Inflater.java:259)
> at java.util.zip.Inflater.inflate(Inflater.java:280)
> at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:107)
> at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:73)
> ... 20 more
> {code}
> If there is no quick solution for this bug, is there a workaround? Can I
> somehow catch the exception and take some action?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]