[
https://issues.apache.org/jira/browse/PDFBOX-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507452#comment-16507452
]
Tilman Hausherr commented on PDFBOX-4243:
-----------------------------------------
I tried the validator at
[https://www.pdf-online.com/osa/validate.aspx]
Validating file "IN_THE_UNITED_STATES_DISTRICT_COURT_(78).pdf" for conformance
level pdf1.3
Error in Flate stream: data error.
The document does not conform to the requested standard.
The document doesn't conform to the PDF reference (missing required entries,
wrong value types, etc.).
The document does not conform to the PDF 1.3 standard.
Done.
> DataFormatException: "invalid stored block lengths" in FlateFilter
> ------------------------------------------------------------------
>
> Key: PDFBOX-4243
> URL: https://issues.apache.org/jira/browse/PDFBOX-4243
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 2.0.8, 2.0.9
> Environment: Java 8 update 172
> Tika parsers 1.17 (and 1.18)
> Windows 7 (and server 2012)
> Reporter: Isabelle Giguere
> Priority: Major
> Attachments: IN_THE_UNITED_STATES_DISTRICT_COURT_(78).pdf
>
>
> The attached PDF document causes this exception. Similar to PDFBOX-3546, but
> probably not the same root cause.
> Observed using Tika 1.17 + PDF Box 2.0.8, and with Tika 1.18 + PDF Box 2.0.9
> {noformat}
> org.apache.tika.exception.TikaException: Unable to extract PDF content
> at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:139)
> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:171)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
> at test.conversion.tika.Conversion.parse(Conversion.java:56)
> at test.conversion.tika.Conversion.main(Conversion.java:40)
> Caused by: java.io.IOException: java.util.zip.DataFormatException: invalid
> stored block lengths
> at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:83)
> at org.apache.pdfbox.filter.Filter.decode(Filter.java:87)
> at org.apache.pdfbox.cos.COSInputStream.create(COSInputStream.java:77)
> at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:175)
> at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:163)
> at org.apache.pdfbox.pdmodel.PDPage.getContents(PDPage.java:157)
> at
> org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:91)
> at
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:493)
> at
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:477)
> at
> org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150)
> at
> org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:139)
> at
> org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:391)
> at org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:147)
> at
> org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319)
> at
> org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)
> at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:117)
> ... 6 more
> Caused by: java.util.zip.DataFormatException: invalid stored block lengths
> at java.util.zip.Inflater.inflateBytes(Native Method)
> at java.util.zip.Inflater.inflate(Inflater.java:259)
> at java.util.zip.Inflater.inflate(Inflater.java:280)
> at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:108)
> at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:74)
> ... 21 more
> {noformat}
> Thank you for looking into this.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]