Tim Allison created PDFBOX-5153: ----------------------------------- Summary: New flatefilter exception on Tika unit test files with 3.0.0-RC1 Key: PDFBOX-5153 URL: https://issues.apache.org/jira/browse/PDFBOX-5153 Project: PDFBox Issue Type: Task Reporter: Tim Allison
On TIKA-3347, we're integrating PDFBox 3.0.0-RC1. We're getting new flate filter exceptions on a set of files that I _think_ I created with PDFBox a while ago. Looks like we're also getting xref exceptions. I would not be surprised in the least to learn that I did something wrong in the creation of these files and that they are corrupt! I can replicate this issue with {{java -jar pdfbox-app-3.0.0-RC1.jar export:text}} {noformat} SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Error extracting text for document [IOException]: java.util.zip.DataFormatException: invalid block type {noformat} One of the files: https://github.com/apache/tika/blob/main/tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-pdf-module/src/test/resources/test-documents/testPDF_no_extract_yes_accessibility_owner_user.pdf -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org