Tim Allison created PDFBOX-5153:
-----------------------------------

             Summary: New flatefilter exception on Tika unit test files with 
3.0.0-RC1
                 Key: PDFBOX-5153
                 URL: https://issues.apache.org/jira/browse/PDFBOX-5153
             Project: PDFBox
          Issue Type: Task
            Reporter: Tim Allison


On TIKA-3347, we're integrating PDFBox 3.0.0-RC1.  We're getting new flate 
filter exceptions on a set of files that I _think_ I created with PDFBox a 
while ago.

Looks like we're also getting xref exceptions.

I would not be surprised in the least to learn that I did something wrong in 
the creation of these files and that they are corrupt!

I can replicate this issue with {{java -jar pdfbox-app-3.0.0-RC1.jar 
export:text}}

{noformat}
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Error extracting text for document [IOException]: 
java.util.zip.DataFormatException: invalid block type
{noformat}

One of the files: 
https://github.com/apache/tika/blob/main/tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-pdf-module/src/test/resources/test-documents/testPDF_no_extract_yes_accessibility_owner_user.pdf
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to