[ https://issues.apache.org/jira/browse/TIKA-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tyler Palsulich closed TIKA-617. -------------------------------- Resolution: Won't Fix The underlying exception is {code} Caused by: java.util.zip.DataFormatException: invalid distance too far back at java.util.zip.Inflater.inflateBytes(Native Method) {code} So, I'm closing this as Won't Fix. If anyone objects, please reopen. > Series of exceptions from PDFBox > -------------------------------- > > Key: TIKA-617 > URL: https://issues.apache.org/jira/browse/TIKA-617 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 0.10 > Reporter: Erik Hetzner > > Hi, > I am getting the following exception from PDFBox. Thank you! > (If I should file these upstream at PDFBox first, please let me know.) > {noformat} > $ java -jar tika-app-1.0-SNAPSHOT.jar > http://www.arb.ca.gov/research/apr/past/01-340.pdf > /dev/null > ERROR - Stop reading corrupt stream > INFO - unsupported/disabled operation: f24.481 > INFO - unsupported/disabled operation: ree)n. > WARN - java.lang.ClassCastException: org.apache.pdfbox.cos.COSInteger cannot > be cast to org.apache.pdfbox.cos.COSArray > java.lang.ClassCastException: org.apache.pdfbox.cos.COSInteger cannot be cast > to org.apache.pdfbox.cos.COSArray > at > org.apache.pdfbox.util.operator.ShowTextGlyph.process(ShowTextGlyph.java:44) > at > org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:551) > at > org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:274) > at > org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251) > at > org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225) > at > org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:442) > at > org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:366) > at > org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:322) > at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56) > at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:89) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135) > at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:107) > at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:302) > at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:91) > INFO - unsupported/disabled operation: i- > INFO - unsupported/disabled operation: R4% > INFO - unsupported/disabled operation: ) > INFO - unsupported/disabled operation: Re.8 > INFO - unsupported/disabled operation: e. > INFO - unsupported/disabled operation: FE)- > WARN - java.lang.ClassCastException: org.apache.pdfbox.cos.COSInteger cannot > be cast to org.apache.pdfbox.cos.COSArray > java.lang.ClassCastException: org.apache.pdfbox.cos.COSInteger cannot be cast > to org.apache.pdfbox.cos.COSArray > at > org.apache.pdfbox.util.operator.ShowTextGlyph.process(ShowTextGlyph.java:44) > at > org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:551) > at > org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:274) > at > org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251) > at > org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225) > at > org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:442) > at > org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:366) > at > org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:322) > at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56) > at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:89) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135) > at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:107) > at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:302) > at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:91) > INFO - unsupported/disabled operation: R3% > INFO - unsupported/disabled operation: T > Exception in thread "main" org.apache.tika.exception.TikaException: > Unexpected RuntimeException from org.apache.tika.parser.pdf.PDFParser@5809fdee > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:199) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135) > at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:107) > at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:302) > at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:91) > Caused by: java.lang.RuntimeException: java.io.IOException: Error: Expected > operator 'ID' actual='I8' > at > org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:178) > at > org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:187) > at > org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:266) > at > org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251) > at > org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225) > at > org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:442) > at > org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:366) > at > org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:322) > at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56) > at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:89) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) > ... 5 more > Caused by: java.io.IOException: Error: Expected operator 'ID' actual='I8' > at > org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:382) > at > org.apache.pdfbox.pdfparser.PDFStreamParser.access$000(PDFStreamParser.java:46) > at > org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:175) > ... 15 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)