[ https://issues.apache.org/jira/browse/TIKA-2869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Edans Sandes updated TIKA-2869: ------------------------------- Description: I could convert the attached pdf using tika-app-1.19.1.jar, but now, in version tika-app-1.20.jar, it stopped working. {{java -jar {color:#ff0000}tika-app-1.20.jar{color} 0001.127_342_5_7955.pdf}} mai 10, 2019 11:36:23 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem ADVERT╩NCIA: J2KImageReader not loaded. JPEG2000 files will not be processed. See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. mai 10, 2019 11:36:23 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem ADVERT╩NCIA: org.xerial's sqlite-jdbc is not loaded. Please provide the jar on your classpath to parse sqlite files. See tika-parsers/pom.xml for the correct version. Exception in thread "main" org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.crypto.Pkcs7Parser@1c43f4e at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149) Caused by: org.apache.tika.io.TaggedIOException: DEF length 465542 object truncated by 465479 at org.apache.tika.io.TaggedInputStream.handleIOException(TaggedInputStream.java:133) at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:63) at org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:437) at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116) at org.apache.tika.parser.crypto.Pkcs7Parser.parse(Pkcs7Parser.java:86) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ... 5 more Caused by: java.io.EOFException: DEF length 465542 object truncated by 465479 at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source) at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source) at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source) at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source) at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source) at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source) at java.io.BufferedInputStream.read1(Unknown Source) at java.io.BufferedInputStream.read(Unknown Source) at org.bouncycastle.util.io.Streams.readFully(Unknown Source) at org.bouncycastle.cms.CMSTypedStream$FullReaderStream.read(Unknown Source) at java.io.BufferedInputStream.fill(Unknown Source) at java.io.BufferedInputStream.read(Unknown Source) at java.io.FilterInputStream.read(Unknown Source) at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:59) ... 10 more {{java -jar {color:#14892c}tika-app-1.19.1.jar{color} 0001.127_342_5_7955.pdf}}{{mai 10, 2019 11:26:28 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem}} {{ADVERT╩NCIA: J2KImageReader not loaded. JPEG2000 files will not be processed.}} {{See [https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io]}} {{for optional dependencies.}}{{mai 10, 2019 11:26:28 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem}} {{ADVERT╩NCIA: org.xerial's sqlite-jdbc is not loaded.}} {{Please provide the jar on your classpath to parse sqlite files.}} {{See tika-parsers/pom.xml for the correct version.}}{{<?xml version="1.0" encoding="UTF-8"?><html xmlns="http://www.w3.org/1999/xhtml">}} {{<head>}} {{<meta name="date" content="2019-03-15T12:36:08Z"/>}}{{...CORRECT XML OUTPUT...}} was: I could convert the attached pdf using tika-app-1.19.1.jar, but now, in version tika-app-1.20.jar, it stopped working. {{java -jar {color:#ff0000}tika-app-1.20.jar{color} 0001.127_342_5_7955.pdf}} {{mai 10, 2019 11:36:23 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem}} {{ADVERT╩NCIA: J2KImageReader not loaded. JPEG2000 files will not be processed.}} {{See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io}} {{for optional dependencies.}}{{mai 10, 2019 11:36:23 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem}} {{ADVERT╩NCIA: org.xerial's sqlite-jdbc is not loaded.}} {{Please provide the jar on your classpath to parse sqlite files.}} {{See tika-parsers/pom.xml for the correct version.}} {{Exception in thread "main" org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.crypto.Pkcs7Parser@1c43f4e}} {{ at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286)}} {{ at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}} {{ at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)}} {{ at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209)}} {{ at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496)}} {{ at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149)}} {{Caused by: org.apache.tika.io.TaggedIOException: DEF length 465542 object truncated by 465479}} {{ at org.apache.tika.io.TaggedInputStream.handleIOException(TaggedInputStream.java:133)}} {{ at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:63)}} {{ at org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:437)}} {{ at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84)}} {{ at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116)}} {{ at org.apache.tika.parser.crypto.Pkcs7Parser.parse(Pkcs7Parser.java:86)}} {{ at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}} {{ ... 5 more}} {{Caused by: java.io.EOFException: DEF length 465542 object truncated by 465479}} {{ at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source)}} {{ at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source)}} {{ at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source)}} {{ at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source)}} {{ at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source)}} {{ at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source)}} {{ at java.io.BufferedInputStream.read1(Unknown Source)}} {{ at java.io.BufferedInputStream.read(Unknown Source)}} {{ at org.bouncycastle.util.io.Streams.readFully(Unknown Source)}} {{ at org.bouncycastle.cms.CMSTypedStream$FullReaderStream.read(Unknown Source)}} {{ at java.io.BufferedInputStream.fill(Unknown Source)}} {{ at java.io.BufferedInputStream.read(Unknown Source)}} {{ at java.io.FilterInputStream.read(Unknown Source)}} {{ at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:59)}} {{ ... 10 more}} {{java -jar {color:#14892c}tika-app-1.19.1.jar{color} 0001.127_342_5_7955.pdf}}{{mai 10, 2019 11:26:28 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem}} {{ADVERT╩NCIA: J2KImageReader not loaded. JPEG2000 files will not be processed.}} {{See [https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io]}} {{for optional dependencies.}}{{mai 10, 2019 11:26:28 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem}} {{ADVERT╩NCIA: org.xerial's sqlite-jdbc is not loaded.}} {{Please provide the jar on your classpath to parse sqlite files.}} {{See tika-parsers/pom.xml for the correct version.}}{{<?xml version="1.0" encoding="UTF-8"?><html xmlns="http://www.w3.org/1999/xhtml">}} {{<head>}} {{<meta name="date" content="2019-03-15T12:36:08Z"/>}}{{...CORRECT XML OUTPUT...}} > Can't parse pdf in version 1.20 - Pkcs7Parser (DEF length 465542 object > truncated by 465479) > -------------------------------------------------------------------------------------------- > > Key: TIKA-2869 > URL: https://issues.apache.org/jira/browse/TIKA-2869 > Project: Tika > Issue Type: Bug > Components: app, cli, parser > Affects Versions: 1.20 > Environment: Windows 10 (1809 - 17763.437) > Java(TM) SE Runtime Environment (build 1.8.0_121-b13) > Java HotSpot(TM) Client VM (build 25.121-b13, mixed mode) > Reporter: Edans Sandes > Priority: Major > Attachments: 0001.127_342_5_7955.pdf > > > I could convert the attached pdf using tika-app-1.19.1.jar, but now, in > version tika-app-1.20.jar, it stopped working. > {{java -jar {color:#ff0000}tika-app-1.20.jar{color} 0001.127_342_5_7955.pdf}} > mai 10, 2019 11:36:23 AM org.apache.tika.config.InitializableProblemHandler$3 > handleInitializableProblem > ADVERT╩NCIA: J2KImageReader not loaded. JPEG2000 files will not be processed. > See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io > for optional dependencies. > mai 10, 2019 11:36:23 AM org.apache.tika.config.InitializableProblemHandler$3 > handleInitializableProblem > ADVERT╩NCIA: org.xerial's sqlite-jdbc is not loaded. > Please provide the jar on your classpath to parse sqlite files. > See tika-parsers/pom.xml for the correct version. > Exception in thread "main" org.apache.tika.exception.TikaException: TIKA-198: > Illegal IOException from org.apache.tika.parser.crypto.Pkcs7Parser@1c43f4e > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) > at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209) > at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496) > at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149) > Caused by: org.apache.tika.io.TaggedIOException: DEF length 465542 object > truncated by 465479 > at > org.apache.tika.io.TaggedInputStream.handleIOException(TaggedInputStream.java:133) > at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:63) > at > org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:437) > at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84) > at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116) > at org.apache.tika.parser.crypto.Pkcs7Parser.parse(Pkcs7Parser.java:86) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > ... 5 more > Caused by: java.io.EOFException: DEF length 465542 object truncated by 465479 > at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source) > at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source) > at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source) > at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source) > at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source) > at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source) > at java.io.BufferedInputStream.read1(Unknown Source) > at java.io.BufferedInputStream.read(Unknown Source) > at org.bouncycastle.util.io.Streams.readFully(Unknown Source) > at org.bouncycastle.cms.CMSTypedStream$FullReaderStream.read(Unknown Source) > at java.io.BufferedInputStream.fill(Unknown Source) > at java.io.BufferedInputStream.read(Unknown Source) > at java.io.FilterInputStream.read(Unknown Source) > at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:59) > ... 10 more > > > {{java -jar {color:#14892c}tika-app-1.19.1.jar{color} > 0001.127_342_5_7955.pdf}}{{mai 10, 2019 11:26:28 AM > org.apache.tika.config.InitializableProblemHandler$3 > handleInitializableProblem}} > {{ADVERT╩NCIA: J2KImageReader not loaded. JPEG2000 files will not be > processed.}} > {{See [https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io]}} > {{for optional dependencies.}}{{mai 10, 2019 11:26:28 AM > org.apache.tika.config.InitializableProblemHandler$3 > handleInitializableProblem}} > {{ADVERT╩NCIA: org.xerial's sqlite-jdbc is not loaded.}} > {{Please provide the jar on your classpath to parse sqlite files.}} > {{See tika-parsers/pom.xml for the correct version.}}{{<?xml version="1.0" > encoding="UTF-8"?><html xmlns="http://www.w3.org/1999/xhtml">}} > {{<head>}} > {{<meta name="date" content="2019-03-15T12:36:08Z"/>}}{{...CORRECT XML > OUTPUT...}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)