[ 
https://issues.apache.org/jira/browse/TIKA-2869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edans Sandes updated TIKA-2869:
-------------------------------
    Description: 
I could convert the attached pdf using tika-app-1.19.1.jar, but now, in version 
tika-app-1.20.jar, it stopped working.

{{java -jar {color:#ff0000}tika-app-1.20.jar{color} 0001.127_342_5_7955.pdf}}

mai 10, 2019 11:36:23 AM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
ADVERT╩NCIA: J2KImageReader not loaded. JPEG2000 files will not be processed.
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.

mai 10, 2019 11:36:23 AM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
ADVERT╩NCIA: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.
Exception in thread "main" org.apache.tika.exception.TikaException: TIKA-198: 
Illegal IOException from org.apache.tika.parser.crypto.Pkcs7Parser@1c43f4e
 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286)
 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
 at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
 at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209)
 at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496)
 at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149)
Caused by: org.apache.tika.io.TaggedIOException: DEF length 465542 object 
truncated by 465479
 at 
org.apache.tika.io.TaggedInputStream.handleIOException(TaggedInputStream.java:133)
 at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:63)
 at 
org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:437)
 at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84)
 at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116)
 at org.apache.tika.parser.crypto.Pkcs7Parser.parse(Pkcs7Parser.java:86)
 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
 ... 5 more
Caused by: java.io.EOFException: DEF length 465542 object truncated by 465479
 at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source)
 at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source)
 at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source)
 at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source)
 at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source)
 at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source)
 at java.io.BufferedInputStream.read1(Unknown Source)
 at java.io.BufferedInputStream.read(Unknown Source)
 at org.bouncycastle.util.io.Streams.readFully(Unknown Source)
 at org.bouncycastle.cms.CMSTypedStream$FullReaderStream.read(Unknown Source)
 at java.io.BufferedInputStream.fill(Unknown Source)
 at java.io.BufferedInputStream.read(Unknown Source)
 at java.io.FilterInputStream.read(Unknown Source)
 at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:59)
 ... 10 more

 

 

{{java -jar {color:#14892c}tika-app-1.19.1.jar{color} 
0001.127_342_5_7955.pdf}}{{mai 10, 2019 11:26:28 AM 
org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem}}
 {{ADVERT╩NCIA: J2KImageReader not loaded. JPEG2000 files will not be 
processed.}}
 {{See [https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io]}}
 {{for optional dependencies.}}{{mai 10, 2019 11:26:28 AM 
org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem}}
 {{ADVERT╩NCIA: org.xerial's sqlite-jdbc is not loaded.}}
 {{Please provide the jar on your classpath to parse sqlite files.}}
 {{See tika-parsers/pom.xml for the correct version.}}{{<?xml version="1.0" 
encoding="UTF-8"?><html xmlns="http://www.w3.org/1999/xhtml";>}}
 {{<head>}}
 {{<meta name="date" content="2019-03-15T12:36:08Z"/>}}{{...CORRECT XML 
OUTPUT...}}

  was:
I could convert the attached pdf using tika-app-1.19.1.jar, but now, in version 
tika-app-1.20.jar, it stopped working.

{{java -jar {color:#ff0000}tika-app-1.20.jar{color} 0001.127_342_5_7955.pdf}}

{{mai 10, 2019 11:36:23 AM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem}}
{{ADVERT╩NCIA: J2KImageReader not loaded. JPEG2000 files will not be 
processed.}}
{{See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io}}
{{for optional dependencies.}}{{mai 10, 2019 11:36:23 AM 
org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem}}
{{ADVERT╩NCIA: org.xerial's sqlite-jdbc is not loaded.}}
{{Please provide the jar on your classpath to parse sqlite files.}}
{{See tika-parsers/pom.xml for the correct version.}}
{{Exception in thread "main" org.apache.tika.exception.TikaException: TIKA-198: 
Illegal IOException from org.apache.tika.parser.crypto.Pkcs7Parser@1c43f4e}}
{{ at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286)}}
{{ at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}
{{ at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)}}
{{ at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209)}}
{{ at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496)}}
{{ at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149)}}
{{Caused by: org.apache.tika.io.TaggedIOException: DEF length 465542 object 
truncated by 465479}}
{{ at 
org.apache.tika.io.TaggedInputStream.handleIOException(TaggedInputStream.java:133)}}
{{ at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:63)}}
{{ at 
org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:437)}}
{{ at 
org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84)}}
{{ at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116)}}
{{ at org.apache.tika.parser.crypto.Pkcs7Parser.parse(Pkcs7Parser.java:86)}}
{{ at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}
{{ ... 5 more}}
{{Caused by: java.io.EOFException: DEF length 465542 object truncated by 
465479}}
{{ at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source)}}
{{ at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source)}}
{{ at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source)}}
{{ at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source)}}
{{ at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source)}}
{{ at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source)}}
{{ at java.io.BufferedInputStream.read1(Unknown Source)}}
{{ at java.io.BufferedInputStream.read(Unknown Source)}}
{{ at org.bouncycastle.util.io.Streams.readFully(Unknown Source)}}
{{ at org.bouncycastle.cms.CMSTypedStream$FullReaderStream.read(Unknown 
Source)}}
{{ at java.io.BufferedInputStream.fill(Unknown Source)}}
{{ at java.io.BufferedInputStream.read(Unknown Source)}}
{{ at java.io.FilterInputStream.read(Unknown Source)}}
{{ at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:59)}}
{{ ... 10 more}}

 

 

{{java -jar {color:#14892c}tika-app-1.19.1.jar{color} 
0001.127_342_5_7955.pdf}}{{mai 10, 2019 11:26:28 AM 
org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem}}
 {{ADVERT╩NCIA: J2KImageReader not loaded. JPEG2000 files will not be 
processed.}}
 {{See [https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io]}}
 {{for optional dependencies.}}{{mai 10, 2019 11:26:28 AM 
org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem}}
 {{ADVERT╩NCIA: org.xerial's sqlite-jdbc is not loaded.}}
 {{Please provide the jar on your classpath to parse sqlite files.}}
 {{See tika-parsers/pom.xml for the correct version.}}{{<?xml version="1.0" 
encoding="UTF-8"?><html xmlns="http://www.w3.org/1999/xhtml";>}}
 {{<head>}}
 {{<meta name="date" content="2019-03-15T12:36:08Z"/>}}{{...CORRECT XML 
OUTPUT...}}


> Can't parse pdf in version 1.20 - Pkcs7Parser (DEF length 465542 object 
> truncated by 465479)
> --------------------------------------------------------------------------------------------
>
>                 Key: TIKA-2869
>                 URL: https://issues.apache.org/jira/browse/TIKA-2869
>             Project: Tika
>          Issue Type: Bug
>          Components: app, cli, parser
>    Affects Versions: 1.20
>         Environment: Windows 10 (1809 - 17763.437)
> Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
> Java HotSpot(TM) Client VM (build 25.121-b13, mixed mode)
>            Reporter: Edans Sandes
>            Priority: Major
>         Attachments: 0001.127_342_5_7955.pdf
>
>
> I could convert the attached pdf using tika-app-1.19.1.jar, but now, in 
> version tika-app-1.20.jar, it stopped working.
> {{java -jar {color:#ff0000}tika-app-1.20.jar{color} 0001.127_342_5_7955.pdf}}
> mai 10, 2019 11:36:23 AM org.apache.tika.config.InitializableProblemHandler$3 
> handleInitializableProblem
> ADVERT╩NCIA: J2KImageReader not loaded. JPEG2000 files will not be processed.
> See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
> for optional dependencies.
> mai 10, 2019 11:36:23 AM org.apache.tika.config.InitializableProblemHandler$3 
> handleInitializableProblem
> ADVERT╩NCIA: org.xerial's sqlite-jdbc is not loaded.
> Please provide the jar on your classpath to parse sqlite files.
> See tika-parsers/pom.xml for the correct version.
> Exception in thread "main" org.apache.tika.exception.TikaException: TIKA-198: 
> Illegal IOException from org.apache.tika.parser.crypto.Pkcs7Parser@1c43f4e
>  at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286)
>  at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>  at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>  at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209)
>  at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496)
>  at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149)
> Caused by: org.apache.tika.io.TaggedIOException: DEF length 465542 object 
> truncated by 465479
>  at 
> org.apache.tika.io.TaggedInputStream.handleIOException(TaggedInputStream.java:133)
>  at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:63)
>  at 
> org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:437)
>  at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84)
>  at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116)
>  at org.apache.tika.parser.crypto.Pkcs7Parser.parse(Pkcs7Parser.java:86)
>  at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>  ... 5 more
> Caused by: java.io.EOFException: DEF length 465542 object truncated by 465479
>  at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source)
>  at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source)
>  at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source)
>  at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source)
>  at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source)
>  at org.bouncycastle.asn1.DefiniteLengthInputStream.read(Unknown Source)
>  at java.io.BufferedInputStream.read1(Unknown Source)
>  at java.io.BufferedInputStream.read(Unknown Source)
>  at org.bouncycastle.util.io.Streams.readFully(Unknown Source)
>  at org.bouncycastle.cms.CMSTypedStream$FullReaderStream.read(Unknown Source)
>  at java.io.BufferedInputStream.fill(Unknown Source)
>  at java.io.BufferedInputStream.read(Unknown Source)
>  at java.io.FilterInputStream.read(Unknown Source)
>  at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:59)
>  ... 10 more
>  
>  
> {{java -jar {color:#14892c}tika-app-1.19.1.jar{color} 
> 0001.127_342_5_7955.pdf}}{{mai 10, 2019 11:26:28 AM 
> org.apache.tika.config.InitializableProblemHandler$3 
> handleInitializableProblem}}
>  {{ADVERT╩NCIA: J2KImageReader not loaded. JPEG2000 files will not be 
> processed.}}
>  {{See [https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io]}}
>  {{for optional dependencies.}}{{mai 10, 2019 11:26:28 AM 
> org.apache.tika.config.InitializableProblemHandler$3 
> handleInitializableProblem}}
>  {{ADVERT╩NCIA: org.xerial's sqlite-jdbc is not loaded.}}
>  {{Please provide the jar on your classpath to parse sqlite files.}}
>  {{See tika-parsers/pom.xml for the correct version.}}{{<?xml version="1.0" 
> encoding="UTF-8"?><html xmlns="http://www.w3.org/1999/xhtml";>}}
>  {{<head>}}
>  {{<meta name="date" content="2019-03-15T12:36:08Z"/>}}{{...CORRECT XML 
> OUTPUT...}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to