[ 
https://issues.apache.org/jira/browse/TIKA-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manish S N updated TIKA-4459:
-----------------------------
    Description: 
When passing inputstream of protected odf format file to tika we get a 
ZipException instead of a EncryptedDocumentException.

This works well and correctly throws EncryptedDocumentException if you create 
TikaInputStream with Path or call TikaInputStream.getPath() as it will write to 
a temporary file in memory.

But when working with InputStreams we get the following zip exception:

 
org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from 
org.apache.tika.parser.odf.OpenDocumentParser@bae47a0
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:304)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:204)
at org.apache.tika.Tika.parseToString(Tika.java:525)
at org.apache.tika.Tika.parseToString(Tika.java:495)
at org.manish.AttachmentParser.parse(AttachmentParser.java:21)
at org.manish.AttachmentParser.lambda$testParse$1(AttachmentParser.java:72)
at 
java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at 
java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
at 
java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
at 
java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at 
java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at 
java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at 
java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at 
java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at 
java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at 
java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
at org.manish.AttachmentParser.testParse(AttachmentParser.java:64)
at org.manish.AttachmentParser.main(AttachmentParser.java:57)
Caused by: java.util.zip.ZipException: only DEFLATED entries can have EXT 
descriptor
at java.base/java.util.zip.ZipInputStream.readLOC(ZipInputStream.java:313)
at java.base/java.util.zip.ZipInputStream.getNextEntry(ZipInputStream.java:125)
at 
org.apache.tika.parser.odf.OpenDocumentParser.handleZipStream(OpenDocumentParser.java:218)
at 
org.apache.tika.parser.odf.OpenDocumentParser.parse(OpenDocumentParser.java:169)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
... 19 more
 
(We use tika to detect encrypted docs)

  was:
When passing inputstream of protected odf format file to tika we get a 
ZipException instead of a EncryptedDocumentException.

This works well and correctly throws EncryptedDocumentException if you create 
TikaInputStream with Path or call TikaInputStream.getPath() as it will spool 
the file to memory.

But when working with InputStreams we get the following zip exception:

 
org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from 
org.apache.tika.parser.odf.OpenDocumentParser@bae47a0
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:304)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:204)
at org.apache.tika.Tika.parseToString(Tika.java:525)
at org.apache.tika.Tika.parseToString(Tika.java:495)
at org.manish.AttachmentParser.parse(AttachmentParser.java:21)
at org.manish.AttachmentParser.lambda$testParse$1(AttachmentParser.java:72)
at 
java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at 
java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
at 
java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
at 
java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at 
java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at 
java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at 
java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at 
java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at 
java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at 
java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
at org.manish.AttachmentParser.testParse(AttachmentParser.java:64)
at org.manish.AttachmentParser.main(AttachmentParser.java:57)
Caused by: java.util.zip.ZipException: only DEFLATED entries can have EXT 
descriptor
at java.base/java.util.zip.ZipInputStream.readLOC(ZipInputStream.java:313)
at java.base/java.util.zip.ZipInputStream.getNextEntry(ZipInputStream.java:125)
at 
org.apache.tika.parser.odf.OpenDocumentParser.handleZipStream(OpenDocumentParser.java:218)
at 
org.apache.tika.parser.odf.OpenDocumentParser.parse(OpenDocumentParser.java:169)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
... 19 more
 
(We use tika to detect encrypted docs)


> protected ODF encryption detection fail
> ---------------------------------------
>
>                 Key: TIKA-4459
>                 URL: https://issues.apache.org/jira/browse/TIKA-4459
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 3.2.1
>         Environment: Ubuntu 24.04.2 LTS x86_64 
>            Reporter: Manish S N
>            Priority: Minor
>              Labels: encryption, odf, open-document-format, protected, zip
>
> When passing inputstream of protected odf format file to tika we get a 
> ZipException instead of a EncryptedDocumentException.
> This works well and correctly throws EncryptedDocumentException if you create 
> TikaInputStream with Path or call TikaInputStream.getPath() as it will write 
> to a temporary file in memory.
> But when working with InputStreams we get the following zip exception:
>  
> org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from 
> org.apache.tika.parser.odf.OpenDocumentParser@bae47a0
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:304)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:204)
> at org.apache.tika.Tika.parseToString(Tika.java:525)
> at org.apache.tika.Tika.parseToString(Tika.java:495)
> at org.manish.AttachmentParser.parse(AttachmentParser.java:21)
> at org.manish.AttachmentParser.lambda$testParse$1(AttachmentParser.java:72)
> at 
> java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
> at 
> java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
> at 
> java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
> at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
> at 
> java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
> at 
> java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
> at 
> java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
> at 
> java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
> at 
> java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
> at 
> java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at 
> java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
> at org.manish.AttachmentParser.testParse(AttachmentParser.java:64)
> at org.manish.AttachmentParser.main(AttachmentParser.java:57)
> Caused by: java.util.zip.ZipException: only DEFLATED entries can have EXT 
> descriptor
> at java.base/java.util.zip.ZipInputStream.readLOC(ZipInputStream.java:313)
> at 
> java.base/java.util.zip.ZipInputStream.getNextEntry(ZipInputStream.java:125)
> at 
> org.apache.tika.parser.odf.OpenDocumentParser.handleZipStream(OpenDocumentParser.java:218)
> at 
> org.apache.tika.parser.odf.OpenDocumentParser.parse(OpenDocumentParser.java:169)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
> ... 19 more
>  
> (We use tika to detect encrypted docs)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to