Manish S N created TIKA-4459: -------------------------------- Summary: protected ODF encryption detection fail Key: TIKA-4459 URL: https://issues.apache.org/jira/browse/TIKA-4459 Project: Tika Issue Type: Bug Components: parser Affects Versions: 3.2.1 Environment: Ubuntu 24.04.2 LTS x86_64 Reporter: Manish S N
When passing inputstream of protected odf format file to tika we get a ZipException instead of a EncryptedDocumentException. This works well and correctly throws EncryptedDocumentException if you create TikaInputStream with Path or call TikaInputStream.getPath() as it will spool the file to memory. But when working with InputStreams we get the following zip exception: org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.odf.OpenDocumentParser@bae47a0 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:304) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:204) at org.apache.tika.Tika.parseToString(Tika.java:525) at org.apache.tika.Tika.parseToString(Tika.java:495) at org.manish.AttachmentParser.parse(AttachmentParser.java:21) at org.manish.AttachmentParser.lambda$testParse$1(AttachmentParser.java:72) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177) at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195) at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) at org.manish.AttachmentParser.testParse(AttachmentParser.java:64) at org.manish.AttachmentParser.main(AttachmentParser.java:57) Caused by: java.util.zip.ZipException: only DEFLATED entries can have EXT descriptor at java.base/java.util.zip.ZipInputStream.readLOC(ZipInputStream.java:313) at java.base/java.util.zip.ZipInputStream.getNextEntry(ZipInputStream.java:125) at org.apache.tika.parser.odf.OpenDocumentParser.handleZipStream(OpenDocumentParser.java:218) at org.apache.tika.parser.odf.OpenDocumentParser.parse(OpenDocumentParser.java:169) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) ... 19 more (We use tika to detect encrypted docs) -- This message was sent by Atlassian Jira (v8.20.10#820010)