Manish S N created TIKA-4459:
--------------------------------
Summary: protected ODF encryption detection fail
Key: TIKA-4459
URL: https://issues.apache.org/jira/browse/TIKA-4459
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 3.2.1
Environment: Ubuntu 24.04.2 LTS x86_64
Reporter: Manish S N
When passing inputstream of protected odf format file to tika we get a
ZipException instead of a EncryptedDocumentException.
This works well and correctly throws EncryptedDocumentException if you create
TikaInputStream with Path or call TikaInputStream.getPath() as it will spool
the file to memory.
But when working with InputStreams we get the following zip exception:
org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from
org.apache.tika.parser.odf.OpenDocumentParser@bae47a0
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:304)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:204)
at org.apache.tika.Tika.parseToString(Tika.java:525)
at org.apache.tika.Tika.parseToString(Tika.java:495)
at org.manish.AttachmentParser.parse(AttachmentParser.java:21)
at org.manish.AttachmentParser.lambda$testParse$1(AttachmentParser.java:72)
at
java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at
java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
at
java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
at
java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at
java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at
java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at
java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at
java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at
java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at
java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
at org.manish.AttachmentParser.testParse(AttachmentParser.java:64)
at org.manish.AttachmentParser.main(AttachmentParser.java:57)
Caused by: java.util.zip.ZipException: only DEFLATED entries can have EXT
descriptor
at java.base/java.util.zip.ZipInputStream.readLOC(ZipInputStream.java:313)
at java.base/java.util.zip.ZipInputStream.getNextEntry(ZipInputStream.java:125)
at
org.apache.tika.parser.odf.OpenDocumentParser.handleZipStream(OpenDocumentParser.java:218)
at
org.apache.tika.parser.odf.OpenDocumentParser.parse(OpenDocumentParser.java:169)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
... 19 more
(We use tika to detect encrypted docs)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)