[ https://issues.apache.org/jira/browse/TIKA-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18010244#comment-18010244 ]
Manish S N edited comment on TIKA-4459 at 7/28/25 4:45 AM: ----------------------------------------------------------- It happens with any protected open document format (odt,ods,odp) (given as stream not file) parsed by open document parser. Infering from the decompiled code the open document parser has two way of parsing document it seems either as ZipFile (In case the stream is TikaInputStream and returns true for hasFile() method) or as ZipStream otherwise. The problem occurs in the ZipStream logic - when we give tika the InputStream and not the path was (Author: JIRAUSER306563): It happens with any protected open document format (odt,ods,odp) given as stream not file parsed by open document parser. Infering from the decompiled code the open document parser has two way of parsing document it seems either as ZipFile (In case the stream is TikaInputStream and returns true for hasFile() method) or as ZipStream otherwise. The problem occurs in the ZipStream logic - when we give tika the InputStream and not the path > protected ODF encryption detection fail > --------------------------------------- > > Key: TIKA-4459 > URL: https://issues.apache.org/jira/browse/TIKA-4459 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 3.2.1 > Environment: Ubuntu 24.04.2 LTS x86_64 > Reporter: Manish S N > Priority: Minor > Labels: encryption, odf, open-document-format, protected, zip > Attachments: protected.odt > > > When passing inputstream of protected odf format file to tika we get a > ZipException instead of a EncryptedDocumentException. > This works well and correctly throws EncryptedDocumentException if you create > TikaInputStream with Path or call TikaInputStream.getPath() as it will write > to a temporary file in memory. > But when working with InputStreams we get the following zip exception: > > org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from > org.apache.tika.parser.odf.OpenDocumentParser@bae47a0 > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:304) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) > at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:204) > at org.apache.tika.Tika.parseToString(Tika.java:525) > at org.apache.tika.Tika.parseToString(Tika.java:495) > at org.manish.AttachmentParser.parse(AttachmentParser.java:21) > at org.manish.AttachmentParser.lambda$testParse$1(AttachmentParser.java:72) > at > java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) > at > java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177) > at > java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195) > at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) > at > java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at > java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) > at > java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) > at > java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) > at > java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) > at > java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) > at org.manish.AttachmentParser.testParse(AttachmentParser.java:64) > at org.manish.AttachmentParser.main(AttachmentParser.java:57) > Caused by: java.util.zip.ZipException: only DEFLATED entries can have EXT > descriptor > at java.base/java.util.zip.ZipInputStream.readLOC(ZipInputStream.java:313) > at > java.base/java.util.zip.ZipInputStream.getNextEntry(ZipInputStream.java:125) > at > org.apache.tika.parser.odf.OpenDocumentParser.handleZipStream(OpenDocumentParser.java:218) > at > org.apache.tika.parser.odf.OpenDocumentParser.parse(OpenDocumentParser.java:169) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) > ... 19 more > > (We use tika to detect encrypted docs) -- This message was sent by Atlassian Jira (v8.20.10#820010)