[ 
https://issues.apache.org/jira/browse/TIKA-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18010556#comment-18010556
 ] 

Manish S N edited comment on TIKA-4459 at 7/29/25 5:00 AM:
-----------------------------------------------------------

(y) yes that's what we must do – spool InputStream to temporary file in 
OpenDocumentParser every time, if that's what's required for correct parsing,

As OpenDocumentParser must do what it ought to do.

]with modern SSDs the write overhead is negligible.

Also there are benefits of having it in file system like parsers can have 
random access to content and it is easy on ram.


was (Author: JIRAUSER306563):
(y) yes that's what me must do -- spool InputStream to temporary file in 
OpenDocumentParser every time, if that's what's required for correct parsing,

As OpenDocumentParser must do what it ought to do.

]with modern SSDs the write overhead is negligible.

Also there are benefits of having it in file system like parsers can have 
random access to content and it is easy on ram.

> protected ODF encryption detection fail
> ---------------------------------------
>
>                 Key: TIKA-4459
>                 URL: https://issues.apache.org/jira/browse/TIKA-4459
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 3.2.1
>         Environment: Ubuntu 24.04.2 LTS x86_64 
>            Reporter: Manish S N
>            Priority: Minor
>              Labels: encryption, odf, open-document-format, protected, 
> regression, zip
>             Fix For: 4.0.0, 3.2.2
>
>         Attachments: protected.odt, testProtected.odp
>
>
> When passing inputstream of protected odf format file to tika we get a 
> ZipException instead of a EncryptedDocumentException.
> This works well and correctly throws EncryptedDocumentException if you create 
> TikaInputStream with Path or call TikaInputStream.getPath() as it will write 
> to a temporary file in memory.
> But when working with InputStreams we get the following zip exception:
>  
> org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from 
> org.apache.tika.parser.odf.OpenDocumentParser@bae47a0
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:304)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:204)
> at org.apache.tika.Tika.parseToString(Tika.java:525)
> at org.apache.tika.Tika.parseToString(Tika.java:495)
> at org.manish.AttachmentParser.parse(AttachmentParser.java:21)
> at org.manish.AttachmentParser.lambda$testParse$1(AttachmentParser.java:72)
> at 
> java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
> at 
> java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
> at 
> java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
> at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
> at 
> java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
> at 
> java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
> at 
> java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
> at 
> java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
> at 
> java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
> at 
> java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at 
> java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
> at org.manish.AttachmentParser.testParse(AttachmentParser.java:64)
> at org.manish.AttachmentParser.main(AttachmentParser.java:57)
> Caused by: java.util.zip.ZipException: only DEFLATED entries can have EXT 
> descriptor
> at java.base/java.util.zip.ZipInputStream.readLOC(ZipInputStream.java:313)
> at 
> java.base/java.util.zip.ZipInputStream.getNextEntry(ZipInputStream.java:125)
> at 
> org.apache.tika.parser.odf.OpenDocumentParser.handleZipStream(OpenDocumentParser.java:218)
> at 
> org.apache.tika.parser.odf.OpenDocumentParser.parse(OpenDocumentParser.java:169)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
> ... 19 more
>  
> (We use tika to detect encrypted docs)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to