[
https://issues.apache.org/jira/browse/TIKA-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18010390#comment-18010390
]
Manish S N edited comment on TIKA-4459 at 7/28/25 1:04 PM:
-----------------------------------------------------------
-So would ditching java's inbuilt ZipInputStream and moving to
ZipArchiveInputStream of apache commons compress solve the issue?-
guess not. as the following code
{code:java}
InputStream is =
URI.create("https://issues.apache.org/jira/secure/attachment/13077746/protected.odt").toURL().openStream();
// new AutoDetectParser().parse(is, new DefaultHandler(), new
Metadata(), new ParseContext());
ZipArchiveInputStream zis = new ZipArchiveInputStream(is);
do
{
ZipEntry entry = zis.getNextEntry();
if (entry == null)
{
System.out.println("No more entries in the zip file.");
break;
}
System.out.println(entry + " " + entry.getName() + " " +
entry.getSize() + " " + entry.isDirectory());
}
while (true); {code}
gives
{code:java}
mimetype mimetype 39 false
Configurations2/menubar/ Configurations2/menubar/ 0 true
Configurations2/progressbar/ Configurations2/progressbar/ 0 true
Configurations2/popupmenu/ Configurations2/popupmenu/ 0 true
Configurations2/floater/ Configurations2/floater/ 0 true
Configurations2/statusbar/ Configurations2/statusbar/ 0 true
Configurations2/toolbar/ Configurations2/toolbar/ 0 true
Configurations2/toolpanel/ Configurations2/toolpanel/ 0 true
Configurations2/images/Bitmaps/ Configurations2/images/Bitmaps/ 0 true
Configurations2/accelerator/ Configurations2/accelerator/ 0 truestyles.xml
styles.xml -1 false
Exception in thread "main"
org.apache.commons.compress.archivers.zip.UnsupportedZipFeatureException:
Unsupported feature data descriptor used in entry styles.xml
at
org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.read(ZipArchiveInputStream.java:919)
at
org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.skip(ZipArchiveInputStream.java:1285)
at
org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.closeEntry(ZipArchiveInputStream.java:480)
at
org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.getNextZipEntry(ZipArchiveInputStream.java:651)
at
org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.getNextEntry(ZipArchiveInputStream.java:632)
at org.manish.AttachmentParser.tilmanTest(AttachmentParser.java:79)
at org.manish.AttachmentParser.main(AttachmentParser.java:69){code}
was (Author: JIRAUSER306563):
So would ditching java's inbuilt ZipInputStream and moving to
ZipArchiveInputStream of apache commons compress solve the issue?
> protected ODF encryption detection fail
> ---------------------------------------
>
> Key: TIKA-4459
> URL: https://issues.apache.org/jira/browse/TIKA-4459
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 3.2.1
> Environment: Ubuntu 24.04.2 LTS x86_64
> Reporter: Manish S N
> Priority: Minor
> Labels: encryption, odf, open-document-format, protected,
> regression, zip
> Fix For: 4.0.0, 3.2.2
>
> Attachments: protected.odt
>
>
> When passing inputstream of protected odf format file to tika we get a
> ZipException instead of a EncryptedDocumentException.
> This works well and correctly throws EncryptedDocumentException if you create
> TikaInputStream with Path or call TikaInputStream.getPath() as it will write
> to a temporary file in memory.
> But when working with InputStreams we get the following zip exception:
>
> org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from
> org.apache.tika.parser.odf.OpenDocumentParser@bae47a0
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:304)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:204)
> at org.apache.tika.Tika.parseToString(Tika.java:525)
> at org.apache.tika.Tika.parseToString(Tika.java:495)
> at org.manish.AttachmentParser.parse(AttachmentParser.java:21)
> at org.manish.AttachmentParser.lambda$testParse$1(AttachmentParser.java:72)
> at
> java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
> at
> java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
> at
> java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
> at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
> at
> java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
> at
> java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
> at
> java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
> at
> java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
> at
> java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
> at
> java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at
> java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
> at org.manish.AttachmentParser.testParse(AttachmentParser.java:64)
> at org.manish.AttachmentParser.main(AttachmentParser.java:57)
> Caused by: java.util.zip.ZipException: only DEFLATED entries can have EXT
> descriptor
> at java.base/java.util.zip.ZipInputStream.readLOC(ZipInputStream.java:313)
> at
> java.base/java.util.zip.ZipInputStream.getNextEntry(ZipInputStream.java:125)
> at
> org.apache.tika.parser.odf.OpenDocumentParser.handleZipStream(OpenDocumentParser.java:218)
> at
> org.apache.tika.parser.odf.OpenDocumentParser.parse(OpenDocumentParser.java:169)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
> ... 19 more
>
> (We use tika to detect encrypted docs)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)