[
https://issues.apache.org/jira/browse/TIKA-3556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17415533#comment-17415533
]
Tim Allison commented on TIKA-3556:
-----------------------------------
And then I see a TODO: {{//TODO: OPCBased needs to be last!!!}}... Ugh.
> DefaultZipContainerDetector returns application/zip for .odt files when
> OPCPackageDetector is on the classpath
> --------------------------------------------------------------------------------------------------------------
>
> Key: TIKA-3556
> URL: https://issues.apache.org/jira/browse/TIKA-3556
> Project: Tika
> Issue Type: Bug
> Components: detector
> Affects Versions: 2.1.0
> Reporter: Simon Gaeremynck
> Priority: Major
>
> This is happening because the OPCPackageDetector.detect method will [fail and
> close the underlying zip
> stream|https://github.com/apache/tika/blob/2.1.0-rc2/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/detect/microsoft/ooxml/OPCPackageDetector.java#L257].
> When the next detector runs (e.g. OpenDocumentDetector), the stream it
> receives has been closed and it won't be able to detect anything.
> After all detectors have effectively no-oped, [the
> DefaultZipContainerDetector falls back to
> application/zip|https://github.com/apache/tika/blob/2.1.0-rc2/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-zip-commons/src/main/java/org/apache/tika/detect/zip/DefaultZipContainerDetector.java#L209].
> Now, when running with the default CompositeDetector, the next detector is
> usually the MimeTypes detector. This returns the proper
> application/vnd.oasis.opendocument.text, but the [CompositeDetector will
> ignore|https://github.com/apache/tika/blob/main/tika-core/src/main/java/org/apache/tika/detect/CompositeDetector.java#L86]
> it as that mime type isn't marked up as a subclass of application/zip in
> [the
> registry|https://github.com/apache/tika/blob/2.1.0-rc2/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml#L2327].
>
> In short, I think there are two bugs here potentially:
> # The OPCPacakageDetector either shouldn't close the zip while detecting or
> the DefaultZipContainerDetector should re-open if necessary?
> # The registry should be updated to mark up
> application/vnd.oasis.opendocument.text as a subclass of application/zip ?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)