[ 
https://issues.apache.org/jira/browse/TIKA-3556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17415566#comment-17415566
 ] 

Tim Allison commented on TIKA-3556:
-----------------------------------

Y, I agree on the above.  One unfortunate bit is that POI deep, deep down 
closes the ZipFile if there's an exception loading the OPCPackage.  So we'll 
have to figure out how to deal with that.

> DefaultZipContainerDetector returns application/zip for .odt files when 
> OPCPackageDetector is on the classpath
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-3556
>                 URL: https://issues.apache.org/jira/browse/TIKA-3556
>             Project: Tika
>          Issue Type: Bug
>          Components: detector
>    Affects Versions: 2.1.0
>            Reporter: Simon Gaeremynck
>            Priority: Major
>
> This is happening because the OPCPackageDetector.detect method will [fail and 
> close the underlying zip 
> stream|https://github.com/apache/tika/blob/2.1.0-rc2/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/detect/microsoft/ooxml/OPCPackageDetector.java#L257].
>  When the next detector runs (e.g. OpenDocumentDetector), the stream it 
> receives has been closed and it won't be able to detect anything.
> After all detectors have effectively no-oped, [the 
> DefaultZipContainerDetector falls back to 
> application/zip|https://github.com/apache/tika/blob/2.1.0-rc2/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-zip-commons/src/main/java/org/apache/tika/detect/zip/DefaultZipContainerDetector.java#L209].
> Now, when running with the default CompositeDetector, the next detector is 
> usually the MimeTypes detector. This returns the proper 
> application/vnd.oasis.opendocument.text, but the [CompositeDetector will 
> ignore|https://github.com/apache/tika/blob/main/tika-core/src/main/java/org/apache/tika/detect/CompositeDetector.java#L86]
>  it as that mime type isn't marked up as a subclass of application/zip in 
> [the 
> registry|https://github.com/apache/tika/blob/2.1.0-rc2/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml#L2327].
>  
> In short, I think there are two bugs here potentially:
>  # The OPCPacakageDetector either shouldn't close the zip while detecting or 
> the DefaultZipContainerDetector should re-open if necessary?
>  # The registry should be updated to mark up 
> application/vnd.oasis.opendocument.text as a subclass of application/zip ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to