[
https://issues.apache.org/jira/browse/TIKA-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018947#comment-17018947
]
Nick Burch commented on TIKA-2294:
----------------------------------
For fully accurate OOXML (and other zip-subtype) detection, you need to have
the Tika Parsers jar on your classpath, along with the dependencies. That's
because Tika needs to look inside the zip and potentially check some files in
there to be sure of the type
If you want best-guess detection, which probably would be fine for this case,
the mime-magic in Tika Core + filename hint should do you. IIRC calling detect
with a File object will do that for you, if detecting on a stream you will need
to set the filename as a hint on the metadata object passed to detection
> Tika inconsistently detects ooxml files as zip file sometimes
> -------------------------------------------------------------
>
> Key: TIKA-2294
> URL: https://issues.apache.org/jira/browse/TIKA-2294
> Project: Tika
> Issue Type: Bug
> Components: mime
> Affects Versions: 1.11
> Environment: linux
> Reporter: chanchal
> Assignee: Tim Allison
> Priority: Major
> Attachments: google_doc.docx
>
>
> Tika sometimes incorrectly detects ooxml file as zip and sometimes correctly
> detects as docx/pptx/xlsx.
> Is there a possibility of it happening and how?
> I cannot share the file as it has sensitive content.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)