[
https://issues.apache.org/jira/browse/TIKA-484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906555#action_12906555
]
Nick Burch commented on TIKA-484:
---------------------------------
I've just tried this file with Tika-App (which passes the filename into the
detector), and it get the content type correct:
Content-Type:
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
When working with container based files such as .xlsx, you either need to pass
in the file name, or use the ContainerAwareDetector. If you ask the normal
mime-magic detector, without a filename hint, it won't be able to figure it out.
Could you please confirm what steps you're taking that cause it to not work for
you, and ensure you are passing in the filename?
> xlsx files created with open office are detected as application/zip
> -------------------------------------------------------------------
>
> Key: TIKA-484
> URL: https://issues.apache.org/jira/browse/TIKA-484
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.7
> Environment: Ubuntu
> Reporter: Victor Kazakov
> Priority: Minor
> Attachments: openofficexlsxfile.xlsx
>
>
> Create an xlsx file in open office.
> Parse the file using a org.apache.tika.parser.AutoDetectParser
> It gets recognized as a zip file.
> Note: I have only tried this with open office running on ubuntu.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.