[ https://issues.apache.org/jira/browse/TIKA-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967876#comment-15967876 ]
Tim Allison commented on TIKA-2311: ----------------------------------- Ha, Nifi overrides our def and just calls it {{.tar}}: [here|https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml#L86] Should we do the same? > Create x-tika-ooxml-unk mime type (?) > ------------------------------------- > > Key: TIKA-2311 > URL: https://issues.apache.org/jira/browse/TIKA-2311 > Project: Tika > Issue Type: Bug > Reporter: Tim Allison > > The following is an unintended consequence of TIKA-2212. > The OOXML parser used to handle {{x-tika-ooxml}}. We have some truncated > ooxml files in our regression corpus. The previous behavior was: > 1) ZipPackage detector caught the zip truncation exception and returned > "application/zip" > 2) The mime detector recognized magic and returned {{x-tika-ooxml}} > 3) The file was then routed to the OOXML parser which didn't wind up doing > much with the content because it hit the zip exception early on, but the > final mime type was {{x-tika-ooxml}}. > The current behavior > 1) Same detection steps > 2) However, because the OOXML parser no longer handles {{x-tika-ooxml}}, the > file is handled by the Package Parser, which overwrites the magic-determined > mime type, and the new mime type is {{application/zip}}. > 3) Some content is extracted because the Package parser handles the zip > entries in order and only throws the exception once it hits the last entry in > the zip file. > Ideally, I'd like to keep the magic-determined mime detection. Once we can > chain parsers, the user should be able to backoff to the PackageParser, but I > don't think this should be the default behavior. > One solution would be to create a new mime type that is not the parent of the > other ooxml subtypes, but is itself a leaf subtype, something like: > {{x-tika-ooxml-unk}}. > Any objections/other recommendations? -- This message was sent by Atlassian JIRA (v6.3.15#6346)