[
https://issues.apache.org/jira/browse/TIKA-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18013718#comment-18013718
]
Tim Allison commented on TIKA-4464:
-----------------------------------
As a simple/imprecise step, we could fall-back to trusting the file extension.
> Parsing IWork files results in unknown mimetype
> -----------------------------------------------
>
> Key: TIKA-4464
> URL: https://issues.apache.org/jira/browse/TIKA-4464
> Project: Tika
> Issue Type: Bug
> Components: detector, parser
> Affects Versions: 3.2.1
> Reporter: Gregor Lang
> Priority: Minor
> Attachments: sample-2.pages, sample.key, sample.numbers, sample.pages
>
>
> When parsing *.pages or *.numbers files the resulting mime-type is always
> "application/vnd.apple.unknown.13"
>
> There seems to be a todo in *IWork13PackageParser* at line 319, which is
> probably related.
> {code:java}
> // Is it the main document?
> if (name.equals(IWORK13_MAIN_ENTRY)) {
> // TODO Decode the snappy stream, and check for the Message Type
> // = 2 (TN::SheetArchive), it is a numbers file;
> // = 10000 (TP::DocumentArchive), that's a pages file
> return null;
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)