[ https://issues.apache.org/jira/browse/TIKA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472053#comment-17472053 ]
Tim Allison commented on TIKA-3634: ----------------------------------- Thank you for submitting the bug and sharing triggering files. A couple of items unrelated to the problem: * AppleSingleFileParser does not handle iworks files. That is for a completely unrelated file format: [https://en.wikipedia.org/wiki/AppleSingle_and_AppleDouble_formats] * You shouldn't need to add: tika-parser-zip-commons,tika-parser-apple-module. These should be included in tika-parsers-standard-package. If they're not, that's a serious problem. Please open a different ticket. I regret I'm still not clear on what we need to fix. With Tika 1.28, I get {{application/vnd.apple.unknown.13}} for the *.numbers file and *.pages file; I get {{application/vnd.apple.keynote.13}} for the .key file. No attachments or text are extracted from any of those. With Tika 2.2.1, I get {{application/vnd.apple.unknown.13}} all three (*.pages, *.key , *.numbers files), but then the packageparser parses all embedded files that Tika supports. What is the desired behavior? > Failed to Parser Apple related files > ------------------------------------ > > Key: TIKA-3634 > URL: https://issues.apache.org/jira/browse/TIKA-3634 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 2.2.1 > Reporter: Tika User > Assignee: Tim Allison > Priority: Blocker > Attachments: brochure.pages, keynotecreated.key, > mortgagecalculator.numbers > > > Unable to parse '.Number', '.key', '.pages' file using below class in xml > file(org.apache.tika.parser.apple.AppleSingleFileParser) > Getting unkown mimetype : application/vnd.apple.unknown.13 > Using all these modules : > tika-core,tika-parsers-standard-package,tika-parser-microsoft-module,tika-parser-sqlite3-package,tika-parser-scientific-module,tika-parser-zip-commons,tika-parser-apple-module -- This message was sent by Atlassian Jira (v8.20.1#820001)