[jira] [Commented] (TIKA-3634) Failed to Parser Apple related files
[ https://issues.apache.org/jira/browse/TIKA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475454#comment-17475454 ] Tim Allison commented on TIKA-3634: --- That was a bad commit message. Mea culpa. That was for TIKA-3642 > Failed to Parser Apple related files > > > Key: TIKA-3634 > URL: https://issues.apache.org/jira/browse/TIKA-3634 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 2.2.1 >Reporter: Tika User >Assignee: Tim Allison >Priority: Major > Attachments: brochure.pages, keynotecreated.key, > mortgagecalculator.numbers > > > Unable to parse '.Number', '.key', '.pages' file using below class in xml > file(org.apache.tika.parser.apple.AppleSingleFileParser) > Getting unkown mimetype : application/vnd.apple.unknown.13 > Using all these modules : > tika-core,tika-parsers-standard-package,tika-parser-microsoft-module,tika-parser-sqlite3-package,tika-parser-scientific-module,tika-parser-zip-commons,tika-parser-apple-module -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (TIKA-3634) Failed to Parser Apple related files
[ https://issues.apache.org/jira/browse/TIKA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475393#comment-17475393 ] Hudson commented on TIKA-3634: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #415 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/415/]) TIKA-3634 -- improve detection of iworks 13 files and extraction of thumbnails and attachments (tallison: [https://github.com/apache/tika/commit/0a8da94c5fc49e706a77245da323088115cc22c9]) * (edit) CHANGES.txt * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/main/java/org/apache/tika/parser/pdf/PDFParserConfig.java > Failed to Parser Apple related files > > > Key: TIKA-3634 > URL: https://issues.apache.org/jira/browse/TIKA-3634 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 2.2.1 >Reporter: Tika User >Assignee: Tim Allison >Priority: Major > Attachments: brochure.pages, keynotecreated.key, > mortgagecalculator.numbers > > > Unable to parse '.Number', '.key', '.pages' file using below class in xml > file(org.apache.tika.parser.apple.AppleSingleFileParser) > Getting unkown mimetype : application/vnd.apple.unknown.13 > Using all these modules : > tika-core,tika-parsers-standard-package,tika-parser-microsoft-module,tika-parser-sqlite3-package,tika-parser-scientific-module,tika-parser-zip-commons,tika-parser-apple-module -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (TIKA-3634) Failed to Parser Apple related files
[ https://issues.apache.org/jira/browse/TIKA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472243#comment-17472243 ] Tim Allison commented on TIKA-3634: --- pdfbox is not in core; it is brought in via tika-parsers-standard-package. You'd have to build your own tika-parsers-standard-package to downgrade PDFBox. If there's a regression, you should open a ticket with PDFBox, obviously. What breaks for you with the latest PDFBox? Yes, this will be out with the next release. I have no idea when that will be. > Failed to Parser Apple related files > > > Key: TIKA-3634 > URL: https://issues.apache.org/jira/browse/TIKA-3634 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 2.2.1 >Reporter: Tika User >Assignee: Tim Allison >Priority: Major > Attachments: brochure.pages, keynotecreated.key, > mortgagecalculator.numbers > > > Unable to parse '.Number', '.key', '.pages' file using below class in xml > file(org.apache.tika.parser.apple.AppleSingleFileParser) > Getting unkown mimetype : application/vnd.apple.unknown.13 > Using all these modules : > tika-core,tika-parsers-standard-package,tika-parser-microsoft-module,tika-parser-sqlite3-package,tika-parser-scientific-module,tika-parser-zip-commons,tika-parser-apple-module -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (TIKA-3634) Failed to Parser Apple related files
[ https://issues.apache.org/jira/browse/TIKA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472237#comment-17472237 ] Hudson commented on TIKA-3634: -- UNSTABLE: Integrated in Jenkins build Tika » tika-main-jdk8 #412 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/412/]) TIKA-3634 -- improve detection of iworks 13 files and extraction of thumbnails and attachments (tallison: [https://github.com/apache/tika/commit/7e4edfce6722cbf0be0deab9b9e3f23073406dd4]) * (edit) CHANGES.txt * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-apple-module/src/test/java/org/apache/tika/parser/iwork/iwana/IWork13ParserTest.java * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-apple-module/src/main/java/org/apache/tika/parser/iwork/iwana/IWork13PackageParser.java > Failed to Parser Apple related files > > > Key: TIKA-3634 > URL: https://issues.apache.org/jira/browse/TIKA-3634 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 2.2.1 >Reporter: Tika User >Assignee: Tim Allison >Priority: Major > Attachments: brochure.pages, keynotecreated.key, > mortgagecalculator.numbers > > > Unable to parse '.Number', '.key', '.pages' file using below class in xml > file(org.apache.tika.parser.apple.AppleSingleFileParser) > Getting unkown mimetype : application/vnd.apple.unknown.13 > Using all these modules : > tika-core,tika-parsers-standard-package,tika-parser-microsoft-module,tika-parser-sqlite3-package,tika-parser-scientific-module,tika-parser-zip-commons,tika-parser-apple-module -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (TIKA-3634) Failed to Parser Apple related files
[ https://issues.apache.org/jira/browse/TIKA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472195#comment-17472195 ] Tika User commented on TIKA-3634: - The fix should be available in next release? For now I handled in our code based on tika extension I am setting mime type. Also thanks for letting me know about zip module , I will have a look on that. Also can you let me know apache.pdf tool is included in core right? Is there a way to downgrade not using latest version > Failed to Parser Apple related files > > > Key: TIKA-3634 > URL: https://issues.apache.org/jira/browse/TIKA-3634 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 2.2.1 >Reporter: Tika User >Assignee: Tim Allison >Priority: Major > Attachments: brochure.pages, keynotecreated.key, > mortgagecalculator.numbers > > > Unable to parse '.Number', '.key', '.pages' file using below class in xml > file(org.apache.tika.parser.apple.AppleSingleFileParser) > Getting unkown mimetype : application/vnd.apple.unknown.13 > Using all these modules : > tika-core,tika-parsers-standard-package,tika-parser-microsoft-module,tika-parser-sqlite3-package,tika-parser-scientific-module,tika-parser-zip-commons,tika-parser-apple-module -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (TIKA-3634) Failed to Parser Apple related files
[ https://issues.apache.org/jira/browse/TIKA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472189#comment-17472189 ] Tim Allison commented on TIKA-3634: --- I pushed a fix that takes into account the file names for disambiguation between numbers and pages if the user sends in the file name. I've also added better parsing for the thumbnail image file, and I now have the iworkspackage parser handle metadata plist files and any file that doesn't end in .iwa (for iworks 13 files). Not sure if/when we should add in the iworks18 parser that does minimal detection for those files. Note that none of the above actually extracts content from the iworks13 and iworks 18 files. We still do not have a parser for those files. We also still have no way of disambiguating numbers and pages without file names (e.g. if the user sends in only a stream without a file name). :( > Failed to Parser Apple related files > > > Key: TIKA-3634 > URL: https://issues.apache.org/jira/browse/TIKA-3634 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 2.2.1 >Reporter: Tika User >Assignee: Tim Allison >Priority: Major > Attachments: brochure.pages, keynotecreated.key, > mortgagecalculator.numbers > > > Unable to parse '.Number', '.key', '.pages' file using below class in xml > file(org.apache.tika.parser.apple.AppleSingleFileParser) > Getting unkown mimetype : application/vnd.apple.unknown.13 > Using all these modules : > tika-core,tika-parsers-standard-package,tika-parser-microsoft-module,tika-parser-sqlite3-package,tika-parser-scientific-module,tika-parser-zip-commons,tika-parser-apple-module -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (TIKA-3634) Failed to Parser Apple related files
[ https://issues.apache.org/jira/browse/TIKA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472053#comment-17472053 ] Tim Allison commented on TIKA-3634: --- Thank you for submitting the bug and sharing triggering files. A couple of items unrelated to the problem: * AppleSingleFileParser does not handle iworks files. That is for a completely unrelated file format: [https://en.wikipedia.org/wiki/AppleSingle_and_AppleDouble_formats] * You shouldn't need to add: tika-parser-zip-commons,tika-parser-apple-module. These should be included in tika-parsers-standard-package. If they're not, that's a serious problem. Please open a different ticket. I regret I'm still not clear on what we need to fix. With Tika 1.28, I get {{application/vnd.apple.unknown.13}} for the *.numbers file and *.pages file; I get {{application/vnd.apple.keynote.13}} for the .key file. No attachments or text are extracted from any of those. With Tika 2.2.1, I get {{application/vnd.apple.unknown.13}} all three (*.pages, *.key , *.numbers files), but then the packageparser parses all embedded files that Tika supports. What is the desired behavior? > Failed to Parser Apple related files > > > Key: TIKA-3634 > URL: https://issues.apache.org/jira/browse/TIKA-3634 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 2.2.1 >Reporter: Tika User >Assignee: Tim Allison >Priority: Blocker > Attachments: brochure.pages, keynotecreated.key, > mortgagecalculator.numbers > > > Unable to parse '.Number', '.key', '.pages' file using below class in xml > file(org.apache.tika.parser.apple.AppleSingleFileParser) > Getting unkown mimetype : application/vnd.apple.unknown.13 > Using all these modules : > tika-core,tika-parsers-standard-package,tika-parser-microsoft-module,tika-parser-sqlite3-package,tika-parser-scientific-module,tika-parser-zip-commons,tika-parser-apple-module -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (TIKA-3634) Failed to Parser Apple related files
[ https://issues.apache.org/jira/browse/TIKA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17466374#comment-17466374 ] Tika User commented on TIKA-3634: - For 2.0.0 version the .key files mime type is correct but failing to extract attachments. But for 2.1.1 all the attachments are extracted but mime type is showing as unknown (application/vnd.apple.unknown.13) > Failed to Parser Apple related files > > > Key: TIKA-3634 > URL: https://issues.apache.org/jira/browse/TIKA-3634 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 2.2.1 >Reporter: Tika User >Priority: Blocker > Attachments: brochure.pages, keynotecreated.key, > mortgagecalculator.numbers > > > Unable to parse '.Number', '.key', '.pages' file using below class in xml > file(org.apache.tika.parser.apple.AppleSingleFileParser) > Getting unkown mimetype : application/vnd.apple.unknown.13 > Using all these modules : > tika-core,tika-parsers-standard-package,tika-parser-microsoft-module,tika-parser-sqlite3-package,tika-parser-scientific-module,tika-parser-zip-commons,tika-parser-apple-module -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (TIKA-3634) Failed to Parser Apple related files
[ https://issues.apache.org/jira/browse/TIKA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17466098#comment-17466098 ] Tilman Hausherr commented on TIKA-3634: --- Only the {{keynotecreated.key}} file is detected differently in 2.0.0. This is related to a small change in {{IWork13PackageParser.java}} done in TIKA-3517. There is a TODO there which I haven't understood. Maybe related to the format of {{Document.iwa}}. > Failed to Parser Apple related files > > > Key: TIKA-3634 > URL: https://issues.apache.org/jira/browse/TIKA-3634 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 2.2.1 >Reporter: Tika User >Priority: Blocker > Attachments: brochure.pages, keynotecreated.key, > mortgagecalculator.numbers > > > Unable to parse '.Number', '.key', '.pages' file using below class in xml > file(org.apache.tika.parser.apple.AppleSingleFileParser) > Getting unkown mimetype : application/vnd.apple.unknown.13 > Using all these modules : > tika-core,tika-parsers-standard-package,tika-parser-microsoft-module,tika-parser-sqlite3-package,tika-parser-scientific-module,tika-parser-zip-commons,tika-parser-apple-module -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (TIKA-3634) Failed to Parser Apple related files
[ https://issues.apache.org/jira/browse/TIKA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17466053#comment-17466053 ] Tika User commented on TIKA-3634: - Fyi. This is working fine in 2.0.0 version. Able to get correct file type but text is unavailable as iWork is not implemented(seen this comment in one of the incident). > Failed to Parser Apple related files > > > Key: TIKA-3634 > URL: https://issues.apache.org/jira/browse/TIKA-3634 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 2.2.1 >Reporter: Tika User >Priority: Blocker > Attachments: brochure.pages, keynotecreated.key, > mortgagecalculator.numbers > > > Unable to parse '.Number', '.key', '.pages' file using below class in xml > file(org.apache.tika.parser.apple.AppleSingleFileParser) > Getting unkown mimetype : application/vnd.apple.unknown.13 > Using all these modules : > tika-core,tika-parsers-standard-package,tika-parser-microsoft-module,tika-parser-sqlite3-package,tika-parser-scientific-module,tika-parser-zip-commons,tika-parser-apple-module -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (TIKA-3634) Failed to Parser Apple related files
[ https://issues.apache.org/jira/browse/TIKA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17465835#comment-17465835 ] Tilman Hausherr commented on TIKA-3634: --- I have no idea, at this time no dev has picked up the problem. You submitted this 8 hours ago. This is a volunteer project, and it's winter holiday season. > Failed to Parser Apple related files > > > Key: TIKA-3634 > URL: https://issues.apache.org/jira/browse/TIKA-3634 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 2.2.1 >Reporter: Tika User >Priority: Blocker > Attachments: brochure.pages, keynotecreated.key, > mortgagecalculator.numbers > > > Unable to parse '.Number', '.key', '.pages' file using below class in xml > file(org.apache.tika.parser.apple.AppleSingleFileParser) > Getting unkown mimetype : application/vnd.apple.unknown.13 > Using all these modules : > tika-core,tika-parsers-standard-package,tika-parser-microsoft-module,tika-parser-sqlite3-package,tika-parser-scientific-module,tika-parser-zip-commons,tika-parser-apple-module -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (TIKA-3634) Failed to Parser Apple related files
[ https://issues.apache.org/jira/browse/TIKA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17465723#comment-17465723 ] Tilman Hausherr commented on TIKA-3634: --- Thanks for uploading the files. Please don't set the "fix" version, this is the future version where a Tika dev expects it to be fixed or has been fixed. It is not the version where it worked. (Is 1.27 the last version where it worked? If yes, then this is very valuable information) > Failed to Parser Apple related files > > > Key: TIKA-3634 > URL: https://issues.apache.org/jira/browse/TIKA-3634 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 2.2.1 >Reporter: Tika User >Priority: Blocker > Attachments: brochure.pages, keynotecreated.key, > mortgagecalculator.numbers > > > Unable to parse '.Number', '.key', '.pages' file using below class in xml > file(org.apache.tika.parser.apple.AppleSingleFileParser) > Getting unkown mimetype : application/vnd.apple.unknown.13 > Using all these modules : > tika-core,tika-parsers-standard-package,tika-parser-microsoft-module,tika-parser-sqlite3-package,tika-parser-scientific-module,tika-parser-zip-commons,tika-parser-apple-module -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (TIKA-3634) Failed to Parser Apple related files
[ https://issues.apache.org/jira/browse/TIKA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17465677#comment-17465677 ] Tilman Hausherr commented on TIKA-3634: --- Please don't ping people unless needed, we watch the dev list for new issues and some dev will or will not work on your issue if possible. Please attach a file that fails. > Failed to Parser Apple related files > > > Key: TIKA-3634 > URL: https://issues.apache.org/jira/browse/TIKA-3634 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 2.2.1 >Reporter: Tika User >Priority: Blocker > Fix For: 2.2.0 > > > Unable to parse '.Number', '.key', '.pages' file using below class in xml > file(org.apache.tika.parser.apple.AppleSingleFileParser) > Getting unkown mimetype : application/vnd.apple.unknown.13 > Using all these modules : > tika-core,tika-parsers-standard-package,tika-parser-microsoft-module,tika-parser-sqlite3-package,tika-parser-scientific-module,tika-parser-zip-commons,tika-parser-apple-module -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (TIKA-3634) Failed to Parser Apple related files
[ https://issues.apache.org/jira/browse/TIKA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17465675#comment-17465675 ] Tika User commented on TIKA-3634: - [~tallison] > Failed to Parser Apple related files > > > Key: TIKA-3634 > URL: https://issues.apache.org/jira/browse/TIKA-3634 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 2.2.1 >Reporter: Tika User >Priority: Blocker > Fix For: 2.2.0 > > > Unable to parse '.Number', '.key', '.pages' file using below class in xml > file(org.apache.tika.parser.apple.AppleSingleFileParser) > Getting unkown mimetype : application/vnd.apple.unknown.13 > Using all these modules : > tika-core,tika-parsers-standard-package,tika-parser-microsoft-module,tika-parser-sqlite3-package,tika-parser-scientific-module,tika-parser-zip-commons,tika-parser-apple-module -- This message was sent by Atlassian Jira (v8.20.1#820001)