[jira] [Commented] (TIKA-3634) Failed to Parser Apple related files

2022-01-13 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475454#comment-17475454
 ] 

Tim Allison commented on TIKA-3634:
---

That was a bad commit message.  Mea culpa.  That was for TIKA-3642

> Failed to Parser Apple related files
> 
>
> Key: TIKA-3634
> URL: https://issues.apache.org/jira/browse/TIKA-3634
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.2.1
>Reporter: Tika User
>Assignee: Tim Allison
>Priority: Major
> Attachments: brochure.pages, keynotecreated.key, 
> mortgagecalculator.numbers
>
>
> Unable to parse '.Number', '.key', '.pages' file using below class in xml 
> file(org.apache.tika.parser.apple.AppleSingleFileParser)
> Getting unkown mimetype : application/vnd.apple.unknown.13
> Using all these modules :
> tika-core,tika-parsers-standard-package,tika-parser-microsoft-module,tika-parser-sqlite3-package,tika-parser-scientific-module,tika-parser-zip-commons,tika-parser-apple-module



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3634) Failed to Parser Apple related files

2022-01-13 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475393#comment-17475393
 ] 

Hudson commented on TIKA-3634:
--

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #415 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/415/])
TIKA-3634 -- improve detection of iworks 13 files and extraction of thumbnails 
and attachments (tallison: 
[https://github.com/apache/tika/commit/0a8da94c5fc49e706a77245da323088115cc22c9])
* (edit) CHANGES.txt
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/main/java/org/apache/tika/parser/pdf/PDFParserConfig.java


> Failed to Parser Apple related files
> 
>
> Key: TIKA-3634
> URL: https://issues.apache.org/jira/browse/TIKA-3634
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.2.1
>Reporter: Tika User
>Assignee: Tim Allison
>Priority: Major
> Attachments: brochure.pages, keynotecreated.key, 
> mortgagecalculator.numbers
>
>
> Unable to parse '.Number', '.key', '.pages' file using below class in xml 
> file(org.apache.tika.parser.apple.AppleSingleFileParser)
> Getting unkown mimetype : application/vnd.apple.unknown.13
> Using all these modules :
> tika-core,tika-parsers-standard-package,tika-parser-microsoft-module,tika-parser-sqlite3-package,tika-parser-scientific-module,tika-parser-zip-commons,tika-parser-apple-module



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3634) Failed to Parser Apple related files

2022-01-10 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472243#comment-17472243
 ] 

Tim Allison commented on TIKA-3634:
---

pdfbox is not in core; it is brought in via tika-parsers-standard-package.  
You'd have to build your own tika-parsers-standard-package to downgrade PDFBox. 
 If there's a regression, you should open a ticket with PDFBox, obviously.  
What breaks for you with the latest PDFBox?

 

Yes, this will be out with the next release.  I have no idea when that will be.

> Failed to Parser Apple related files
> 
>
> Key: TIKA-3634
> URL: https://issues.apache.org/jira/browse/TIKA-3634
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.2.1
>Reporter: Tika User
>Assignee: Tim Allison
>Priority: Major
> Attachments: brochure.pages, keynotecreated.key, 
> mortgagecalculator.numbers
>
>
> Unable to parse '.Number', '.key', '.pages' file using below class in xml 
> file(org.apache.tika.parser.apple.AppleSingleFileParser)
> Getting unkown mimetype : application/vnd.apple.unknown.13
> Using all these modules :
> tika-core,tika-parsers-standard-package,tika-parser-microsoft-module,tika-parser-sqlite3-package,tika-parser-scientific-module,tika-parser-zip-commons,tika-parser-apple-module



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3634) Failed to Parser Apple related files

2022-01-10 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472237#comment-17472237
 ] 

Hudson commented on TIKA-3634:
--

UNSTABLE: Integrated in Jenkins build Tika » tika-main-jdk8 #412 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/412/])
TIKA-3634 -- improve detection of iworks 13 files and extraction of thumbnails 
and attachments (tallison: 
[https://github.com/apache/tika/commit/7e4edfce6722cbf0be0deab9b9e3f23073406dd4])
* (edit) CHANGES.txt
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-apple-module/src/test/java/org/apache/tika/parser/iwork/iwana/IWork13ParserTest.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-apple-module/src/main/java/org/apache/tika/parser/iwork/iwana/IWork13PackageParser.java


> Failed to Parser Apple related files
> 
>
> Key: TIKA-3634
> URL: https://issues.apache.org/jira/browse/TIKA-3634
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.2.1
>Reporter: Tika User
>Assignee: Tim Allison
>Priority: Major
> Attachments: brochure.pages, keynotecreated.key, 
> mortgagecalculator.numbers
>
>
> Unable to parse '.Number', '.key', '.pages' file using below class in xml 
> file(org.apache.tika.parser.apple.AppleSingleFileParser)
> Getting unkown mimetype : application/vnd.apple.unknown.13
> Using all these modules :
> tika-core,tika-parsers-standard-package,tika-parser-microsoft-module,tika-parser-sqlite3-package,tika-parser-scientific-module,tika-parser-zip-commons,tika-parser-apple-module



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3634) Failed to Parser Apple related files

2022-01-10 Thread Tika User (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472195#comment-17472195
 ] 

Tika User commented on TIKA-3634:
-

The fix should be available in next release? For now I handled in our code 
based on tika extension I am setting mime type. Also thanks for letting me know 
about zip module , I will have a look on that. Also can you let me know 
apache.pdf tool is included in core right? Is there a way to downgrade not 
using latest version 

> Failed to Parser Apple related files
> 
>
> Key: TIKA-3634
> URL: https://issues.apache.org/jira/browse/TIKA-3634
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.2.1
>Reporter: Tika User
>Assignee: Tim Allison
>Priority: Major
> Attachments: brochure.pages, keynotecreated.key, 
> mortgagecalculator.numbers
>
>
> Unable to parse '.Number', '.key', '.pages' file using below class in xml 
> file(org.apache.tika.parser.apple.AppleSingleFileParser)
> Getting unkown mimetype : application/vnd.apple.unknown.13
> Using all these modules :
> tika-core,tika-parsers-standard-package,tika-parser-microsoft-module,tika-parser-sqlite3-package,tika-parser-scientific-module,tika-parser-zip-commons,tika-parser-apple-module



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3634) Failed to Parser Apple related files

2022-01-10 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472189#comment-17472189
 ] 

Tim Allison commented on TIKA-3634:
---

I pushed a fix that takes into account the file names for disambiguation 
between numbers and pages if the user sends in the file name.  I've also added 
better parsing for the thumbnail image file, and I now have the iworkspackage 
parser handle metadata plist files and any file that doesn't end in .iwa (for 
iworks 13 files).

 

Not sure if/when we should add in the iworks18 parser that does minimal 
detection for those files.

 

Note that none of the above actually extracts content from the iworks13 and 
iworks 18 files.  We still do not have a parser for those files.  We also still 
have no way of disambiguating numbers and pages without file names (e.g. if the 
user sends in only a stream without a file name). :(

> Failed to Parser Apple related files
> 
>
> Key: TIKA-3634
> URL: https://issues.apache.org/jira/browse/TIKA-3634
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.2.1
>Reporter: Tika User
>Assignee: Tim Allison
>Priority: Major
> Attachments: brochure.pages, keynotecreated.key, 
> mortgagecalculator.numbers
>
>
> Unable to parse '.Number', '.key', '.pages' file using below class in xml 
> file(org.apache.tika.parser.apple.AppleSingleFileParser)
> Getting unkown mimetype : application/vnd.apple.unknown.13
> Using all these modules :
> tika-core,tika-parsers-standard-package,tika-parser-microsoft-module,tika-parser-sqlite3-package,tika-parser-scientific-module,tika-parser-zip-commons,tika-parser-apple-module



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3634) Failed to Parser Apple related files

2022-01-10 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472053#comment-17472053
 ] 

Tim Allison commented on TIKA-3634:
---

Thank you for submitting the bug and sharing triggering files.

A couple of items unrelated to the problem:
 * AppleSingleFileParser does not handle iworks files.  That is for a 
completely unrelated file format: 
[https://en.wikipedia.org/wiki/AppleSingle_and_AppleDouble_formats]
 * You shouldn't need to add: tika-parser-zip-commons,tika-parser-apple-module. 
 These should be included in tika-parsers-standard-package.  If they're not, 
that's a serious problem.  Please open a different ticket.

I regret I'm still not clear on what we need to fix.

With Tika 1.28, I get {{application/vnd.apple.unknown.13}} for the *.numbers 
file and *.pages file; I get {{application/vnd.apple.keynote.13}} for the .key 
file.  No attachments or text are extracted from any of those.

 

With Tika 2.2.1, I get {{application/vnd.apple.unknown.13}} all three (*.pages, 
*.key , *.numbers files), but then the packageparser parses all embedded files 
that Tika supports.

 

What is the desired behavior?

> Failed to Parser Apple related files
> 
>
> Key: TIKA-3634
> URL: https://issues.apache.org/jira/browse/TIKA-3634
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.2.1
>Reporter: Tika User
>Assignee: Tim Allison
>Priority: Blocker
> Attachments: brochure.pages, keynotecreated.key, 
> mortgagecalculator.numbers
>
>
> Unable to parse '.Number', '.key', '.pages' file using below class in xml 
> file(org.apache.tika.parser.apple.AppleSingleFileParser)
> Getting unkown mimetype : application/vnd.apple.unknown.13
> Using all these modules :
> tika-core,tika-parsers-standard-package,tika-parser-microsoft-module,tika-parser-sqlite3-package,tika-parser-scientific-module,tika-parser-zip-commons,tika-parser-apple-module



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3634) Failed to Parser Apple related files

2021-12-29 Thread Tika User (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17466374#comment-17466374
 ] 

Tika User commented on TIKA-3634:
-

For 2.0.0 version the .key files mime type is correct but failing to extract 
attachments. But for 2.1.1 all the attachments are extracted but mime type is 
showing as unknown (application/vnd.apple.unknown.13)

> Failed to Parser Apple related files
> 
>
> Key: TIKA-3634
> URL: https://issues.apache.org/jira/browse/TIKA-3634
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.2.1
>Reporter: Tika User
>Priority: Blocker
> Attachments: brochure.pages, keynotecreated.key, 
> mortgagecalculator.numbers
>
>
> Unable to parse '.Number', '.key', '.pages' file using below class in xml 
> file(org.apache.tika.parser.apple.AppleSingleFileParser)
> Getting unkown mimetype : application/vnd.apple.unknown.13
> Using all these modules :
> tika-core,tika-parsers-standard-package,tika-parser-microsoft-module,tika-parser-sqlite3-package,tika-parser-scientific-module,tika-parser-zip-commons,tika-parser-apple-module



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3634) Failed to Parser Apple related files

2021-12-28 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17466098#comment-17466098
 ] 

Tilman Hausherr commented on TIKA-3634:
---

Only the {{keynotecreated.key}} file is detected differently in 2.0.0. This is 
related to a small change in {{IWork13PackageParser.java}} done in TIKA-3517. 
There is a TODO there which I haven't understood. Maybe related to the format 
of {{Document.iwa}}.

> Failed to Parser Apple related files
> 
>
> Key: TIKA-3634
> URL: https://issues.apache.org/jira/browse/TIKA-3634
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.2.1
>Reporter: Tika User
>Priority: Blocker
> Attachments: brochure.pages, keynotecreated.key, 
> mortgagecalculator.numbers
>
>
> Unable to parse '.Number', '.key', '.pages' file using below class in xml 
> file(org.apache.tika.parser.apple.AppleSingleFileParser)
> Getting unkown mimetype : application/vnd.apple.unknown.13
> Using all these modules :
> tika-core,tika-parsers-standard-package,tika-parser-microsoft-module,tika-parser-sqlite3-package,tika-parser-scientific-module,tika-parser-zip-commons,tika-parser-apple-module



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3634) Failed to Parser Apple related files

2021-12-28 Thread Tika User (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17466053#comment-17466053
 ] 

Tika User commented on TIKA-3634:
-

Fyi. This is working fine in 2.0.0 version. Able to get correct file type but 
text is unavailable as iWork is not implemented(seen this comment in one of the 
incident).

> Failed to Parser Apple related files
> 
>
> Key: TIKA-3634
> URL: https://issues.apache.org/jira/browse/TIKA-3634
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.2.1
>Reporter: Tika User
>Priority: Blocker
> Attachments: brochure.pages, keynotecreated.key, 
> mortgagecalculator.numbers
>
>
> Unable to parse '.Number', '.key', '.pages' file using below class in xml 
> file(org.apache.tika.parser.apple.AppleSingleFileParser)
> Getting unkown mimetype : application/vnd.apple.unknown.13
> Using all these modules :
> tika-core,tika-parsers-standard-package,tika-parser-microsoft-module,tika-parser-sqlite3-package,tika-parser-scientific-module,tika-parser-zip-commons,tika-parser-apple-module



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3634) Failed to Parser Apple related files

2021-12-27 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17465835#comment-17465835
 ] 

Tilman Hausherr commented on TIKA-3634:
---

I have no idea, at this time no dev has picked up the problem. You submitted 
this 8 hours ago. This is a volunteer project, and it's winter holiday season.

> Failed to Parser Apple related files
> 
>
> Key: TIKA-3634
> URL: https://issues.apache.org/jira/browse/TIKA-3634
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.2.1
>Reporter: Tika User
>Priority: Blocker
> Attachments: brochure.pages, keynotecreated.key, 
> mortgagecalculator.numbers
>
>
> Unable to parse '.Number', '.key', '.pages' file using below class in xml 
> file(org.apache.tika.parser.apple.AppleSingleFileParser)
> Getting unkown mimetype : application/vnd.apple.unknown.13
> Using all these modules :
> tika-core,tika-parsers-standard-package,tika-parser-microsoft-module,tika-parser-sqlite3-package,tika-parser-scientific-module,tika-parser-zip-commons,tika-parser-apple-module



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3634) Failed to Parser Apple related files

2021-12-27 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17465723#comment-17465723
 ] 

Tilman Hausherr commented on TIKA-3634:
---

Thanks for uploading the files. Please don't set the "fix" version, this is the 
future version where a Tika dev expects it to be fixed or has been fixed. It is 
not the version where it worked. (Is 1.27 the last version where it worked? If 
yes, then this is very valuable information)

> Failed to Parser Apple related files
> 
>
> Key: TIKA-3634
> URL: https://issues.apache.org/jira/browse/TIKA-3634
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.2.1
>Reporter: Tika User
>Priority: Blocker
> Attachments: brochure.pages, keynotecreated.key, 
> mortgagecalculator.numbers
>
>
> Unable to parse '.Number', '.key', '.pages' file using below class in xml 
> file(org.apache.tika.parser.apple.AppleSingleFileParser)
> Getting unkown mimetype : application/vnd.apple.unknown.13
> Using all these modules :
> tika-core,tika-parsers-standard-package,tika-parser-microsoft-module,tika-parser-sqlite3-package,tika-parser-scientific-module,tika-parser-zip-commons,tika-parser-apple-module



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3634) Failed to Parser Apple related files

2021-12-27 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17465677#comment-17465677
 ] 

Tilman Hausherr commented on TIKA-3634:
---

Please don't ping people unless needed, we watch the dev list for new issues 
and some dev will or will not work on your issue if possible.

Please attach a file that fails.

> Failed to Parser Apple related files
> 
>
> Key: TIKA-3634
> URL: https://issues.apache.org/jira/browse/TIKA-3634
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.2.1
>Reporter: Tika User
>Priority: Blocker
> Fix For: 2.2.0
>
>
> Unable to parse '.Number', '.key', '.pages' file using below class in xml 
> file(org.apache.tika.parser.apple.AppleSingleFileParser)
> Getting unkown mimetype : application/vnd.apple.unknown.13
> Using all these modules :
> tika-core,tika-parsers-standard-package,tika-parser-microsoft-module,tika-parser-sqlite3-package,tika-parser-scientific-module,tika-parser-zip-commons,tika-parser-apple-module



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3634) Failed to Parser Apple related files

2021-12-27 Thread Tika User (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17465675#comment-17465675
 ] 

Tika User commented on TIKA-3634:
-

[~tallison] 

> Failed to Parser Apple related files
> 
>
> Key: TIKA-3634
> URL: https://issues.apache.org/jira/browse/TIKA-3634
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.2.1
>Reporter: Tika User
>Priority: Blocker
> Fix For: 2.2.0
>
>
> Unable to parse '.Number', '.key', '.pages' file using below class in xml 
> file(org.apache.tika.parser.apple.AppleSingleFileParser)
> Getting unkown mimetype : application/vnd.apple.unknown.13
> Using all these modules :
> tika-core,tika-parsers-standard-package,tika-parser-microsoft-module,tika-parser-sqlite3-package,tika-parser-scientific-module,tika-parser-zip-commons,tika-parser-apple-module



--
This message was sent by Atlassian Jira
(v8.20.1#820001)