[ 
https://issues.apache.org/jira/browse/TIKA-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14943339#comment-14943339
 ] 

Tim Allison edited comment on TIKA-1285 at 10/5/15 1:14 PM:
------------------------------------------------------------

Thank you, [~b...@benmccann.com]!  The more eyes we have on this the better for 
both projects.

Updated working wrapper is available 
[here|https://github.com/tballison/tika/tree/pdfbox2_0].  Some clean up 
remains...

[~arkadyzalko] and [~jayesh_ag], would you be willing to run this on your 
batches of docs and let us know what you find?  Extra points if you can compare 
 memory usage and time to parse vs. 1.8.10! :)

Also extra points for running this with the extract embedded images parameter 
turned on.


was (Author: talli...@mitre.org):
Thank you, [~b...@benmccann.com]!  The more eyes we have on this the better for 
both projects.

Updated working wrapper is available 
[here|https://github.com/tballison/tika/tree/pdfbox2_0].  Some clean up 
remains...

[~arkadyzalko], would you be willing to run this on your batch of docs and let 
us know what you find?

> Upgrade to PDFBox 2.0.0 when available
> --------------------------------------
>
>                 Key: TIKA-1285
>                 URL: https://issues.apache.org/jira/browse/TIKA-1285
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.6
>            Reporter: Jeremy Anderson
>            Priority: Minor
>         Attachments: TIKA-1285.patch, TIKA-1285_rev1641423.patch, 
> TIKA-1285v3.patch, pdfbox_reports_2_0_0_20150709.zip, 
> testPDF_childAttachments.pdf
>
>
> This issue is to track fixes required when upgrading the PDFbox dependency to 
> 2.0.0 Final once it's available, and using PDFBox's daily build before then.
> See TIKA-1268 comment.
> Relates to PDFBOX-1893



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to