[ 
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371565#comment-14371565
 ] 

Tim Allison edited comment on TIKA-1575 at 3/20/15 4:33 PM:
------------------------------------------------------------

Hi Tilman,

  The .json files were the ones created during the multi-threaded batch run.  

  When I just ran the exact same tika-app.jars that I ran in batch mode on the 
same OS, I'm getting no difference between PDFBox 1.8.8 and 1.8.9 for 524276, 
as you found.  However, I'm still seeing 3 copies of the footer with 1.8.8 but 
only one copy with 1.8.9 for 719128.

  PDFBox app's ExtractText isn't pulling as much text (AcroForm data?) as 
Tika's, and I agree that there is no difference between text extracted with 
1.8.8 and 1.8.9 with PDFBox app's ExtractText for 719128.


was (Author: talli...@mitre.org):
Hi Tilman,
  The .json files were the ones created during the multi-threaded batch run.  
  When I just ran the exact same tika-app.jars that I ran in batch mode on the 
same OS, I'm getting no difference between PDFBox 1.8.8 and 1.8.9 for 524276, 
as you found.  However, I'm still seeing 3 copies of the footer with 1.8.8 but 
only one copy with 1.8.9.


> Upgrade to PDFBox 1.8.9 when available
> --------------------------------------
>
>                 Key: TIKA-1575
>                 URL: https://issues.apache.org/jira/browse/TIKA-1575
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Minor
>         Attachments: 005937.pdf.json, 005937_1_8_9-SNAPSHOT.pdf.json, 
> 10-814_Appendix B_v3.pdf, 524276_719128_diffs.zip, 
> PDFBox_1_8_8VPDFBox_1_8_9-SNAPSHOT.xlsx, 
> PDFBox_1_8_8VPDFBox_1_8_9-SNAPSHOT_reports.zip, 
> PDFBox_1_8_8Vs1_8_9_20150316.zip, content_diffs_20150316.xlsx
>
>
> The PDFBox community is about to release 1.8.9.  Let's use this issue to 
> track discussions before the release and to track Tika's upgrade to PDFBox 
> 1.8.9



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to