[ 
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14170919#comment-14170919
 ] 

Tim Allison commented on TIKA-1442:
-----------------------------------

dirname/filename.pdf would be great, but I'm happy to munge whatever you 
contribute.

Your recommended criteria are exactly what I was thinking.  There have been a 
handful of cases where I've been able to get good text via PDFBox but not via 
Adobe Reader, but this is just a flag for triaging potential changes, not an 
absolute "remove file from corpus".

> Upgrade to PDFBox 1.8.8
> -----------------------
>
>                 Key: TIKA-1442
>                 URL: https://issues.apache.org/jira/browse/TIKA-1442
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Assignee: Tim Allison
>             Fix For: 1.7
>
>
> Given the regressions we identified in PDFBox 1.8.7, we should upgrade to 
> 1.8.8 as soon as it is ready.  I'm tempted to call this a blocker on Tika 
> 1.7.  Let's use this issue to carry on the discussion of regression testing 
> (if any further discussion is necessary) or any other prep that needs to 
> happen before 1.8.8's release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to