[ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173983#comment-14173983 ]
Tilman Hausherr commented on TIKA-1442: --------------------------------------- After some more research, I was able to decode 5 more files (the cause was not the LZW filter, see ). However 7 other files are really corrupt, portions of the files are blank when shown in AR: 115/115269.pdf 211/211876.pdf 268/268346.pdf 389/389474.pdf 443/443752.pdf 698/698813.pdf 846/846759.pdf > Upgrade to PDFBox 1.8.8 > ----------------------- > > Key: TIKA-1442 > URL: https://issues.apache.org/jira/browse/TIKA-1442 > Project: Tika > Issue Type: Improvement > Reporter: Tim Allison > Assignee: Tim Allison > Fix For: 1.7 > > > Given the regressions we identified in PDFBox 1.8.7, we should upgrade to > 1.8.8 as soon as it is ready. I'm tempted to call this a blocker on Tika > 1.7. Let's use this issue to carry on the discussion of regression testing > (if any further discussion is necessary) or any other prep that needs to > happen before 1.8.8's release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)