Tim Allison created TIKA-1948:
---------------------------------

             Summary: Catch exceptions per page in PDFParser
                 Key: TIKA-1948
                 URL: https://issues.apache.org/jira/browse/TIKA-1948
             Project: Tika
          Issue Type: Improvement
            Reporter: Tim Allison
            Assignee: Tim Allison
            Priority: Minor


In a discussion with [~tilman] somewhere(???), I think he observed that we 
weren't doing a try/catch for each page.  If there's an exception in an early 
page, it might still be possible to extract text from later pages in a 
problematic PDF.

With very minimal modifications we could add a try/catch per page, store the 
caught exceptions, and then throw the first caught exception after the parse 
finishes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to