[ 
https://issues.apache.org/jira/browse/TIKA-815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175403#comment-13175403
 ] 

Jerome Lacoste commented on TIKA-815:
-------------------------------------

> For people with strong stability requirements, we provide the ForkParser

To get the ForkParser to work, I've had to make 6 patches... And I haven't yet 
stress tested it. That makes me wary of using it in production!

Please fix TIKA-808, TIKA-827 (optional), TIKA-828, TIKA-829, TIKA-830, 
TIKA-831 in that order.

0001-TIKA-808-tika-doesn-t-parse-PDF-file.-The-issue-is-c.patch
0002-TIKA-827-try-to-report-something-if-the-exception-is.patch  (optional)
0003-TIKA-828-make-sure-the-exceptions-thrown-by-TaggedIn.patch
0004-TIKA-829-make-sure-tika-identifies-invalid-arguments.patch
0005-TIKA-830-Tike.parseToString-caused-ForkParser-to-try.patch
0006-TIKA-830-refactor-tests-for-clarity.patch
0007-TIKA-831-fix-for-errors-not-being-reported-properly-.patch

Thanks

                
> Tika parsers should handle failures more gracefully
> ---------------------------------------------------
>
>                 Key: TIKA-815
>                 URL: https://issues.apache.org/jira/browse/TIKA-815
>             Project: Tika
>          Issue Type: Test
>          Components: parser
>    Affects Versions: 1.0
>            Reporter: Jerome Lacoste
>
> We encountered an OOM while parsing a Word document. We will report the 
> failure to POI.
> This raises the question about the general robustness of the parsers.
> We've written a little test tool that reproduces the aforementionned OOM and 
> other potential issues that will be reported to the individual parsers. It's 
> the responsibility of the parsers to handle those failures gracefully.
> Yet it's easy to write generic tools at the Tika level to make these kind of 
> tests.
> So we also submit this issue here to start a discussion on what role should 
> Tika have when it comes to validate its parsers.
> Code here: https://github.com/lacostej/tika-hardener

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to