[ https://issues.apache.org/jira/browse/TIKA-815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175403#comment-13175403 ]
Jerome Lacoste commented on TIKA-815: ------------------------------------- > For people with strong stability requirements, we provide the ForkParser To get the ForkParser to work, I've had to make 6 patches... And I haven't yet stress tested it. That makes me wary of using it in production! Please fix TIKA-808, TIKA-827 (optional), TIKA-828, TIKA-829, TIKA-830, TIKA-831 in that order. 0001-TIKA-808-tika-doesn-t-parse-PDF-file.-The-issue-is-c.patch 0002-TIKA-827-try-to-report-something-if-the-exception-is.patch (optional) 0003-TIKA-828-make-sure-the-exceptions-thrown-by-TaggedIn.patch 0004-TIKA-829-make-sure-tika-identifies-invalid-arguments.patch 0005-TIKA-830-Tike.parseToString-caused-ForkParser-to-try.patch 0006-TIKA-830-refactor-tests-for-clarity.patch 0007-TIKA-831-fix-for-errors-not-being-reported-properly-.patch Thanks > Tika parsers should handle failures more gracefully > --------------------------------------------------- > > Key: TIKA-815 > URL: https://issues.apache.org/jira/browse/TIKA-815 > Project: Tika > Issue Type: Test > Components: parser > Affects Versions: 1.0 > Reporter: Jerome Lacoste > > We encountered an OOM while parsing a Word document. We will report the > failure to POI. > This raises the question about the general robustness of the parsers. > We've written a little test tool that reproduces the aforementionned OOM and > other potential issues that will be reported to the individual parsers. It's > the responsibility of the parsers to handle those failures gracefully. > Yet it's easy to write generic tools at the Tika level to make these kind of > tests. > So we also submit this issue here to start a discussion on what role should > Tika have when it comes to validate its parsers. > Code here: https://github.com/lacostej/tika-hardener -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira