[ https://issues.apache.org/jira/browse/NUTCH-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885260#action_12885260 ]
Julien Nioche edited comment on NUTCH-696 at 7/5/10 11:13 AM: -------------------------------------------------------------- +1 : this is definitely useful. Hopefully the underlying parsers in Tika are constantly improved to prevent loops and crashes but having the parser timeout on top would be great I suggest we mark it for 2.0 and 1.2 was (Author: jnioche): +1 : this is definitely useful. Hopefully the underlying parsers in Tika are constantly improved to prevent loops and crashes but having the parser timeout on top would be great > Timeout for Parser > ------------------ > > Key: NUTCH-696 > URL: https://issues.apache.org/jira/browse/NUTCH-696 > Project: Nutch > Issue Type: Wish > Components: fetcher > Reporter: Julien Nioche > Priority: Minor > Attachments: timeout.patch > > > I found that the parsing sometimes crashes due to a problem on a specific > document, which is a bit of a shame as this blocks the rest of the segment > and Hadoop ends up finding that the node does not respond. I was wondering > about whether it would make sense to have a timeout mechanism for the parsing > so that if a document is not parsed after a time t, it is simply treated as > an exception and we can get on with the rest of the process. > Does that make sense? Where do you think we should implement that, in > ParseUtil? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.