[ https://issues.apache.org/jira/browse/NUTCH-712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679582#action_12679582 ]
Andrzej Bialecki commented on NUTCH-712: ----------------------------------------- I'm not sure that ignoring this exception is the right thing to do ... if we fail to normalize the url, we also fail to filter it. This means that if we proceed as if nothing happened (which your patch does) we could end up with many unfiltered junk urls. I think a better alternative is to return, i.e. to skip this record without further processing. > ParseOutputFormat should catch java.net.MalformedURLException coming from > normalizers > ------------------------------------------------------------------------------------- > > Key: NUTCH-712 > URL: https://issues.apache.org/jira/browse/NUTCH-712 > Project: Nutch > Issue Type: Improvement > Affects Versions: 1.0.0 > Reporter: Julien Nioche > Attachments: ParseOutputFormat-NUTCH712.patch > > > ParseOutputFormat should catch java.net.MalformedURLException coming from > normalizers otherwise the whole parsing step crashes instead of simply > ignoring dodgy outlinks -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.