[ https://issues.apache.org/jira/browse/NUTCH-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116767#comment-13116767 ]
Sebastian Nagel edited comment on NUTCH-1106 at 9/28/11 8:28 PM: ----------------------------------------------------------------- Why not just add a regex-urlfilter rule: {code} # skip URLs longer than 120 characters -^.{121,}$ {code} was (Author: wastl-nagel): Why not just add a regex-urlfilter rule: # skip URLs longer than 120 characters -^.{121,}$ > Options to skip url's based on length > ------------------------------------- > > Key: NUTCH-1106 > URL: https://issues.apache.org/jira/browse/NUTCH-1106 > Project: Nutch > Issue Type: Improvement > Components: linkdb > Affects Versions: 1.3 > Reporter: Markus Jelsma > Assignee: Markus Jelsma > Fix For: 1.5 > > Attachments: NUTCH-1106-1.4-1.patch > > > Adds option to skip URL's exceeding a certain length. At first we used regex > to impose this limit but having this options configurable is more convenient. > Comments? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira