[ https://issues.apache.org/jira/browse/NUTCH-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma updated NUTCH-2993: --------------------------------- Attachment: (was: NUTCH-2993-1.15.patch) > ScoringDepth plugin to skip depth check based on URL Pattern > ------------------------------------------------------------ > > Key: NUTCH-2993 > URL: https://issues.apache.org/jira/browse/NUTCH-2993 > Project: Nutch > Issue Type: Improvement > Reporter: Markus Jelsma > Assignee: Markus Jelsma > Priority: Minor > Fix For: 1.20 > > Attachments: NUTCH-2993-1.15-1.patch > > > We do not want some crawl to go deep and broad, but instead focus it on a > narrow section of sites. This patch skips the depth check if the current URL > matches some regular expression. > > Initially we tried to set a custom maxDepth based on a Pattern match, but > this didn't work. The crawler still managed to creep too deep due to having > links everywhere. -- This message was sent by Atlassian Jira (v8.20.10#820010)