[ https://issues.apache.org/jira/browse/NUTCH-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma updated NUTCH-2221: --------------------------------- Description: FetcherThread has support for db.ignore.external.links. In config you can find db.ignore.internal.links as well, but it only operates on LinkDB, which is confusing. This patch will introduce db.ignore.internal.links to FetcherThread, similar to db.ignore.external.links. With both parameter set to true you can limit the crawl to the injected seed list. was:FetcherThread has support for db.ignore.external.links. In config you can find ce db.ignore.internal.links as well, but it only operates on LinkDB. This patch will introduce db.ignore.internal.links to FetcherThread, similar to db.ignore.external.links. > Introduce db.ignore.internal.links to FetcherThread > --------------------------------------------------- > > Key: NUTCH-2221 > URL: https://issues.apache.org/jira/browse/NUTCH-2221 > Project: Nutch > Issue Type: Improvement > Components: fetcher > Affects Versions: 1.11 > Reporter: Markus Jelsma > Fix For: 1.12 > > > FetcherThread has support for db.ignore.external.links. In config you can > find db.ignore.internal.links as well, but it only operates on LinkDB, which > is confusing. This patch will introduce db.ignore.internal.links to > FetcherThread, similar to db.ignore.external.links. With both parameter set > to true you can limit the crawl to the injected seed list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)