[ 
https://issues.apache.org/jira/browse/NUTCH-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated NUTCH-2221:
---------------------------------
    Description: 
FetcherThread has support for db.ignore.external.links. In config you can find 
db.ignore.internal.links as well, but it only operates on LinkDB, which is 
confusing. This patch will introduce db.ignore.internal.links to FetcherThread, 
similar to db.ignore.external.links. With both parameter set to true you can 
limit the crawl to the injected seed list.


  was:FetcherThread has support for db.ignore.external.links. In config you can 
find ce db.ignore.internal.links as well, but it only operates on LinkDB. This 
patch will introduce db.ignore.internal.links to FetcherThread, similar to 
db.ignore.external.links.


> Introduce db.ignore.internal.links to FetcherThread
> ---------------------------------------------------
>
>                 Key: NUTCH-2221
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2221
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 1.11
>            Reporter: Markus Jelsma
>             Fix For: 1.12
>
>
> FetcherThread has support for db.ignore.external.links. In config you can 
> find db.ignore.internal.links as well, but it only operates on LinkDB, which 
> is confusing. This patch will introduce db.ignore.internal.links to 
> FetcherThread, similar to db.ignore.external.links. With both parameter set 
> to true you can limit the crawl to the injected seed list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to