[ 
https://issues.apache.org/jira/browse/NUTCH-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15012193#comment-15012193
 ] 

Markus Jelsma commented on NUTCH-2069:
--------------------------------------

Hi J - i agree with the mode! Have it defaulted so it never breaks older 
instances and doesn't allow excluding both. Your follow up patch is probably 
spot on, have you got one? It can still come in 1.11!
M.

> Ignore external links based on domain
> -------------------------------------
>
>                 Key: NUTCH-2069
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2069
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher, parser
>    Affects Versions: 1.10
>            Reporter: Julien Nioche
>         Attachments: NUTCH-2069.patch
>
>
> We currently have `db.ignore.external.links` which is a nice way of 
> restricting the crawl based on the hostname. This adds a new parameter 
> 'db.ignore.external.links.domain' to do the same based on the domain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to