[ https://issues.apache.org/jira/browse/NUTCH-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15013545#comment-15013545 ]
Julien Nioche commented on NUTCH-2069: -------------------------------------- > I propose to modes to be named just 'host' and 'domain'. As they are > elsewhere. Not really, see fetcher.queue.mode and partition.url.mode [https://github.com/apache/nutch/blob/trunk/conf/nutch-default.xml#L723] This issue is not about fixing existing discrepancies, this should be addressed separately. As for mixing bydomain and byDomain we do that only when comparing the strings {code} if ("bydomain".equalsIgnoreCase(ignoreExternalLinksMode)) {code} changing to "byDomain" won't make any difference but feel free to change this if you feel strongly about it > Ignore external links based on domain > ------------------------------------- > > Key: NUTCH-2069 > URL: https://issues.apache.org/jira/browse/NUTCH-2069 > Project: Nutch > Issue Type: Improvement > Components: fetcher, parser > Affects Versions: 1.10 > Reporter: Julien Nioche > Attachments: NUTCH-2069.patch, NUTCH-2069.v2.patch > > > We currently have `db.ignore.external.links` which is a nice way of > restricting the crawl based on the hostname. This adds a new parameter > 'db.ignore.external.links.domain' to do the same based on the domain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)