You can use the domain-urlfilter to limit the crawl to specific domains. Limiting the search to specific domains would take one of the following:

1) A plugin that changes the search query to include "and (site:xxx or site:xxx). I don't recommend this way, just including for completeness.

2) Another field in your index, say sitegroup, with value x. Same value for all 150 sites. Then do query and sitegroup:x.

Dennis

Ian.huang wrote:
Dmitry,

Thanks. What I want is to restrict search result in limited domains after whole index has built.

say, I built index for 150 sites, but want to search index from 5 of them. It is like searching by : site:www.a.com or site:www.b.com or site:www.c.com

Ian

--------------------------------------------------
From: "Dmitry Lihachev" <[email protected]>
Sent: Wednesday, April 22, 2009 7:45 AM
To: <[email protected]>
Subject: Re: how to restrict search result  in defined domains?

Hi Ian.
hi, everyone

I used nutch to crawl about 150 websites, result is quite good.

If I only want to search result in a list of defined domains, how can I
make it?

Thanks!

Ian

Put below lines in your nutch-site.xml

<property>
 <name>db.ignore.external.links</name>
 <value>true</value>
 <description>If true, outlinks leading from a page to external hosts
 will be ignored. This is an effective way to limit the crawl to include
 only initially injected hosts, without creating complex URLFilters.
 </description>
</property>

--
Regards,
Dmitry Lihachev

Reply via email to