You can use the domain-urlfilter to limit the crawl to specific domains.
Limiting the search to specific domains would take one of the following:
1) A plugin that changes the search query to include "and (site:xxx or
site:xxx). I don't recommend this way, just including for completeness.
2) Another field in your index, say sitegroup, with value x. Same value
for all 150 sites. Then do query and sitegroup:x.
Dennis
Ian.huang wrote:
Dmitry,
Thanks. What I want is to restrict search result in limited domains
after whole index has built.
say, I built index for 150 sites, but want to search index from 5 of
them. It is like searching by : site:www.a.com or site:www.b.com or
site:www.c.com
Ian
--------------------------------------------------
From: "Dmitry Lihachev" <[email protected]>
Sent: Wednesday, April 22, 2009 7:45 AM
To: <[email protected]>
Subject: Re: how to restrict search result in defined domains?
Hi Ian.
hi, everyone
I used nutch to crawl about 150 websites, result is quite good.
If I only want to search result in a list of defined domains, how can I
make it?
Thanks!
Ian
Put below lines in your nutch-site.xml
<property>
<name>db.ignore.external.links</name>
<value>true</value>
<description>If true, outlinks leading from a page to external hosts
will be ignored. This is an effective way to limit the crawl to include
only initially injected hosts, without creating complex URLFilters.
</description>
</property>
--
Regards,
Dmitry Lihachev