Hello Victor,

  If I understand correctly, you want to use a seed list that contains
some sites, and then do an internal search only on pages belonging to
these sites. In this case, it's best not to crawl pages from other
sites. This can be done by setting db.ignore.external.links to false in
your nutch-site.xml. This will ensure that your crawl is only limited to
pages from initially injected hosts.

Regards,

-vishal.

-----Original Message-----
From: victor_emailbox [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 31, 2006 10:51 AM
To: [email protected]
Subject: Re: How to Make Nutch Return Search Results Belonged to the
Crawl URL Li


No, I meant if the crawling url lists have http://www.abc.com and
http://www.bcc.com, and both urls contains the term "hello".  bbc.com
also
has a link that references ccc.com which also contains the term "hello"
but
it is not part of the crawling url lists.

So when I do a search on "hello", will Nutch return abc.com, bcc.com and
ccc.com in default?  If so,  how to force Nutch to return both abc.com
and
bcc.com without ccc.com?  

Thanks.


Zaheed Haque wrote:
> 
> Hi
> 
> You mean show results from a site http://abc.com only. If so you need
> to turn on your index-more and query-more plugins in nutch-site.xml
> then you need to use query like  site:http://abc.com +query term or
> url: .. I think its site not sure.
> 
> Cheers
> 
> On 8/31/06, victor_emailbox <[EMAIL PROTECTED]> wrote:
>>
>> Hi,
>>   I enter 10 urls in the url crawling list.  Nutch does its thing to
>> fetch
>> and index them.  How to I force Nutch to return search results that
>> belongs
>> to the url list?  e.g. if the url crawling list has only
>> http://www.abc.com
>> and http://www.bcc.com, then all search result should be under either
>> abc.com or bbc.com, not ccc.com even if bbc.com contains links
referring
>> to
>> ccc.com.
>>
>> Many thanks.
>> --
>> View this message in context:
>>
http://www.nabble.com/How-to-Make-Nutch-Return-Search-Results-Belonged-t
o-the-Crawl-URL-List--tf2194391.html#a6072986
>> Sent from the Nutch - User forum at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context:
http://www.nabble.com/How-to-Make-Nutch-Return-Search-Results-Belonged-t
o-the-Crawl-URL-List--tf2194391.html#a6073242
Sent from the Nutch - User forum at Nabble.com.


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to