Hello,

What I am really trying to do is understand Nutch and specifically the crawler. 
As an exercise, I have
selected a couple of web sites to crawl, rather than the whole web. 

One of the issues I am struggling with is re-crawling. My understanding from 
browsing the mailing list is that a re-crawl fetches every page again, whether 
the page has changed or not (see 
http://www.mail-archive.com/[email protected]/msg01108.html).

However, I have not yet following the steps for 'whole web crawl' and 
refetching.

Isabelle






[EMAIL PROTECTED]
Ph: 651 687 3424





-------------------------------------------------------
This SF.Net email is sponsored by Yahoo.
Introducing Yahoo! Search Developer Network - Create apps using Yahoo!
Search APIs Find out how you can build Yahoo! directly into your own
Applications - visit http://developer.yahoo.net/?fr=offad-ysdn-ostg-q22005
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to