Hello, What I am really trying to do is understand Nutch and specifically the crawler. As an exercise, I have selected a couple of web sites to crawl, rather than the whole web.
One of the issues I am struggling with is re-crawling. My understanding from browsing the mailing list is that a re-crawl fetches every page again, whether the page has changed or not (see http://www.mail-archive.com/[email protected]/msg01108.html). However, I have not yet following the steps for 'whole web crawl' and refetching. Isabelle [EMAIL PROTECTED] Ph: 651 687 3424 ------------------------------------------------------- This SF.Net email is sponsored by Yahoo. Introducing Yahoo! Search Developer Network - Create apps using Yahoo! Search APIs Find out how you can build Yahoo! directly into your own Applications - visit http://developer.yahoo.net/?fr=offad-ysdn-ostg-q22005 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
