I am using the following tutorial below (with nutch 0.9) to crawl the web. I went through the steps, download dmoz and run the parser, etc, etc.
bin/nutch inject crawl/crawldb dmoz etc etc. bin/nutch fetch $s1 Once I get to this step, is there a way to "crawl" the sites that are in the dmoz/url list. It seems like we are just fetching the URLs that are straight out of the dmoz list. Lets say I want to crawl those and give a particular depth? http://lucene.apache.org/nutch/tutorial8.html -- Berlin Brown http://www.newspiritcompany.com - newspirit technologies ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
