[Nutch-general] Clustered crawl

Bolle, Jeffrey F. Fri, 25 May 2007 06:49:25 -0700

Is there a good explanation someone can point me to as to why when I
setup a hadoop cluster my entire site isn't crawled?  It doesn't make
sense that I should have to tweak the number of hadoop map and reduce
tasks in order to ensure that everything gets indexed.  
 
I followed the tutorial here:
http://wiki.apache.org/nutch/NutchHadoopTutorial and have found that
only a small portion of my site was indexed.  Besides explicitly
stating every URL on the site, what should I do to ensure that my
hadoop cluster (of only 4 machines) manages to create a full index?
 
Thanks for the help.
 
Jeff

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] Clustered crawl

Reply via email to