Is there a good explanation someone can point me to as to why when I
setup a hadoop cluster my entire site isn't crawled? It doesn't make
sense that I should have to tweak the number of hadoop map and reduce
tasks in order to ensure that everything gets indexed.
I followed the tutorial here:
http://wiki.apache.org/nutch/NutchHadoopTutorial and have found that
only a small portion of my site was indexed. Besides explicitly
stating every URL on the site, what should I do to ensure that my
hadoop cluster (of only 4 machines) manages to create a full index?
Thanks for the help.
Jeff