Your cluster is not large enough. 10 nodes or more maybe better, I'm not sure about the exact number. Distributed Nutch has higher throughput, but longer latency due to the communication overhead between master and slaves.
On Thu, Jul 16, 2009 at 3:35 AM, Rodrigo Reyes C. <[email protected]>wrote: > Hi all. We are using nutch for open web crawling one URL at a time until > depth 5 and we were able to configure a cluster with one master and one > hadoop node. Still, in our case, it seems that distributed mode is a lot > slower than local mode. Any reason why not to run nutch for crawling in > local mode in production? > > Thanks in advance. > > Rodrigo >
