Yitao Duan wrote:
I see Nutch is moving towards using MapReduce for many things and
there is already a branch that uses MapReduce for parsing and
updatedb. I was wondering are there any benchmarks/tests validating
the benefit of using the Nutch implementation of MapReduce, especially
at large scale in a distributed setting? What tests have been done,
and at what scale, for this MapReduce branch? I am doing my own
testing but it is good to know what others have experienced.

I have not yet performed many benchmarks nor large-scale experiments. Once a critical mass of Nutch code has been ported to MapReduce then I will begin performance tuning and scale testing. Over the course of the summer I hope to demonstrate that Nutch can scale to billions of pages using tens of machines.

Doug

Reply via email to