On Sep 13, 2007, at 2:20 AM, Taeho Kang wrote:

I did run WordCount included in 0.14.1 release version on a 1 node Hadoop
cluster (Pentium D with 2GB of RAM).

Thanks for running the benchmark. I'm afraid that with such a small cluster and data size you are getting swamped in the start up costs. I have not done enough benchmarking of the C++ bindings

There were 2 input files (one 4.5MB file + one 36MB file).
I also did take Combiner out of Java version WordCount MapReduce, as there
was no Combiner used for C++ version.

Actually, the wordcount-part.cc example does have a combiner. You would want to remove the partitioner from that example that forces every key to partition 0 however. *smile* Actually, as an example, the bad partitioner wasn't a good idea. I should move the bad partitioner to a test case.

The result is.... as many of you have guessed, Java version won the race big
time. Java version was about 4 times quicker.

I'll write a sort benchmark for C++ so that we can run a reasonably large program. Note that for simple programs, the C++ is by definition slower since pipes runs the C++ as a subprocess underneath a Java mapper and reducer.

-- Owen

Reply via email to