Re: MapReduce in C++ vs MapReduce in Java

Owen O'Malley Thu, 13 Sep 2007 10:23:26 -0700


On Sep 13, 2007, at 2:20 AM, Taeho Kang wrote:

I did run WordCount included in 0.14.1 release version on a 1 nodeHadoop
cluster (Pentium D with 2GB of RAM).

Thanks for running the benchmark. I'm afraid that with such a smallcluster and data size you are getting swamped in the start up costs.I have not done enough benchmarking of the C++ bindings

There were 2 input files (one 4.5MB file + one 36MB file).
I also did take Combiner out of Java version WordCount MapReduce,as there
was no Combiner used for C++ version.

Actually, the wordcount-part.cc example does have a combiner. Youwould want to remove the partitioner from that example that forcesevery key to partition 0 however. *smile* Actually, as an example,the bad partitioner wasn't a good idea. I should move the badpartitioner to a test case.

The result is.... as many of you have guessed, Java version won therace big
time. Java version was about 4 times quicker.

I'll write a sort benchmark for C++ so that we can run a reasonablylarge program. Note that for simple programs, the C++ is bydefinition slower since pipes runs the C++ as a subprocess underneatha Java mapper and reducer.


-- Owen

Re: MapReduce in C++ vs MapReduce in Java

Reply via email to