Thanks for your answers and clarifications. I will try to do some more benchmark testing with more nodes and keep you guys posted.
On 9/14/07, Owen O'Malley <[EMAIL PROTECTED]> wrote: > > > On Sep 13, 2007, at 2:20 AM, Taeho Kang wrote: > > > I did run WordCount included in 0.14.1 release version on a 1 node > > Hadoop > > cluster (Pentium D with 2GB of RAM). > > Thanks for running the benchmark. I'm afraid that with such a small > cluster and data size you are getting swamped in the start up costs. > I have not done enough benchmarking of the C++ bindings > > > There were 2 input files (one 4.5MB file + one 36MB file). > > I also did take Combiner out of Java version WordCount MapReduce, > > as there > > was no Combiner used for C++ version. > > Actually, the wordcount-part.cc example does have a combiner. You > would want to remove the partitioner from that example that forces > every key to partition 0 however. *smile* Actually, as an example, > the bad partitioner wasn't a good idea. I should move the bad > partitioner to a test case. > > > The result is.... as many of you have guessed, Java version won the > > race big > > time. Java version was about 4 times quicker. > > I'll write a sort benchmark for C++ so that we can run a reasonably > large program. Note that for simple programs, the C++ is by > definition slower since pipes runs the C++ as a subprocess underneath > a Java mapper and reducer. > > -- Owen > -- Taeho Kang [tkang.blogspot.com] Software Engineer, NHN Corporation, Korea