Benchmarking pipelined MapReduce jobs

2011-02-22 Thread David Saile
Hello everybody,

I am trying to benchmark a Hadoop-cluster with regards to throughput of 
pipelined MapReduce jobs.
Looking for benchmarks, I found the "Gridmix" benchmark that is supplied with 
Hadoop. In its README-file it says that part of this benchmark is a "Three 
stage map/reduce job".

As this seems to match my needs, I was wondering if it possible to configure 
"Gridmix", in order to only run this job (without the rest of the "Gridmix" 
benchmark)?
Or do I have to build my own benchmark? If this is the case, which classes are 
used by this "Three stage map/reduce job"?

Thanks for any help!

David

 

Re: Benchmarking pipelined MapReduce jobs

2011-02-22 Thread Shrinivas Joshi
I am not sure about this but you might want to take a look at the GridMix
config file. FWIU, it lets you define the # of jobs for different workloads
and categories.

HTH,
-Shrinivas

On Tue, Feb 22, 2011 at 10:46 AM, David Saile  wrote:

> Hello everybody,
>
> I am trying to benchmark a Hadoop-cluster with regards to throughput of
> pipelined MapReduce jobs.
> Looking for benchmarks, I found the "Gridmix" benchmark that is supplied
> with Hadoop. In its README-file it says that part of this benchmark is a
> "Three stage map/reduce job".
>
> As this seems to match my needs, I was wondering if it possible to
> configure "Gridmix", in order to only run this job (without the rest of the
> "Gridmix" benchmark)?
> Or do I have to build my own benchmark? If this is the case, which classes
> are used by this "Three stage map/reduce job"?
>
> Thanks for any help!
>
> David
>
>


Re: Benchmarking pipelined MapReduce jobs

2011-02-24 Thread David Saile
Thanks for your help! 

I had a look at the gridmix_config.xml file in the gridmix2 directory. However, 
I'm having difficulties to map the descriptions of the simulated jobs from the 
README-file
1) Three stage map/reduce job
2) Large sort of variable key/value size
3) Reference select
4) API text sort (java, streaming)
5) Jobs with combiner (word count jobs)

to the jobs names in gridmix_config.xml: 
-streamSort
-javaSort
-combiner
-monsterQuery
-webdataScan
-webdataSort

I would really appreciate any help, getting the right configuration! Which job 
do I have to enable to simulate a pipelined execution as described in "1) Three 
stage map/reduce job"?

Thanks
David 

Am 23.02.2011 um 04:01 schrieb Shrinivas Joshi:

> I am not sure about this but you might want to take a look at the GridMix 
> config file. FWIU, it lets you define the # of jobs for different workloads 
> and categories.
> 
> HTH,
> -Shrinivas
> 
> On Tue, Feb 22, 2011 at 10:46 AM, David Saile  wrote:
> Hello everybody,
> 
> I am trying to benchmark a Hadoop-cluster with regards to throughput of 
> pipelined MapReduce jobs.
> Looking for benchmarks, I found the "Gridmix" benchmark that is supplied with 
> Hadoop. In its README-file it says that part of this benchmark is a "Three 
> stage map/reduce job".
> 
> As this seems to match my needs, I was wondering if it possible to configure 
> "Gridmix", in order to only run this job (without the rest of the "Gridmix" 
> benchmark)?
> Or do I have to build my own benchmark? If this is the case, which classes 
> are used by this "Three stage map/reduce job"?
> 
> Thanks for any help!
> 
> David
> 
>  
>