Re: Using a combiner

2012-03-15 Thread John Armstrong

Another important note: the combiner runs can stack.

Let's say Prashant is right that the default spill number that triggers 
the combiner is 3, and that we have a mapper that generates 9 spills. 
These spills will generate 3 combiner runs, which meets the threshold 
again, and so we get *another* combiner run on the outputs of the first 
round of combiners.


The upshot is that you *must* make the input and output keys and values 
of a Combiner the same class, since the outputs of one combiner may well 
be run into the inputs of another.


hth


On 03/14/2012 06:32 PM, Prashant Kommireddi wrote:

It is a function of the number of spills on map side and I believe
the default is 3. So for every 3 times data is spilled the combiner is
run. This number is configurable.

Sent from my iPhone

On Mar 14, 2012, at 3:26 PM, Gayatri Raorgayat...@gmail.com  wrote:


Hi all,

I have a quick query on using a combiner in a MR job. Is it true the
framework decides whether or not the combiner gets called?
Can any one please give more information on how t his is done.

Thanks,
Gayatri






Re: Using a combiner

2012-03-14 Thread Prashant Kommireddi
It is a function of the number of spills on map side and I believe
the default is 3. So for every 3 times data is spilled the combiner is
run. This number is configurable.

Sent from my iPhone

On Mar 14, 2012, at 3:26 PM, Gayatri Rao rgayat...@gmail.com wrote:

 Hi all,

 I have a quick query on using a combiner in a MR job. Is it true the
 framework decides whether or not the combiner gets called?
 Can any one please give more information on how t his is done.

 Thanks,
 Gayatri