Another important note: the combiner runs can stack.
Let's say Prashant is right that the default spill number that triggers
the combiner is 3, and that we have a mapper that generates 9 spills.
These spills will generate 3 combiner runs, which meets the threshold
again, and so we get *another* combiner run on the outputs of the first
round of combiners.
The upshot is that you *must* make the input and output keys and values
of a Combiner the same class, since the outputs of one combiner may well
be run into the inputs of another.
hth
On 03/14/2012 06:32 PM, Prashant Kommireddi wrote:
It is a function of the number of spills on map side and I believe
the default is 3. So for every 3 times data is spilled the combiner is
run. This number is configurable.
Sent from my iPhone
On Mar 14, 2012, at 3:26 PM, Gayatri Raorgayat...@gmail.com wrote:
Hi all,
I have a quick query on using a combiner in a MR job. Is it true the
framework decides whether or not the combiner gets called?
Can any one please give more information on how t his is done.
Thanks,
Gayatri