Yes; a combiner that emits a key that should go to a different
partition is incorrect. If this were legal, then the combiner output
would also need to be buffered, sorted, spilled, etc., effectively
requiring another map phase. The combiner's purpose is to decrease the
volume of data that needs to be shuffled or spilled (wordcount is the
perfect example). It should not be thought of as a stage of
computation. -C
On Jul 14, 2008, at 4:46 PM, Keliang Zhao wrote:
Hi there,
I read the code a bit, though I am not sure if I get it right. It
appears to me that when memory buffer of mapper is full, it spills and
gets sorted by partition id and by keys. Then, if there is a combiner
defined, it will work on each partition. However, it seems that the
outputs of a combiner are put in the same input partition, which means
that the keys emit by a combiner have to be in the same partition as
the inputs to it. Is this the case?
Best,
-Kevin