Yes; a combiner that emits a key that should go to a different partition is incorrect. If this were legal, then the combiner output would also need to be buffered, sorted, spilled, etc., effectively requiring another map phase. The combiner's purpose is to decrease the volume of data that needs to be shuffled or spilled (wordcount is the perfect example). It should not be thought of as a stage of computation. -C

On Jul 14, 2008, at 4:46 PM, Keliang Zhao wrote:

Hi there,

I read the code a bit, though I am not sure if I get it right. It
appears to me that when memory buffer of mapper is full, it spills and
gets sorted by partition id and by keys. Then, if there is a combiner
defined, it will work on each partition. However, it seems that the
outputs of a combiner are put in the same input partition, which means
that the keys emit by a combiner have to be in the same partition as
the inputs to it. Is this the case?

Best,
-Kevin

Reply via email to