Hello,
Let' say we have a MR job that uses ChainMapper and ChainReducer like in the 
following diagram:
Input->Map1->Map2->Reduce->Map3->Output

The input is split and distributed to the nodes of the cluster before being 
processed by Map1 phase.
Also, before the Reduce phase the key/value pairs are also distributed to the 
Reducers according to the Partitions made by the Partitioner.

I expected that the same thing (distribution of the keys) would happen before 
Map2 and Map3 phases but after reading "Pro Hadoop" Book I strongly doubt it.

I would like to ask you if the key/value pairs emitted by the Map1 phase (or 
those emitted by the Reduce phase) are distributed to the nodes of the cluster 
before being processed by the next Map phase,
or if the output of the Map1 phase (or Reduce phase) is immediately inserted to 
Map2 phase (or Map3 Phase) within the same node, without any distribution.

Thank you in advance!
Panagiotis Antonopoulos
                                          

Reply via email to