"I was always wondering after mapping, how each reduce task get its input. It is said in google's paper and hadoop's documentation that a sort is done to aggregate the same key of the map output. But there is no detailed explanation of how it is implemented and my intuition is that perhaps a global hashing will work better than sorting. So I really want to know the details and see whether my intuition is right. If I can find out that in the source code, where should I start with?"
I saw this question online and no one replied to it. does anyone know where I go to study the source code for the shuffle and sort. -sean