"I was always wondering after mapping, how each reduce task get its input. It 
is said in
google's paper and hadoop's documentation that a sort is done to aggregate the
same key of the map output. But there is no detailed explanation of how it is
implemented and my intuition is that perhaps a global hashing will work better
than sorting. So I really want to know the details and see whether my intuition
is right. If I can find out that in the source code, where should I start with?"

I saw this question online and no one replied to it. does anyone know where I 
go to study the source code for the shuffle and sort.

-sean

Reply via email to