Hello, I am curious about what is going on after the map puts key value pair to the collector. I know there is something called spill and sort merge happen. But I don't get a clear picture. My understanding is a partitioner divides the key value pairs (map output) to several "groups". Each "group" which will be sent to a particular reducer. For each "group", the MapTask will sort the key value pair based on key (why???) and materialized on local disk. I don't know where the merge steps in and why we need merge.
On the reduce side, there is also a sort and merge step. Why is that necessary? Thanks for helping me. -- Allen