Elton,
On May 2, 2011, at 11:30 PM, elton sky wrote:
In shuffle phase, reduce copies output from map. In parallel, there
are
InMemoryMerger and OnDiskMerger merge copied files if too many. But
on map,
the mergeParts*() *happens only after collect() finished. Why don't we
parallel spills merging with collect()/sort&spill on map?
Certainly feasible, please feel free to open a jira for the enhancement.
However, typically, the map's merge is much less intensive than the
reduce's merge. As a result, this might just bloat the code for little
gain, except in the most extreme cases.
Arun