Elton,

On May 2, 2011, at 11:30 PM, elton sky wrote:

In shuffle phase, reduce copies output from map. In parallel, there are InMemoryMerger and OnDiskMerger merge copied files if too many. But on map,
the mergeParts*() *happens only after collect() finished. Why don't we
parallel spills merging with collect()/sort&spill on map?

Certainly feasible, please feel free to open a jira for the enhancement.

However, typically, the map's merge is much less intensive than the reduce's merge. As a result, this might just bloat the code for little gain, except in the most extreme cases.

Arun


Reply via email to