[ 
https://issues.apache.org/jira/browse/HADOOP-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467693
 ] 

Owen O'Malley commented on HADOOP-939:
--------------------------------------

I think the complexity of the general case makes this problematic. I wouldn't 
want to see a config option to do this, because it will be easy for users to 
get it wrong. 

There are some more specific cases that might be interesting:
  1. After the spill of the map outputs, it would make sense to continue 
appending to the spill as long as the outputs from the map are sorted. Note 
that the partition is the primary key for that sort.
  2. The reduces should be scheduled near the map output. That would help in 
the case where each reduce is getting inputs from a small number of maps.

Note that even if the map outputs are sorted, the reduce needs to do a merge 
sort because there the map outputs are fetched in a fairly random order.

> No-sort optimization
> --------------------
>
>                 Key: HADOOP-939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-939
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>         Environment: all
>            Reporter: Doug Judd
>
> There should be a way to tell the mapred framework that the output of the 
> map() phase will already be sorted.  The Reduce phase can just merge the 
> intermediate files together without sorting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to