[ 
https://issues.apache.org/jira/browse/HADOOP-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467728
 ] 

Doug Cutting commented on HADOOP-939:
-------------------------------------

> 14 (more than half) are unavoidable. 

Make that 15: those associated with the input and output.  So the remaining 12 
are associated with sort & reduce.  9 of those could be eliminated when input 
is largely pre-sorted and reduces can be placed on the same rack as the vast 
majority of their input, reducing the sort/reduce overhead from 12 out of 27 to 
3 out of 18.

> No-sort optimization
> --------------------
>
>                 Key: HADOOP-939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-939
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>         Environment: all
>            Reporter: Doug Judd
>
> There should be a way to tell the mapred framework that the output of the 
> map() phase will already be sorted.  The Reduce phase can just merge the 
> intermediate files together without sorting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to