[jira] Commented: (HADOOP-1043) Optimize the shuffle phase (increase the parallelism)

Hadoop QA (JIRA) Tue, 27 Feb 2007 20:59:27 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12476514
 ]


Hadoop QA commented on HADOOP-1043:
-----------------------------------

+1, because 
http://issues.apache.org/jira/secure/attachment/12352089/1043.patch</a>) 
against trunk revision <a href= applied and successfully tested against trunk 
revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/512499. Results 
are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch

> Optimize the shuffle phase (increase the parallelism)
> -----------------------------------------------------
>
>                 Key: HADOOP-1043
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1043
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Devaraj Das
>         Assigned To: Devaraj Das
>         Attachments: 1043.patch
>
>
> In the current shuffle code, only one map output location node is accessed 
> from any Reduce at any given point of time. For example, if a particular 
> node, say machine1.foo.com ran 300 maps, the reducer would fetch just one 
> output from there at a time. machine1.foo.com will be inserted into a Set 
> datastructure (uniqueHosts) and until it gets removed from there, no other 
> map output will be fetched from that machine. The fact that only one map 
> output is fetched at a time from any particular host seems fine, but the 
> logic for removing a node from uniqueHosts is such that there could be a lot 
> of delay before a node gets deleted from the Set datastructure (even after 
> the map output has been fetched from that node). This probably leads to 
> suboptimal performance since it reduces the parallelism in fetching.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1043) Optimize the shuffle phase (increase the parallelism)

Reply via email to