[jira] Commented: (HADOOP-1338) Improve the shuffle phase by using the "connection: keep-alive" and doing batch transfers of files

Devaraj Das (JIRA) Thu, 05 Feb 2009 05:52:28 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670751#action_12670751
 ]


Devaraj Das commented on HADOOP-1338:
-------------------------------------

Continuing on ReduceTask.java, 
1) Change the notifyAll to notify (as it was earlier)
2) I think the retryFetches can be removed and on an error leave the 
knownOutputs unchanged.
3) I think knownOutputs can be used for all purposes, currently for which new 
maps have been defined (just that knownOutputs should not be updated on an 
error)
4) CopyResult object need not take a MapOutputLocation. Instead it can just 
take {mapID,host,size} combination. That will simplify the code to do with (3) 
above.



> Improve the shuffle phase by using the "connection: keep-alive" and doing 
> batch transfers of files
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1338
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1338
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Devaraj Das
>            Assignee: Jothi Padmanabhan
>         Attachments: hadoop-1338-v1.patch
>
>
> We should do transfers of map outputs at the granularity of  
> *total-bytes-transferred* rather than the current way of transferring a 
> single file and then closing the connection to the server. A single 
> TaskTracker might have a couple of map output files for a given reduce, and 
> we should transfer multiple of them (upto a certain total size) in a single 
> connection to the TaskTracker. Using HTTP-1.1's keep-alive connection would 
> help since it would keep the connection open for more than one file transfer. 
> We should limit the transfers to a certain size so that we don't hold up a 
> jetty thread indefinitely (and cause timeouts for other clients).
> Overall, this should give us improved performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1338) Improve the shuffle phase by using the "connection: keep-alive" and doing batch transfers of files

Reply via email to