[
https://issues.apache.org/jira/browse/HADOOP-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661538#action_12661538
]
Jothi Padmanabhan commented on HADOOP-1338:
-------------------------------------------
I tested a patch where
* Each copier thread is assigned one host
* Each copier thread would pull 'n' map outputs from a given host (until a
specific size threshold has been pulled), before moving on to the next thread
* Each fetch would be one map request/response (as it exists in the trunk)
With the above patch, I did not observe any improvement at all (for a variety
of map sizes with the loadgen program). The underlying presumption with this
patch was that since each thread is holding on to the host, the keep-alive
would kick in (by JVM?) and make a few of the connections as no-op, as these
are connections made to the same host/port. However, it looks like keep-alive
is not kicking in and see similar shuffle times with and without this patch.
We did another test where the code was hacked so that the copier fetches a
configurable number of maps at a time and the the TT replies to this request by
clubbing the map outputs together. The received map outputs were just discarded
at the reducer (neither written to disk nor memory) so that we just measured
the network performance. The following are the results
||Number of Maps Per Fetch||Average Shuffle Time||Worst case Shuffle Time||
|1|1:27|4:20|
|2|1:11|2:11|
|4|47s|1:11|
|8|29s|41s|
>From this it does appear that we would benefit from modifying the fetch
>protocol to fetch several maps at one shot, using the same connection.
>Thoughts?
> Improve the shuffle phase by using the "connection: keep-alive" and doing
> batch transfers of files
> --------------------------------------------------------------------------------------------------
>
> Key: HADOOP-1338
> URL: https://issues.apache.org/jira/browse/HADOOP-1338
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: Devaraj Das
> Assignee: Jothi Padmanabhan
>
> We should do transfers of map outputs at the granularity of
> *total-bytes-transferred* rather than the current way of transferring a
> single file and then closing the connection to the server. A single
> TaskTracker might have a couple of map output files for a given reduce, and
> we should transfer multiple of them (upto a certain total size) in a single
> connection to the TaskTracker. Using HTTP-1.1's keep-alive connection would
> help since it would keep the connection open for more than one file transfer.
> We should limit the transfers to a certain size so that we don't hold up a
> jetty thread indefinitely (and cause timeouts for other clients).
> Overall, this should give us improved performance.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.