[
https://issues.apache.org/jira/browse/HADOOP-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640433#action_12640433
]
Jothi Padmanabhan commented on HADOOP-4396:
-------------------------------------------
OK, this might be a non issue after all.
All my tests have been with mapred.reduce.parallel.copies=60 and
tasktracker.http.threads=100. This does not appear to be the ideal
configuration for the cluster, Runping let me know that he uses
parallel.copies=30 and http.threads=50. With this configuration, sort took the
same time as 18 and gridmix completed in 40+ minutes, which is a reasonable
time.
When reduce.parallel.copies=60 and tasktracker.http.threads=100, it is obvious
that towards the end of the map phase, the load on the disks on the individual
nodes is fairly high because the reducers are pulling in data from a lot more
maps in parallel and possibly shuffling them to disk. This seems to be causing
the stragglers that we observed. However, slowing down the maps by having them
write in small chunks seems to somehow mitigate this problem as observed with
both the LocalFileSystem and when breaking down the writes into chunks when
using the RawLocalFileSystem.
> sort on 400 nodes is now slower than in 18
> ------------------------------------------
>
> Key: HADOOP-4396
> URL: https://issues.apache.org/jira/browse/HADOOP-4396
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.19.0
> Reporter: Jothi Padmanabhan
> Assignee: Jothi Padmanabhan
> Priority: Blocker
> Fix For: 0.19.0
>
> Attachments: 4396-v3.patch
>
>
> Sort on 400 nodes on hadoop release 18 takes about 29 minutes, but with the
> 19 branch takes about 32 minutes. This behavior is consistent.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.