[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen reassigned MAPREDUCE-6631:
-------------------------------------

    Assignee: Haibo Chen

> shuffle handler would benefit from per-local-dir threads
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-6631
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6631
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: nodemanager
>    Affects Versions: 2.7.2, 3.0.0-alpha1
>            Reporter: Nathan Roberts
>            Assignee: Haibo Chen
>
> [~jlowe] and I discussed this while investigating I/O starvation we have been 
> seeing on our clusters lately (possibly amplified by increased tez 
> workloads). 
> If a particular disk is being slow, it is very likely that all shuffle netty 
> threads will be blocked on the read side of sendfile(). (sendfile() is 
> asynchronous on the outbound socket side, but not on the read side.) This 
> causes the entire shuffle subsystem to slow down. 
> It seems like we could make the netty threads more asynchronous by 
> introducing a small set of threads per local-dir that are responsible for the 
> actual sendfile() invocations.
> This would not only improve shuffles that span drives, but also improve 
> situations where there is a single large shuffle from a single local-dir. It 
> would allow other drives to continue serving shuffle requests, AND avoid a 
> large number of readers (2X number_of_cores by default) all fighting for the 
> same drive, which becomes unfair to everything else on the system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to