[jira] [Assigned] (MAPREDUCE-6631) shuffle handler would benefit from per-local-dir threads
[ https://issues.apache.org/jira/browse/MAPREDUCE-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen reassigned MAPREDUCE-6631: - Assignee: (was: Haibo Chen) > shuffle handler would benefit from per-local-dir threads > > > Key: MAPREDUCE-6631 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6631 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.7.2, 3.0.0-alpha1 >Reporter: Nathan Roberts > > [~jlowe] and I discussed this while investigating I/O starvation we have been > seeing on our clusters lately (possibly amplified by increased tez > workloads). > If a particular disk is being slow, it is very likely that all shuffle netty > threads will be blocked on the read side of sendfile(). (sendfile() is > asynchronous on the outbound socket side, but not on the read side.) This > causes the entire shuffle subsystem to slow down. > It seems like we could make the netty threads more asynchronous by > introducing a small set of threads per local-dir that are responsible for the > actual sendfile() invocations. > This would not only improve shuffles that span drives, but also improve > situations where there is a single large shuffle from a single local-dir. It > would allow other drives to continue serving shuffle requests, AND avoid a > large number of readers (2X number_of_cores by default) all fighting for the > same drive, which becomes unfair to everything else on the system. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Assigned] (MAPREDUCE-6631) shuffle handler would benefit from per-local-dir threads
[ https://issues.apache.org/jira/browse/MAPREDUCE-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen reassigned MAPREDUCE-6631: - Assignee: Haibo Chen > shuffle handler would benefit from per-local-dir threads > > > Key: MAPREDUCE-6631 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6631 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.7.2, 3.0.0-alpha1 >Reporter: Nathan Roberts >Assignee: Haibo Chen > > [~jlowe] and I discussed this while investigating I/O starvation we have been > seeing on our clusters lately (possibly amplified by increased tez > workloads). > If a particular disk is being slow, it is very likely that all shuffle netty > threads will be blocked on the read side of sendfile(). (sendfile() is > asynchronous on the outbound socket side, but not on the read side.) This > causes the entire shuffle subsystem to slow down. > It seems like we could make the netty threads more asynchronous by > introducing a small set of threads per local-dir that are responsible for the > actual sendfile() invocations. > This would not only improve shuffles that span drives, but also improve > situations where there is a single large shuffle from a single local-dir. It > would allow other drives to continue serving shuffle requests, AND avoid a > large number of readers (2X number_of_cores by default) all fighting for the > same drive, which becomes unfair to everything else on the system. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org