[jira] [Commented] (MAPREDUCE-6803) MR AppMaster should assign container that is closest to the data
[ https://issues.apache.org/jira/browse/MAPREDUCE-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15632657#comment-15632657 ] Hadoop QA commented on MAPREDUCE-6803: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s {color} | {color:red} MAPREDUCE-6803 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12742477/YARN-3856.002.patch | | JIRA Issue | MAPREDUCE-6803 | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6795/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > MR AppMaster should assign container that is closest to the data > > > Key: MAPREDUCE-6803 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6803 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster > Environment: Hadoop cluster with multi-level network hierarchy >Reporter: jaehoon ko > Labels: oct16-medium > Attachments: YARN-3856.001.patch, YARN-3856.002.patch > > > Currently, given a Container request for a host, ResourceManager allocates a > Container with following priorities (RMContainerAllocator.java): > - Requested host > - a host in the same rack as the requested host > - any host > This can lead to a sub-optimal allocation if Hadoop cluster is deployed on > multi-level networked hosts (which is typical). For example, let's suppose a > network architecture with one core switches, two aggregate switches, four ToR > switches, and 8 hosts. Each switch has two downlinks. Rack IDs of hosts are > as follows: > h1, h2: /c/a1/t1 > h3, h4: /c/a1/t2 > h5, h6: /c/a2/t3 > h7, h8: /c/a2/t4 > To allocate a container for data in h1, Hadoop first tries h1 itself, then > h2, then any of h3 ~ h8. Clearly, h3 or h4 are better than h5~h8 in terms of > network distance and bandwidth. However, current implementation choose one > from h3~h8 with equal probabilities. > This limitation is more obvious when considering hadoop clusters deployed on > VM or containers. In this case, only the VMs or containers running in the > same physical host are considered rack local, and actual rack-local hosts are > chosen with same probabilities as far hosts. > The root cause of this limitation is that RMContainerAllocator.java performs > exact matching on rack id to find a rack local host. Alternatively, we can > perform longest-prefix matching to find a closest host. Using the same > network architecture as above, with longest-prefix matching, hosts are > selected with the following priorities: > h1 > h2 > h3 or h4 > h5 or h6 or h7 or h8 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6803) MR AppMaster should assign container that is closest to the data
[ https://issues.apache.org/jira/browse/MAPREDUCE-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15632654#comment-15632654 ] Junping Du commented on MAPREDUCE-6803: --- Sorry for replying a bit late as busy in travel previously. From my quick check, most of code is still in YARN side. [~leftnoteasy], May be we should move back to YARN? > MR AppMaster should assign container that is closest to the data > > > Key: MAPREDUCE-6803 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6803 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster > Environment: Hadoop cluster with multi-level network hierarchy >Reporter: jaehoon ko > Labels: oct16-medium > Attachments: YARN-3856.001.patch, YARN-3856.002.patch > > > Currently, given a Container request for a host, ResourceManager allocates a > Container with following priorities (RMContainerAllocator.java): > - Requested host > - a host in the same rack as the requested host > - any host > This can lead to a sub-optimal allocation if Hadoop cluster is deployed on > multi-level networked hosts (which is typical). For example, let's suppose a > network architecture with one core switches, two aggregate switches, four ToR > switches, and 8 hosts. Each switch has two downlinks. Rack IDs of hosts are > as follows: > h1, h2: /c/a1/t1 > h3, h4: /c/a1/t2 > h5, h6: /c/a2/t3 > h7, h8: /c/a2/t4 > To allocate a container for data in h1, Hadoop first tries h1 itself, then > h2, then any of h3 ~ h8. Clearly, h3 or h4 are better than h5~h8 in terms of > network distance and bandwidth. However, current implementation choose one > from h3~h8 with equal probabilities. > This limitation is more obvious when considering hadoop clusters deployed on > VM or containers. In this case, only the VMs or containers running in the > same physical host are considered rack local, and actual rack-local hosts are > chosen with same probabilities as far hosts. > The root cause of this limitation is that RMContainerAllocator.java performs > exact matching on rack id to find a rack local host. Alternatively, we can > perform longest-prefix matching to find a closest host. Using the same > network architecture as above, with longest-prefix matching, hosts are > selected with the following priorities: > h1 > h2 > h3 or h4 > h5 or h6 or h7 or h8 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6803) MR AppMaster should assign container that is closest to the data
[ https://issues.apache.org/jira/browse/MAPREDUCE-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612792#comment-15612792 ] Wangda Tan commented on MAPREDUCE-6803: --- Moved to MR. [~djp] if you have chance, could you look at the patch? Thanks > MR AppMaster should assign container that is closest to the data > > > Key: MAPREDUCE-6803 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6803 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster > Environment: Hadoop cluster with multi-level network hierarchy >Reporter: jaehoon ko > Labels: oct16-medium > Attachments: YARN-3856.001.patch, YARN-3856.002.patch > > > Currently, given a Container request for a host, ResourceManager allocates a > Container with following priorities (RMContainerAllocator.java): > - Requested host > - a host in the same rack as the requested host > - any host > This can lead to a sub-optimal allocation if Hadoop cluster is deployed on > multi-level networked hosts (which is typical). For example, let's suppose a > network architecture with one core switches, two aggregate switches, four ToR > switches, and 8 hosts. Each switch has two downlinks. Rack IDs of hosts are > as follows: > h1, h2: /c/a1/t1 > h3, h4: /c/a1/t2 > h5, h6: /c/a2/t3 > h7, h8: /c/a2/t4 > To allocate a container for data in h1, Hadoop first tries h1 itself, then > h2, then any of h3 ~ h8. Clearly, h3 or h4 are better than h5~h8 in terms of > network distance and bandwidth. However, current implementation choose one > from h3~h8 with equal probabilities. > This limitation is more obvious when considering hadoop clusters deployed on > VM or containers. In this case, only the VMs or containers running in the > same physical host are considered rack local, and actual rack-local hosts are > chosen with same probabilities as far hosts. > The root cause of this limitation is that RMContainerAllocator.java performs > exact matching on rack id to find a rack local host. Alternatively, we can > perform longest-prefix matching to find a closest host. Using the same > network architecture as above, with longest-prefix matching, hosts are > selected with the following priorities: > h1 > h2 > h3 or h4 > h5 or h6 or h7 or h8 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org