subject:"\[jira\] \[Commented\] \(MAPREDUCE\-6803\) MR AppMaster should assign container that is closest to the data"

[jira] [Commented] (MAPREDUCE-6803) MR AppMaster should assign container that is closest to the data

2016-11-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15632657#comment-15632657
 ] 

Hadoop QA commented on MAPREDUCE-6803:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s {color} 
| {color:red} MAPREDUCE-6803 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12742477/YARN-3856.002.patch |
| JIRA Issue | MAPREDUCE-6803 |
| Console output | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6795/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> MR AppMaster should assign container that is closest to the data
> 
>
> Key: MAPREDUCE-6803
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6803
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster
> Environment: Hadoop cluster with multi-level network hierarchy
>Reporter: jaehoon ko
>  Labels: oct16-medium
> Attachments: YARN-3856.001.patch, YARN-3856.002.patch
>
>
> Currently, given a Container request for a host, ResourceManager allocates a 
> Container with following priorities (RMContainerAllocator.java):
>  - Requested host
>  - a host in the same rack as the requested host
>  - any host
> This can lead to a sub-optimal allocation if Hadoop cluster is deployed on 
> multi-level networked hosts (which is typical). For example, let's suppose a 
> network architecture with one core switches, two aggregate switches, four ToR 
> switches, and 8 hosts. Each switch has two downlinks. Rack IDs of hosts are 
> as follows:
> h1, h2: /c/a1/t1
> h3, h4: /c/a1/t2
> h5, h6: /c/a2/t3
> h7, h8: /c/a2/t4
> To allocate a container for data in h1, Hadoop first tries h1 itself, then 
> h2, then any of h3 ~ h8. Clearly, h3 or h4 are better than h5~h8 in terms of 
> network distance and bandwidth. However, current implementation choose one 
> from h3~h8 with equal probabilities.
> This limitation is more obvious when considering hadoop clusters deployed on 
> VM or containers. In this case, only the VMs or containers running in the 
> same physical host are considered rack local, and actual rack-local hosts are 
> chosen with same probabilities as far hosts.
> The root cause of this limitation is that RMContainerAllocator.java performs 
> exact matching on rack id to find a rack local host. Alternatively, we can 
> perform longest-prefix matching to find a closest host. Using the same 
> network architecture as above, with longest-prefix matching, hosts are 
> selected with the following priorities:
>  h1
>  h2
>  h3 or h4
>  h5 or h6 or h7 or h8



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6803) MR AppMaster should assign container that is closest to the data

2016-11-03 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15632654#comment-15632654
 ] 

Junping Du commented on MAPREDUCE-6803:
---

Sorry for replying a bit late as busy in travel previously. From my quick 
check, most of code is still in YARN side. [~leftnoteasy], May be we should 
move back to YARN?

> MR AppMaster should assign container that is closest to the data
> 
>
> Key: MAPREDUCE-6803
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6803
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster
> Environment: Hadoop cluster with multi-level network hierarchy
>Reporter: jaehoon ko
>  Labels: oct16-medium
> Attachments: YARN-3856.001.patch, YARN-3856.002.patch
>
>
> Currently, given a Container request for a host, ResourceManager allocates a 
> Container with following priorities (RMContainerAllocator.java):
>  - Requested host
>  - a host in the same rack as the requested host
>  - any host
> This can lead to a sub-optimal allocation if Hadoop cluster is deployed on 
> multi-level networked hosts (which is typical). For example, let's suppose a 
> network architecture with one core switches, two aggregate switches, four ToR 
> switches, and 8 hosts. Each switch has two downlinks. Rack IDs of hosts are 
> as follows:
> h1, h2: /c/a1/t1
> h3, h4: /c/a1/t2
> h5, h6: /c/a2/t3
> h7, h8: /c/a2/t4
> To allocate a container for data in h1, Hadoop first tries h1 itself, then 
> h2, then any of h3 ~ h8. Clearly, h3 or h4 are better than h5~h8 in terms of 
> network distance and bandwidth. However, current implementation choose one 
> from h3~h8 with equal probabilities.
> This limitation is more obvious when considering hadoop clusters deployed on 
> VM or containers. In this case, only the VMs or containers running in the 
> same physical host are considered rack local, and actual rack-local hosts are 
> chosen with same probabilities as far hosts.
> The root cause of this limitation is that RMContainerAllocator.java performs 
> exact matching on rack id to find a rack local host. Alternatively, we can 
> perform longest-prefix matching to find a closest host. Using the same 
> network architecture as above, with longest-prefix matching, hosts are 
> selected with the following priorities:
>  h1
>  h2
>  h3 or h4
>  h5 or h6 or h7 or h8



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6803) MR AppMaster should assign container that is closest to the data

2016-10-27 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612792#comment-15612792
 ] 

Wangda Tan commented on MAPREDUCE-6803:
---

Moved to MR. [~djp] if you have chance, could you look at the patch?

Thanks

> MR AppMaster should assign container that is closest to the data
> 
>
> Key: MAPREDUCE-6803
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6803
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster
> Environment: Hadoop cluster with multi-level network hierarchy
>Reporter: jaehoon ko
>  Labels: oct16-medium
> Attachments: YARN-3856.001.patch, YARN-3856.002.patch
>
>
> Currently, given a Container request for a host, ResourceManager allocates a 
> Container with following priorities (RMContainerAllocator.java):
>  - Requested host
>  - a host in the same rack as the requested host
>  - any host
> This can lead to a sub-optimal allocation if Hadoop cluster is deployed on 
> multi-level networked hosts (which is typical). For example, let's suppose a 
> network architecture with one core switches, two aggregate switches, four ToR 
> switches, and 8 hosts. Each switch has two downlinks. Rack IDs of hosts are 
> as follows:
> h1, h2: /c/a1/t1
> h3, h4: /c/a1/t2
> h5, h6: /c/a2/t3
> h7, h8: /c/a2/t4
> To allocate a container for data in h1, Hadoop first tries h1 itself, then 
> h2, then any of h3 ~ h8. Clearly, h3 or h4 are better than h5~h8 in terms of 
> network distance and bandwidth. However, current implementation choose one 
> from h3~h8 with equal probabilities.
> This limitation is more obvious when considering hadoop clusters deployed on 
> VM or containers. In this case, only the VMs or containers running in the 
> same physical host are considered rack local, and actual rack-local hosts are 
> chosen with same probabilities as far hosts.
> The root cause of this limitation is that RMContainerAllocator.java performs 
> exact matching on rack id to find a rack local host. Alternatively, we can 
> perform longest-prefix matching to find a closest host. Using the same 
> network architecture as above, with longest-prefix matching, hosts are 
> selected with the following priorities:
>  h1
>  h2
>  h3 or h4
>  h5 or h6 or h7 or h8



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6803) MR AppMaster should assign container that is closest to the data

[jira] [Commented] (MAPREDUCE-6803) MR AppMaster should assign container that is closest to the data

[jira] [Commented] (MAPREDUCE-6803) MR AppMaster should assign container that is closest to the data

3 matches

Site Navigation

Mail list logo

Footer information