[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499689#comment-16499689
 ] 

Xiang Li edited comment on MAPREDUCE-7100 at 6/4/18 2:54 AM:
-------------------------------------------------------------

The allocation of containing is more quick if I disabled adding requests for 
rack-local. But the MR job summary shows:
{code}
Rack-local map tasks=xxx
{code}
which is quite questionable to me, because I did not request rack-local 
containers, why are there rack-local map tasks?



was (Author: water):
The allocation of containing is more quick if I disabled adding requests for 
rack-local. But the MR job summary shows:
{code}
Rack-local map tasks=xxx
{code}
which is quite questionable to me, because I did not request rack-local 
containers, why are there rack-local map tasks.


> Provide options to skip adding resource request for data-local and rack-local 
> respectively
> ------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-7100
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7100
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: applicationmaster
>            Reporter: Xiang Li
>            Priority: Minor
>
> We are using hadoop 2.7.3 and the computing layer is running out of the 
> storage cluster (that is, node managers are running on a different set of 
> nodes from data nodes). The problem we meet is that the container allocation 
> is quite slow for some jobs.
> After some debugging, we found that in 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor#addContainerReq() 
> (the following code is from trunk, not 2.7.3)
> {code}
> protected void addContainerReq(ContainerRequest req) {
>     // Create resource requests
>     for (String host : req.hosts) {
>       // Data-local
>       if (!isNodeBlacklisted(host)) {
>         addResourceRequest(req.priority, host, req.capability,
>             null);
>       }
>     }
>     // Nothing Rack-local for now
>     for (String rack : req.racks) {
>       addResourceRequest(req.priority, rack, req.capability,
>           null);
>     }
>     // Off-switch
>     addResourceRequest(req.priority, ResourceRequest.ANY, req.capability,
>         req.nodeLabelExpression);
>   }
> {code}
> It seem that the request of data-local and rack-local could be skipped when 
> computing layer is not the same as the storage cluster.
> If I get it correctly, req.hosts and req.racks are provided by InputFormat. 
> If the mapper is to read HDFS, req.hosts is the corresponding data node and 
> req.racks is its rack. The debug log of AM is like:
> {code}
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: 
> addResourceRequest: applicationId=1 priority=20 resourceName=<data-node> 
> numContainers=1 #asks=1
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: 
> addResourceRequest: applicationId=1 priority=20 resourceName=<its rack> 
> numContainers=1 #asks=2
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: 
> addResourceRequest: applicationId=1 priority=20 resourceName=* 
> numContainers=1 #asks=3
> {code}
> Although eventually, the resource request with resourceName=<data-node> will 
> not be satisfied (because the data node is not node manager) in RM, it could 
> be better if AM does not request data-local or rack-local at the very 
> beginning, when we already know that computer layer runs out of the storage 
> cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to