[ 
https://issues.apache.org/jira/browse/SAMZA-886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15220376#comment-15220376
 ] 

Jake Maes commented on SAMZA-886:
---------------------------------

One follow up comment for posterity. 

With this issue host affinity seems to work in a degraded mode and performs 
better with jobs that have more containers. The reason is when these rack 
resolver issues occur, none of the preferred host requests are honored and a 
set of random containers are returned. However the HostAwareContainerAllocator 
doesn't know this. If any of the random containers happens to match a preferred 
host, it uses it accordingly. And the more containers requested (for a fixed 
size cluster) the higher the probability that random containers will match a 
preferred host. 

> Investigate 'relax locality' to improve Host Affinity 
> ------------------------------------------------------
>
>                 Key: SAMZA-886
>                 URL: https://issues.apache.org/jira/browse/SAMZA-886
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Jagadish
>            Assignee: Jake Maes
>         Attachments: RelaxedLocality experiments.pdf
>
>
> I ran several tests experimenting Samza with a cluster of size 36 nodes. I 
> have the following observations:
> 1.On a cluster with about 50% utilization. The percentage of requests that 
> are mapped to preferred hosts seems to depend on yarn.container.count. The % 
> is higher when yarn.container.count is comparable to the size of the cluster.
> (For example.) I get about 50% of requests matched when yarn.container.count 
> is 30. and When yarn.container.count is 10, only 27% of requests are matched. 
> (on a 36 node cluster)
> One reason is because, when spawning a large # of containers initially, many 
> requests are made in bulk successively, there is a good chance that any 
> random host in the cluster will match with the preferred request. However, 
> when spawning a particular container during failure, there's only one request 
> for the failed container, and it has a lesser chance of a match.
> The results are averaged across 20 runs in each scenario.
> 2. On a cluster with about zero utilization, 100% of requests are matched to 
> preferred hosts irrespective of yarn.container.count.
> This ticket is to explore alternatives to see if they will improve % of 
> matched hosts. 
> I believe these ideas are worth trying:
> 1. Yarn supports the idea of a 'relaxed locality' flag that can be specified 
> with the request. We could set 'relaxed locality' to false. (This will ensure 
> that we get the request on the exact same host we ask for.) If we don't get 
> such a request within a timeout, we may re-request the same request with 
> 'relaxed locality' to true. (as we currently do now.)
> 2. Re-issue the same preferred host request again, if the hosts returned 
> don't match the request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to