[ https://issues.apache.org/jira/browse/SAMZA-886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15220376#comment-15220376 ]
Jake Maes commented on SAMZA-886: --------------------------------- One follow up comment for posterity. With this issue host affinity seems to work in a degraded mode and performs better with jobs that have more containers. The reason is when these rack resolver issues occur, none of the preferred host requests are honored and a set of random containers are returned. However the HostAwareContainerAllocator doesn't know this. If any of the random containers happens to match a preferred host, it uses it accordingly. And the more containers requested (for a fixed size cluster) the higher the probability that random containers will match a preferred host. > Investigate 'relax locality' to improve Host Affinity > ------------------------------------------------------ > > Key: SAMZA-886 > URL: https://issues.apache.org/jira/browse/SAMZA-886 > Project: Samza > Issue Type: Bug > Reporter: Jagadish > Assignee: Jake Maes > Attachments: RelaxedLocality experiments.pdf > > > I ran several tests experimenting Samza with a cluster of size 36 nodes. I > have the following observations: > 1.On a cluster with about 50% utilization. The percentage of requests that > are mapped to preferred hosts seems to depend on yarn.container.count. The % > is higher when yarn.container.count is comparable to the size of the cluster. > (For example.) I get about 50% of requests matched when yarn.container.count > is 30. and When yarn.container.count is 10, only 27% of requests are matched. > (on a 36 node cluster) > One reason is because, when spawning a large # of containers initially, many > requests are made in bulk successively, there is a good chance that any > random host in the cluster will match with the preferred request. However, > when spawning a particular container during failure, there's only one request > for the failed container, and it has a lesser chance of a match. > The results are averaged across 20 runs in each scenario. > 2. On a cluster with about zero utilization, 100% of requests are matched to > preferred hosts irrespective of yarn.container.count. > This ticket is to explore alternatives to see if they will improve % of > matched hosts. > I believe these ideas are worth trying: > 1. Yarn supports the idea of a 'relaxed locality' flag that can be specified > with the request. We could set 'relaxed locality' to false. (This will ensure > that we get the request on the exact same host we ask for.) If we don't get > such a request within a timeout, we may re-request the same request with > 'relaxed locality' to true. (as we currently do now.) > 2. Re-issue the same preferred host request again, if the hosts returned > don't match the request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)