[ https://issues.apache.org/jira/browse/SAMZA-886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15177263#comment-15177263 ]
Jagadish commented on SAMZA-886: -------------------------------- Attaching a log file where only 50% of containers matched. (when spawning 30 containers on a 36 node cluster). When observing running fewer containers about 10, I observe that the matched ratio dips to about 27% > Investigate 'relax locality' to improve Host Affinity > ------------------------------------------------------ > > Key: SAMZA-886 > URL: https://issues.apache.org/jira/browse/SAMZA-886 > Project: Samza > Issue Type: Bug > Reporter: Jagadish > Attachments: 50_pct_match.log > > > I ran several tests experimenting Samza with a cluster of size 36 nodes. I > have the following observations: > 1.On a cluster with about 50% utilization. The percentage of requests that > are mapped to preferred hosts seems to depend on yarn.container.count. The % > is higher when yarn.container.count is comparable to the size of the cluster. > (For example.) I get about 50% of requests matched when yarn.container.count > is 30. and When yarn.container.count is 10, only 27% of requests are matched. > (on a 36 node cluster) > One reason is because, when spawning a large # of containers initially, many > requests are made in bulk successively, there is a good chance that any > random host in the cluster will match with the preferred request. However, > when spawning a particular container during failure, there's only one request > for the failed container, and it has a lesser chance of a match. > The results are averaged across 20 runs in each scenario. > 2. On a cluster with about zero utilization, 100% of requests are matched to > preferred hosts irrespective of yarn.container.count. > This ticket is to explore alternatives to see if they will improve % of > matched hosts. > I believe these ideas are worth trying: > 1. Yarn supports the idea of relaxed locality with request. We could set > 'relaxed locality' to false. (This will ensure that we get the request on the > exact same host we ask for.) If we don't get such a request within a timeout, > we may re-request the same request with 'relaxed locality' to true. (as we > currently do now.) > 2. Re-issue the same preferred host request again, if the hosts returned > don't match the request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)