[ https://issues.apache.org/jira/browse/SAMZA-886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jake Maes updated SAMZA-886: ---------------------------- Attachment: SAMZA-886_2.patch SAMZA-886_2.patch == diff 2 in the review > Investigate 'relax locality' to improve Host Affinity > ------------------------------------------------------ > > Key: SAMZA-886 > URL: https://issues.apache.org/jira/browse/SAMZA-886 > Project: Samza > Issue Type: Bug > Reporter: Jagadish > Assignee: Jake Maes > Attachments: RelaxedLocality experiments.pdf, SAMZA-886.patch, > SAMZA-886_2.patch > > > I ran several tests experimenting Samza with a cluster of size 36 nodes. I > have the following observations: > 1.On a cluster with about 50% utilization. The percentage of requests that > are mapped to preferred hosts seems to depend on yarn.container.count. The % > is higher when yarn.container.count is comparable to the size of the cluster. > (For example.) I get about 50% of requests matched when yarn.container.count > is 30. and When yarn.container.count is 10, only 27% of requests are matched. > (on a 36 node cluster) > One reason is because, when spawning a large # of containers initially, many > requests are made in bulk successively, there is a good chance that any > random host in the cluster will match with the preferred request. However, > when spawning a particular container during failure, there's only one request > for the failed container, and it has a lesser chance of a match. > The results are averaged across 20 runs in each scenario. > 2. On a cluster with about zero utilization, 100% of requests are matched to > preferred hosts irrespective of yarn.container.count. > This ticket is to explore alternatives to see if they will improve % of > matched hosts. > I believe these ideas are worth trying: > 1. Yarn supports the idea of a 'relaxed locality' flag that can be specified > with the request. We could set 'relaxed locality' to false. (This will ensure > that we get the request on the exact same host we ask for.) If we don't get > such a request within a timeout, we may re-request the same request with > 'relaxed locality' to true. (as we currently do now.) > 2. Re-issue the same preferred host request again, if the hosts returned > don't match the request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)