[ 
https://issues.apache.org/jira/browse/SAMZA-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jagadish updated SAMZA-1552:
----------------------------
    Description: 
Kudos to [~abkshvn] for observing this!

We have observed host-affinity not being honored for some containers in very 
large jobs. When Yarn allocates more resources than what Samza requested on a 
specific host, the extra resources are added to a spare-pool called the 
"ANY_HOST Buffer". Later, when there is a resource request for the same host 
from Samza and Yarn does not return resources, we don't leverage the spare-pool 
of previously returned resources in that host. 

This problem is specially pronounced in clusters that are heavily loaded in 
cpu, and memory where allocations need to satisfy both cpu and memory 
requirements of available hosts (Often, hosts have cpu but not memory or 
vice-versa). If there are a lot of container failures on a particular host in 
the midst of allocation, it further aggravates this problem.

The fix is as follows:
Check if there are available containers in the buffer corresponding to our 
preferred host. If not, we should also scan the ANY-HOST buffer for matched 
containers.

  was:
Kudos to [~abkshvn] for observing this!

We have observed host-affinity not being honored for some containers in very 
large jobs. When Yarn allocates more resources than what Samza requested on a 
specific host, the extra resources are added to a spare-pool called the 
"ANY_HOST Buffer". Later, when there is a resource request for the same host 
from Samza and Yarn does not return resources, we don't leverage the spare-pool 
of previously returned resources in that host. 

This problem is specially pronounced in clusters that are heavily loaded in 
cpu, and memory where allocations need to satisfy both cpu and memory 
requirements of available hosts (Often, hosts have cpu but not memory or 
vice-versa). If there are a lot of container failures on a particular host in 
the midst of allocation, it further aggravates this problem.

The fix is as follows:
We will need to scan the ANY_HOST buffers for available containers to service 
requests for container allocation.


> Host affinity improvements - Improve matching of hosts to allocated resources
> -----------------------------------------------------------------------------
>
>                 Key: SAMZA-1552
>                 URL: https://issues.apache.org/jira/browse/SAMZA-1552
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Abhishek Shivanna
>            Assignee: Jagadish
>
> Kudos to [~abkshvn] for observing this!
> We have observed host-affinity not being honored for some containers in very 
> large jobs. When Yarn allocates more resources than what Samza requested on a 
> specific host, the extra resources are added to a spare-pool called the 
> "ANY_HOST Buffer". Later, when there is a resource request for the same host 
> from Samza and Yarn does not return resources, we don't leverage the 
> spare-pool of previously returned resources in that host. 
> This problem is specially pronounced in clusters that are heavily loaded in 
> cpu, and memory where allocations need to satisfy both cpu and memory 
> requirements of available hosts (Often, hosts have cpu but not memory or 
> vice-versa). If there are a lot of container failures on a particular host in 
> the midst of allocation, it further aggravates this problem.
> The fix is as follows:
> Check if there are available containers in the buffer corresponding to our 
> preferred host. If not, we should also scan the ANY-HOST buffer for matched 
> containers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to