[
https://issues.apache.org/jira/browse/AURORA-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882006#comment-13882006
]
brian wickman commented on AURORA-117:
--------------------------------------
One option is to use a simple limit index in order to quickly determine if
scheduling an instance of a task on a slave will cause it to violate any limit
constraints. ~ python impl below:
{noformat}
class LimitIndex(defaultdict):
"""An index to keep track of limit constraints per job."""
def __init__(self, job):
self.__job = job
super(LimitIndex, self).__init__(lambda: defaultdict(int))
def update_job(self, job):
self.__job = job
def add_slave(self, slave):
for name, value in slave.attributes.items():
self[name][value] += 1
def remove_slave(self, slave):
for name, value in slave.attributes.items():
self[name][value] -= 1
def is_valid(self, slave):
"""Would adding this slave go over our attribute limit?"""
for name, limit in self.__job.constraints.limit_tuples():
if self[name][slave.attributes[name]] > limit:
return False
return True
{noformat}
> Scheduler performance issues with very large jobs
> -------------------------------------------------
>
> Key: AURORA-117
> URL: https://issues.apache.org/jira/browse/AURORA-117
> Project: Aurora
> Issue Type: Task
> Components: Scheduler
> Reporter: Bill Farner
>
> The scheduler tends to have performance issues when scheduling very large
> jobs. We've observed this with jobs exceeding 2000 instances. The
> {{TaskScheduler}} thread tends to consume a large amount of CPU (100%,
> limited by the global storage lock). Current hypothesis is that the majority
> of the time is spent satisfying diversity constraints (rack, machine), which
> require expensive queries.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)