[jira] [Commented] (AURORA-117) Scheduler performance issues with very large jobs

brian wickman (JIRA) Sat, 25 Jan 2014 11:35:00 -0800

    [ 
https://issues.apache.org/jira/browse/AURORA-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882006#comment-13882006
 ]


brian wickman commented on AURORA-117:
--------------------------------------

One option is to use a simple limit index in order to quickly determine if 
scheduling an instance of a task on a slave will cause it to violate any limit 
constraints.  ~ python impl below:

{noformat}
class LimitIndex(defaultdict):
  """An index to keep track of limit constraints per job."""

  def __init__(self, job):
    self.__job = job
    super(LimitIndex, self).__init__(lambda: defaultdict(int))

  def update_job(self, job):
    self.__job = job

  def add_slave(self, slave):
    for name, value in slave.attributes.items():
      self[name][value] += 1

  def remove_slave(self, slave):
    for name, value in slave.attributes.items():
      self[name][value] -= 1

  def is_valid(self, slave):
    """Would adding this slave go over our attribute limit?"""
    for name, limit in self.__job.constraints.limit_tuples():
      if self[name][slave.attributes[name]] > limit:
        return False
    return True
{noformat}

> Scheduler performance issues with very large jobs
> -------------------------------------------------
>
>                 Key: AURORA-117
>                 URL: https://issues.apache.org/jira/browse/AURORA-117
>             Project: Aurora
>          Issue Type: Task
>          Components: Scheduler
>            Reporter: Bill Farner
>
> The scheduler tends to have performance issues when scheduling very large 
> jobs.  We've observed this with jobs exceeding 2000 instances.  The 
> {{TaskScheduler}} thread tends to consume a large amount of CPU (100%, 
> limited by the global storage lock).  Current hypothesis is that the majority 
> of the time is spent satisfying diversity constraints (rack, machine), which 
> require expensive queries.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (AURORA-117) Scheduler performance issues with very large jobs

Reply via email to