Maxim Khutornenko created AURORA-1600:
-----------------------------------------

             Summary: Job updates with large count of instance overrides halt 
scheduler perf
                 Key: AURORA-1600
                 URL: https://issues.apache.org/jira/browse/AURORA-1600
             Project: Aurora
          Issue Type: Bug
          Components: Scheduler
            Reporter: Maxim Khutornenko
            Assignee: Maxim Khutornenko
            Priority: Critical


We have observed a case when a user update with a large number of specified 
instance overrides (updateOnlyTheseInstances) results in significant 
performance deterioration to the extent of scheduler processing almost no 
offers and not scheduling any pending tasks for long periods (minutes to 
hours). 

The culprit appears to be the {{selectInstructions}} query. It's unacceptably 
slow when number of instanceConfigs and/or instance overrides approaches 100. 
Since it's called inside a write lock to guide individual instance updates, 
nothing else can proceed including status updates and offer activities. 

I was able to replicate this in jmh. Fix is incoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to