Maxim Khutornenko created AURORA-1549:
-----------------------------------------

             Summary: Updater kills instances with scoped update
                 Key: AURORA-1549
                 URL: https://issues.apache.org/jira/browse/AURORA-1549
             Project: Aurora
          Issue Type: Bug
          Components: Scheduler
            Reporter: Maxim Khutornenko
            Assignee: Maxim Khutornenko


Consider the following sequence for the hello_world job with 3 instances:
{noformat}
aurora job create devcluster/www-data/prod/hello 
aurora/examples/jobs/hello_world.aurora
<change config to trigger update, e.g. change RAM>
aurora update start devcluster/www-data/prod/hello/0 
aurora/examples/jobs/hello_world.aurora 
aurora job kill devcluster/www-data/prod/hello/1
aurora update start devcluster/www-data/prod/hello/0,1 
aurora/examples/jobs/hello_world.aurora 
{noformat}

The expectation is to have all 3 instances on the same config. The result: 
instance 0 is killed with only instances 1 and 2 remaining.

The problem is that 
[UpdateFactory|https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java#L95-L101]
 iterates over scoped instances thus overriding the JobDiff results. This leads 
to 
[InstanceUpdater|https://github.com/apache/aurora/blob/d7a1619fa85195937e74d1b09594909f0ed0ffd5/src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java#L102-L107]
 killing any instances that are present in actual state but not present in the 
desired state. 

These are the (correct) results produced by the 
[JobDiff|https://github.com/apache/aurora/blob/2e2371481d9aaccd6a45ad0f442d963d5ae7a3c8/src/main/java/org/apache/aurora/scheduler/updater/JobDiff.java#L185-L202]
 that should be used to drive the update instead:
{noformat}
"Unscoped diff contents:"
Replaced: [2]
Replacements: [1, 2]
Unchanged: [0]
"Scoped (final) diff contents:"
Replaced: []
Replacements: [1]
Unchanged: [2, 0]
{noformat}

The current behavior appears to be a leftover that should have been removed in 
this refactoring: https://reviews.apache.org/r/25969/.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to