Re: Review Request 58259: Add update affinity to Scheduler

David McLaughlin Wed, 19 Apr 2017 17:01:39 -0700


> On April 19, 2017, 8 a.m., Santhosh Kumar Shanmugham wrote:
> > src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java
> > Lines 228-229 (patched)
> > <https://reviews.apache.org/r/58259/diff/2/?file=1689847#file1689847line255>
> >
> >     Should we check if the offer is already reserved for preemption before 
> > proceeding? It is possible for a preemption to fail an update, if the 
> > premption steals the instance that was just updated.


Currently that is not possible, since we also check if the agentReserver has 
any reservations for a host inside the maybeAssign default path.


> On April 19, 2017, 8 a.m., Santhosh Kumar Shanmugham wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/InstanceActionHandler.java
> > Lines 98-106 (original), 100-108 (patched)
> > <https://reviews.apache.org/r/58259/diff/2/?file=1689849#file1689849line100>
> >
> >     In order to reduce the duration the offer is help in the 
> > `UpdateAgentReserver`, wonder if this might be a better place to add to the 
> > reserver? Killing a task can take arbitrarily long and we risk expiring the 
> > offer.

There are race conditions that appear when moving the logic to ADD_TASK. Some 
other pending task could be assigned into the offer in the gap between the task 
being killed by Mesos and the JobUpdateController processing it. We use batch 
workers to batch up work, so there's definitely gaps where Mesos can send the 
resources freed up by the kill to Aurora and they are used for some other 
pending task during this window. 

It might seem like a rare race condition, but we've seen at Twitter issues 
where a bad host causes chronic issues because there are no slots for some 
large tasks. So in low capacity situations, we'd see things like this happen a 
lot.  

Also note: killing a task doesn't have unbounded execution. There is an upper 
bound of 5s at the executor level. One other issue here is the turnaround from 
a task being killed to Mesos sending Aurora a new offer with the freed up 
resources. Obviously there are best-effort network calls involved in both 
killing and the offer management, so there are definitely issues where offers 
could expire unnecessarily due to latency. 

At least by reserving the agent upfront when you learn that you're going to be 
sending a future ADD_TASK, you have the control to extend the timeout to 
increase cache hits (although I'd be careful extending it too long). 

We will have run several simulations in our scale test environment to confirm 
some of this before it ships.


- David


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58259/#review172324
-----------------------------------------------------------


On April 12, 2017, 7:51 a.m., David McLaughlin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58259/
> -----------------------------------------------------------
> 
> (Updated April 12, 2017, 7:51 a.m.)
> 
> 
> Review request for Aurora, Santhosh Kumar Shanmugham, Stephan Erb, and Zameer 
> Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> In the Dynamic Reservations review (and on the mailing list), I mentioned 
> that we could implement update affinity with less complexity using the same 
> technique as preemption. Here is how that would work. 
> 
> This just adds a simple wrapper around the preemptor's BiCache structure and 
> then optimistically tries to keep an agent free for a task during the update 
> process. 
> 
> 
> Note: I don't bother even checking the resources before reserving the agent. 
> I figure there is a chance the agent has enough room, and if not we'll catch 
> it when we attempt to veto the offer. We need to always check the offer like 
> this anyway in case constraints change. In the worst case it adds some delay 
> in the rare cases you increase resources. 
> 
> We also don't persist the reservations, so if the Scheduler fails over during 
> an update, the worst case is that any instances between the KILLED and 
> ASSIGNED in-flight batch need to fall back to the current first-fit 
> scheduling algorithm.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> f0b148cd158d61cd89cc51dca9f3fa4c6feb1b49 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
> 203f62bacc47470545d095e4d25f7e0f25990ed9 
>   src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 
> a177b301203143539b052524d14043ec8a85a46d 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceAction.java 
> b4cd01b3e03029157d5ca5d1d8e79f01296b57c2 
>   
> src/main/java/org/apache/aurora/scheduler/updater/InstanceActionHandler.java 
> f25dc0c6d9c05833b9938b023669c9c36a489f68 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
> c129896d8cd54abd2634e2a339c27921042b0162 
>   
> src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
>  e14112479807b4477b82554caf84fe733f62cf58 
>   src/main/java/org/apache/aurora/scheduler/updater/StateEvaluator.java 
> c95943d242dc2f539778bdc9e071f342005e8de3 
>   src/main/java/org/apache/aurora/scheduler/updater/UpdateAgentReserver.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java 
> 13cbdadad606d9acaadc541320b22b0ae538cc5e 
>   
> src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java
>  fa1a81785802b82542030e1aae786fe9570d9827 
>   src/test/java/org/apache/aurora/scheduler/state/TaskAssignerImplTest.java 
> cf2d25ec2e407df7159e0021ddb44adf937e1777 
>   src/test/java/org/apache/aurora/scheduler/updater/AddTaskTest.java 
> b2c4c66850dd8f35e06a631809530faa3b776252 
>   src/test/java/org/apache/aurora/scheduler/updater/InstanceUpdaterTest.java 
> c78c7fbd7d600586136863c99ce3d7387895efee 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 
> 30b44f88a5b8477e917da21d92361aea1a39ceeb 
>   src/test/java/org/apache/aurora/scheduler/updater/KillTaskTest.java 
> 833fd62c870f96b96343ee5e0eed0d439536381f 
>   
> src/test/java/org/apache/aurora/scheduler/updater/NullAgentReserverTest.java 
> PRE-CREATION 
>   
> src/test/java/org/apache/aurora/scheduler/updater/UpdateAgentReserverImplTest.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/58259/diff/2/
> 
> 
> Testing
> -------
> 
> ./gradlew build
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>

Re: Review Request 58259: Add update affinity to Scheduler

Reply via email to