> On April 17, 2017, 1:27 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java
> > Lines 52-55 (patched)
> > <https://reviews.apache.org/r/58259/diff/2/?file=1689854#file1689854line52>
> >
> >     I am trying to understand if this is a good default for this 
> > best-effort feature.
> >     
> >     What is your cluster-wide MTTA? It should give us a decent hint for a 
> > suitable default.
> 
> David McLaughlin wrote:
>     Our MTTA can range from a couple milliseconds to several minutes. Depends 
> how many tasks are pending and how full the cluster is.
> 
> Stephan Erb wrote:
>     If I understand this correctly, this patch will help the "good case" but 
> could fall down quickly during overload: If the cluster is getting overloaded 
> with pending tasks, the 1 min timeout might not be sufficient to place a job 
> in its reserved spot. This will then lead to preemptions that further 
> aggregate the overload situation. 
>     
>     We will need a counter to track those expired reservations.
> 
> David McLaughlin wrote:
>     Yup, definitely need more metrics here. If the community gives the 
> overall approach a +1, I will move forward with making this production ready.

A hard-limit of 1 minute may not be enough and raising the limit might causing 
starvation.


- Santhosh Kumar


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58259/#review172122
-----------------------------------------------------------


On April 12, 2017, 12:51 a.m., David McLaughlin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58259/
> -----------------------------------------------------------
> 
> (Updated April 12, 2017, 12:51 a.m.)
> 
> 
> Review request for Aurora, Santhosh Kumar Shanmugham, Stephan Erb, and Zameer 
> Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> In the Dynamic Reservations review (and on the mailing list), I mentioned 
> that we could implement update affinity with less complexity using the same 
> technique as preemption. Here is how that would work. 
> 
> This just adds a simple wrapper around the preemptor's BiCache structure and 
> then optimistically tries to keep an agent free for a task during the update 
> process. 
> 
> 
> Note: I don't bother even checking the resources before reserving the agent. 
> I figure there is a chance the agent has enough room, and if not we'll catch 
> it when we attempt to veto the offer. We need to always check the offer like 
> this anyway in case constraints change. In the worst case it adds some delay 
> in the rare cases you increase resources. 
> 
> We also don't persist the reservations, so if the Scheduler fails over during 
> an update, the worst case is that any instances between the KILLED and 
> ASSIGNED in-flight batch need to fall back to the current first-fit 
> scheduling algorithm.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> f0b148cd158d61cd89cc51dca9f3fa4c6feb1b49 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
> 203f62bacc47470545d095e4d25f7e0f25990ed9 
>   src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 
> a177b301203143539b052524d14043ec8a85a46d 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceAction.java 
> b4cd01b3e03029157d5ca5d1d8e79f01296b57c2 
>   
> src/main/java/org/apache/aurora/scheduler/updater/InstanceActionHandler.java 
> f25dc0c6d9c05833b9938b023669c9c36a489f68 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
> c129896d8cd54abd2634e2a339c27921042b0162 
>   
> src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
>  e14112479807b4477b82554caf84fe733f62cf58 
>   src/main/java/org/apache/aurora/scheduler/updater/StateEvaluator.java 
> c95943d242dc2f539778bdc9e071f342005e8de3 
>   src/main/java/org/apache/aurora/scheduler/updater/UpdateAgentReserver.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java 
> 13cbdadad606d9acaadc541320b22b0ae538cc5e 
>   
> src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java
>  fa1a81785802b82542030e1aae786fe9570d9827 
>   src/test/java/org/apache/aurora/scheduler/state/TaskAssignerImplTest.java 
> cf2d25ec2e407df7159e0021ddb44adf937e1777 
>   src/test/java/org/apache/aurora/scheduler/updater/AddTaskTest.java 
> b2c4c66850dd8f35e06a631809530faa3b776252 
>   src/test/java/org/apache/aurora/scheduler/updater/InstanceUpdaterTest.java 
> c78c7fbd7d600586136863c99ce3d7387895efee 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 
> 30b44f88a5b8477e917da21d92361aea1a39ceeb 
>   src/test/java/org/apache/aurora/scheduler/updater/KillTaskTest.java 
> 833fd62c870f96b96343ee5e0eed0d439536381f 
>   
> src/test/java/org/apache/aurora/scheduler/updater/NullAgentReserverTest.java 
> PRE-CREATION 
>   
> src/test/java/org/apache/aurora/scheduler/updater/UpdateAgentReserverImplTest.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/58259/diff/2/
> 
> 
> Testing
> -------
> 
> ./gradlew build
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>

Reply via email to