Re: Review Request 66716: Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-23 Thread Jordan Ly

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/#review203701
---



Great documentation and tests!

Looks pretty much ready to go from my point of view. I've been using the 
`SlaManager` to test my proposed SLA-aware updates for a while.


src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
Lines 234 (patched)


I would throw an exception here in case the user is not aware of the 
`minRequiredInstances` and expects SLA to be respected but it will be silently 
ignored.



src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
Lines 236 (patched)


Should we also warn users if they are setting an SLA Policy for a 
preemptible task? They may be unaware why SLA is not being respected.



src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
Lines 256 (patched)


I would add a more descriptive message here (I found it confusing when I 
first saw it). Maybe something that expresses intent like:

Current percentage will not allow any instances to be killed. Must be less 
than ...



src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
Lines 265 (patched)


Same as above: duration_secs must be less than cluster-wide maximum of ...



src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
Lines 277 (patched)


Same as above.



src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
Lines 285 (patched)


Same as above.



src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java
Lines 236 (patched)


I think listing the unaffected tasks might be a bit gratuitious. I would 
have that as a debug statement if needed.

When doing SLA-aware updates, a big job will produce huge `unaffected 
tasks` lists repeatedly.



src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java
Lines 273 (patched)


I believe that if the coordinator is locked for more than 10 seconds, then 
this method will return false. However, the code will continue since the output 
is ignored despite the lock never being gotten.

I think waiting for 10 seconds is also a long time to block the thread. Is 
it feasible to try and and get the lock and if it is not immediately available 
to return? The caller can invoke the method again 10 seconds later for 
essentially the same effect if they desire.



src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java
Lines 314-318 (patched)


Is there a way to set a timeout or increment a metric when a coordinator 
takes too long to respond?

Is there any downside to allowing long requests?



src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java
Lines 448-449 (patched)


Remove `.newBuilder()`



src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java
Lines 453 (patched)


Remove `ISlaPolicy.build`



src/test/java/org/apache/aurora/scheduler/sla/SlaManagerTest.java
Lines 1018 (patched)


`Thread.sleep(15000)` here breaks this test per my comment in `SlaManager`


- Jordan Ly


On May 23, 2018, 12:21 a.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66716/
> ---
> 
> (Updated May 23, 2018, 12:21 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> `Tasks` can specify custom SLA requirements as part of
> their `TaskConfig`. One of the new features is the ability
> to specify an external coordinator that can ACK/NACK
> maintenance requests for tasks. This will be hugely
> beneficial for onboarding services that cannot satisfactorily
> specify SLA in terms of running instances.
> 
> Maintenance requests are driven from the Scheduler to
> improve management of nodes in the cluster.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md 5e1f9940a7974e212140b7e5304695afa7f96e78 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ff48000d613ceef3e03586b94944d13275fb127c 

Re: Review Request 66716: Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-23 Thread Santhosh Kumar Shanmugham


> On May 23, 2018, 4:13 p.m., Jordan Ly wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java
> > Lines 236 (patched)
> > 
> >
> > I think listing the unaffected tasks might be a bit gratuitious. I 
> > would have that as a debug statement if needed.
> > 
> > When doing SLA-aware updates, a big job will produce huge `unaffected 
> > tasks` lists repeatedly.

Dropping the unaffected tasks list.


> On May 23, 2018, 4:13 p.m., Jordan Ly wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java
> > Lines 273 (patched)
> > 
> >
> > I believe that if the coordinator is locked for more than 10 seconds, 
> > then this method will return false. However, the code will continue since 
> > the output is ignored despite the lock never being gotten.
> > 
> > I think waiting for 10 seconds is also a long time to block the thread. 
> > Is it feasible to try and and get the lock and if it is not immediately 
> > available to return? The caller can invoke the method again 10 seconds 
> > later for essentially the same effect if they desire.

Good catch. Dropping the timeout.


> On May 23, 2018, 4:13 p.m., Jordan Ly wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java
> > Lines 314-318 (patched)
> > 
> >
> > Is there a way to set a timeout or increment a metric when a 
> > coordinator takes too long to respond?
> > 
> > Is there any downside to allowing long requests?

The `HttpClient` already sets `requestTimeout`, `connectTimeout`, 
`handshakeTimeout`, `readTimeout` etc. Is there something that is not covered 
by this?


> On May 23, 2018, 4:13 p.m., Jordan Ly wrote:
> > src/test/java/org/apache/aurora/scheduler/sla/SlaManagerTest.java
> > Lines 1018 (patched)
> > 
> >
> > `Thread.sleep(15000)` here breaks this test per my comment in 
> > `SlaManager`

Addressed above comment.


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/#review203701
---


On May 22, 2018, 5:21 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66716/
> ---
> 
> (Updated May 22, 2018, 5:21 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> `Tasks` can specify custom SLA requirements as part of
> their `TaskConfig`. One of the new features is the ability
> to specify an external coordinator that can ACK/NACK
> maintenance requests for tasks. This will be hugely
> beneficial for onboarding services that cannot satisfactorily
> specify SLA in terms of running instances.
> 
> Maintenance requests are driven from the Scheduler to
> improve management of nodes in the cluster.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md 5e1f9940a7974e212140b7e5304695afa7f96e78 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ff48000d613ceef3e03586b94944d13275fb127c 
>   docs/README.md 166bf1ce240474f0a181e023439cfbfbe7363822 
>   docs/features/sla-requirements.md PRE-CREATION 
>   docs/reference/configuration.md d4b869b938105ba301fc88d41019af2f1707f6f4 
>   docs/reference/scheduler-configuration.md 
> a659cfac974059b04ef5593286011decbb7f9110 
>   examples/vagrant/systemd/aurora-scheduler.service 
> 57e4bba858672f8da94eaa0499f8e5f3347ab982 
>   src/main/java/org/apache/aurora/scheduler/app/AppModule.java 
> ffc07443fae9e5216a5333ae305f75aa9b452a0c 
>   src/main/java/org/apache/aurora/scheduler/config/CliOptions.java 
> a2fb0393ba47e876c4c8c63e3ed27ebe42cb6ca3 
>   
> src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
>  4073229b74d0e0e7fd31552bd96894ceb8a0971a 
>   
> src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 
> 3b4df55a05873e79aae206b117cbc753fa3abb94 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java 
> 25ed474289f369e74c24e999ad97ed6810c9fd5e 
>   src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java 
> f58c66aaebe8d31913d67a05add0f3d6054e88d1 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> 0e0f90b670bbbcd6cb3aa302ce4a9abfe70ea979 
>   src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySc

Re: Review Request 66716: Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-23 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/
---

(Updated May 23, 2018, 11:24 p.m.)


Review request for Aurora, David McLaughlin, Jordan Ly, and Stephan Erb.


Changes
---

- e2e test
- Jordan's feedback


Repository: aurora


Description
---

`Tasks` can specify custom SLA requirements as part of
their `TaskConfig`. One of the new features is the ability
to specify an external coordinator that can ACK/NACK
maintenance requests for tasks. This will be hugely
beneficial for onboarding services that cannot satisfactorily
specify SLA in terms of running instances.

Maintenance requests are driven from the Scheduler to
improve management of nodes in the cluster.


Diffs (updated)
-

  RELEASE-NOTES.md 5e1f9940a7974e212140b7e5304695afa7f96e78 
  api/src/main/thrift/org/apache/aurora/gen/api.thrift 
ff48000d613ceef3e03586b94944d13275fb127c 
  docs/README.md 166bf1ce240474f0a181e023439cfbfbe7363822 
  docs/features/sla-requirements.md PRE-CREATION 
  docs/reference/configuration.md d4b869b938105ba301fc88d41019af2f1707f6f4 
  docs/reference/scheduler-configuration.md 
a659cfac974059b04ef5593286011decbb7f9110 
  examples/vagrant/systemd/aurora-scheduler.service 
57e4bba858672f8da94eaa0499f8e5f3347ab982 
  src/main/java/org/apache/aurora/scheduler/app/AppModule.java 
ffc07443fae9e5216a5333ae305f75aa9b452a0c 
  src/main/java/org/apache/aurora/scheduler/config/CliOptions.java 
a2fb0393ba47e876c4c8c63e3ed27ebe42cb6ca3 
  
src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
 4073229b74d0e0e7fd31552bd96894ceb8a0971a 
  src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java 
PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 
3b4df55a05873e79aae206b117cbc753fa3abb94 
  src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java 
25ed474289f369e74c24e999ad97ed6810c9fd5e 
  src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java 
f58c66aaebe8d31913d67a05add0f3d6054e88d1 
  src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
0e0f90b670bbbcd6cb3aa302ce4a9abfe70ea979 
  src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImpl.java 
e88cad6cf12312512e6840329db7ca7134ceaae6 
  
src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java 
9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b 
  src/main/python/apache/aurora/admin/admin_util.py 
8240e8093160623b4c30dd212a88b8e122fd9856 
  src/main/python/apache/aurora/admin/host_maintenance.py 
83fc2b6ece40d3436cc7de7a034f95224235fcfd 
  src/main/python/apache/aurora/admin/maintenance.py 
942a237f47a6e0416bbaf244278685477e0f407d 
  src/main/python/apache/aurora/client/api/__init__.py 
f6fd1dd6d7c2bdd5bca3037f501b36badab78c75 
  src/main/python/apache/aurora/client/cli/context.py 
06b194114a7f44a61943e0932973e71b53f239b4 
  src/main/python/apache/aurora/client/cli/jobs.py 
536d04a21d32c4e586dc943a6f9b0ad0143354a3 
  src/test/java/org/apache/aurora/scheduler/app/SchedulerIT.java 
63c338e5bbdf60de0fba8d68c6613904abb93fa8 
  src/test/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
8c1f5ce6d7eb94ec4e0302bfd41318bd0797a1a5 
  src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
e66ec116112df164106598d9ff0bc9e8f465e44f 
  
src/test/java/org/apache/aurora/scheduler/configuration/ConfigurationManagerTest.java
 749ffeac6cb851f32bba7606390203d7a046a0e6 
  src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 
0fabb3370713e57d417adbd2af9e24a4d515c60a 
  src/test/java/org/apache/aurora/scheduler/cron/quartz/QuartzTestUtil.java 
b7dcf3af366c9def63165dc9bef998ab5e95ed49 
  src/test/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandlerTest.java 
c6163bbabc7e7748f167b679893a93f58e4ef1ac 
  src/test/java/org/apache/aurora/scheduler/sla/SlaManagerTest.java 
PRE-CREATION 
  src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java 
d37e7a07e9258bc8c0758bf50aece5b79025126b 
  
src/test/java/org/apache/aurora/scheduler/state/MaintenanceControllerImplTest.java
 770846e84e9980ea3dbf9e1c46b0d45c5488c5b3 
  
src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
 2cf66d8154ad3795989ee9026e45af1be509f244 
  src/test/java/org/apache/aurora/scheduler/thrift/ThriftIT.java 
40851c419e4d62e6545959eebc0ce144fdecc697 
  src/test/java/org/apache/aurora/scheduler/thrift/aop/MockDecoratedThrift.java 
d412090c292691305f01bccd1596fb0f6bb003ad 
  src/test/python/apache/aurora/admin/test_maintenance.py 
ca0239b157f9f9053821af0328b9448703386cd4 
  src/test/python/apache/aurora/api_util.py 
3fc9b478cc9aada0503e8ed8698a37b4ed926cdd 
  src/test/python/apache/aurora/client/api/test_scheduler_client.py 
f2a2eae1539f7f6dff6855e4122cc41c6cbb0f7b 
  src/