Re: Review Request 66716: Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-17 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/#review203396
---



Master (805a53f) is red with this patch.
  ./build-support/jenkins/build.sh

Pass 2: Analyzing classes (334 / 347) - 96% complete
Pass 2: Analyzing classes (335 / 347) - 96% complete
Pass 2: Analyzing classes (336 / 347) - 96% complete
Pass 2: Analyzing classes (337 / 347) - 97% complete
Pass 2: Analyzing classes (338 / 347) - 97% complete
Pass 2: Analyzing classes (339 / 347) - 97% complete
Pass 2: Analyzing classes (340 / 347) - 97% complete
Pass 2: Analyzing classes (341 / 347) - 98% complete
Pass 2: Analyzing classes (342 / 347) - 98% complete
Pass 2: Analyzing classes (343 / 347) - 98% complete
Pass 2: Analyzing classes (344 / 347) - 99% complete
Pass 2: Analyzing classes (345 / 347) - 99% complete
Pass 2: Analyzing classes (346 / 347) - 99% complete
Pass 2: Analyzing classes (347 / 347) - 100% complete
Done with analysis
:test

org.apache.aurora.scheduler.events.WebhookTest > 
testTaskChangedWithOldStateError FAILED
java.lang.AssertionError at WebhookTest.java:251
I0518 03:05:13.203 [ShutdownHook, SchedulerMain] Stopping scheduler services. 

1125 tests completed, 1 failed, 1 skipped
:test FAILED
:jacocoTestReport
Coverage report generated: 
file:///home/jenkins/jenkins-slave/workspace/AuroraBot/dist/reports/jacoco/test/html/index.html
:jacocoTestCoverageVerification

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':test'.
> There were failing tests. See the report at: 
> file:///home/jenkins/jenkins-slave/workspace/AuroraBot/dist/reports/tests/test/index.html

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output.

* Get more help at https://help.gradle.org

BUILD FAILED in 13m 59s
45 actionable tasks: 36 executed, 9 up-to-date


I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On May 18, 2018, 2:49 a.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66716/
> ---
> 
> (Updated May 18, 2018, 2:49 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> `Tasks` can specify custom SLA requirements as part of
> their `TaskConfig`. One of the new features is the ability
> to specify an external coordinator that can ACK/NACK
> maintenance requests for tasks. This will be hugely
> beneficial for onboarding services that cannot satisfactorily
> specify SLA in terms of running instances.
> 
> Maintenance requests are driven from the Scheduler to
> improve management of nodes in the cluster.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md 5e1f9940a7974e212140b7e5304695afa7f96e78 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
> b79e2045ccda05d5058565f81988dfe33feea8f1 
>   docs/README.md 166bf1ce240474f0a181e023439cfbfbe7363822 
>   docs/features/sla-requirements.md PRE-CREATION 
>   docs/reference/configuration.md d4b869b938105ba301fc88d41019af2f1707f6f4 
>   docs/reference/scheduler-configuration.md 
> a659cfac974059b04ef5593286011decbb7f9110 
>   src/main/java/org/apache/aurora/scheduler/app/AppModule.java 
> ffc07443fae9e5216a5333ae305f75aa9b452a0c 
>   src/main/java/org/apache/aurora/scheduler/config/CliOptions.java 
> a2fb0393ba47e876c4c8c63e3ed27ebe42cb6ca3 
>   
> src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
>  4073229b74d0e0e7fd31552bd96894ceb8a0971a 
>   
> src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 
> 3b4df55a05873e79aae206b117cbc753fa3abb94 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java 
> 25ed474289f369e74c24e999ad97ed6810c9fd5e 
>   src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java 
> f58c66aaebe8d31913d67a05add0f3d6054e88d1 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> 0e0f90b670bbbcd6cb3aa302ce4a9abfe70ea979 
>   src/main/java/org/apache/aurora/scheduler/storage/HostMaintenanceStore.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/Storage.java 
> da5534f886e032ca5a182f3704aa335ff680b258 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java
>  f1fdc275d3958a36bbe79110d70dfeba640a948a 
>   

Re: Review Request 66716: Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-17 Thread Santhosh Kumar Shanmugham


> On May 16, 2018, 2:35 p.m., Stephan Erb wrote:
> > I have done a first quick pass. I will have a second closer look once the 
> > storage patch has landed.

Thanks for the review. Much appreciated.


> On May 16, 2018, 2:35 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
> > Lines 74 (patched)
> > 
> >
> > I think we should make this value configurable for operators.

Done.


> On May 16, 2018, 2:35 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
> > Lines 248-259 (patched)
> > 
> >
> > Both this validation and the other `instance`-based validation below 
> > have a conceptual problem: Users can freely adjust the number of available 
> > instances via `kill` and `addInstance`. This implies that the policy can 
> > become invalid long after we have checked it here.
> > 
> > I think the most robust way would be if we just print warnings in the 
> > client, and change the scheduler so that it can ignore invalid policies, 
> > similar to how it ignores them after a timeout.

Good catch. I will add the warning and the validation check.


> On May 16, 2018, 2:35 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
> > Lines 276 (patched)
> > 
> >
> > `"CountSlaPolicy: count=5 must be less than 3"` can be hard to 
> > understand as it lacks context. 
> > 
> > If you add that the second number refers to the number of instances, it 
> > it will be easier to users to reason about the error.

Done.


> On May 16, 2018, 2:35 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java
> > Lines 280 (patched)
> > 
> >
> > Would you be open to the idea of passing the host and (health?) port of 
> > the instance here as well? I have a usecase in mind that would be 
> > simplified by this quite a bit as I can query a running instance for 
> > additional information without having to query the the service discover ZK 
> > first.
> > 
> > In addition, have you considered just passing each information 
> > individually (job, environment, role,...) rather than as a joined string? 
> > Many usescases will probably require string splitting within the 
> > coordinator otherwise.

Adding the entire `ScheduledTask` into the body in JSON format, similar to 
`WebHooks` (but unlike that we will use `TSerializer` to cleanly serialize to 
JSON).


> On May 16, 2018, 2:35 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java
> > Lines 303 (patched)
> > 
> >
> > What usecase do you have in mind for the statuskey?
> > 
> > I am mostly wondering if a protocol purely based on HTTP status codes 
> > would be sufficient: e.g. `200 OK` means ready to drain and `428 
> > PRECONDITION REQUIRED` asks us to come back again later.

I foresee this becoming a for full-fledged protocol with its own library in 
future depending on the expectations from more stateful services. Some of the 
possible usecases are "come back after x mins from now", "add instance before 
killing".


> On May 16, 2018, 2:35 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java
> > Lines 384 (patched)
> > 
> >
> > isProduction is deprecated. You will need to check the appropriate tier 
> > config here.

Done.


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/#review203276
---


On May 17, 2018, 7:49 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66716/
> ---
> 
> (Updated May 17, 2018, 7:49 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> `Tasks` can specify custom SLA requirements as part of
> their `TaskConfig`. One of the new features is the ability
> to specify an external coordinator that can ACK/NACK
> maintenance requests for tasks. This will be hugely
> beneficial for onboarding services that cannot satisfactorily
> specify SLA in terms of running instances.
> 
> Maintenance requests are driven from 

Re: Review Request 66716: Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-17 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/
---

(Updated May 17, 2018, 7:49 p.m.)


Review request for Aurora, David McLaughlin, Jordan Ly, and Stephan Erb.


Changes
---

Addressed Stephan's comments.


Repository: aurora


Description
---

`Tasks` can specify custom SLA requirements as part of
their `TaskConfig`. One of the new features is the ability
to specify an external coordinator that can ACK/NACK
maintenance requests for tasks. This will be hugely
beneficial for onboarding services that cannot satisfactorily
specify SLA in terms of running instances.

Maintenance requests are driven from the Scheduler to
improve management of nodes in the cluster.


Diffs (updated)
-

  RELEASE-NOTES.md 5e1f9940a7974e212140b7e5304695afa7f96e78 
  api/src/main/thrift/org/apache/aurora/gen/api.thrift 
ef754e32172e7490a47a13e7b526f243ffa3efeb 
  api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
b79e2045ccda05d5058565f81988dfe33feea8f1 
  docs/README.md 166bf1ce240474f0a181e023439cfbfbe7363822 
  docs/features/sla-requirements.md PRE-CREATION 
  docs/reference/configuration.md d4b869b938105ba301fc88d41019af2f1707f6f4 
  docs/reference/scheduler-configuration.md 
a659cfac974059b04ef5593286011decbb7f9110 
  src/main/java/org/apache/aurora/scheduler/app/AppModule.java 
ffc07443fae9e5216a5333ae305f75aa9b452a0c 
  src/main/java/org/apache/aurora/scheduler/config/CliOptions.java 
a2fb0393ba47e876c4c8c63e3ed27ebe42cb6ca3 
  
src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
 4073229b74d0e0e7fd31552bd96894ceb8a0971a 
  src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java 
PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 
3b4df55a05873e79aae206b117cbc753fa3abb94 
  src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java 
25ed474289f369e74c24e999ad97ed6810c9fd5e 
  src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java 
f58c66aaebe8d31913d67a05add0f3d6054e88d1 
  src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
0e0f90b670bbbcd6cb3aa302ce4a9abfe70ea979 
  src/main/java/org/apache/aurora/scheduler/storage/HostMaintenanceStore.java 
PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/storage/Storage.java 
da5534f886e032ca5a182f3704aa335ff680b258 
  
src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java
 f1fdc275d3958a36bbe79110d70dfeba640a948a 
  src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java 
10864f122eff5027c88d835baae6de483d960218 
  
src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java 
8d70cae35289a9e36142bab288cf0c9398ebd2d4 
  src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotService.java 
b30de881eafa3226fdc32383b4e9bfd33ca912a5 
  src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotterImpl.java 
4b52be02001e704f4b1a5f447226ac8c2386e3fd 
  
src/main/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStore.java
 PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorage.java 
9f324b010db7e351e98b257d8fc8fecfeac81268 
  src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorageModule.java 
edcea09b4d206cfddb642074237b031ad71cff13 
  src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImpl.java 
e88cad6cf12312512e6840329db7ca7134ceaae6 
  
src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java 
9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b 
  src/main/python/apache/aurora/admin/admin_util.py 
8240e8093160623b4c30dd212a88b8e122fd9856 
  src/main/python/apache/aurora/admin/host_maintenance.py 
83fc2b6ece40d3436cc7de7a034f95224235fcfd 
  src/main/python/apache/aurora/admin/maintenance.py 
942a237f47a6e0416bbaf244278685477e0f407d 
  src/main/python/apache/aurora/client/api/__init__.py 
f6fd1dd6d7c2bdd5bca3037f501b36badab78c75 
  src/main/python/apache/aurora/client/cli/context.py 
06b194114a7f44a61943e0932973e71b53f239b4 
  src/main/python/apache/aurora/client/cli/jobs.py 
536d04a21d32c4e586dc943a6f9b0ad0143354a3 
  src/main/python/apache/aurora/config/schema/base.py 
a629bcd1261e5959da0a8458a55545d4e2c2a7a5 
  src/main/python/apache/aurora/config/thrift.py 
6d2dde6e964daa68bf6f0e5bbbffecc5bd8c0431 
  src/main/python/apache/aurora/executor/executor_vars.py 
561f9452aedda4cc695c84a2a850bdd7e1d65dec 
  src/test/java/org/apache/aurora/scheduler/app/SchedulerIT.java 
63c338e5bbdf60de0fba8d68c6613904abb93fa8 
  src/test/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
778148a7c033cba9004954cabc33a2b1d003dccf 
  src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
e66ec116112df164106598d9ff0bc9e8f465e44f 
  

Re: Review Request 67141: Introduce structs to enable specifying custom SLA.

2018-05-17 Thread Renan DelValle

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67141/#review203355
---


Ship it!




Ship It!

- Renan DelValle


On May 16, 2018, 7:07 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67141/
> ---
> 
> (Updated May 16, 2018, 7:07 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, Renan DelValle, and 
> Stephan Erb.
> 
> 
> Bugs: AURORA-1977
> https://issues.apache.org/jira/browse/AURORA-1977
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Add `SlaPolicy` and `HostMaintenanceRequest` structs
> to the thrift definition and introduce a new `HostMaintenanceStore`
> for tracking maintenance requests. These changes will be used in
> https://reviews.apache.org/r/66716 for implementing custom SLA
> and scheduler driven maintenance.
> 
> This RB splits the storage related changes from 
> https://reviews.apache.org/r/66716
> for better rollback story.
> 
> Tested rollback on the vagrant.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
> b79e2045ccda05d5058565f81988dfe33feea8f1 
>   src/main/java/org/apache/aurora/scheduler/storage/HostMaintenanceStore.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/Storage.java 
> da5534f886e032ca5a182f3704aa335ff680b258 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java
>  f1fdc275d3958a36bbe79110d70dfeba640a948a 
>   src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java 
> 10864f122eff5027c88d835baae6de483d960218 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java
>  8d70cae35289a9e36142bab288cf0c9398ebd2d4 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotService.java 
> b30de881eafa3226fdc32383b4e9bfd33ca912a5 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotterImpl.java 
> 4b52be02001e704f4b1a5f447226ac8c2386e3fd 
>   
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStore.java
>  PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorage.java 
> 9f324b010db7e351e98b257d8fc8fecfeac81268 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorageModule.java 
> edcea09b4d206cfddb642074237b031ad71cff13 
>   src/main/python/apache/aurora/config/schema/base.py 
> a629bcd1261e5959da0a8458a55545d4e2c2a7a5 
>   src/main/python/apache/aurora/config/thrift.py 
> 6d2dde6e964daa68bf6f0e5bbbffecc5bd8c0431 
>   src/main/python/apache/aurora/executor/executor_vars.py 
> 561f9452aedda4cc695c84a2a850bdd7e1d65dec 
>   src/test/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 778148a7c033cba9004954cabc33a2b1d003dccf 
>   
> src/test/java/org/apache/aurora/scheduler/storage/AbstractHostMaintenanceStoreTest.java
>  PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/storage/backup/RecoveryTest.java 
> ba03ff94bb5fee2b09a6660a9ad759cece7449f1 
>   
> src/test/java/org/apache/aurora/scheduler/storage/durability/DataCompatibilityTest.java
>  31f9545d83a950064df646ef6ba8a95234cf89ec 
>   
> src/test/java/org/apache/aurora/scheduler/storage/durability/DurableStorageTest.java
>  3dd9ce4039b223cb6156462d089f7062a1cde772 
>   
> src/test/java/org/apache/aurora/scheduler/storage/durability/WriteRecorderTest.java
>  27c8c829cd1e417dd5e60a8e9415331ca4a7c918 
>   
> src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotterImplIT.java 
> be07361a27afefa21cc2ba76ce82531a418d9814 
>   
> src/test/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStoreTest.java
>  PRE-CREATION 
>   
> src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java
>  d59118be13342da9003b0bcb97e12e477d9edf8f 
>   src/test/python/apache/aurora/client/cli/test_inspect.py 
> e4f43d0573c7862adc9bc679f4cea40cc76eac38 
>   src/test/python/apache/aurora/config/test_thrift.py 
> 8e1d0e177959af12b97bdd1cd47845b72bc12fe1 
>   
> src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/removeHostMaintenanceRequest
>  PRE-CREATION 
>   
> src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveCronJob
>  88e1c36a1aa2d192b95963f7aa36e243a447e4af 
>   
> src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveHostMaintenanceRequest
>  PRE-CREATION 
>   
> src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveJobUpdate
>  32fdcdacde58345cdd6c4b449b82c0c90c2b2aae 
>   
>