Re: Review Request 51536: @ReviewBot retry Scheduler updater will not use watch_sec if health check is enabled

2016-08-30 Thread Kai Huang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51536/
---

(Updated Aug. 30, 2016, 8:37 p.m.)


Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.


Changes
---

@ReviewBot retry


Summary (updated)
-

@ReviewBot retry Scheduler updater will not use watch_sec if health check is 
enabled


Repository: aurora


Description
---

- Scheduler updater will not use watch_sec if health check is enabled
- Add unit tests for successful updates.

Assumptions on executor change:
The executor starts health check at STARTING, if a successful health check is 
performed before initial_interval_sec expires, the executor will sends a status 
message for RUNNING.

What I try to implement:
vCurrent updater will infer the executor version from the presence of a status 
message sent by the executor during RUNNING status update. 

For vPrev executor:
The vCurrent updater will use watch_sec for instance update, this is 
independent from the health check.
For vCurrent executor:
If health check is disabled on vCurrent executor, the vCurrent updater will 
use watch_sec for instance update.
If health check is enabled on vCurrent executor, the vCurrent updater will 
not use watch_sec for instance update.


Diffs (updated)
-

  src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
c129896d8cd54abd2634e2a339c27921042b0162 
  src/test/java/org/apache/aurora/scheduler/updater/InstanceUpdaterTest.java 
c78c7fbd7d600586136863c99ce3d7387895efee 

Diff: https://reviews.apache.org/r/51536/diff/


Testing (updated)
---

./gradlew build

./gradlew :test --tests 
"org.apache.aurora.scheduler.updater.InstanceUpdaterTest"

./build-support/jenkins/build.sh


Thanks,

Kai Huang



Re: Review Request 51536: @ReviewBot retry Scheduler updater will not use watch_sec if health check is enabled

2016-08-30 Thread Kai Huang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51536/
---

(Updated Aug. 30, 2016, 8:52 p.m.)


Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.


Changes
---

Update the description of the review request, provide the JIRA link and design 
doc.


Repository: aurora


Description (updated)
---

- Scheduler updater will not use watch_sec if health check is enabled
- Add unit tests for successful updates.

This feature intends to improve reliability and performance of the Aurora 
scheduler job updater by relying on health check status rather than watch_secs 
timeout when deciding an individual instance update state. 

See this epic: https://issues.apache.org/jira/browse/AURORA-894 
and the design doc: 
https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
 for more details and background.


Assumptions on executor change:
The executor starts health check at STARTING, if a successful health check is 
performed before initial_interval_sec expires, the executor will sends a status 
message for RUNNING.

What I try to implement:
vCurrent updater will infer the executor version from the presence of a status 
message sent by the executor during RUNNING status update. 

For vPrev executor:
The vCurrent updater will use watch_sec for instance update, this is 
independent from the health check.
For vCurrent executor:
If health check is disabled on vCurrent executor, the vCurrent updater will 
use watch_sec for instance update.
If health check is enabled on vCurrent executor, the vCurrent updater will 
not use watch_sec for instance update.


Diffs
-

  src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
c129896d8cd54abd2634e2a339c27921042b0162 
  src/test/java/org/apache/aurora/scheduler/updater/InstanceUpdaterTest.java 
c78c7fbd7d600586136863c99ce3d7387895efee 

Diff: https://reviews.apache.org/r/51536/diff/


Testing
---

./gradlew build

./gradlew :test --tests 
"org.apache.aurora.scheduler.updater.InstanceUpdaterTest"

./build-support/jenkins/build.sh


Thanks,

Kai Huang



Re: Review Request 51536: @ReviewBot retry Scheduler updater will not use watch_sec if health check is enabled

2016-08-30 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51536/#review147359
---


Ship it!




Master (c99f2fb) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Aug. 30, 2016, 8:37 p.m., Kai Huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51536/
> ---
> 
> (Updated Aug. 30, 2016, 8:37 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> - Scheduler updater will not use watch_sec if health check is enabled
> - Add unit tests for successful updates.
> 
> Assumptions on executor change:
> The executor starts health check at STARTING, if a successful health check is 
> performed before initial_interval_sec expires, the executor will sends a 
> status message for RUNNING.
> 
> What I try to implement:
> vCurrent updater will infer the executor version from the presence of a 
> status message sent by the executor during RUNNING status update. 
> 
> For vPrev executor:
> The vCurrent updater will use watch_sec for instance update, this is 
> independent from the health check.
> For vCurrent executor:
> If health check is disabled on vCurrent executor, the vCurrent updater 
> will use watch_sec for instance update.
> If health check is enabled on vCurrent executor, the vCurrent updater 
> will not use watch_sec for instance update.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
> c129896d8cd54abd2634e2a339c27921042b0162 
>   src/test/java/org/apache/aurora/scheduler/updater/InstanceUpdaterTest.java 
> c78c7fbd7d600586136863c99ce3d7387895efee 
> 
> Diff: https://reviews.apache.org/r/51536/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build
> 
> ./gradlew :test --tests 
> "org.apache.aurora.scheduler.updater.InstanceUpdaterTest"
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Kai Huang
> 
>



Re: Review Request 51536: @ReviewBot retry Scheduler updater will not use watch_sec if health check is enabled

2016-08-30 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51536/#review147358
---




src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java (lines 
125 - 135)


This could be rewritten a bit more concisely as:

return actualState != null && 
actualState.getTaskEvents().stream().filter(event ->
  event.getStatus() == ScheduleStatus.RUNNING && event.isSetMessage()
).findFirst().isPresent();



src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java (line 
130)


Doing this based simply on the presence of a string feels brittle to me. 
Would it make sense to define some basic form of "executor capabilities" 
passing? Even if it's something simple like a comma delimited list of constants 
that we define in api.thrift (so that both the scheduler and the executor can 
reuse the values)?



src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java (line 
143)


We should add a constant to api.thrift for `health` and use it here as well 
as in the executor.



src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java (lines 
156 - 157)


formatting for this should be:

if (isWatchRunningSkippable(actualState)
&& isHealthCheckEnabled(desiredState.get())
|| appearsStable(actualState)) {
  // blank line
  ...block begins
}



src/test/java/org/apache/aurora/scheduler/updater/InstanceUpdaterTest.java 
(lines 71 - 73)


I'm confused by this comment in the context of this review. 
`initial_interval_secs` is strictly an executor concern...



src/test/java/org/apache/aurora/scheduler/updater/InstanceUpdaterTest.java 
(line 176)


I'd avoid terms like "new executor", as "new" tends to lose meaning over 
time.



src/test/java/org/apache/aurora/scheduler/updater/InstanceUpdaterTest.java 
(lines 191 - 201)


Beyond setting things up to use the new style executor, I don't think this 
test actually verifies anything due to the fact that `INITIAL_INTERVAL_SECOND` 
has the same value as `MIN_RUNNING_TIME`. Ideally we'd advance the time by some 
value less than `MIN_RUNNING_TIME` to prove that we're not waiting that long 
before declaring the task running?


- Joshua Cohen


On Aug. 30, 2016, 8:52 p.m., Kai Huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51536/
> ---
> 
> (Updated Aug. 30, 2016, 8:52 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> - Scheduler updater will not use watch_sec if health check is enabled
> - Add unit tests for successful updates.
> 
> This feature intends to improve reliability and performance of the Aurora 
> scheduler job updater by relying on health check status rather than 
> watch_secs timeout when deciding an individual instance update state. 
> 
> See this epic: https://issues.apache.org/jira/browse/AURORA-894 
> and the design doc: 
> https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
>  for more details and background.
> 
> 
> Assumptions on executor change:
> The executor starts health check at STARTING, if a successful health check is 
> performed before initial_interval_sec expires, the executor will sends a 
> status message for RUNNING.
> 
> What I try to implement:
> vCurrent updater will infer the executor version from the presence of a 
> status message sent by the executor during RUNNING status update. 
> 
> For vPrev executor:
> The vCurrent updater will use watch_sec for instance update, this is 
> independent from the health check.
> For vCurrent executor:
> If health check is disabled on vCurrent executor, the vCurrent updater 
> will use watch_sec for instance update.
> If health check is enabled on vCurrent executor, the vCurrent updater 
> will not use watch_sec for instance update.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
> c129896d8cd54abd2634e2a339c27921042b0162 
>   src/test/java/org/apache/aurora/scheduler/updater/InstanceUpdaterTest.java 
> c78c7fbd7d600586136863c99ce3d7387895efee 
> 
> Diff: https://reviews.apache.org/r/51536/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build
> 
> ./gradlew :test --tests 
> "org.apac

Re: Review Request 51536: @ReviewBot retry Scheduler updater will not use watch_sec if health check is enabled

2016-08-30 Thread Kai Huang


> On Aug. 30, 2016, 9:25 p.m., Joshua Cohen wrote:
> > src/test/java/org/apache/aurora/scheduler/updater/InstanceUpdaterTest.java, 
> > lines 191-201
> > 
> >
> > Beyond setting things up to use the new style executor, I don't think 
> > this test actually verifies anything due to the fact that 
> > `INITIAL_INTERVAL_SECOND` has the same value as `MIN_RUNNING_TIME`. Ideally 
> > we'd advance the time by some value less than `MIN_RUNNING_TIME` to prove 
> > that we're not waiting that long before declaring the task running?

I agree that making initial_interval_sec shorter than min_running_time makes 
more sense.

In this test, I'm also testing that:
   A RUNNING status update will causes the updater transit into SUCCEEDED 
state, rather than EVALUATE_ON_STATE_CHANGE.


- Kai


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51536/#review147358
---


On Aug. 30, 2016, 8:52 p.m., Kai Huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51536/
> ---
> 
> (Updated Aug. 30, 2016, 8:52 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> - Scheduler updater will not use watch_sec if health check is enabled
> - Add unit tests for successful updates.
> 
> This feature intends to improve reliability and performance of the Aurora 
> scheduler job updater by relying on health check status rather than 
> watch_secs timeout when deciding an individual instance update state. 
> 
> See this epic: https://issues.apache.org/jira/browse/AURORA-894 
> and the design doc: 
> https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
>  for more details and background.
> 
> 
> Assumptions on executor change:
> The executor starts health check at STARTING, if a successful health check is 
> performed before initial_interval_sec expires, the executor will sends a 
> status message for RUNNING.
> 
> What I try to implement:
> vCurrent updater will infer the executor version from the presence of a 
> status message sent by the executor during RUNNING status update. 
> 
> For vPrev executor:
> The vCurrent updater will use watch_sec for instance update, this is 
> independent from the health check.
> For vCurrent executor:
> If health check is disabled on vCurrent executor, the vCurrent updater 
> will use watch_sec for instance update.
> If health check is enabled on vCurrent executor, the vCurrent updater 
> will not use watch_sec for instance update.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
> c129896d8cd54abd2634e2a339c27921042b0162 
>   src/test/java/org/apache/aurora/scheduler/updater/InstanceUpdaterTest.java 
> c78c7fbd7d600586136863c99ce3d7387895efee 
> 
> Diff: https://reviews.apache.org/r/51536/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build
> 
> ./gradlew :test --tests 
> "org.apache.aurora.scheduler.updater.InstanceUpdaterTest"
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Kai Huang
> 
>



Re: Review Request 51536: @ReviewBot retry Scheduler updater will not use watch_sec if health check is enabled

2016-08-30 Thread Stephan Erb


> On Aug. 30, 2016, 11:25 p.m., Joshua Cohen wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java, 
> > line 143
> > 
> >
> > We should add a constant to api.thrift for `health` and use it here as 
> > well as in the executor.

We need to look for a more general solution here. The presence of a health port 
does not account for the potential existence of a `.healthchecksnooze` file 
disabling health checks.


- Stephan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51536/#review147358
---


On Aug. 30, 2016, 10:52 p.m., Kai Huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51536/
> ---
> 
> (Updated Aug. 30, 2016, 10:52 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> - Scheduler updater will not use watch_sec if health check is enabled
> - Add unit tests for successful updates.
> 
> This feature intends to improve reliability and performance of the Aurora 
> scheduler job updater by relying on health check status rather than 
> watch_secs timeout when deciding an individual instance update state. 
> 
> See this epic: https://issues.apache.org/jira/browse/AURORA-894 
> and the design doc: 
> https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
>  for more details and background.
> 
> 
> Assumptions on executor change:
> The executor starts health check at STARTING, if a successful health check is 
> performed before initial_interval_sec expires, the executor will sends a 
> status message for RUNNING.
> 
> What I try to implement:
> vCurrent updater will infer the executor version from the presence of a 
> status message sent by the executor during RUNNING status update. 
> 
> For vPrev executor:
> The vCurrent updater will use watch_sec for instance update, this is 
> independent from the health check.
> For vCurrent executor:
> If health check is disabled on vCurrent executor, the vCurrent updater 
> will use watch_sec for instance update.
> If health check is enabled on vCurrent executor, the vCurrent updater 
> will not use watch_sec for instance update.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
> c129896d8cd54abd2634e2a339c27921042b0162 
>   src/test/java/org/apache/aurora/scheduler/updater/InstanceUpdaterTest.java 
> c78c7fbd7d600586136863c99ce3d7387895efee 
> 
> Diff: https://reviews.apache.org/r/51536/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build
> 
> ./gradlew :test --tests 
> "org.apache.aurora.scheduler.updater.InstanceUpdaterTest"
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Kai Huang
> 
>



Re: Review Request 51536: @ReviewBot retry Scheduler updater will not use watch_sec if health check is enabled

2016-08-30 Thread Kai Huang


> On Aug. 30, 2016, 9:25 p.m., Joshua Cohen wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java, 
> > line 143
> > 
> >
> > We should add a constant to api.thrift for `health` and use it here as 
> > well as in the executor.
> 
> Stephan Erb wrote:
> We need to look for a more general solution here. The presence of a 
> health port does not account for the potential existence of a 
> `.healthchecksnooze` file disabling health checks.

Thanks, that's a good point. We could combine this logic into "executor 
capabilities" as Joshua mentioned.


- Kai


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51536/#review147358
---


On Aug. 30, 2016, 8:52 p.m., Kai Huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51536/
> ---
> 
> (Updated Aug. 30, 2016, 8:52 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> - Scheduler updater will not use watch_sec if health check is enabled
> - Add unit tests for successful updates.
> 
> This feature intends to improve reliability and performance of the Aurora 
> scheduler job updater by relying on health check status rather than 
> watch_secs timeout when deciding an individual instance update state. 
> 
> See this epic: https://issues.apache.org/jira/browse/AURORA-894 
> and the design doc: 
> https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
>  for more details and background.
> 
> 
> Assumptions on executor change:
> The executor starts health check at STARTING, if a successful health check is 
> performed before initial_interval_sec expires, the executor will sends a 
> status message for RUNNING.
> 
> What I try to implement:
> vCurrent updater will infer the executor version from the presence of a 
> status message sent by the executor during RUNNING status update. 
> 
> For vPrev executor:
> The vCurrent updater will use watch_sec for instance update, this is 
> independent from the health check.
> For vCurrent executor:
> If health check is disabled on vCurrent executor, the vCurrent updater 
> will use watch_sec for instance update.
> If health check is enabled on vCurrent executor, the vCurrent updater 
> will not use watch_sec for instance update.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
> c129896d8cd54abd2634e2a339c27921042b0162 
>   src/test/java/org/apache/aurora/scheduler/updater/InstanceUpdaterTest.java 
> c78c7fbd7d600586136863c99ce3d7387895efee 
> 
> Diff: https://reviews.apache.org/r/51536/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build
> 
> ./gradlew :test --tests 
> "org.apache.aurora.scheduler.updater.InstanceUpdaterTest"
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Kai Huang
> 
>



Re: Review Request 51536: @ReviewBot retry Scheduler updater will not use watch_sec if health check is enabled

2016-08-31 Thread Kai Huang


> On Aug. 30, 2016, 9:25 p.m., Joshua Cohen wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java, 
> > line 130
> > 
> >
> > Doing this based simply on the presence of a string feels brittle to 
> > me. Would it make sense to define some basic form of "executor 
> > capabilities" passing? Even if it's something simple like a comma delimited 
> > list of constants that we define in api.thrift (so that both the scheduler 
> > and the executor can reuse the values)?

Do you mean adding executor capabilities as one attribute of TaskEvent in 
api.thrift? 

I think on the executor side , we can put (isHealthCheckEnabled() && 
isExecutorUpdated()) into one attribute called "watch_sec_skippable", and send 
it to the scheduler.


- Kai


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51536/#review147358
---


On Aug. 30, 2016, 8:52 p.m., Kai Huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51536/
> ---
> 
> (Updated Aug. 30, 2016, 8:52 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> - Scheduler updater will not use watch_sec if health check is enabled
> - Add unit tests for successful updates.
> 
> This feature intends to improve reliability and performance of the Aurora 
> scheduler job updater by relying on health check status rather than 
> watch_secs timeout when deciding an individual instance update state. 
> 
> See this epic: https://issues.apache.org/jira/browse/AURORA-894 
> and the design doc: 
> https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
>  for more details and background.
> 
> 
> Assumptions on executor change:
> The executor starts health check at STARTING, if a successful health check is 
> performed before initial_interval_sec expires, the executor will sends a 
> status message for RUNNING.
> 
> What I try to implement:
> vCurrent updater will infer the executor version from the presence of a 
> status message sent by the executor during RUNNING status update. 
> 
> For vPrev executor:
> The vCurrent updater will use watch_sec for instance update, this is 
> independent from the health check.
> For vCurrent executor:
> If health check is disabled on vCurrent executor, the vCurrent updater 
> will use watch_sec for instance update.
> If health check is enabled on vCurrent executor, the vCurrent updater 
> will not use watch_sec for instance update.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
> c129896d8cd54abd2634e2a339c27921042b0162 
>   src/test/java/org/apache/aurora/scheduler/updater/InstanceUpdaterTest.java 
> c78c7fbd7d600586136863c99ce3d7387895efee 
> 
> Diff: https://reviews.apache.org/r/51536/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build
> 
> ./gradlew :test --tests 
> "org.apache.aurora.scheduler.updater.InstanceUpdaterTest"
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Kai Huang
> 
>



Re: Review Request 51536: @ReviewBot retry Scheduler updater will not use watch_sec if health check is enabled

2016-08-31 Thread Zameer Manji


> On Aug. 30, 2016, 2:25 p.m., Joshua Cohen wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java, 
> > line 130
> > 
> >
> > Doing this based simply on the presence of a string feels brittle to 
> > me. Would it make sense to define some basic form of "executor 
> > capabilities" passing? Even if it's something simple like a comma delimited 
> > list of constants that we define in api.thrift (so that both the scheduler 
> > and the executor can reuse the values)?
> 
> Kai Huang wrote:
> Do you mean adding executor capabilities as one attribute of TaskEvent in 
> api.thrift? 
> 
> I think on the executor side , we can put (isHealthCheckEnabled() && 
> isExecutorUpdated()) into one attribute called "watch_sec_skippable", and 
> send it to the scheduler.

"I think on the executor side , we can put (isHealthCheckEnabled() && 
isExecutorUpdated()) into one attribute called "watch_sec_skippable", and send 
it to the scheduler."

We currently have no communication via the executor and scheduler outside of 
status updates. If you want to do this, please discuss this idea on the mailing 
list. The design doc makes no reference to executor capabilities or any 
communication between the executor and scheduler.


- Zameer


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51536/#review147358
---


On Aug. 30, 2016, 1:52 p.m., Kai Huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51536/
> ---
> 
> (Updated Aug. 30, 2016, 1:52 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> - Scheduler updater will not use watch_sec if health check is enabled
> - Add unit tests for successful updates.
> 
> This feature intends to improve reliability and performance of the Aurora 
> scheduler job updater by relying on health check status rather than 
> watch_secs timeout when deciding an individual instance update state. 
> 
> See this epic: https://issues.apache.org/jira/browse/AURORA-894 
> and the design doc: 
> https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
>  for more details and background.
> 
> 
> Assumptions on executor change:
> The executor starts health check at STARTING, if a successful health check is 
> performed before initial_interval_sec expires, the executor will sends a 
> status message for RUNNING.
> 
> What I try to implement:
> vCurrent updater will infer the executor version from the presence of a 
> status message sent by the executor during RUNNING status update. 
> 
> For vPrev executor:
> The vCurrent updater will use watch_sec for instance update, this is 
> independent from the health check.
> For vCurrent executor:
> If health check is disabled on vCurrent executor, the vCurrent updater 
> will use watch_sec for instance update.
> If health check is enabled on vCurrent executor, the vCurrent updater 
> will not use watch_sec for instance update.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
> c129896d8cd54abd2634e2a339c27921042b0162 
>   src/test/java/org/apache/aurora/scheduler/updater/InstanceUpdaterTest.java 
> c78c7fbd7d600586136863c99ce3d7387895efee 
> 
> Diff: https://reviews.apache.org/r/51536/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build
> 
> ./gradlew :test --tests 
> "org.apache.aurora.scheduler.updater.InstanceUpdaterTest"
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Kai Huang
> 
>



Re: Review Request 51536: @ReviewBot retry Scheduler updater will not use watch_sec if health check is enabled

2016-08-31 Thread Maxim Khutornenko


> On Aug. 30, 2016, 9:25 p.m., Joshua Cohen wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java, 
> > line 143
> > 
> >
> > We should add a constant to api.thrift for `health` and use it here as 
> > well as in the executor.
> 
> Stephan Erb wrote:
> We need to look for a more general solution here. The presence of a 
> health port does not account for the potential existence of a 
> `.healthchecksnooze` file disabling health checks.
> 
> Kai Huang wrote:
> Thanks, that's a good point. We could combine this logic into "executor 
> capabilities" as Joshua mentioned.

I disagree that we should account for the `.healthchecksnooze` for this 
feature. This approach has always been considered extremely risky and "proceed 
at your own risk" way to debug end user issues. If someone drops a 
`.healthchecksnooze` _after_ task entring `STARTING` and _before_ it reaches 
`RUNNING` I believe the following should be enough to address such case (from 
design doc):

> 2. Instance fails to respond OK and the initial_interval_secs expires ? task 
> moves into FAILED.


> On Aug. 30, 2016, 9:25 p.m., Joshua Cohen wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java, 
> > line 130
> > 
> >
> > Doing this based simply on the presence of a string feels brittle to 
> > me. Would it make sense to define some basic form of "executor 
> > capabilities" passing? Even if it's something simple like a comma delimited 
> > list of constants that we define in api.thrift (so that both the scheduler 
> > and the executor can reuse the values)?
> 
> Kai Huang wrote:
> Do you mean adding executor capabilities as one attribute of TaskEvent in 
> api.thrift? 
> 
> I think on the executor side , we can put (isHealthCheckEnabled() && 
> isExecutorUpdated()) into one attribute called "watch_sec_skippable", and 
> send it to the scheduler.
> 
> Zameer Manji wrote:
> "I think on the executor side , we can put (isHealthCheckEnabled() && 
> isExecutorUpdated()) into one attribute called "watch_sec_skippable", and 
> send it to the scheduler."
> 
> We currently have no communication via the executor and scheduler outside 
> of status updates. If you want to do this, please discuss this idea on the 
> mailing list. The design doc makes no reference to executor capabilities or 
> any communication between the executor and scheduler.

Joshua, how did you see the "executor capabilities" working here? As Zameer 
pointed out, there is currently no way for us to communicate real-time executor 
capabilities to the scheduler. I don't think adding this type of info to every 
status update justifies the amount of work and additional storage space that 
would require. Besides, we'd have to deprecate this capability and remove from 
the schema in the next release or two. 

I agree relying on the presence of the message may appear brittle at first but 
as far as I can see it's the most straightforward approach that will not 
require any special handling or deprecation followups. We are in full control 
of the executor and it's guaranteed through code the message will remain absent 
for any V-n versions running out there.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51536/#review147358
---


On Aug. 30, 2016, 8:52 p.m., Kai Huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51536/
> ---
> 
> (Updated Aug. 30, 2016, 8:52 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> - Scheduler updater will not use watch_sec if health check is enabled
> - Add unit tests for successful updates.
> 
> This feature intends to improve reliability and performance of the Aurora 
> scheduler job updater by relying on health check status rather than 
> watch_secs timeout when deciding an individual instance update state. 
> 
> See this epic: https://issues.apache.org/jira/browse/AURORA-894 
> and the design doc: 
> https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
>  for more details and background.
> 
> 
> Assumptions on executor change:
> The executor starts health check at STARTING, if a successful health check is 
> performed before initial_interval_sec expires, the executor will sends a 
> status message for RUNNING.
> 
> What I try to implement:
> vCurrent updater will infer the executor version from the presence of a 
> status message sent by the executor

Re: Review Request 51536: @ReviewBot retry Scheduler updater will not use watch_sec if health check is enabled

2016-08-31 Thread Kai Huang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51536/
---

(Updated Aug. 31, 2016, 10:08 p.m.)


Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.


Changes
---

Add a constant to api.thrift for health.

Rename a couple of method names in InstanceUpdaterTest.

Some style improvements.


Repository: aurora


Description
---

- Scheduler updater will not use watch_sec if health check is enabled
- Add unit tests for successful updates.

This feature intends to improve reliability and performance of the Aurora 
scheduler job updater by relying on health check status rather than watch_secs 
timeout when deciding an individual instance update state. 

See this epic: https://issues.apache.org/jira/browse/AURORA-894 
and the design doc: 
https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
 for more details and background.


Assumptions on executor change:
The executor starts health check at STARTING, if a successful health check is 
performed before initial_interval_sec expires, the executor will sends a status 
message for RUNNING.

What I try to implement:
vCurrent updater will infer the executor version from the presence of a status 
message sent by the executor during RUNNING status update. 

For vPrev executor:
The vCurrent updater will use watch_sec for instance update, this is 
independent from the health check.
For vCurrent executor:
If health check is disabled on vCurrent executor, the vCurrent updater will 
use watch_sec for instance update.
If health check is enabled on vCurrent executor, the vCurrent updater will 
not use watch_sec for instance update.


Diffs (updated)
-

  api/src/main/thrift/org/apache/aurora/gen/api.thrift 
c5765b70501c101f0535b4eed94e9948c36808f9 
  src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
c129896d8cd54abd2634e2a339c27921042b0162 
  src/test/java/org/apache/aurora/scheduler/updater/InstanceUpdaterTest.java 
c78c7fbd7d600586136863c99ce3d7387895efee 

Diff: https://reviews.apache.org/r/51536/diff/


Testing
---

./gradlew build

./gradlew :test --tests 
"org.apache.aurora.scheduler.updater.InstanceUpdaterTest"

./build-support/jenkins/build.sh


Thanks,

Kai Huang



Re: Review Request 51536: @ReviewBot retry Scheduler updater will not use watch_sec if health check is enabled

2016-08-31 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51536/#review147498
---


Ship it!




Master (bd11b1c) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Aug. 31, 2016, 10:08 p.m., Kai Huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51536/
> ---
> 
> (Updated Aug. 31, 2016, 10:08 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> - Scheduler updater will not use watch_sec if health check is enabled
> - Add unit tests for successful updates.
> 
> This feature intends to improve reliability and performance of the Aurora 
> scheduler job updater by relying on health check status rather than 
> watch_secs timeout when deciding an individual instance update state. 
> 
> See this epic: https://issues.apache.org/jira/browse/AURORA-894 
> and the design doc: 
> https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
>  for more details and background.
> 
> 
> Assumptions on executor change:
> The executor starts health check at STARTING, if a successful health check is 
> performed before initial_interval_sec expires, the executor will sends a 
> status message for RUNNING.
> 
> What I try to implement:
> vCurrent updater will infer the executor version from the presence of a 
> status message sent by the executor during RUNNING status update. 
> 
> For vPrev executor:
> The vCurrent updater will use watch_sec for instance update, this is 
> independent from the health check.
> For vCurrent executor:
> If health check is disabled on vCurrent executor, the vCurrent updater 
> will use watch_sec for instance update.
> If health check is enabled on vCurrent executor, the vCurrent updater 
> will not use watch_sec for instance update.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c5765b70501c101f0535b4eed94e9948c36808f9 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
> c129896d8cd54abd2634e2a339c27921042b0162 
>   src/test/java/org/apache/aurora/scheduler/updater/InstanceUpdaterTest.java 
> c78c7fbd7d600586136863c99ce3d7387895efee 
> 
> Diff: https://reviews.apache.org/r/51536/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build
> 
> ./gradlew :test --tests 
> "org.apache.aurora.scheduler.updater.InstanceUpdaterTest"
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Kai Huang
> 
>



Re: Review Request 51536: @ReviewBot retry Scheduler updater will not use watch_sec if health check is enabled

2016-08-31 Thread Joshua Cohen


> On Aug. 30, 2016, 9:25 p.m., Joshua Cohen wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java, 
> > line 130
> > 
> >
> > Doing this based simply on the presence of a string feels brittle to 
> > me. Would it make sense to define some basic form of "executor 
> > capabilities" passing? Even if it's something simple like a comma delimited 
> > list of constants that we define in api.thrift (so that both the scheduler 
> > and the executor can reuse the values)?
> 
> Kai Huang wrote:
> Do you mean adding executor capabilities as one attribute of TaskEvent in 
> api.thrift? 
> 
> I think on the executor side , we can put (isHealthCheckEnabled() && 
> isExecutorUpdated()) into one attribute called "watch_sec_skippable", and 
> send it to the scheduler.
> 
> Zameer Manji wrote:
> "I think on the executor side , we can put (isHealthCheckEnabled() && 
> isExecutorUpdated()) into one attribute called "watch_sec_skippable", and 
> send it to the scheduler."
> 
> We currently have no communication via the executor and scheduler outside 
> of status updates. If you want to do this, please discuss this idea on the 
> mailing list. The design doc makes no reference to executor capabilities or 
> any communication between the executor and scheduler.
> 
> Maxim Khutornenko wrote:
> Joshua, how did you see the "executor capabilities" working here? As 
> Zameer pointed out, there is currently no way for us to communicate real-time 
> executor capabilities to the scheduler. I don't think adding this type of 
> info to every status update justifies the amount of work and additional 
> storage space that would require. Besides, we'd have to deprecate this 
> capability and remove from the schema in the next release or two. 
> 
> I agree relying on the presence of the message may appear brittle at 
> first but as far as I can see it's the most straightforward approach that 
> will not require any special handling or deprecation followups. We are in 
> full control of the executor and it's guaranteed through code the message 
> will remain absent for any V-n versions running out there.

I was thinking we would just do it via status updates, as it being done here. 
We don't need to set it on every status update or even store it. Essentially 
I'm just saying instead of relying on the presence of an arbitrary string in 
the message, rely on the presence of a string like: 
"capabilities:CAPABILITY_1,CAPABILITY-2" where `CAPABILITY_1` and 
`CAPABILITY_2` (etc.) are constants defined in api.thrift. Basically just 
formalizing the mechanism and making it a bit more future proof.


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51536/#review147358
---


On Aug. 31, 2016, 10:08 p.m., Kai Huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51536/
> ---
> 
> (Updated Aug. 31, 2016, 10:08 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> - Scheduler updater will not use watch_sec if health check is enabled
> - Add unit tests for successful updates.
> 
> This feature intends to improve reliability and performance of the Aurora 
> scheduler job updater by relying on health check status rather than 
> watch_secs timeout when deciding an individual instance update state. 
> 
> See this epic: https://issues.apache.org/jira/browse/AURORA-894 
> and the design doc: 
> https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
>  for more details and background.
> 
> 
> Assumptions on executor change:
> The executor starts health check at STARTING, if a successful health check is 
> performed before initial_interval_sec expires, the executor will sends a 
> status message for RUNNING.
> 
> What I try to implement:
> vCurrent updater will infer the executor version from the presence of a 
> status message sent by the executor during RUNNING status update. 
> 
> For vPrev executor:
> The vCurrent updater will use watch_sec for instance update, this is 
> independent from the health check.
> For vCurrent executor:
> If health check is disabled on vCurrent executor, the vCurrent updater 
> will use watch_sec for instance update.
> If health check is enabled on vCurrent executor, the vCurrent updater 
> will not use watch_sec for instance update.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c5765b70501c101f0535b4eed94e9948c36808f9 
>   src/main/java/org/apache/aurora/scheduler/updater/Ins

Re: Review Request 51536: @ReviewBot retry Scheduler updater will not use watch_sec if health check is enabled

2016-09-01 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51536/#review147597
---




api/src/main/thrift/org/apache/aurora/gen/api.thrift (line 467)


This is not sufficent to determine if healthchecking occurs.

We now support shell healthchecking, so a job may not have any port named 
health but it will still have it's health checked by thermos.

Why can't enabling of this feature be a property of the Job or Update?


- Zameer Manji


On Aug. 31, 2016, 3:08 p.m., Kai Huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51536/
> ---
> 
> (Updated Aug. 31, 2016, 3:08 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> - Scheduler updater will not use watch_sec if health check is enabled
> - Add unit tests for successful updates.
> 
> This feature intends to improve reliability and performance of the Aurora 
> scheduler job updater by relying on health check status rather than 
> watch_secs timeout when deciding an individual instance update state. 
> 
> See this epic: https://issues.apache.org/jira/browse/AURORA-894 
> and the design doc: 
> https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
>  for more details and background.
> 
> 
> Assumptions on executor change:
> The executor starts health check at STARTING, if a successful health check is 
> performed before initial_interval_sec expires, the executor will sends a 
> status message for RUNNING.
> 
> What I try to implement:
> vCurrent updater will infer the executor version from the presence of a 
> status message sent by the executor during RUNNING status update. 
> 
> For vPrev executor:
> The vCurrent updater will use watch_sec for instance update, this is 
> independent from the health check.
> For vCurrent executor:
> If health check is disabled on vCurrent executor, the vCurrent updater 
> will use watch_sec for instance update.
> If health check is enabled on vCurrent executor, the vCurrent updater 
> will not use watch_sec for instance update.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c5765b70501c101f0535b4eed94e9948c36808f9 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
> c129896d8cd54abd2634e2a339c27921042b0162 
>   src/test/java/org/apache/aurora/scheduler/updater/InstanceUpdaterTest.java 
> c78c7fbd7d600586136863c99ce3d7387895efee 
> 
> Diff: https://reviews.apache.org/r/51536/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build
> 
> ./gradlew :test --tests 
> "org.apache.aurora.scheduler.updater.InstanceUpdaterTest"
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Kai Huang
> 
>



Re: Review Request 51536: @ReviewBot retry Scheduler updater will not use watch_sec if health check is enabled

2016-09-01 Thread Dmitriy Shirchenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51536/#review147599
---




src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java (lines 
126 - 131)


+1 on @zmanji's comment. 
https://github.com/apache/aurora/blob/master/docs/reference/configuration.md#shellhealthchecker-objects
 could be used which does not need ports nor will necessarily contain 
HEALTH_PORT_NAME if requested.


- Dmitriy Shirchenko


On Aug. 31, 2016, 10:08 p.m., Kai Huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51536/
> ---
> 
> (Updated Aug. 31, 2016, 10:08 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> - Scheduler updater will not use watch_sec if health check is enabled
> - Add unit tests for successful updates.
> 
> This feature intends to improve reliability and performance of the Aurora 
> scheduler job updater by relying on health check status rather than 
> watch_secs timeout when deciding an individual instance update state. 
> 
> See this epic: https://issues.apache.org/jira/browse/AURORA-894 
> and the design doc: 
> https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
>  for more details and background.
> 
> 
> Assumptions on executor change:
> The executor starts health check at STARTING, if a successful health check is 
> performed before initial_interval_sec expires, the executor will sends a 
> status message for RUNNING.
> 
> What I try to implement:
> vCurrent updater will infer the executor version from the presence of a 
> status message sent by the executor during RUNNING status update. 
> 
> For vPrev executor:
> The vCurrent updater will use watch_sec for instance update, this is 
> independent from the health check.
> For vCurrent executor:
> If health check is disabled on vCurrent executor, the vCurrent updater 
> will use watch_sec for instance update.
> If health check is enabled on vCurrent executor, the vCurrent updater 
> will not use watch_sec for instance update.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c5765b70501c101f0535b4eed94e9948c36808f9 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
> c129896d8cd54abd2634e2a339c27921042b0162 
>   src/test/java/org/apache/aurora/scheduler/updater/InstanceUpdaterTest.java 
> c78c7fbd7d600586136863c99ce3d7387895efee 
> 
> Diff: https://reviews.apache.org/r/51536/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build
> 
> ./gradlew :test --tests 
> "org.apache.aurora.scheduler.updater.InstanceUpdaterTest"
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Kai Huang
> 
>



Re: Review Request 51536: @ReviewBot retry Scheduler updater will not use watch_sec if health check is enabled

2016-09-01 Thread Kai Huang


> On Sept. 1, 2016, 7:51 p.m., Zameer Manji wrote:
> > api/src/main/thrift/org/apache/aurora/gen/api.thrift, line 467
> > 
> >
> > This is not sufficent to determine if healthchecking occurs.
> > 
> > We now support shell healthchecking, so a job may not have any port 
> > named health but it will still have it's health checked by thermos.
> > 
> > Why can't enabling of this feature be a property of the Job or Update?

We can infer if the health check is enabled from the Job Config, so I think 
rather than modifying the Job, we should modify the Update to direct the 
Instance Updater to skip watch_secs(e.g. creating a boolean named 
"isWatchSecsSkippable" in InstanceUpdater).


- Kai


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51536/#review147597
---


On Aug. 31, 2016, 10:08 p.m., Kai Huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51536/
> ---
> 
> (Updated Aug. 31, 2016, 10:08 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> - Scheduler updater will not use watch_sec if health check is enabled
> - Add unit tests for successful updates.
> 
> This feature intends to improve reliability and performance of the Aurora 
> scheduler job updater by relying on health check status rather than 
> watch_secs timeout when deciding an individual instance update state. 
> 
> See this epic: https://issues.apache.org/jira/browse/AURORA-894 
> and the design doc: 
> https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
>  for more details and background.
> 
> 
> Assumptions on executor change:
> The executor starts health check at STARTING, if a successful health check is 
> performed before initial_interval_sec expires, the executor will sends a 
> status message for RUNNING.
> 
> What I try to implement:
> vCurrent updater will infer the executor version from the presence of a 
> status message sent by the executor during RUNNING status update. 
> 
> For vPrev executor:
> The vCurrent updater will use watch_sec for instance update, this is 
> independent from the health check.
> For vCurrent executor:
> If health check is disabled on vCurrent executor, the vCurrent updater 
> will use watch_sec for instance update.
> If health check is enabled on vCurrent executor, the vCurrent updater 
> will not use watch_sec for instance update.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c5765b70501c101f0535b4eed94e9948c36808f9 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
> c129896d8cd54abd2634e2a339c27921042b0162 
>   src/test/java/org/apache/aurora/scheduler/updater/InstanceUpdaterTest.java 
> c78c7fbd7d600586136863c99ce3d7387895efee 
> 
> Diff: https://reviews.apache.org/r/51536/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build
> 
> ./gradlew :test --tests 
> "org.apache.aurora.scheduler.updater.InstanceUpdaterTest"
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Kai Huang
> 
>



Re: Review Request 51536: @ReviewBot retry Scheduler updater will not use watch_sec if health check is enabled

2016-09-01 Thread Kai Huang


> On Sept. 1, 2016, 7:51 p.m., Zameer Manji wrote:
> > api/src/main/thrift/org/apache/aurora/gen/api.thrift, line 467
> > 
> >
> > This is not sufficent to determine if healthchecking occurs.
> > 
> > We now support shell healthchecking, so a job may not have any port 
> > named health but it will still have it's health checked by thermos.
> > 
> > Why can't enabling of this feature be a property of the Job or Update?
> 
> Kai Huang wrote:
> We can infer if the health check is enabled from the Job Config, so I 
> think rather than modifying the Job, we should modify the Update to direct 
> the Instance Updater to skip watch_secs(e.g. creating a boolean named 
> "isWatchSecsSkippable" in InstanceUpdater).

A second thought which seems to be a better solution: We just modify the 
executor to send a message in its healthChecker thread, and on scheduler side 
we check the presence of the task status update message string to determine if 
watch_secs should be skipped.

There is no need to check if health check is enabled at scheduler side nor 
modifying the Update.

The watch_secs is skipped by the vCurrent scheduler if and only if the vCurrent 
executor is doing a health check, and sends an non-empty message string during 
the TASK_RUNNING state transition.


- Kai


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51536/#review147597
---


On Aug. 31, 2016, 10:08 p.m., Kai Huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51536/
> ---
> 
> (Updated Aug. 31, 2016, 10:08 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> - Scheduler updater will not use watch_sec if health check is enabled
> - Add unit tests for successful updates.
> 
> This feature intends to improve reliability and performance of the Aurora 
> scheduler job updater by relying on health check status rather than 
> watch_secs timeout when deciding an individual instance update state. 
> 
> See this epic: https://issues.apache.org/jira/browse/AURORA-894 
> and the design doc: 
> https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
>  for more details and background.
> 
> 
> Assumptions on executor change:
> The executor starts health check at STARTING, if a successful health check is 
> performed before initial_interval_sec expires, the executor will sends a 
> status message for RUNNING.
> 
> What I try to implement:
> vCurrent updater will infer the executor version from the presence of a 
> status message sent by the executor during RUNNING status update. 
> 
> For vPrev executor:
> The vCurrent updater will use watch_sec for instance update, this is 
> independent from the health check.
> For vCurrent executor:
> If health check is disabled on vCurrent executor, the vCurrent updater 
> will use watch_sec for instance update.
> If health check is enabled on vCurrent executor, the vCurrent updater 
> will not use watch_sec for instance update.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c5765b70501c101f0535b4eed94e9948c36808f9 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
> c129896d8cd54abd2634e2a339c27921042b0162 
>   src/test/java/org/apache/aurora/scheduler/updater/InstanceUpdaterTest.java 
> c78c7fbd7d600586136863c99ce3d7387895efee 
> 
> Diff: https://reviews.apache.org/r/51536/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build
> 
> ./gradlew :test --tests 
> "org.apache.aurora.scheduler.updater.InstanceUpdaterTest"
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Kai Huang
> 
>



Re: Review Request 51536: @ReviewBot retry Scheduler updater will not use watch_sec if health check is enabled

2016-09-01 Thread Maxim Khutornenko


> On Sept. 1, 2016, 7:51 p.m., Zameer Manji wrote:
> > api/src/main/thrift/org/apache/aurora/gen/api.thrift, line 467
> > 
> >
> > This is not sufficent to determine if healthchecking occurs.
> > 
> > We now support shell healthchecking, so a job may not have any port 
> > named health but it will still have it's health checked by thermos.
> > 
> > Why can't enabling of this feature be a property of the Job or Update?
> 
> Kai Huang wrote:
> We can infer if the health check is enabled from the Job Config, so I 
> think rather than modifying the Job, we should modify the Update to direct 
> the Instance Updater to skip watch_secs(e.g. creating a boolean named 
> "isWatchSecsSkippable" in InstanceUpdater).
> 
> Kai Huang wrote:
> A second thought which seems to be a better solution: We just modify the 
> executor to send a message in its healthChecker thread, and on scheduler side 
> we check the presence of the task status update message string to determine 
> if watch_secs should be skipped.
> 
> There is no need to check if health check is enabled at scheduler side 
> nor modifying the Update.
> 
> The watch_secs is skipped by the vCurrent scheduler if and only if the 
> vCurrent executor is doing a health check, and sends an non-empty message 
> string during the TASK_RUNNING state transition.

I think Kai's latest proposal should be sufficient to determine what policy to 
apply. Iff RUNNING gets back with our "special" message added, the updater will 
skip watch_secs. In all other cases, it will honor watch_secs.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51536/#review147597
---


On Aug. 31, 2016, 10:08 p.m., Kai Huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51536/
> ---
> 
> (Updated Aug. 31, 2016, 10:08 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> - Scheduler updater will not use watch_sec if health check is enabled
> - Add unit tests for successful updates.
> 
> This feature intends to improve reliability and performance of the Aurora 
> scheduler job updater by relying on health check status rather than 
> watch_secs timeout when deciding an individual instance update state. 
> 
> See this epic: https://issues.apache.org/jira/browse/AURORA-894 
> and the design doc: 
> https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
>  for more details and background.
> 
> 
> Assumptions on executor change:
> The executor starts health check at STARTING, if a successful health check is 
> performed before initial_interval_sec expires, the executor will sends a 
> status message for RUNNING.
> 
> What I try to implement:
> vCurrent updater will infer the executor version from the presence of a 
> status message sent by the executor during RUNNING status update. 
> 
> For vPrev executor:
> The vCurrent updater will use watch_sec for instance update, this is 
> independent from the health check.
> For vCurrent executor:
> If health check is disabled on vCurrent executor, the vCurrent updater 
> will use watch_sec for instance update.
> If health check is enabled on vCurrent executor, the vCurrent updater 
> will not use watch_sec for instance update.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c5765b70501c101f0535b4eed94e9948c36808f9 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
> c129896d8cd54abd2634e2a339c27921042b0162 
>   src/test/java/org/apache/aurora/scheduler/updater/InstanceUpdaterTest.java 
> c78c7fbd7d600586136863c99ce3d7387895efee 
> 
> Diff: https://reviews.apache.org/r/51536/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build
> 
> ./gradlew :test --tests 
> "org.apache.aurora.scheduler.updater.InstanceUpdaterTest"
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Kai Huang
> 
>



Re: Review Request 51536: @ReviewBot retry Scheduler updater will not use watch_sec if health check is enabled

2016-09-02 Thread Kai Huang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51536/
---

(Updated Sept. 2, 2016, 3:55 p.m.)


Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.


Changes
---

Add links for epic.


Bugs: AURORA-894
https://issues.apache.org/jira/browse/AURORA-894


Repository: aurora


Description
---

- Scheduler updater will not use watch_sec if health check is enabled
- Add unit tests for successful updates.

This feature intends to improve reliability and performance of the Aurora 
scheduler job updater by relying on health check status rather than watch_secs 
timeout when deciding an individual instance update state. 

See this epic: https://issues.apache.org/jira/browse/AURORA-894 
and the design doc: 
https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
 for more details and background.


Assumptions on executor change:
The executor starts health check at STARTING, if a successful health check is 
performed before initial_interval_sec expires, the executor will sends a status 
message for RUNNING.

What I try to implement:
vCurrent updater will infer the executor version from the presence of a status 
message sent by the executor during RUNNING status update. 

For vPrev executor:
The vCurrent updater will use watch_sec for instance update, this is 
independent from the health check.
For vCurrent executor:
If health check is disabled on vCurrent executor, the vCurrent updater will 
use watch_sec for instance update.
If health check is enabled on vCurrent executor, the vCurrent updater will 
not use watch_sec for instance update.


Diffs
-

  api/src/main/thrift/org/apache/aurora/gen/api.thrift 
c5765b70501c101f0535b4eed94e9948c36808f9 
  src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
c129896d8cd54abd2634e2a339c27921042b0162 
  src/test/java/org/apache/aurora/scheduler/updater/InstanceUpdaterTest.java 
c78c7fbd7d600586136863c99ce3d7387895efee 

Diff: https://reviews.apache.org/r/51536/diff/


Testing
---

./gradlew build

./gradlew :test --tests 
"org.apache.aurora.scheduler.updater.InstanceUpdaterTest"

./build-support/jenkins/build.sh


Thanks,

Kai Huang



Re: Review Request 51536: @ReviewBot retry Scheduler updater will not use watch_sec if health check is enabled

2016-09-02 Thread Kai Huang


> On Sept. 1, 2016, 7:59 p.m., Dmitriy Shirchenko wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java, 
> > lines 126-131
> > 
> >
> > +1 on @zmanji's comment. 
> > https://github.com/apache/aurora/blob/master/docs/reference/configuration.md#shellhealthchecker-objects
> >  could be used which does not need ports nor will necessarily contain 
> > HEALTH_PORT_NAME if requested.

Yeah, the port name check is not sufficient. Actually I'm thinking about 
removing this function from scheduler, since the executor already checks 
whether HealthCheck is enabled or not on its end. See above discussions.


- Kai


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51536/#review147599
---


On Sept. 2, 2016, 3:55 p.m., Kai Huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51536/
> ---
> 
> (Updated Sept. 2, 2016, 3:55 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.
> 
> 
> Bugs: AURORA-894
> https://issues.apache.org/jira/browse/AURORA-894
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> - Scheduler updater will not use watch_sec if health check is enabled
> - Add unit tests for successful updates.
> 
> This feature intends to improve reliability and performance of the Aurora 
> scheduler job updater by relying on health check status rather than 
> watch_secs timeout when deciding an individual instance update state. 
> 
> See this epic: https://issues.apache.org/jira/browse/AURORA-894 
> and the design doc: 
> https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
>  for more details and background.
> 
> 
> Assumptions on executor change:
> The executor starts health check at STARTING, if a successful health check is 
> performed before initial_interval_sec expires, the executor will sends a 
> status message for RUNNING.
> 
> What I try to implement:
> vCurrent updater will infer the executor version from the presence of a 
> status message sent by the executor during RUNNING status update. 
> 
> For vPrev executor:
> The vCurrent updater will use watch_sec for instance update, this is 
> independent from the health check.
> For vCurrent executor:
> If health check is disabled on vCurrent executor, the vCurrent updater 
> will use watch_sec for instance update.
> If health check is enabled on vCurrent executor, the vCurrent updater 
> will not use watch_sec for instance update.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c5765b70501c101f0535b4eed94e9948c36808f9 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
> c129896d8cd54abd2634e2a339c27921042b0162 
>   src/test/java/org/apache/aurora/scheduler/updater/InstanceUpdaterTest.java 
> c78c7fbd7d600586136863c99ce3d7387895efee 
> 
> Diff: https://reviews.apache.org/r/51536/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build
> 
> ./gradlew :test --tests 
> "org.apache.aurora.scheduler.updater.InstanceUpdaterTest"
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Kai Huang
> 
>



Re: Review Request 51536: @ReviewBot retry Scheduler updater will not use watch_sec if health check is enabled

2016-09-02 Thread Kai Huang


> On Sept. 1, 2016, 7:51 p.m., Zameer Manji wrote:
> > api/src/main/thrift/org/apache/aurora/gen/api.thrift, line 467
> > 
> >
> > This is not sufficent to determine if healthchecking occurs.
> > 
> > We now support shell healthchecking, so a job may not have any port 
> > named health but it will still have it's health checked by thermos.
> > 
> > Why can't enabling of this feature be a property of the Job or Update?
> 
> Kai Huang wrote:
> We can infer if the health check is enabled from the Job Config, so I 
> think rather than modifying the Job, we should modify the Update to direct 
> the Instance Updater to skip watch_secs(e.g. creating a boolean named 
> "isWatchSecsSkippable" in InstanceUpdater).
> 
> Kai Huang wrote:
> A second thought which seems to be a better solution: We just modify the 
> executor to send a message in its healthChecker thread, and on scheduler side 
> we check the presence of the task status update message string to determine 
> if watch_secs should be skipped.
> 
> There is no need to check if health check is enabled at scheduler side 
> nor modifying the Update.
> 
> The watch_secs is skipped by the vCurrent scheduler if and only if the 
> vCurrent executor is doing a health check, and sends an non-empty message 
> string during the TASK_RUNNING state transition.
> 
> Maxim Khutornenko wrote:
> I think Kai's latest proposal should be sufficient to determine what 
> policy to apply. Iff RUNNING gets back with our "special" message added, the 
> updater will skip watch_secs. In all other cases, it will honor watch_secs.

ping for response.


- Kai


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51536/#review147597
---


On Sept. 2, 2016, 3:55 p.m., Kai Huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51536/
> ---
> 
> (Updated Sept. 2, 2016, 3:55 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.
> 
> 
> Bugs: AURORA-894
> https://issues.apache.org/jira/browse/AURORA-894
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> - Scheduler updater will not use watch_sec if health check is enabled
> - Add unit tests for successful updates.
> 
> This feature intends to improve reliability and performance of the Aurora 
> scheduler job updater by relying on health check status rather than 
> watch_secs timeout when deciding an individual instance update state. 
> 
> See this epic: https://issues.apache.org/jira/browse/AURORA-894 
> and the design doc: 
> https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
>  for more details and background.
> 
> 
> Assumptions on executor change:
> The executor starts health check at STARTING, if a successful health check is 
> performed before initial_interval_sec expires, the executor will sends a 
> status message for RUNNING.
> 
> What I try to implement:
> vCurrent updater will infer the executor version from the presence of a 
> status message sent by the executor during RUNNING status update. 
> 
> For vPrev executor:
> The vCurrent updater will use watch_sec for instance update, this is 
> independent from the health check.
> For vCurrent executor:
> If health check is disabled on vCurrent executor, the vCurrent updater 
> will use watch_sec for instance update.
> If health check is enabled on vCurrent executor, the vCurrent updater 
> will not use watch_sec for instance update.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c5765b70501c101f0535b4eed94e9948c36808f9 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
> c129896d8cd54abd2634e2a339c27921042b0162 
>   src/test/java/org/apache/aurora/scheduler/updater/InstanceUpdaterTest.java 
> c78c7fbd7d600586136863c99ce3d7387895efee 
> 
> Diff: https://reviews.apache.org/r/51536/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build
> 
> ./gradlew :test --tests 
> "org.apache.aurora.scheduler.updater.InstanceUpdaterTest"
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Kai Huang
> 
>



Re: Review Request 51536: @ReviewBot retry Scheduler updater will not use watch_sec if health check is enabled

2016-09-02 Thread Zameer Manji


> On Sept. 1, 2016, 12:51 p.m., Zameer Manji wrote:
> > api/src/main/thrift/org/apache/aurora/gen/api.thrift, line 467
> > 
> >
> > This is not sufficent to determine if healthchecking occurs.
> > 
> > We now support shell healthchecking, so a job may not have any port 
> > named health but it will still have it's health checked by thermos.
> > 
> > Why can't enabling of this feature be a property of the Job or Update?
> 
> Kai Huang wrote:
> We can infer if the health check is enabled from the Job Config, so I 
> think rather than modifying the Job, we should modify the Update to direct 
> the Instance Updater to skip watch_secs(e.g. creating a boolean named 
> "isWatchSecsSkippable" in InstanceUpdater).
> 
> Kai Huang wrote:
> A second thought which seems to be a better solution: We just modify the 
> executor to send a message in its healthChecker thread, and on scheduler side 
> we check the presence of the task status update message string to determine 
> if watch_secs should be skipped.
> 
> There is no need to check if health check is enabled at scheduler side 
> nor modifying the Update.
> 
> The watch_secs is skipped by the vCurrent scheduler if and only if the 
> vCurrent executor is doing a health check, and sends an non-empty message 
> string during the TASK_RUNNING state transition.
> 
> Maxim Khutornenko wrote:
> I think Kai's latest proposal should be sufficient to determine what 
> policy to apply. Iff RUNNING gets back with our "special" message added, the 
> updater will skip watch_secs. In all other cases, it will honor watch_secs.
> 
> Kai Huang wrote:
> ping for response.

I disagree with this design, please see the email thread Kai has created on 
dev@. Let's discuss our options there before going forward with the code.


- Zameer


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51536/#review147597
---


On Sept. 2, 2016, 8:55 a.m., Kai Huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51536/
> ---
> 
> (Updated Sept. 2, 2016, 8:55 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.
> 
> 
> Bugs: AURORA-894
> https://issues.apache.org/jira/browse/AURORA-894
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> - Scheduler updater will not use watch_sec if health check is enabled
> - Add unit tests for successful updates.
> 
> This feature intends to improve reliability and performance of the Aurora 
> scheduler job updater by relying on health check status rather than 
> watch_secs timeout when deciding an individual instance update state. 
> 
> See this epic: https://issues.apache.org/jira/browse/AURORA-894 
> and the design doc: 
> https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
>  for more details and background.
> 
> 
> Assumptions on executor change:
> The executor starts health check at STARTING, if a successful health check is 
> performed before initial_interval_sec expires, the executor will sends a 
> status message for RUNNING.
> 
> What I try to implement:
> vCurrent updater will infer the executor version from the presence of a 
> status message sent by the executor during RUNNING status update. 
> 
> For vPrev executor:
> The vCurrent updater will use watch_sec for instance update, this is 
> independent from the health check.
> For vCurrent executor:
> If health check is disabled on vCurrent executor, the vCurrent updater 
> will use watch_sec for instance update.
> If health check is enabled on vCurrent executor, the vCurrent updater 
> will not use watch_sec for instance update.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c5765b70501c101f0535b4eed94e9948c36808f9 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
> c129896d8cd54abd2634e2a339c27921042b0162 
>   src/test/java/org/apache/aurora/scheduler/updater/InstanceUpdaterTest.java 
> c78c7fbd7d600586136863c99ce3d7387895efee 
> 
> Diff: https://reviews.apache.org/r/51536/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build
> 
> ./gradlew :test --tests 
> "org.apache.aurora.scheduler.updater.InstanceUpdaterTest"
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Kai Huang
> 
>



Re: Review Request 51536: @ReviewBot retry Scheduler updater will not use watch_sec if health check is enabled

2016-09-07 Thread Kai Huang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51536/
---

(Updated Sept. 7, 2016, 10:06 p.m.)


Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.


Summary (updated)
-

@ReviewBot retry Scheduler updater will not use watch_sec if health check is 
enabled


Bugs: AURORA-894
https://issues.apache.org/jira/browse/AURORA-894


Repository: aurora


Description
---

- Scheduler updater will not use watch_sec if health check is enabled.

This feature intends to improve reliability and performance of the Aurora 
scheduler job updater by relying on health check status rather than watch_secs 
timeout when deciding an individual instance update state. 

See this epic: https://issues.apache.org/jira/browse/AURORA-894 
and the design doc: 
https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
 for more details and background.

After discussion on Aurora dev list, we decided to keep the watch_secs 
infrastructure intact on scheduler side. Our final conclusion is that we adopt 
the following implementation:

1. If the users want purely health checking driven updates they can set 
watch_secs to 0 and enable health checks.

2. If they want to have both health checking and time driven updates they can 
set watch_secs to the time that they care about, and doing health checks at 
STARTING state as well.

3. If they just want time driven updates they can disable health checking and 
set watch_secs to the time that they care about.

In this review, there will be only one scheduler change: 
Currently scheduler does not accept zero value for watch_secs, we need to relax 
this constraint.

Executor change to do (in a separate review):
The executor starts health check at STARTING, if a successful health check is 
performed before initial_interval_sec expires, the executor will sends a status 
message for RUNNING.


Diffs
-

  RELEASE-NOTES.md d79aaadc197697d09a71c83494a01765d6a983d4 
  src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java 
ac8df3e5a2da8cf22e1ba8a90944546e19ccdcaa 
  src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 
04551f17999d742c53dfb1b36286b119b448550e 

Diff: https://reviews.apache.org/r/51536/diff/


Testing
---

./gradlew build

./gradlew :test --tests "org.apache.aurora.scheduler.updater.JobUpdaterIT"

./build-support/jenkins/build.sh


Thanks,

Kai Huang



Re: Review Request 51536: @ReviewBot retry Scheduler updater will not use watch_sec if health check is enabled

2016-09-07 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51536/#review148100
---


Ship it!




Master (19866b5) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Sept. 7, 2016, 10:06 p.m., Kai Huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51536/
> ---
> 
> (Updated Sept. 7, 2016, 10:06 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.
> 
> 
> Bugs: AURORA-894
> https://issues.apache.org/jira/browse/AURORA-894
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> - Scheduler updater will not use watch_sec if health check is enabled.
> 
> This feature intends to improve reliability and performance of the Aurora 
> scheduler job updater by relying on health check status rather than 
> watch_secs timeout when deciding an individual instance update state. 
> 
> See this epic: https://issues.apache.org/jira/browse/AURORA-894 
> and the design doc: 
> https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
>  for more details and background.
> 
> After discussion on Aurora dev list, we decided to keep the watch_secs 
> infrastructure intact on scheduler side. Our final conclusion is that we 
> adopt the following implementation:
> 
> 1. If the users want purely health checking driven updates they can set 
> watch_secs to 0 and enable health checks.
> 
> 2. If they want to have both health checking and time driven updates they can 
> set watch_secs to the time that they care about, and doing health checks at 
> STARTING state as well.
> 
> 3. If they just want time driven updates they can disable health checking and 
> set watch_secs to the time that they care about.
> 
> In this review, there will be only one scheduler change: 
> Currently scheduler does not accept zero value for watch_secs, we need to 
> relax this constraint.
> 
> Executor change to do (in a separate review):
> The executor starts health check at STARTING, if a successful health check is 
> performed before initial_interval_sec expires, the executor will sends a 
> status message for RUNNING.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md d79aaadc197697d09a71c83494a01765d6a983d4 
>   src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java 
> ac8df3e5a2da8cf22e1ba8a90944546e19ccdcaa 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 
> 04551f17999d742c53dfb1b36286b119b448550e 
> 
> Diff: https://reviews.apache.org/r/51536/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build
> 
> ./gradlew :test --tests "org.apache.aurora.scheduler.updater.JobUpdaterIT"
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Kai Huang
> 
>



Re: Review Request 51536: @ReviewBot retry Scheduler updater will not use watch_sec if health check is enabled

2016-09-07 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51536/#review148102
---




RELEASE-NOTES.md (line 37)


I'd change this to "- Allow `watch_secs` to be set to 0".


- Zameer Manji


On Sept. 7, 2016, 3:06 p.m., Kai Huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51536/
> ---
> 
> (Updated Sept. 7, 2016, 3:06 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.
> 
> 
> Bugs: AURORA-894
> https://issues.apache.org/jira/browse/AURORA-894
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> - Scheduler updater will not use watch_sec if health check is enabled.
> 
> This feature intends to improve reliability and performance of the Aurora 
> scheduler job updater by relying on health check status rather than 
> watch_secs timeout when deciding an individual instance update state. 
> 
> See this epic: https://issues.apache.org/jira/browse/AURORA-894 
> and the design doc: 
> https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
>  for more details and background.
> 
> After discussion on Aurora dev list, we decided to keep the watch_secs 
> infrastructure intact on scheduler side. Our final conclusion is that we 
> adopt the following implementation:
> 
> 1. If the users want purely health checking driven updates they can set 
> watch_secs to 0 and enable health checks.
> 
> 2. If they want to have both health checking and time driven updates they can 
> set watch_secs to the time that they care about, and doing health checks at 
> STARTING state as well.
> 
> 3. If they just want time driven updates they can disable health checking and 
> set watch_secs to the time that they care about.
> 
> In this review, there will be only one scheduler change: 
> Currently scheduler does not accept zero value for watch_secs, we need to 
> relax this constraint.
> 
> Executor change to do (in a separate review):
> The executor starts health check at STARTING, if a successful health check is 
> performed before initial_interval_sec expires, the executor will sends a 
> status message for RUNNING.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md d79aaadc197697d09a71c83494a01765d6a983d4 
>   src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java 
> ac8df3e5a2da8cf22e1ba8a90944546e19ccdcaa 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 
> 04551f17999d742c53dfb1b36286b119b448550e 
> 
> Diff: https://reviews.apache.org/r/51536/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build
> 
> ./gradlew :test --tests "org.apache.aurora.scheduler.updater.JobUpdaterIT"
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Kai Huang
> 
>



Re: Review Request 51536: @ReviewBot retry Scheduler updater will not use watch_sec if health check is enabled

2016-09-07 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51536/#review148103
---



Also can you please update the review title and commit message to reflect the 
changes here? This has diverged from the initial review. I need the title and 
message to be updated so I can commit this.

- Zameer Manji


On Sept. 7, 2016, 3:06 p.m., Kai Huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51536/
> ---
> 
> (Updated Sept. 7, 2016, 3:06 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.
> 
> 
> Bugs: AURORA-894
> https://issues.apache.org/jira/browse/AURORA-894
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> - Scheduler updater will not use watch_sec if health check is enabled.
> 
> This feature intends to improve reliability and performance of the Aurora 
> scheduler job updater by relying on health check status rather than 
> watch_secs timeout when deciding an individual instance update state. 
> 
> See this epic: https://issues.apache.org/jira/browse/AURORA-894 
> and the design doc: 
> https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
>  for more details and background.
> 
> After discussion on Aurora dev list, we decided to keep the watch_secs 
> infrastructure intact on scheduler side. Our final conclusion is that we 
> adopt the following implementation:
> 
> 1. If the users want purely health checking driven updates they can set 
> watch_secs to 0 and enable health checks.
> 
> 2. If they want to have both health checking and time driven updates they can 
> set watch_secs to the time that they care about, and doing health checks at 
> STARTING state as well.
> 
> 3. If they just want time driven updates they can disable health checking and 
> set watch_secs to the time that they care about.
> 
> In this review, there will be only one scheduler change: 
> Currently scheduler does not accept zero value for watch_secs, we need to 
> relax this constraint.
> 
> Executor change to do (in a separate review):
> The executor starts health check at STARTING, if a successful health check is 
> performed before initial_interval_sec expires, the executor will sends a 
> status message for RUNNING.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md d79aaadc197697d09a71c83494a01765d6a983d4 
>   src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java 
> ac8df3e5a2da8cf22e1ba8a90944546e19ccdcaa 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 
> 04551f17999d742c53dfb1b36286b119b448550e 
> 
> Diff: https://reviews.apache.org/r/51536/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build
> 
> ./gradlew :test --tests "org.apache.aurora.scheduler.updater.JobUpdaterIT"
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Kai Huang
> 
>