-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51876/#review148937
-----------------------------------------------------------




src/main/python/apache/aurora/executor/aurora_executor.py (line 125)
<https://reviews.apache.org/r/51876/#comment216462>

    I don't think we should include `Task is healthy.` message for 
non-health-check-enabled tasks as we have no visibility to the health of such 
tasks.



src/main/python/apache/aurora/executor/aurora_executor.py (lines 133 - 141)
<https://reviews.apache.org/r/51876/#comment216461>

    This code is largely a repetition of what alrady exists in 
health_checker.py. I'd be great to reuse it (say from task_info.py).



src/main/python/apache/aurora/executor/common/health_checker.py (lines 40 - 43)
<https://reviews.apache.org/r/51876/#comment216503>

    Do we really need this enum? A task that is in the middle of the 
`initial_interval_secs` timeout technically is still healthy, right? I feel 
having a boolean `isHealthy` should be enough to cover all possible scenarios 
as `StatusManager` checks for `isHealthy` AND the task status itself to invoke 
a callback.



src/main/python/apache/aurora/executor/common/health_checker.py (line 69)
<https://reviews.apache.org/r/51876/#comment216477>

    This comment needs update.



src/main/python/apache/aurora/executor/common/health_checker.py (line 103)
<https://reviews.apache.org/r/51876/#comment216474>

    `...if is_healthy else...`



src/main/python/apache/aurora/executor/common/health_checker.py (lines 136 - 
150)
<https://reviews.apache.org/r/51876/#comment216506>

    If you get rid of the `HealthCheckStatus` enum I think all we need is this:
    ```
    if self.initial_in_progress:
      if self.current_consecutive_health_checks >= 
self.min_consecutive_health_checks:
      self.healthy = True
    else:
      self.healthy = self.current_consecutive_failures > 
self.max_consecutive_failures
      self.reason = reason
      
      
    @property
    def initial_in_progress(self):
      if not self._expired:
        if self.elapsed_time <= self.initial_interval:
          return true
        else:  
          self._expired = True
        return !self._expired
    ```



src/main/python/apache/aurora/executor/common/health_checker.py (line 149)
<https://reviews.apache.org/r/51876/#comment216482>

    This should be log.info instead.



src/main/python/apache/aurora/executor/common/status_checker.py (line 104)
<https://reviews.apache.org/r/51876/#comment216500>

    Not obvious to me: what's the purpose of this condition here?



src/main/python/apache/aurora/executor/status_manager.py (line 60)
<https://reviews.apache.org/r/51876/#comment216473>

    How about extending the `StatusManager` to take two callbacks instead? The 
old callback would remain a default one and the new would be an optional one to 
exclusively process 'TASK_RUNNING'. Since it's now status aware, it would make 
sense to consolidate status management here and avoid the if-else switch 
upstream.
    
    Also, I'd use `mesos_pb2.TASK_RUNNING` instead of relying on string const.


- Maxim Khutornenko


On Sept. 14, 2016, 4:43 a.m., Kai Huang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51876/
> -----------------------------------------------------------
> 
> (Updated Sept. 14, 2016, 4:43 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.
> 
> 
> Bugs: AURORA-1225
>     https://issues.apache.org/jira/browse/AURORA-1225
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Modify executor state transition logic to rely on health checks (if enabled).
> 
> [Summary]
> Executor needs to start executing user content in STARTING and transition to 
> RUNNING when a successful required number of health checks is reached.
> 
> This review contains a series of executor changes that implement the health 
> check driven updates. It gives more context of the design of this feature.
> 
> [Background]
> Please see this epic: https://issues.apache.org/jira/browse/AURORA-1225
> and the design doc: 
> https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
>  for more details and background.
> 
> [Description]
> If health check is enabled on vCurrent executor, the health checker will send 
> a "TASK_RUNNING" message when a successful required number of health checks 
> is reached within the initial_interval_secs. On the other hand, a 
> "TASK_FAILED" message was sent if the health checker fails to reach the 
> required number of health checks within that period, or a maximum number of 
> failed health check limit is reached after the initital_interval_secs.
> 
> If health check is disabled on the vCurrent executor, it will sends 
> "TASK_RUNNING" message to scheduler after the thermos runner was started. In 
> this scenario, the behavior of vCurrent executor will be the same as the 
> vPrev executor.
> 
> [Change List]
> The current change set includes:
> 1. Removed the status memoization in ChainedStatusChecker.
> 2. Modified the StatusManager to be edge triggered.
> 3. Changed the Aurora Executor callback function.
> 4. Modified the Health Checker and redefined the meaning 
> initial_interval_secs.
> 
> [TODOs]
> Currently I fixed all broken tests caused by my changes. However, more tests 
> needs to be added in order to accomodate the executor change. I will send 
> follow-up review update when I cover more edge cases. But any feedback on the 
> implementation and tests is greatly appreciated.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> ce5ef680f01831cd89fced8969ae3246c7f60cfd 
>   src/main/python/apache/aurora/executor/common/health_checker.py 
> 5fc845eceac6f0c048d7489fdc4c672b0c609ea0 
>   src/main/python/apache/aurora/executor/common/status_checker.py 
> 795dae2d6b661fc528d952c2315196d94127961f 
>   src/main/python/apache/aurora/executor/status_manager.py 
> 228a99a05f339e21cd7e769a42b9b2276e7bc3fc 
>   src/test/python/apache/aurora/executor/common/test_health_checker.py 
> bb6ea69dd94298c5b8cf4d5f06d06eea7790d66e 
>   src/test/python/apache/aurora/executor/common/test_status_checker.py 
> 5be1981c8c8e88258456adb21aa3ca7c0aa472a7 
>   src/test/python/apache/aurora/executor/test_status_manager.py 
> ce4679ba1aa7b42cf0115c943d84663030182d23 
>   src/test/python/apache/aurora/executor/test_thermos_executor.py 
> 0bfe9e931f873c9f804f2ba4012e050e1f9fd24e 
> 
> Diff: https://reviews.apache.org/r/51876/diff/
> 
> 
> Testing
> -------
> 
> ./build-support/jenkins/build.sh
> 
> ./pants test.pytest src/test/python/apache/aurora/executor::
> 
> 
> Thanks,
> 
> Kai Huang
> 
>

Reply via email to