----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51876/#review149846 -----------------------------------------------------------
api/src/main/thrift/org/apache/aurora/gen/api.thrift (lines 1218 - 1220) <https://reviews.apache.org/r/51876/#comment217562> What's the motivation behind moving these constants into the thrift layer? FWICT, these are only used inside the `health_checker.py` and should stay there. src/main/python/apache/aurora/executor/aurora_executor.py (line 80) <https://reviews.apache.org/r/51876/#comment217565> This does not have to be a global field, does it? It looks like `_register()` only adds callbacks but nothing ever reads from it. src/main/python/apache/aurora/executor/aurora_executor.py (lines 184 - 185) <https://reviews.apache.org/r/51876/#comment217566> Are there any guarantees the status_manager will always deal with `ThermosTaskRunner.EXIT_STATE_MAP` values? The way I see it any unmapped state will end up not calling the `_shutdown()` at all. Perhaps a safer way could be having an explicit pair of `_running()` to be called only for RUNNING state and `_shutdown()` act as a default callback for anything else? Curious what others think here. src/main/python/apache/aurora/executor/common/health_checker.py (line 140) <https://reviews.apache.org/r/51876/#comment217570> leftover? src/main/python/apache/aurora/executor/common/health_checker.py (lines 153 - 155) <https://reviews.apache.org/r/51876/#comment217578> This should be moved into the `_maybe_update_failure_count()` to have all decision making logic in a single place. src/main/python/apache/aurora/executor/common/health_checker.py (line 225) <https://reviews.apache.org/r/51876/#comment217582> This should be `mesos_pb2.TASK_RUNNING`, right? src/main/python/apache/aurora/executor/common/health_checker.py (lines 270 - 274) <https://reviews.apache.org/r/51876/#comment217563> This logic needs to be refactored to reduce duplication and reuse what's now in `is_health_check_enabled` as much as possible. Ideally, we should have the only place we extract `health_checker` and the like. src/main/python/apache/aurora/executor/status_manager.py (line 68) <https://reviews.apache.org/r/51876/#comment217567> This is exactly the problem I alluded to above. What if there is no entry for a given `status` here? - Maxim Khutornenko On Sept. 20, 2016, 12:25 a.m., Kai Huang wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51876/ > ----------------------------------------------------------- > > (Updated Sept. 20, 2016, 12:25 a.m.) > > > Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji. > > > Bugs: AURORA-1225 > https://issues.apache.org/jira/browse/AURORA-1225 > > > Repository: aurora > > > Description > ------- > > Modify executor state transition logic to rely on health checks (if enabled). > > [Summary] > Executor needs to start executing user content in STARTING and transition to > RUNNING when a successful required number of health checks is reached. > > This review contains a series of executor changes that implement the health > check driven updates. It gives more context of the design of this feature. > > [Background] > Please see this epic: https://issues.apache.org/jira/browse/AURORA-1225 > and the design doc: > https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit# > for more details and background. > > [Description] > If health check is enabled on vCurrent executor, the health checker will send > a "TASK_RUNNING" message when a successful required number of health checks > is reached within the initial_interval_secs. On the other hand, a > "TASK_FAILED" message was sent if the health checker fails to reach the > required number of health checks within that period, or a maximum number of > failed health check limit is reached after the initital_interval_secs. > > If health check is disabled on the vCurrent executor, it will sends > "TASK_RUNNING" message to scheduler after the thermos runner was started. In > this scenario, the behavior of vCurrent executor will be the same as the > vPrev executor. > > [Change List] > The current change set includes: > 1. Removed the status memoization in ChainedStatusChecker. > 2. Modified the StatusManager to be edge triggered. > 3. Changed the Aurora Executor callback function. > 4. Modified the Health Checker and redefined the meaning > initial_interval_secs. > > > Diffs > ----- > > api/src/main/thrift/org/apache/aurora/gen/api.thrift > c5765b70501c101f0535b4eed94e9948c36808f9 > src/main/python/apache/aurora/executor/aurora_executor.py > ce5ef680f01831cd89fced8969ae3246c7f60cfd > src/main/python/apache/aurora/executor/common/health_checker.py > 5fc845eceac6f0c048d7489fdc4c672b0c609ea0 > src/main/python/apache/aurora/executor/common/status_checker.py > 795dae2d6b661fc528d952c2315196d94127961f > src/main/python/apache/aurora/executor/common/task_info.py > 4ef49e30eeb942886d02c1fb448055264f9aa874 > src/main/python/apache/aurora/executor/status_manager.py > 228a99a05f339e21cd7e769a42b9b2276e7bc3fc > src/test/python/apache/aurora/executor/common/test_health_checker.py > bb6ea69dd94298c5b8cf4d5f06d06eea7790d66e > src/test/python/apache/aurora/executor/common/test_status_checker.py > 5be1981c8c8e88258456adb21aa3ca7c0aa472a7 > src/test/python/apache/aurora/executor/common/test_task_info.py > 0d9cc5cb340d697a887d8e001f23c948f4fa70c7 > src/test/python/apache/aurora/executor/test_status_manager.py > ce4679ba1aa7b42cf0115c943d84663030182d23 > src/test/python/apache/aurora/executor/test_thermos_executor.py > 0bfe9e931f873c9f804f2ba4012e050e1f9fd24e > > Diff: https://reviews.apache.org/r/51876/diff/ > > > Testing > ------- > > ./build-support/jenkins/build.sh > > ./pants test.pytest src/test/python/apache/aurora/executor:: > > > Thanks, > > Kai Huang > >