> On Oct. 17, 2016, 6:53 a.m., haosdent huang wrote: > > src/health-check/health_checker.cpp, lines 206-217 > > <https://reviews.apache.org/r/52865/diff/1/?file=1537866#file1537866line206> > > > > After we never stop health check, `consecutiveFailures` may become to 0 > > after success again. Then `killTask` would transform from `true` to `false` > > here. Is it a expected bahaviour? > > Alexander Rukletsov wrote: > Very good point, Haosdent. > > The problem here is that **one entity decides** when a task should be > killed, but **another entity enforce** this. The first one cannot really > enforce what the second does. What is the least surpising behaviour is that > unfortunate architecture? My opinion is to reset if the second entity, i.e. > executor, does not comply. > > A better architecture would be to separate "health checker" from > "unhealthy policy enforcer". As we've already agreed, we need a "global" > health check policy, see > [MESOS-6171](https://issues.apache.org/jira/browse/MESOS-6171). With two > "unhealthy policies", local and global, the health checker library should > simply report the health status, while the executor will apply one of the > policies (that may still be implemented in a health checker library for code > reuse). If you think this makes sense, do you mind filing a ticket about this?
Got it, create at https://issues.apache.org/jira/browse/MESOS-6578 - haosdent ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/52865/#review152827 ----------------------------------------------------------- On Oct. 14, 2016, 12:37 p.m., Alexander Rukletsov wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/52865/ > ----------------------------------------------------------- > > (Updated Oct. 14, 2016, 12:37 p.m.) > > > Review request for mesos, Anand Mazumdar, Benjamin Mahler, Gastón Kleiman, > and haosdent huang. > > > Bugs: MESOS-5963 > https://issues.apache.org/jira/browse/MESOS-5963 > > > Repository: mesos > > > Description > ------- > > Prior to this patch, HealthChecker would stop performing health > checks after it marks the task for kill. Since tasks' lifecycle > is managed by scheduler-executor, HealthChecker should never stop > health checking on its own. > > > Diffs > ----- > > src/docker/executor.cpp ab3f0473fdc9105d1c425f0dbe7b81c566d541e8 > src/health-check/health_checker.hpp > 392b4d5bd1e5831994b9366c1eb5a2911e19860f > src/health-check/health_checker.cpp > 96ae1a733ff3d211b84d0893b4603873af1c89f0 > src/launcher/default_executor.cpp af4a97f7de5f2157aa65fdab742455b0683c40a4 > src/launcher/executor.cpp 3e95d6029bea0ce6e0dfb39c24b795fe98d90d13 > src/tests/health_check_tests.cpp 1d1676d7259bf52cfb1e499954fa815fe7e37522 > > Diff: https://reviews.apache.org/r/52865/diff/ > > > Testing > ------- > > See https://reviews.apache.org/r/52873/. > > > Thanks, > > Alexander Rukletsov > >