> On April 19, 2016, 2:35 p.m., Neil Conway wrote: > > This patch does not solve the flakiness for me: failed once after 2 > > iterations, then again after 77 iterations. Verbose test log here: > > https://gist.github.com/neilconway/e6134b4717ee022e7fc32a1f95619fa9 > > haosdent huang wrote: > Thank you very much for your test! I saw you use `vagrant@archlinux`, may > you share your vagrantfile to me? So that I could try to reproduce in my > local. > > haosdent huang wrote: > ``` > I0420 00:33:13.497138 15400 http.cpp:313] HTTP GET for /master/state from > 10.0.2.15:44478 > Received task health update, healthy: true > I0420 00:33:13.502598 15400 slave.cpp:3201] Handling status update > TASK_RUNNING (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for task 1 in > health state healthy of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000 > from executor(1)@10.0.2.15:37107 > I0420 00:33:13.504456 15400 status_update_manager.cpp:320] Received > status update TASK_RUNNING (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for > task 1 in health state healthy of framework > 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000 > I0420 00:33:13.505009 15400 slave.cpp:3599] Forwarding the update > TASK_RUNNING (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for task 1 in > health state healthy of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000 > to master@10.0.2.15:41408 > I0420 00:33:13.505167 15400 slave.cpp:3509] Sending acknowledgement for > status update TASK_RUNNING (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for > task 1 in health state healthy of framework > 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000 to executor(1)@10.0.2.15:37107 > I0420 00:33:13.505524 15400 master.cpp:5069] Status update TASK_RUNNING > (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for task 1 in health state > healthy of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000 from agent > 7cf5923c-3d03-4ed6-826a-efa97f54e765-S0 at slave(76)@10.0.2.15:41408 > (archlinux.vagrant.vm) > I0420 00:33:13.505602 15400 master.cpp:5117] Forwarding status update > TASK_RUNNING (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for task 1 in > health state healthy of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000 > I0420 00:33:13.505738 15400 master.cpp:6725] Updating the state of task 1 > of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000 (latest state: > TASK_RUNNING, status update state: TASK_RUNNING) > I0420 00:33:13.505985 15400 master.cpp:4224] Processing ACKNOWLEDGE call > e19c76cc-096a-4398-b616-afb628b8e5b8 for task 1 of framework > 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000 (default) at > scheduler-5bd5e446-a017-45d9-8193-be7d23002487@10.0.2.15:41408 on agent > 7cf5923c-3d03-4ed6-826a-efa97f54e765-S0 > I0420 00:33:13.506142 15400 status_update_manager.cpp:392] Received > status update acknowledgement (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) > for task 1 of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000 > rm: cannot remove '/tmp/1NKfr1': No such file or directory > I0420 00:33:13.508203 15400 http.cpp:178] HTTP GET for /slave(76)/state > from 10.0.2.15:44482 > ../../mesos/src/tests/health_check_tests.cpp:647: Failure > Value of: (find).get() > Actual: 16-byte object <05-00 00-00 00-00 00-00 90-C4 2D-03 00-00 00-00> > Expected: false > Which is: false > *** Aborted at 1461076393 (unix time) try "date -d @1461076393" if you > are using GNU date *** > PC: @ 0x1899ba0 testing::UnitTest::AddTestPartResult() > *** SIGSEGV (@0x0) received by PID 15381 (TID 0x7f0aa958a7c0) from PID 0; > stack trace: *** > > ``` > It looks like get `true` here. Let me try how to fix this.
There were at least two different issues in this test (see MESOS-1802), and this patch fixes just one. The one you see will be addressed in the next review in the chain. - Alexander ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/46307/#review129534 ----------------------------------------------------------- On May 17, 2016, 4:46 p.m., haosdent huang wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/46307/ > ----------------------------------------------------------- > > (Updated May 17, 2016, 4:46 p.m.) > > > Review request for mesos, Alexander Rukletsov, Benjamin Mahler, Greg Mann, > Neil Conway, and Timothy Chen. > > > Bugs: MESOS-1802 > https://issues.apache.org/jira/browse/MESOS-1802 > > > Repository: mesos > > > Description > ------- > > In HealthStatusChange test cases, we launch a task that toggles between > healthy and unhealthy, and will never be killed because no consecutive > health failures occur. We need to ignore subsequent status updates it > is possible to continue to receive status updates before we stop the > driver. > > > Diffs > ----- > > src/tests/health_check_tests.cpp 1c4a554ab07731963a4a38e3ae40b0323bf317bb > > Diff: https://reviews.apache.org/r/46307/diff/ > > > Testing > ------- > > # I still could not reproduce the problem in old code after repeatedly tests. > So seems no way to verify whether my assumption is correct or not. > > > Thanks, > > haosdent huang > >