Re: Review Request 46307: Ignored subsequent status update in HealthStatusChange tests.

Alexander Rukletsov Thu, 24 Nov 2016 14:51:06 -0800


> On April 19, 2016, 2:35 p.m., Neil Conway wrote:
> > This patch does not solve the flakiness for me: failed once after 2 
> > iterations, then again after 77 iterations. Verbose test log here: 
> > https://gist.github.com/neilconway/e6134b4717ee022e7fc32a1f95619fa9
> 
> haosdent huang wrote:
>     Thank you very much for your test! I saw you use `vagrant@archlinux`, may 
> you share your vagrantfile to me? So that I could try to reproduce in my 
> local.
> 
> haosdent huang wrote:
>     ```
>     I0420 00:33:13.497138 15400 http.cpp:313] HTTP GET for /master/state from 
> 10.0.2.15:44478
>     Received task health update, healthy: true
>     I0420 00:33:13.502598 15400 slave.cpp:3201] Handling status update 
> TASK_RUNNING (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for task 1 in 
> health state healthy of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000 
> from executor(1)@10.0.2.15:37107
>     I0420 00:33:13.504456 15400 status_update_manager.cpp:320] Received 
> status update TASK_RUNNING (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for 
> task 1 in health state healthy of framework 
> 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000
>     I0420 00:33:13.505009 15400 slave.cpp:3599] Forwarding the update 
> TASK_RUNNING (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for task 1 in 
> health state healthy of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000 
> to master@10.0.2.15:41408
>     I0420 00:33:13.505167 15400 slave.cpp:3509] Sending acknowledgement for 
> status update TASK_RUNNING (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for 
> task 1 in health state healthy of framework 
> 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000 to executor(1)@10.0.2.15:37107
>     I0420 00:33:13.505524 15400 master.cpp:5069] Status update TASK_RUNNING 
> (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for task 1 in health state 
> healthy of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000 from agent 
> 7cf5923c-3d03-4ed6-826a-efa97f54e765-S0 at slave(76)@10.0.2.15:41408 
> (archlinux.vagrant.vm)
>     I0420 00:33:13.505602 15400 master.cpp:5117] Forwarding status update 
> TASK_RUNNING (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) for task 1 in 
> health state healthy of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000
>     I0420 00:33:13.505738 15400 master.cpp:6725] Updating the state of task 1 
> of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000 (latest state: 
> TASK_RUNNING, status update state: TASK_RUNNING)
>     I0420 00:33:13.505985 15400 master.cpp:4224] Processing ACKNOWLEDGE call 
> e19c76cc-096a-4398-b616-afb628b8e5b8 for task 1 of framework 
> 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000 (default) at 
> scheduler-5bd5e446-a017-45d9-8193-be7d23002487@10.0.2.15:41408 on agent 
> 7cf5923c-3d03-4ed6-826a-efa97f54e765-S0
>     I0420 00:33:13.506142 15400 status_update_manager.cpp:392] Received 
> status update acknowledgement (UUID: e19c76cc-096a-4398-b616-afb628b8e5b8) 
> for task 1 of framework 7cf5923c-3d03-4ed6-826a-efa97f54e765-0000
>     rm: cannot remove '/tmp/1NKfr1': No such file or directory
>     I0420 00:33:13.508203 15400 http.cpp:178] HTTP GET for /slave(76)/state 
> from 10.0.2.15:44482
>     ../../mesos/src/tests/health_check_tests.cpp:647: Failure
>     Value of: (find).get()
>       Actual: 16-byte object <05-00 00-00 00-00 00-00 90-C4 2D-03 00-00 00-00>
>     Expected: false
>     Which is: false
>     *** Aborted at 1461076393 (unix time) try "date -d @1461076393" if you 
> are using GNU date ***
>     PC: @          0x1899ba0 testing::UnitTest::AddTestPartResult()
>     *** SIGSEGV (@0x0) received by PID 15381 (TID 0x7f0aa958a7c0) from PID 0; 
> stack trace: ***
>     
>     ```
>     It looks like get `true` here. Let me try how to fix this.


There were at least two different issues in this test (see MESOS-1802), and 
this patch fixes just one. The one you see will be addressed in the next review 
in the chain.


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46307/#review129534
-----------------------------------------------------------


On May 17, 2016, 4:46 p.m., haosdent huang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46307/
> -----------------------------------------------------------
> 
> (Updated May 17, 2016, 4:46 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Benjamin Mahler, Greg Mann, 
> Neil Conway, and Timothy Chen.
> 
> 
> Bugs: MESOS-1802
>     https://issues.apache.org/jira/browse/MESOS-1802
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> In HealthStatusChange test cases, we launch a task that toggles between
> healthy and unhealthy, and will never be killed because no consecutive
> health failures occur. We need to ignore subsequent status updates it
> is possible to continue to receive status updates before we stop the
> driver.
> 
> 
> Diffs
> -----
> 
>   src/tests/health_check_tests.cpp 1c4a554ab07731963a4a38e3ae40b0323bf317bb 
> 
> Diff: https://reviews.apache.org/r/46307/diff/
> 
> 
> Testing
> -------
> 
> # I still could not reproduce the problem in old code after repeatedly tests. 
> So seems no way to verify whether my assumption is correct or not.
> 
> 
> Thanks,
> 
> haosdent huang
> 
>

Re: Review Request 46307: Ignored subsequent status update in HealthStatusChange tests.

Reply via email to