[
https://issues.apache.org/jira/browse/MESOS-3509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15099128#comment-15099128
]
Zhitao Li commented on MESOS-3509:
--
I'll post some inconclusive findings since my last change seems to made this
test from "flaky" to "failing".
In short, clock/timer implementation in {{libevent}} seems to allow a timer to
be created *out of order* w.r.t to {{Clock::advance()}} in certain cases. As a
result, timer created after Clock advancing code was not affected by that and
instead waiting for a real wall time of 120s
({{slave::REGISTER_RETRY_INTERVAL_MAX * 2}}), which is longer than the default
15s of {{AWAIT_READY}}.
I took a snippet of log to run the test with {{--libevent}} and {{GLOG_v=3}}:
{panel}
I0114 23:25:16.880738 129300 authenticator.cpp:317] Authentication success
I0114 23:25:16.880803 129295 process.cpp:2502] Resuming master@127.0.0.1:31370
at 2016-01-14 23:25:16.878512896+00:00
I0114 23:25:16.880844 129301 process.cpp:2502] Resuming
crammd5_authenticator(1)@127.0.0.1:31370 at 2016-01-14 23:25:16.878512896+00:00
I0114 23:25:16.880861 129295 master.cpp:5475] Successfully authenticated
principal 'test-principal' at slave(1)@127.0.0.1:31370
I0114 23:25:16.880903 129301 authenticator.cpp:431] Authentication session
cleanup for crammd5_authenticatee(3)@127.0.0.1:31370
I0114 23:25:16.880873 129298 process.cpp:2502] Resuming
crammd5_authenticatee(3)@127.0.0.1:31370 at 2016-01-14 23:25:16.878512896+00:00
I0114 23:25:16.880975 129301 process.cpp:2800] Donating thread to
crammd5_authenticator_session(3)@127.0.0.1:31370 while waiting
I0114 23:25:16.880991 129298 authenticatee.cpp:298] Authentication success
I0114 23:25:16.881000 129301 process.cpp:2502] Resuming
crammd5_authenticator_session(3)@127.0.0.1:31370 at 2016-01-14
23:25:16.878512896+00:00
I0114 23:25:16.881027 129301 process.cpp:2607] Cleaning up
crammd5_authenticator_session(3)@127.0.0.1:31370
I0114 23:25:16.881069 129298 process.cpp:2502] Resuming
slave(1)@127.0.0.1:31370 at 2016-01-14 23:25:16.878512896+00:00
I0114 23:25:16.881184 129300 process.cpp:2502] Resuming
crammd5_authenticatee(3)@127.0.0.1:31370 at 2016-01-14 23:25:16.878512896+00:00
I0114 23:25:16.881225 129300 process.cpp:2607] Cleaning up
crammd5_authenticatee(3)@127.0.0.1:31370
I0114 23:25:16.881301 129298 slave.cpp:860] Successfully authenticated with
master master@127.0.0.1:31370
I0114 23:25:16.881631 129296 process.cpp:2502] Resuming master@127.0.0.1:31370
at 2016-01-14 23:25:16.878512896+00:00
I0114 23:25:16.881629 129298 slave.cpp:1254] Will retry registration in
2.172151ms if necessary
I0114 23:25:16.881724 129298 clock.cpp:279] Created a timer for
slave(1)@127.0.0.1:31370 in 2.172151ms in the future (2016-01-14
23:25:16.880685047+00:00)
I0114 23:25:16.881906 129296 master.cpp:4314] Re-registering slave
a9a5fba6-3191-424d-a1cf-5d12f35ada17-S0 at slave(1)@127.0.0.1:31370 (localhost)
I0114 23:25:16.882228 129294 process.cpp:2502] Resuming
slave(1)@127.0.0.1:31370 at 2016-01-14 23:25:16.878512896+00:00
I0114 23:25:16.882613 129295 process.cpp:2502] Resuming
slave(1)@127.0.0.1:31370 at 2016-01-14 23:25:16.878512896+00:00
I0114 23:25:16.882622 129296 master.cpp:4502] Sending updated checkpointed
resources to slave a9a5fba6-3191-424d-a1cf-5d12f35ada17-S0 at
slave(1)@127.0.0.1:31370 (localhost)
I0114 23:25:16.882712 129295 pid.cpp:93] Attempting to parse
'scheduler-72698b83-ea69-4f94-ac79-1fe005ba5ea9@127.0.0.1:31370' into a PID
W0114 23:25:16.882750 129295 slave.cpp:2162] Dropping updateFramework message
for a9a5fba6-3191-424d-a1cf-5d12f35ada17- because the slave is in
DISCONNECTED state
I0114 23:25:16.882935 129295 slave.cpp:2277] Updated checkpointed resources
from to
I0114 23:25:16.883074 129302 clock.cpp:152] Handling timers up to 2016-01-14
23:25:16.878512896+00:00
I0114 23:25:16.883116 129302 clock.cpp:197] Clock has settled
I0114 23:25:16.888927 129289 clock.cpp:465] Clock is settled
I0114 23:25:16.889067 129289 clock.cpp:381] Clock advanced (2mins) to 0x20681f0
I0114 23:25:16.889143 129302 clock.cpp:152] Handling timers up to 2016-01-14
23:27:16.878512896+00:00
I0114 23:25:16.889176 129302 clock.cpp:159] Have timeout(s) at 2016-01-14
23:25:16.880685047+00:00
I0114 23:25:16.889196 129302 clock.cpp:159] Have timeout(s) at 2016-01-14
23:25:16.882896819+00:00
I0114 23:25:16.889209 129302 clock.cpp:159] Have timeout(s) at 2016-01-14
23:25:16.965011968+00:00
I0114 23:25:16.889220 129302 clock.cpp:159] Have timeout(s) at 2016-01-14
23:25:17.800440064+00:00
I0114 23:25:16.889231 129302 clock.cpp:159] Have timeout(s) at 2016-01-14
23:25:18.572319400+00:00
I0114 23:25:16.889242 129302 clock.cpp:159] Have timeout(s) at 2016-01-14
23:25:21.561927936+00:00
I0114 23:25:16.889253 129302 clock.cpp:159] Have timeout(s) at 2016-01-14
23:25:21.847002880+00:00
I0114 23:25:16.889263 129302 clock.cpp:159] Have timeout(s) at 2016-01-