[jira] [Commented] (MESOS-1802) HealthCheckTest.HealthStatusChange is flaky on jenkins.
[ https://issues.apache.org/jira/browse/MESOS-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15694334#comment-15694334 ] Alexander Rukletsov commented on MESOS-1802: I can reproduce it relatively easy by running _parallel_ {{make check}}. Here is a fresh log: {noformat} [ RUN ] HealthCheckTest.HealthStatusChange I1124 23:20:48.351884 4284416 exec.cpp:162] Version: 1.2.0 I1124 23:20:48.375592 3747840 exec.cpp:237] Executor registered on agent 6db7ef4d-7211-47be-98ba-ad590b528c69-S0 Received SUBSCRIBED event Subscribed executor on alexr.speedportneo09012801000249 Received LAUNCH event Starting task 1 /Users/alex/Projects/mesos/build/parallel/src/mesos-containerizer launch --command="{"shell":true,"value":"sleep 120"}" --help="false" Forked command at 73286 Received task health update, healthy: true rm: /private/tmp/z1PbfH/rG7Gha: No such file or directory W1124 23:20:48.544631 3211264 health_checker.cpp:245] Health check failed 1 times consecutively: COMMAND health check failed: Command returned exited with status 1 Received task health update, healthy: false Received task health update, healthy: true rm: /private/tmp/z1PbfH/rG7Gha: No such file or directory ../../../src/tests/health_check_tests.cpp:790: Failure Value of: (find).get() Actual: 16-byte object <05-00 00-00 00-00 00-00 60-A9 62-1B B2-7F 00-00> Expected: false Which is: false I1124 23:20:48.732457 4284416 exec.cpp:414] Executor asked to shutdown Received SHUTDOWN event Shutting down Sending SIGTERM to process tree at pid 73286 W1124 23:20:48.747885 1064960 health_checker.cpp:245] Health check failed 1 times consecutively: COMMAND health check failed: Command returned exited with status 1 rm: /private/tmp/z1PbfH/rG7Gha: No such file or directory W1124 23:20:48.948562 3747840 health_checker.cpp:245] Health check failed 1 times consecutively: COMMAND health check failed: Command returned exited with status 1 Sent SIGTERM to the following process trees: [ --- 73286 sleep 120 ] Scheduling escalation to SIGKILL in 3secs from now [ FAILED ] HealthCheckTest.HealthStatusChange (1639 ms) {noformat} These lines {noformat} Received task health update, healthy: true rm: /private/tmp/z1PbfH/rG7Gha: No such file or directory ../../../src/tests/health_check_tests.cpp:790: Failure {noformat} obviously hint that we've queried the HTTP endpoint _after_ the next health status change. > HealthCheckTest.HealthStatusChange is flaky on jenkins. > --- > > Key: MESOS-1802 > URL: https://issues.apache.org/jira/browse/MESOS-1802 > Project: Mesos > Issue Type: Bug > Components: test, tests >Affects Versions: 0.26.0 >Reporter: Benjamin Mahler >Assignee: haosdent > Labels: flaky, health-check, mesosphere > Attachments: health_check_flaky_test_log.txt > > > https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2374/consoleFull > {noformat} > [ RUN ] HealthCheckTest.HealthStatusChange > Using temporary directory '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2' > I0916 22:56:14.034612 21026 leveldb.cpp:176] Opened db in 2.155713ms > I0916 22:56:14.034965 21026 leveldb.cpp:183] Compacted db in 332489ns > I0916 22:56:14.034984 21026 leveldb.cpp:198] Created db iterator in 3710ns > I0916 22:56:14.034996 21026 leveldb.cpp:204] Seeked to beginning of db in > 642ns > I0916 22:56:14.035006 21026 leveldb.cpp:273] Iterated through 0 keys in the > db in 343ns > I0916 22:56:14.035023 21026 replica.cpp:741] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0916 22:56:14.035200 21054 recover.cpp:425] Starting replica recovery > I0916 22:56:14.035403 21041 recover.cpp:451] Replica is in EMPTY status > I0916 22:56:14.035888 21045 replica.cpp:638] Replica in EMPTY status received > a broadcasted recover request > I0916 22:56:14.035969 21052 recover.cpp:188] Received a recover response from > a replica in EMPTY status > I0916 22:56:14.036118 21042 recover.cpp:542] Updating replica status to > STARTING > I0916 22:56:14.036603 21046 master.cpp:286] Master > 20140916-225614-3125920579-47865-21026 (penates.apache.org) started on > 67.195.81.186:47865 > I0916 22:56:14.036634 21046 master.cpp:332] Master only allowing > authenticated frameworks to register > I0916 22:56:14.036648 21046 master.cpp:337] Master only allowing > authenticated slaves to register > I0916 22:56:14.036659 21046 credentials.hpp:36] Loading credentials for > authentication from > '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2/credentials' > I0916 22:56:14.036686 21045 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 480322ns > I0916 22:56:14.036700 21045 replica.cpp:320] Persisted replica status to > STARTING > I0916 22:56:14.036769 21046
[jira] [Commented] (MESOS-1802) HealthCheckTest.HealthStatusChange is flaky on jenkins.
[ https://issues.apache.org/jira/browse/MESOS-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258435#comment-15258435 ] haosdent commented on MESOS-1802: - I prefer to drop the http status check in the test cases, because I still don't have any clues to guarantee the http queries order match the statusUpdate order. [~neilc] [~bmahler] [~tnachen] Do you think this proposal is doable? > HealthCheckTest.HealthStatusChange is flaky on jenkins. > --- > > Key: MESOS-1802 > URL: https://issues.apache.org/jira/browse/MESOS-1802 > Project: Mesos > Issue Type: Bug > Components: test, tests >Affects Versions: 0.26.0 >Reporter: Benjamin Mahler >Assignee: haosdent > Labels: flaky, health-check, mesosphere > Attachments: health_check_flaky_test_log.txt > > > https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2374/consoleFull > {noformat} > [ RUN ] HealthCheckTest.HealthStatusChange > Using temporary directory '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2' > I0916 22:56:14.034612 21026 leveldb.cpp:176] Opened db in 2.155713ms > I0916 22:56:14.034965 21026 leveldb.cpp:183] Compacted db in 332489ns > I0916 22:56:14.034984 21026 leveldb.cpp:198] Created db iterator in 3710ns > I0916 22:56:14.034996 21026 leveldb.cpp:204] Seeked to beginning of db in > 642ns > I0916 22:56:14.035006 21026 leveldb.cpp:273] Iterated through 0 keys in the > db in 343ns > I0916 22:56:14.035023 21026 replica.cpp:741] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0916 22:56:14.035200 21054 recover.cpp:425] Starting replica recovery > I0916 22:56:14.035403 21041 recover.cpp:451] Replica is in EMPTY status > I0916 22:56:14.035888 21045 replica.cpp:638] Replica in EMPTY status received > a broadcasted recover request > I0916 22:56:14.035969 21052 recover.cpp:188] Received a recover response from > a replica in EMPTY status > I0916 22:56:14.036118 21042 recover.cpp:542] Updating replica status to > STARTING > I0916 22:56:14.036603 21046 master.cpp:286] Master > 20140916-225614-3125920579-47865-21026 (penates.apache.org) started on > 67.195.81.186:47865 > I0916 22:56:14.036634 21046 master.cpp:332] Master only allowing > authenticated frameworks to register > I0916 22:56:14.036648 21046 master.cpp:337] Master only allowing > authenticated slaves to register > I0916 22:56:14.036659 21046 credentials.hpp:36] Loading credentials for > authentication from > '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2/credentials' > I0916 22:56:14.036686 21045 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 480322ns > I0916 22:56:14.036700 21045 replica.cpp:320] Persisted replica status to > STARTING > I0916 22:56:14.036769 21046 master.cpp:366] Authorization enabled > I0916 22:56:14.036826 21045 recover.cpp:451] Replica is in STARTING status > I0916 22:56:14.036944 21052 master.cpp:120] No whitelist given. Advertising > offers for all slaves > I0916 22:56:14.036968 21049 hierarchical_allocator_process.hpp:299] > Initializing hierarchical allocator process with master : > master@67.195.81.186:47865 > I0916 22:56:14.037284 21054 replica.cpp:638] Replica in STARTING status > received a broadcasted recover request > I0916 22:56:14.037312 21046 master.cpp:1212] The newly elected leader is > master@67.195.81.186:47865 with id 20140916-225614-3125920579-47865-21026 > I0916 22:56:14.037333 21046 master.cpp:1225] Elected as the leading master! > I0916 22:56:14.037345 21046 master.cpp:1043] Recovering from registrar > I0916 22:56:14.037504 21040 registrar.cpp:313] Recovering registrar > I0916 22:56:14.037505 21053 recover.cpp:188] Received a recover response from > a replica in STARTING status > I0916 22:56:14.037681 21047 recover.cpp:542] Updating replica status to VOTING > I0916 22:56:14.038072 21052 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 330251ns > I0916 22:56:14.038087 21052 replica.cpp:320] Persisted replica status to > VOTING > I0916 22:56:14.038127 21053 recover.cpp:556] Successfully joined the Paxos > group > I0916 22:56:14.038202 21053 recover.cpp:440] Recover process terminated > I0916 22:56:14.038364 21048 log.cpp:656] Attempting to start the writer > I0916 22:56:14.038812 21053 replica.cpp:474] Replica received implicit > promise request with proposal 1 > I0916 22:56:14.038925 21053 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 92623ns > I0916 22:56:14.038944 21053 replica.cpp:342] Persisted promised to 1 > I0916 22:56:14.039201 21052 coordinator.cpp:230] Coordinator attemping to > fill missing position > I0916 22:56:14.039676 21047 replica.cpp:375] Replica received explicit > promise request for position 0 with proposal 2 > I0916 22:56:14.039836 21047
[jira] [Commented] (MESOS-1802) HealthCheckTest.HealthStatusChange is flaky on jenkins.
[ https://issues.apache.org/jira/browse/MESOS-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248222#comment-15248222 ] haosdent commented on MESOS-1802: - According to another flaky log provided by [~neilc] (Thanks a lot!) {code} [ RUN ] HealthCheckTest.HealthStatusChange I0420 00:33:12.691145 15381 cluster.cpp:149] Creating default 'local' authorizer I0420 00:33:12.691897 15381 leveldb.cpp:174] Opened db in 620608ns I0420 00:33:12.692211 15381 leveldb.cpp:181] Compacted db in 283038ns I0420 00:33:12.692265 15381 leveldb.cpp:196] Created db iterator in 15184ns I0420 00:33:12.692273 15381 leveldb.cpp:202] Seeked to beginning of db in 2161ns I0420 00:33:12.692281 15381 leveldb.cpp:271] Iterated through 0 keys in the db in 1084ns I0420 00:33:12.692312 15381 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0420 00:33:12.693172 15398 recover.cpp:447] Starting replica recovery I0420 00:33:12.693435 15400 recover.cpp:473] Replica is in EMPTY status I0420 00:33:12.695106 15400 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (3755)@10.0.2.15:41408 I0420 00:33:12.695829 15400 recover.cpp:193] Received a recover response from a replica in EMPTY status I0420 00:33:12.696321 15398 recover.cpp:564] Updating replica status to STARTING I0420 00:33:12.696807 15402 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 101090ns I0420 00:33:12.696836 15402 replica.cpp:320] Persisted replica status to STARTING I0420 00:33:12.696964 15402 recover.cpp:473] Replica is in STARTING status I0420 00:33:12.697968 15400 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (3756)@10.0.2.15:41408 I0420 00:33:12.698274 15400 recover.cpp:193] Received a recover response from a replica in STARTING status I0420 00:33:12.698691 15400 recover.cpp:564] Updating replica status to VOTING I0420 00:33:12.698882 15400 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 62549ns I0420 00:33:12.698938 15400 replica.cpp:320] Persisted replica status to VOTING I0420 00:33:12.699002 15400 recover.cpp:578] Successfully joined the Paxos group I0420 00:33:12.699128 15400 recover.cpp:462] Recover process terminated I0420 00:33:12.721339 15401 master.cpp:382] Master 7cf5923c-3d03-4ed6-826a-efa97f54e765 (archlinux.vagrant.vm) started on 10.0.2.15:41408 I0420 00:33:12.721387 15401 master.cpp:384] Flags at startup: --acls="" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="true" --authenticate_http="true" --authenticate_http_frameworks="true" --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/9zoT36/credentials" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" --quiet="false" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" --registry_strict="true" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/9zoT36/master" --zk_session_timeout="10secs" I0420 00:33:12.721667 15401 master.cpp:433] Master only allowing authenticated frameworks to register I0420 00:33:12.721736 15401 master.cpp:439] Master only allowing authenticated agents to register I0420 00:33:12.721745 15401 master.cpp:445] Master only allowing authenticated HTTP frameworks to register I0420 00:33:12.721750 15401 credentials.hpp:37] Loading credentials for authentication from '/tmp/9zoT36/credentials' I0420 00:33:12.722005 15401 master.cpp:489] Using default 'crammd5' authenticator I0420 00:33:12.723206 15401 master.cpp:560] Using default 'basic' HTTP authenticator I0420 00:33:12.723458 15401 master.cpp:640] Using default 'basic' HTTP framework authenticator I0420 00:33:12.723574 15401 master.cpp:687] Authorization enabled I0420 00:33:12.725332 15399 master.cpp:1932] The newly elected leader is master@10.0.2.15:41408 with id 7cf5923c-3d03-4ed6-826a-efa97f54e765 I0420 00:33:12.725376 15399 master.cpp:1945] Elected as the leading master! I0420 00:33:12.725389 15399 master.cpp:1632] Recovering from registrar I0420 00:33:12.725544 15399 registrar.cpp:331] Recovering registrar I0420 00:33:12.726506 15399 log.cpp:524] Attempting to start the writer I0420 00:33:12.727424 15399 replica.cpp:493] Replica received implicit promise request from (3759)@10.0.2.15:41408 with proposal 1 I0420 00:33:12.727517 15399 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 50404ns I0420
[jira] [Commented] (MESOS-1802) HealthCheckTest.HealthStatusChange is flaky on jenkins.
[ https://issues.apache.org/jira/browse/MESOS-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15139357#comment-15139357 ] Neil Conway commented on MESOS-1802: Hi [~haosd...@gmail.com] -- I attached the output of {{--verbose}} for a failing run as a file associated with this issue. > HealthCheckTest.HealthStatusChange is flaky on jenkins. > --- > > Key: MESOS-1802 > URL: https://issues.apache.org/jira/browse/MESOS-1802 > Project: Mesos > Issue Type: Bug > Components: test, tests >Affects Versions: 0.26.0 >Reporter: Benjamin Mahler > Labels: flaky, mesosphere > Attachments: health_check_flaky_test_log.txt > > > https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2374/consoleFull > {noformat} > [ RUN ] HealthCheckTest.HealthStatusChange > Using temporary directory '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2' > I0916 22:56:14.034612 21026 leveldb.cpp:176] Opened db in 2.155713ms > I0916 22:56:14.034965 21026 leveldb.cpp:183] Compacted db in 332489ns > I0916 22:56:14.034984 21026 leveldb.cpp:198] Created db iterator in 3710ns > I0916 22:56:14.034996 21026 leveldb.cpp:204] Seeked to beginning of db in > 642ns > I0916 22:56:14.035006 21026 leveldb.cpp:273] Iterated through 0 keys in the > db in 343ns > I0916 22:56:14.035023 21026 replica.cpp:741] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0916 22:56:14.035200 21054 recover.cpp:425] Starting replica recovery > I0916 22:56:14.035403 21041 recover.cpp:451] Replica is in EMPTY status > I0916 22:56:14.035888 21045 replica.cpp:638] Replica in EMPTY status received > a broadcasted recover request > I0916 22:56:14.035969 21052 recover.cpp:188] Received a recover response from > a replica in EMPTY status > I0916 22:56:14.036118 21042 recover.cpp:542] Updating replica status to > STARTING > I0916 22:56:14.036603 21046 master.cpp:286] Master > 20140916-225614-3125920579-47865-21026 (penates.apache.org) started on > 67.195.81.186:47865 > I0916 22:56:14.036634 21046 master.cpp:332] Master only allowing > authenticated frameworks to register > I0916 22:56:14.036648 21046 master.cpp:337] Master only allowing > authenticated slaves to register > I0916 22:56:14.036659 21046 credentials.hpp:36] Loading credentials for > authentication from > '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2/credentials' > I0916 22:56:14.036686 21045 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 480322ns > I0916 22:56:14.036700 21045 replica.cpp:320] Persisted replica status to > STARTING > I0916 22:56:14.036769 21046 master.cpp:366] Authorization enabled > I0916 22:56:14.036826 21045 recover.cpp:451] Replica is in STARTING status > I0916 22:56:14.036944 21052 master.cpp:120] No whitelist given. Advertising > offers for all slaves > I0916 22:56:14.036968 21049 hierarchical_allocator_process.hpp:299] > Initializing hierarchical allocator process with master : > master@67.195.81.186:47865 > I0916 22:56:14.037284 21054 replica.cpp:638] Replica in STARTING status > received a broadcasted recover request > I0916 22:56:14.037312 21046 master.cpp:1212] The newly elected leader is > master@67.195.81.186:47865 with id 20140916-225614-3125920579-47865-21026 > I0916 22:56:14.037333 21046 master.cpp:1225] Elected as the leading master! > I0916 22:56:14.037345 21046 master.cpp:1043] Recovering from registrar > I0916 22:56:14.037504 21040 registrar.cpp:313] Recovering registrar > I0916 22:56:14.037505 21053 recover.cpp:188] Received a recover response from > a replica in STARTING status > I0916 22:56:14.037681 21047 recover.cpp:542] Updating replica status to VOTING > I0916 22:56:14.038072 21052 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 330251ns > I0916 22:56:14.038087 21052 replica.cpp:320] Persisted replica status to > VOTING > I0916 22:56:14.038127 21053 recover.cpp:556] Successfully joined the Paxos > group > I0916 22:56:14.038202 21053 recover.cpp:440] Recover process terminated > I0916 22:56:14.038364 21048 log.cpp:656] Attempting to start the writer > I0916 22:56:14.038812 21053 replica.cpp:474] Replica received implicit > promise request with proposal 1 > I0916 22:56:14.038925 21053 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 92623ns > I0916 22:56:14.038944 21053 replica.cpp:342] Persisted promised to 1 > I0916 22:56:14.039201 21052 coordinator.cpp:230] Coordinator attemping to > fill missing position > I0916 22:56:14.039676 21047 replica.cpp:375] Replica received explicit > promise request for position 0 with proposal 2 > I0916 22:56:14.039836 21047 leveldb.cpp:343] Persisting action (8 bytes) to > leveldb took 144215ns > I0916 22:56:14.039850 21047 replica.cpp:676] Persisted action at 0 >
[jira] [Commented] (MESOS-1802) HealthCheckTest.HealthStatusChange is flaky on jenkins.
[ https://issues.apache.org/jira/browse/MESOS-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15139364#comment-15139364 ] haosdent commented on MESOS-1802: - Thank you very much. Let me try to get more information from you log. > HealthCheckTest.HealthStatusChange is flaky on jenkins. > --- > > Key: MESOS-1802 > URL: https://issues.apache.org/jira/browse/MESOS-1802 > Project: Mesos > Issue Type: Bug > Components: test, tests >Affects Versions: 0.26.0 >Reporter: Benjamin Mahler > Labels: flaky, mesosphere > Attachments: health_check_flaky_test_log.txt > > > https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2374/consoleFull > {noformat} > [ RUN ] HealthCheckTest.HealthStatusChange > Using temporary directory '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2' > I0916 22:56:14.034612 21026 leveldb.cpp:176] Opened db in 2.155713ms > I0916 22:56:14.034965 21026 leveldb.cpp:183] Compacted db in 332489ns > I0916 22:56:14.034984 21026 leveldb.cpp:198] Created db iterator in 3710ns > I0916 22:56:14.034996 21026 leveldb.cpp:204] Seeked to beginning of db in > 642ns > I0916 22:56:14.035006 21026 leveldb.cpp:273] Iterated through 0 keys in the > db in 343ns > I0916 22:56:14.035023 21026 replica.cpp:741] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0916 22:56:14.035200 21054 recover.cpp:425] Starting replica recovery > I0916 22:56:14.035403 21041 recover.cpp:451] Replica is in EMPTY status > I0916 22:56:14.035888 21045 replica.cpp:638] Replica in EMPTY status received > a broadcasted recover request > I0916 22:56:14.035969 21052 recover.cpp:188] Received a recover response from > a replica in EMPTY status > I0916 22:56:14.036118 21042 recover.cpp:542] Updating replica status to > STARTING > I0916 22:56:14.036603 21046 master.cpp:286] Master > 20140916-225614-3125920579-47865-21026 (penates.apache.org) started on > 67.195.81.186:47865 > I0916 22:56:14.036634 21046 master.cpp:332] Master only allowing > authenticated frameworks to register > I0916 22:56:14.036648 21046 master.cpp:337] Master only allowing > authenticated slaves to register > I0916 22:56:14.036659 21046 credentials.hpp:36] Loading credentials for > authentication from > '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2/credentials' > I0916 22:56:14.036686 21045 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 480322ns > I0916 22:56:14.036700 21045 replica.cpp:320] Persisted replica status to > STARTING > I0916 22:56:14.036769 21046 master.cpp:366] Authorization enabled > I0916 22:56:14.036826 21045 recover.cpp:451] Replica is in STARTING status > I0916 22:56:14.036944 21052 master.cpp:120] No whitelist given. Advertising > offers for all slaves > I0916 22:56:14.036968 21049 hierarchical_allocator_process.hpp:299] > Initializing hierarchical allocator process with master : > master@67.195.81.186:47865 > I0916 22:56:14.037284 21054 replica.cpp:638] Replica in STARTING status > received a broadcasted recover request > I0916 22:56:14.037312 21046 master.cpp:1212] The newly elected leader is > master@67.195.81.186:47865 with id 20140916-225614-3125920579-47865-21026 > I0916 22:56:14.037333 21046 master.cpp:1225] Elected as the leading master! > I0916 22:56:14.037345 21046 master.cpp:1043] Recovering from registrar > I0916 22:56:14.037504 21040 registrar.cpp:313] Recovering registrar > I0916 22:56:14.037505 21053 recover.cpp:188] Received a recover response from > a replica in STARTING status > I0916 22:56:14.037681 21047 recover.cpp:542] Updating replica status to VOTING > I0916 22:56:14.038072 21052 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 330251ns > I0916 22:56:14.038087 21052 replica.cpp:320] Persisted replica status to > VOTING > I0916 22:56:14.038127 21053 recover.cpp:556] Successfully joined the Paxos > group > I0916 22:56:14.038202 21053 recover.cpp:440] Recover process terminated > I0916 22:56:14.038364 21048 log.cpp:656] Attempting to start the writer > I0916 22:56:14.038812 21053 replica.cpp:474] Replica received implicit > promise request with proposal 1 > I0916 22:56:14.038925 21053 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 92623ns > I0916 22:56:14.038944 21053 replica.cpp:342] Persisted promised to 1 > I0916 22:56:14.039201 21052 coordinator.cpp:230] Coordinator attemping to > fill missing position > I0916 22:56:14.039676 21047 replica.cpp:375] Replica received explicit > promise request for position 0 with proposal 2 > I0916 22:56:14.039836 21047 leveldb.cpp:343] Persisting action (8 bytes) to > leveldb took 144215ns > I0916 22:56:14.039850 21047 replica.cpp:676] Persisted action at 0 > I0916 22:56:14.040243 21047 replica.cpp:508] Replica received
[jira] [Commented] (MESOS-1802) HealthCheckTest.HealthStatusChange is flaky on jenkins.
[ https://issues.apache.org/jira/browse/MESOS-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135906#comment-15135906 ] haosdent commented on MESOS-1802: - Yes, [~neilc] do you have the stdout and stderr log? > HealthCheckTest.HealthStatusChange is flaky on jenkins. > --- > > Key: MESOS-1802 > URL: https://issues.apache.org/jira/browse/MESOS-1802 > Project: Mesos > Issue Type: Bug > Components: test, tests >Affects Versions: 0.26.0 >Reporter: Benjamin Mahler > Labels: flaky, mesosphere > > https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2374/consoleFull > {noformat} > [ RUN ] HealthCheckTest.HealthStatusChange > Using temporary directory '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2' > I0916 22:56:14.034612 21026 leveldb.cpp:176] Opened db in 2.155713ms > I0916 22:56:14.034965 21026 leveldb.cpp:183] Compacted db in 332489ns > I0916 22:56:14.034984 21026 leveldb.cpp:198] Created db iterator in 3710ns > I0916 22:56:14.034996 21026 leveldb.cpp:204] Seeked to beginning of db in > 642ns > I0916 22:56:14.035006 21026 leveldb.cpp:273] Iterated through 0 keys in the > db in 343ns > I0916 22:56:14.035023 21026 replica.cpp:741] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0916 22:56:14.035200 21054 recover.cpp:425] Starting replica recovery > I0916 22:56:14.035403 21041 recover.cpp:451] Replica is in EMPTY status > I0916 22:56:14.035888 21045 replica.cpp:638] Replica in EMPTY status received > a broadcasted recover request > I0916 22:56:14.035969 21052 recover.cpp:188] Received a recover response from > a replica in EMPTY status > I0916 22:56:14.036118 21042 recover.cpp:542] Updating replica status to > STARTING > I0916 22:56:14.036603 21046 master.cpp:286] Master > 20140916-225614-3125920579-47865-21026 (penates.apache.org) started on > 67.195.81.186:47865 > I0916 22:56:14.036634 21046 master.cpp:332] Master only allowing > authenticated frameworks to register > I0916 22:56:14.036648 21046 master.cpp:337] Master only allowing > authenticated slaves to register > I0916 22:56:14.036659 21046 credentials.hpp:36] Loading credentials for > authentication from > '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2/credentials' > I0916 22:56:14.036686 21045 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 480322ns > I0916 22:56:14.036700 21045 replica.cpp:320] Persisted replica status to > STARTING > I0916 22:56:14.036769 21046 master.cpp:366] Authorization enabled > I0916 22:56:14.036826 21045 recover.cpp:451] Replica is in STARTING status > I0916 22:56:14.036944 21052 master.cpp:120] No whitelist given. Advertising > offers for all slaves > I0916 22:56:14.036968 21049 hierarchical_allocator_process.hpp:299] > Initializing hierarchical allocator process with master : > master@67.195.81.186:47865 > I0916 22:56:14.037284 21054 replica.cpp:638] Replica in STARTING status > received a broadcasted recover request > I0916 22:56:14.037312 21046 master.cpp:1212] The newly elected leader is > master@67.195.81.186:47865 with id 20140916-225614-3125920579-47865-21026 > I0916 22:56:14.037333 21046 master.cpp:1225] Elected as the leading master! > I0916 22:56:14.037345 21046 master.cpp:1043] Recovering from registrar > I0916 22:56:14.037504 21040 registrar.cpp:313] Recovering registrar > I0916 22:56:14.037505 21053 recover.cpp:188] Received a recover response from > a replica in STARTING status > I0916 22:56:14.037681 21047 recover.cpp:542] Updating replica status to VOTING > I0916 22:56:14.038072 21052 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 330251ns > I0916 22:56:14.038087 21052 replica.cpp:320] Persisted replica status to > VOTING > I0916 22:56:14.038127 21053 recover.cpp:556] Successfully joined the Paxos > group > I0916 22:56:14.038202 21053 recover.cpp:440] Recover process terminated > I0916 22:56:14.038364 21048 log.cpp:656] Attempting to start the writer > I0916 22:56:14.038812 21053 replica.cpp:474] Replica received implicit > promise request with proposal 1 > I0916 22:56:14.038925 21053 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 92623ns > I0916 22:56:14.038944 21053 replica.cpp:342] Persisted promised to 1 > I0916 22:56:14.039201 21052 coordinator.cpp:230] Coordinator attemping to > fill missing position > I0916 22:56:14.039676 21047 replica.cpp:375] Replica received explicit > promise request for position 0 with proposal 2 > I0916 22:56:14.039836 21047 leveldb.cpp:343] Persisting action (8 bytes) to > leveldb took 144215ns > I0916 22:56:14.039850 21047 replica.cpp:676] Persisted action at 0 > I0916 22:56:14.040243 21047 replica.cpp:508] Replica received write request > for position 0 > I0916 22:56:14.040267 21047
[jira] [Commented] (MESOS-1802) HealthCheckTest.HealthStatusChange is flaky on jenkins.
[ https://issues.apache.org/jira/browse/MESOS-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133401#comment-15133401 ] Neil Conway commented on MESOS-1802: Fails pretty reliably for me: ~3 failures in 35 test runs. If you'd like more information, just let me know. > HealthCheckTest.HealthStatusChange is flaky on jenkins. > --- > > Key: MESOS-1802 > URL: https://issues.apache.org/jira/browse/MESOS-1802 > Project: Mesos > Issue Type: Bug > Components: test, tests >Affects Versions: 0.26.0 >Reporter: Benjamin Mahler > Labels: flaky, mesosphere > > https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2374/consoleFull > {noformat} > [ RUN ] HealthCheckTest.HealthStatusChange > Using temporary directory '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2' > I0916 22:56:14.034612 21026 leveldb.cpp:176] Opened db in 2.155713ms > I0916 22:56:14.034965 21026 leveldb.cpp:183] Compacted db in 332489ns > I0916 22:56:14.034984 21026 leveldb.cpp:198] Created db iterator in 3710ns > I0916 22:56:14.034996 21026 leveldb.cpp:204] Seeked to beginning of db in > 642ns > I0916 22:56:14.035006 21026 leveldb.cpp:273] Iterated through 0 keys in the > db in 343ns > I0916 22:56:14.035023 21026 replica.cpp:741] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0916 22:56:14.035200 21054 recover.cpp:425] Starting replica recovery > I0916 22:56:14.035403 21041 recover.cpp:451] Replica is in EMPTY status > I0916 22:56:14.035888 21045 replica.cpp:638] Replica in EMPTY status received > a broadcasted recover request > I0916 22:56:14.035969 21052 recover.cpp:188] Received a recover response from > a replica in EMPTY status > I0916 22:56:14.036118 21042 recover.cpp:542] Updating replica status to > STARTING > I0916 22:56:14.036603 21046 master.cpp:286] Master > 20140916-225614-3125920579-47865-21026 (penates.apache.org) started on > 67.195.81.186:47865 > I0916 22:56:14.036634 21046 master.cpp:332] Master only allowing > authenticated frameworks to register > I0916 22:56:14.036648 21046 master.cpp:337] Master only allowing > authenticated slaves to register > I0916 22:56:14.036659 21046 credentials.hpp:36] Loading credentials for > authentication from > '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2/credentials' > I0916 22:56:14.036686 21045 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 480322ns > I0916 22:56:14.036700 21045 replica.cpp:320] Persisted replica status to > STARTING > I0916 22:56:14.036769 21046 master.cpp:366] Authorization enabled > I0916 22:56:14.036826 21045 recover.cpp:451] Replica is in STARTING status > I0916 22:56:14.036944 21052 master.cpp:120] No whitelist given. Advertising > offers for all slaves > I0916 22:56:14.036968 21049 hierarchical_allocator_process.hpp:299] > Initializing hierarchical allocator process with master : > master@67.195.81.186:47865 > I0916 22:56:14.037284 21054 replica.cpp:638] Replica in STARTING status > received a broadcasted recover request > I0916 22:56:14.037312 21046 master.cpp:1212] The newly elected leader is > master@67.195.81.186:47865 with id 20140916-225614-3125920579-47865-21026 > I0916 22:56:14.037333 21046 master.cpp:1225] Elected as the leading master! > I0916 22:56:14.037345 21046 master.cpp:1043] Recovering from registrar > I0916 22:56:14.037504 21040 registrar.cpp:313] Recovering registrar > I0916 22:56:14.037505 21053 recover.cpp:188] Received a recover response from > a replica in STARTING status > I0916 22:56:14.037681 21047 recover.cpp:542] Updating replica status to VOTING > I0916 22:56:14.038072 21052 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 330251ns > I0916 22:56:14.038087 21052 replica.cpp:320] Persisted replica status to > VOTING > I0916 22:56:14.038127 21053 recover.cpp:556] Successfully joined the Paxos > group > I0916 22:56:14.038202 21053 recover.cpp:440] Recover process terminated > I0916 22:56:14.038364 21048 log.cpp:656] Attempting to start the writer > I0916 22:56:14.038812 21053 replica.cpp:474] Replica received implicit > promise request with proposal 1 > I0916 22:56:14.038925 21053 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 92623ns > I0916 22:56:14.038944 21053 replica.cpp:342] Persisted promised to 1 > I0916 22:56:14.039201 21052 coordinator.cpp:230] Coordinator attemping to > fill missing position > I0916 22:56:14.039676 21047 replica.cpp:375] Replica received explicit > promise request for position 0 with proposal 2 > I0916 22:56:14.039836 21047 leveldb.cpp:343] Persisting action (8 bytes) to > leveldb took 144215ns > I0916 22:56:14.039850 21047 replica.cpp:676] Persisted action at 0 > I0916 22:56:14.040243 21047 replica.cpp:508] Replica received write request
[jira] [Commented] (MESOS-1802) HealthCheckTest.HealthStatusChange is flaky on jenkins.
[ https://issues.apache.org/jira/browse/MESOS-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15124897#comment-15124897 ] haosdent commented on MESOS-1802: - Could it easy reproduce in your env? For example, after xxx cycles the failure happens for sure. Last time I try run a lot cycles in a CentOS 7(physical machine), but it runs well. > HealthCheckTest.HealthStatusChange is flaky on jenkins. > --- > > Key: MESOS-1802 > URL: https://issues.apache.org/jira/browse/MESOS-1802 > Project: Mesos > Issue Type: Bug > Components: test, tests >Affects Versions: 0.26.0 >Reporter: Benjamin Mahler > Labels: flaky, mesosphere > > https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2374/consoleFull > {noformat} > [ RUN ] HealthCheckTest.HealthStatusChange > Using temporary directory '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2' > I0916 22:56:14.034612 21026 leveldb.cpp:176] Opened db in 2.155713ms > I0916 22:56:14.034965 21026 leveldb.cpp:183] Compacted db in 332489ns > I0916 22:56:14.034984 21026 leveldb.cpp:198] Created db iterator in 3710ns > I0916 22:56:14.034996 21026 leveldb.cpp:204] Seeked to beginning of db in > 642ns > I0916 22:56:14.035006 21026 leveldb.cpp:273] Iterated through 0 keys in the > db in 343ns > I0916 22:56:14.035023 21026 replica.cpp:741] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0916 22:56:14.035200 21054 recover.cpp:425] Starting replica recovery > I0916 22:56:14.035403 21041 recover.cpp:451] Replica is in EMPTY status > I0916 22:56:14.035888 21045 replica.cpp:638] Replica in EMPTY status received > a broadcasted recover request > I0916 22:56:14.035969 21052 recover.cpp:188] Received a recover response from > a replica in EMPTY status > I0916 22:56:14.036118 21042 recover.cpp:542] Updating replica status to > STARTING > I0916 22:56:14.036603 21046 master.cpp:286] Master > 20140916-225614-3125920579-47865-21026 (penates.apache.org) started on > 67.195.81.186:47865 > I0916 22:56:14.036634 21046 master.cpp:332] Master only allowing > authenticated frameworks to register > I0916 22:56:14.036648 21046 master.cpp:337] Master only allowing > authenticated slaves to register > I0916 22:56:14.036659 21046 credentials.hpp:36] Loading credentials for > authentication from > '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2/credentials' > I0916 22:56:14.036686 21045 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 480322ns > I0916 22:56:14.036700 21045 replica.cpp:320] Persisted replica status to > STARTING > I0916 22:56:14.036769 21046 master.cpp:366] Authorization enabled > I0916 22:56:14.036826 21045 recover.cpp:451] Replica is in STARTING status > I0916 22:56:14.036944 21052 master.cpp:120] No whitelist given. Advertising > offers for all slaves > I0916 22:56:14.036968 21049 hierarchical_allocator_process.hpp:299] > Initializing hierarchical allocator process with master : > master@67.195.81.186:47865 > I0916 22:56:14.037284 21054 replica.cpp:638] Replica in STARTING status > received a broadcasted recover request > I0916 22:56:14.037312 21046 master.cpp:1212] The newly elected leader is > master@67.195.81.186:47865 with id 20140916-225614-3125920579-47865-21026 > I0916 22:56:14.037333 21046 master.cpp:1225] Elected as the leading master! > I0916 22:56:14.037345 21046 master.cpp:1043] Recovering from registrar > I0916 22:56:14.037504 21040 registrar.cpp:313] Recovering registrar > I0916 22:56:14.037505 21053 recover.cpp:188] Received a recover response from > a replica in STARTING status > I0916 22:56:14.037681 21047 recover.cpp:542] Updating replica status to VOTING > I0916 22:56:14.038072 21052 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 330251ns > I0916 22:56:14.038087 21052 replica.cpp:320] Persisted replica status to > VOTING > I0916 22:56:14.038127 21053 recover.cpp:556] Successfully joined the Paxos > group > I0916 22:56:14.038202 21053 recover.cpp:440] Recover process terminated > I0916 22:56:14.038364 21048 log.cpp:656] Attempting to start the writer > I0916 22:56:14.038812 21053 replica.cpp:474] Replica received implicit > promise request with proposal 1 > I0916 22:56:14.038925 21053 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 92623ns > I0916 22:56:14.038944 21053 replica.cpp:342] Persisted promised to 1 > I0916 22:56:14.039201 21052 coordinator.cpp:230] Coordinator attemping to > fill missing position > I0916 22:56:14.039676 21047 replica.cpp:375] Replica received explicit > promise request for position 0 with proposal 2 > I0916 22:56:14.039836 21047 leveldb.cpp:343] Persisting action (8 bytes) to > leveldb took 144215ns > I0916 22:56:14.039850 21047 replica.cpp:676] Persisted action at 0 > I0916
[jira] [Commented] (MESOS-1802) HealthCheckTest.HealthStatusChange is flaky on jenkins.
[ https://issues.apache.org/jira/browse/MESOS-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15124498#comment-15124498 ] Neil Conway commented on MESOS-1802: Just saw this on an Arch Linux VM (Virtualbox): {noformat} [ RUN ] HealthCheckTest.HealthStatusChange /mesos-2/src/tests/health_check_tests.cpp:529: Failure Mock function called more times than expected - returning directly. Function call: statusUpdate(0x7ffd37ae9bd0, @0x7f2afcdf20a0 112-byte object <70-9D F0-08 2B-7F 00-00 00-00 00-00 00-00 00-00 A0-6D 01-CC 2A-7F 00-00 20-EC 1E-02 00-00 00-00 01-00 00-00 02-00 00-00 20-EC 1E-02 00-00 00-00 60-C4 01-CC 2A-7F 00-00 C0-C3 01-CC 2A-7F 00-00 00-00 00-00 00-00 00-00 0C-74 DD-41 00-AB D5-41 20-C5 01-CC 2A-7F 00-00 00-00 00-00 00-00 00-00 B0-20 00-CC 2A-7F 00-00 00-00 00-00 CB-17 00-00>) Expected: to be called 4 times Actual: called 5 times - over-saturated and active /mesos-2/src/tests/health_check_tests.cpp:529: Failure Mock function called more times than expected - returning directly. Function call: statusUpdate(0x7ffd37ae9bd0, @0x7f2afb5ef0a0 112-byte object <70-9D F0-08 2B-7F 00-00 00-00 00-00 00-00 00-00 C0-6E 02-D4 2A-7F 00-00 20-EC 1E-02 00-00 00-00 01-00 00-00 02-00 00-00 20-EC 1E-02 00-00 00-00 10-08 02-D4 2A-7F 00-00 10-09 02-D4 2A-7F 00-00 00-00 00-00 01-00 00-00 2D-D0 E3-41 00-AB D5-41 10-07 02-D4 2A-7F 00-00 00-00 00-00 00-00 00-00 D0-07 02-D4 2A-7F 00-00 00-00 00-00 CB-17 00-00>) Expected: to be called 4 times Actual: called 6 times - over-saturated and active /mesos-2/src/tests/health_check_tests.cpp:529: Failure Mock function called more times than expected - returning directly. Function call: statusUpdate(0x7ffd37ae9bd0, @0x7f2afcdf20a0 112-byte object <70-9D F0-08 2B-7F 00-00 00-00 00-00 00-00 00-00 90-C5 01-CC 2A-7F 00-00 20-EC 1E-02 00-00 00-00 01-00 00-00 02-00 00-00 20-EC 1E-02 00-00 00-00 00-83 02-CC 2A-7F 00-00 A0-6D 01-CC 2A-7F 00-00 00-00 00-00 00-00 00-00 F5-F7 EA-41 00-AB D5-41 20-4D 00-CC 2A-7F 00-00 00-00 00-00 00-00 00-00 F0-C5 01-CC 2A-7F 00-00 00-00 00-00 CB-17 00-00>) Expected: to be called 4 times Actual: called 7 times - over-saturated and active /mesos-2/src/tests/health_check_tests.cpp:529: Failure Mock function called more times than expected - returning directly. Function call: statusUpdate(0x7ffd37ae9bd0, @0x7f2afa5ed0a0 112-byte object <70-9D F0-08 2B-7F 00-00 00-00 00-00 00-00 00-00 A0-89 01-E8 2A-7F 00-00 20-EC 1E-02 00-00 00-00 01-00 00-00 02-00 00-00 20-EC 1E-02 00-00 00-00 60-34 00-E8 2A-7F 00-00 B0-98 01-E8 2A-7F 00-00 00-00 00-00 01-00 00-00 02-0E F2-41 00-AB D5-41 90-BF 00-E8 2A-7F 00-00 00-00 00-00 00-00 00-00 80-B1 00-E8 2A-7F 00-00 00-00 00-00 CB-17 00-00>) Expected: to be called 4 times Actual: called 8 times - over-saturated and active /mesos-2/src/tests/health_check_tests.cpp:529: Failure Mock function called more times than expected - returning directly. Function call: statusUpdate(0x7ffd37ae9bd0, @0x7f2afc5f10a0 112-byte object <70-9D F0-08 2B-7F 00-00 00-00 00-00 00-00 00-00 50-58 02-DC 2A-7F 00-00 20-EC 1E-02 00-00 00-00 01-00 00-00 02-00 00-00 20-EC 1E-02 00-00 00-00 B0-1E 01-DC 2A-7F 00-00 40-1F 01-DC 2A-7F 00-00 00-00 00-00 00-00 00-00 4B-76 F8-41 00-AB D5-41 A0-95 02-DC 2A-7F 00-00 00-00 00-00 00-00 00-00 30-3F 01-DC 2A-7F 00-00 00-00 00-00 CB-17 00-00>) Expected: to be called 4 times Actual: called 9 times - over-saturated and active /mesos-2/src/tests/health_check_tests.cpp:529: Failure Mock function called more times than expected - returning directly. Function call: statusUpdate(0x7ffd37ae9bd0, @0x7f2afb5ef0a0 112-byte object <70-9D F0-08 2B-7F 00-00 00-00 00-00 00-00 00-00 50-65 03-D4 2A-7F 00-00 20-EC 1E-02 00-00 00-00 01-00 00-00 02-00 00-00 20-EC 1E-02 00-00 00-00 80-06 02-D4 2A-7F 00-00 20-06 02-D4 2A-7F 00-00 00-00 00-00 01-00 00-00 FC-17 FF-41 00-AB D5-41 10-09 02-D4 2A-7F 00-00 00-00 00-00 00-00 00-00 E0-63 03-D4 2A-7F 00-00 00-00 00-00 CB-17 00-00>) Expected: to be called 4 times Actual: called 10 times - over-saturated and active /mesos-2/src/tests/health_check_tests.cpp:529: Failure Mock function called more times than expected - returning directly. Function call: statusUpdate(0x7ffd37ae9bd0, @0x7f2afadee0a0 112-byte object <70-9D F0-08 2B-7F 00-00 00-00 00-00 00-00 00-00 10-A5 05-D8 2A-7F 00-00 20-EC 1E-02 00-00 00-00 01-00 00-00 02-00 00-00 20-EC 1E-02 00-00 00-00 60-1C 06-D8 2A-7F 00-00 C0-FF 0A-D8 2A-7F 00-00 00-00 00-00 00-00 00-00 C3-82 05-42 00-AB D5-41 F0-EB 06-D8 2A-7F 00-00 00-00 00-00 00-00 00-00 E0-FB 06-D8 2A-7F 00-00 00-00 00-00 CB-17 00-00>) Expected: to be called 4 times Actual: called 11 times - over-saturated and active /mesos-2/src/tests/health_check_tests.cpp:529: Failure Mock function called more times than expected - returning
[jira] [Commented] (MESOS-1802) HealthCheckTest.HealthStatusChange is flaky on jenkins.
[ https://issues.apache.org/jira/browse/MESOS-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115532#comment-15115532 ] Greg Mann commented on MESOS-1802: -- Looks like this is still an issue. Just saw this on the ASF CI, Ubuntu 14.04 with gcc: {code} [ RUN ] HealthCheckTest.HealthStatusChange I0125 15:22:26.282974 31962 leveldb.cpp:174] Opened db in 83.974532ms I0125 15:22:26.329022 31962 leveldb.cpp:181] Compacted db in 45.96014ms I0125 15:22:26.329654 31962 leveldb.cpp:196] Created db iterator in 28398ns I0125 15:22:26.330154 31962 leveldb.cpp:202] Seeked to beginning of db in 4095ns I0125 15:22:26.330663 31962 leveldb.cpp:271] Iterated through 0 keys in the db in 468ns I0125 15:22:26.331181 31962 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0125 15:22:26.332582 31988 recover.cpp:447] Starting replica recovery I0125 15:22:26.333237 31991 recover.cpp:473] Replica is in EMPTY status I0125 15:22:26.336228 31993 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (2239)@172.17.0.2:52355 I0125 15:22:26.336789 31992 recover.cpp:193] Received a recover response from a replica in EMPTY status I0125 15:22:26.338047 31992 recover.cpp:564] Updating replica status to STARTING I0125 15:22:26.350210 31982 master.cpp:374] Master 8bd61c8a-63c9-4f67-99a4-d6ce2e3f77be (6a3fa6c4588a) started on 172.17.0.2:52355 I0125 15:22:26.350651 31982 master.cpp:376] Flags at startup: --acls="" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/inJ0ZL/credentials" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" --quiet="false" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="25secs" --registry_strict="true" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/mesos/mesos-0.27.0/_inst/share/mesos/webui" --work_dir="/tmp/inJ0ZL/master" --zk_session_timeout="10secs" I0125 15:22:26.352558 31982 master.cpp:421] Master only allowing authenticated frameworks to register I0125 15:22:26.352833 31982 master.cpp:426] Master only allowing authenticated slaves to register I0125 15:22:26.353160 31982 credentials.hpp:35] Loading credentials for authentication from '/tmp/inJ0ZL/credentials' I0125 15:22:26.356725 31982 master.cpp:466] Using default 'crammd5' authenticator I0125 15:22:26.357357 31982 master.cpp:535] Using default 'basic' HTTP authenticator I0125 15:22:26.357851 31982 master.cpp:569] Authorization enabled I0125 15:22:26.359246 31986 hierarchical.cpp:144] Initialized hierarchical allocator process I0125 15:22:26.359474 31992 whitelist_watcher.cpp:77] No whitelist given I0125 15:22:26.364750 31991 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 25.971673ms I0125 15:22:26.365209 31991 replica.cpp:320] Persisted replica status to STARTING I0125 15:22:26.365898 31991 recover.cpp:473] Replica is in STARTING status I0125 15:22:26.369220 31993 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (2241)@172.17.0.2:52355 I0125 15:22:26.370393 31991 recover.cpp:193] Received a recover response from a replica in STARTING status I0125 15:22:26.371255 31993 recover.cpp:564] Updating replica status to VOTING I0125 15:22:26.372786 31982 master.cpp:1710] The newly elected leader is master@172.17.0.2:52355 with id 8bd61c8a-63c9-4f67-99a4-d6ce2e3f77be I0125 15:22:26.373251 31982 master.cpp:1723] Elected as the leading master! I0125 15:22:26.373553 31982 master.cpp:1468] Recovering from registrar I0125 15:22:26.374184 31991 registrar.cpp:307] Recovering registrar I0125 15:22:26.398006 31993 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 26.052513ms I0125 15:22:26.398576 31993 replica.cpp:320] Persisted replica status to VOTING I0125 15:22:26.399235 31982 recover.cpp:578] Successfully joined the Paxos group I0125 15:22:26.399518 31982 recover.cpp:462] Recover process terminated I0125 15:22:26.400599 31993 log.cpp:659] Attempting to start the writer I0125 15:22:26.402815 31992 replica.cpp:493] Replica received implicit promise request from (2242)@172.17.0.2:52355 with proposal 1 I0125 15:22:26.423198 31992 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 19.954945ms I0125 15:22:26.423651 31992 replica.cpp:342] Persisted promised to 1 I0125 15:22:26.425547 31992 coordinator.cpp:238] Coordinator attempting to fill missing
[jira] [Commented] (MESOS-1802) HealthCheckTest.HealthStatusChange is flaky on jenkins.
[ https://issues.apache.org/jira/browse/MESOS-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997592#comment-14997592 ] Vinod Kone commented on MESOS-1802: --- This is still happening. https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/1204/changes {code} [ RUN ] HealthCheckTest.HealthStatusChange I1109 19:00:52.094830 29603 leveldb.cpp:176] Opened db in 282.154657ms I1109 19:00:52.187214 29603 leveldb.cpp:183] Compacted db in 92.28671ms I1109 19:00:52.187328 29603 leveldb.cpp:198] Created db iterator in 31318ns I1109 19:00:52.187345 29603 leveldb.cpp:204] Seeked to beginning of db in 3925ns I1109 19:00:52.187357 29603 leveldb.cpp:273] Iterated through 0 keys in the db in 386ns I1109 19:00:52.187417 29603 replica.cpp:780] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I1109 19:00:52.188252 29628 recover.cpp:449] Starting replica recovery I1109 19:00:52.188861 29628 recover.cpp:475] Replica is in EMPTY status I1109 19:00:52.190914 29626 replica.cpp:676] Replica in EMPTY status received a broadcasted recover request from (1864)@172.17.14.49:60889 I1109 19:00:52.191453 29632 recover.cpp:195] Received a recover response from a replica in EMPTY status I1109 19:00:52.192078 29632 recover.cpp:566] Updating replica status to STARTING I1109 19:00:52.199542 29624 master.cpp:367] Master 34506b61-7fc1-4e5e-821c-e5f5650782e0 (61f7c652bdf2) started on 172.17.14.49:60889 I1109 19:00:52.200220 29624 master.cpp:369] Flags at startup: --acls="" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/z7oqX0/credentials" --framework_sorter="drf" --help="false" --hostname_lookup="true" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" --quiet="false" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="25secs" --registry_strict="true" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/mesos/mesos-0.26.0/_inst/share/mesos/webui" --work_dir="/tmp/z7oqX0/master" --zk_session_timeout="10secs" I1109 19:00:52.200584 29624 master.cpp:414] Master only allowing authenticated frameworks to register I1109 19:00:52.200597 29624 master.cpp:419] Master only allowing authenticated slaves to register I1109 19:00:52.200606 29624 credentials.hpp:37] Loading credentials for authentication from '/tmp/z7oqX0/credentials' I1109 19:00:52.200917 29624 master.cpp:458] Using default 'crammd5' authenticator I1109 19:00:52.201328 29624 master.cpp:495] Authorization enabled I1109 19:00:52.201648 29622 hierarchical.cpp:140] Initialized hierarchical allocator process I1109 19:00:52.201715 29622 whitelist_watcher.cpp:79] No whitelist given I1109 19:00:52.204258 29624 master.cpp:1606] The newly elected leader is master@172.17.14.49:60889 with id 34506b61-7fc1-4e5e-821c-e5f5650782e0 I1109 19:00:52.204290 29624 master.cpp:1619] Elected as the leading master! I1109 19:00:52.204308 29624 master.cpp:1379] Recovering from registrar I1109 19:00:52.204457 29637 registrar.cpp:309] Recovering registrar I1109 19:00:52.257841 29632 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 65.359249ms I1109 19:00:52.257933 29632 replica.cpp:323] Persisted replica status to STARTING I1109 19:00:52.258283 29631 recover.cpp:475] Replica is in STARTING status I1109 19:00:52.259827 29630 replica.cpp:676] Replica in STARTING status received a broadcasted recover request from (1865)@172.17.14.49:60889 I1109 19:00:52.260082 29628 recover.cpp:195] Received a recover response from a replica in STARTING status I1109 19:00:52.260671 29630 recover.cpp:566] Updating replica status to VOTING I1109 19:00:52.369948 29628 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 109.102187ms I1109 19:00:52.370046 29628 replica.cpp:323] Persisted replica status to VOTING I1109 19:00:52.370317 29628 recover.cpp:580] Successfully joined the Paxos group I1109 19:00:52.370551 29628 recover.cpp:464] Recover process terminated I1109 19:00:52.371186 29629 log.cpp:661] Attempting to start the writer I1109 19:00:52.372530 29629 replica.cpp:496] Replica received implicit promise request from (1866)@172.17.14.49:60889 with proposal 1 I1109 19:00:52.505920 29629 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 133.338727ms I1109 19:00:52.505997 29629 replica.cpp:345] Persisted promised to 1 I1109 19:00:52.506805 29634 coordinator.cpp:240] Coordinator attempting to fill missing positions I1109 19:00:52.508029 29622 replica.cpp:391] Replica received explicit promise request from (1867)@172.17.14.49:60889 for
[jira] [Commented] (MESOS-1802) HealthCheckTest.HealthStatusChange is flaky on jenkins.
[ https://issues.apache.org/jira/browse/MESOS-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511406#comment-14511406 ] Benjamin Hindman commented on MESOS-1802: - Hey [~tnachen], are you working on this? We're doing some backlog grooming! HealthCheckTest.HealthStatusChange is flaky on jenkins. --- Key: MESOS-1802 URL: https://issues.apache.org/jira/browse/MESOS-1802 Project: Mesos Issue Type: Bug Components: test Reporter: Benjamin Mahler Assignee: Timothy Chen Labels: flaky https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2374/consoleFull {noformat} [ RUN ] HealthCheckTest.HealthStatusChange Using temporary directory '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2' I0916 22:56:14.034612 21026 leveldb.cpp:176] Opened db in 2.155713ms I0916 22:56:14.034965 21026 leveldb.cpp:183] Compacted db in 332489ns I0916 22:56:14.034984 21026 leveldb.cpp:198] Created db iterator in 3710ns I0916 22:56:14.034996 21026 leveldb.cpp:204] Seeked to beginning of db in 642ns I0916 22:56:14.035006 21026 leveldb.cpp:273] Iterated through 0 keys in the db in 343ns I0916 22:56:14.035023 21026 replica.cpp:741] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I0916 22:56:14.035200 21054 recover.cpp:425] Starting replica recovery I0916 22:56:14.035403 21041 recover.cpp:451] Replica is in EMPTY status I0916 22:56:14.035888 21045 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I0916 22:56:14.035969 21052 recover.cpp:188] Received a recover response from a replica in EMPTY status I0916 22:56:14.036118 21042 recover.cpp:542] Updating replica status to STARTING I0916 22:56:14.036603 21046 master.cpp:286] Master 20140916-225614-3125920579-47865-21026 (penates.apache.org) started on 67.195.81.186:47865 I0916 22:56:14.036634 21046 master.cpp:332] Master only allowing authenticated frameworks to register I0916 22:56:14.036648 21046 master.cpp:337] Master only allowing authenticated slaves to register I0916 22:56:14.036659 21046 credentials.hpp:36] Loading credentials for authentication from '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2/credentials' I0916 22:56:14.036686 21045 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 480322ns I0916 22:56:14.036700 21045 replica.cpp:320] Persisted replica status to STARTING I0916 22:56:14.036769 21046 master.cpp:366] Authorization enabled I0916 22:56:14.036826 21045 recover.cpp:451] Replica is in STARTING status I0916 22:56:14.036944 21052 master.cpp:120] No whitelist given. Advertising offers for all slaves I0916 22:56:14.036968 21049 hierarchical_allocator_process.hpp:299] Initializing hierarchical allocator process with master : master@67.195.81.186:47865 I0916 22:56:14.037284 21054 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I0916 22:56:14.037312 21046 master.cpp:1212] The newly elected leader is master@67.195.81.186:47865 with id 20140916-225614-3125920579-47865-21026 I0916 22:56:14.037333 21046 master.cpp:1225] Elected as the leading master! I0916 22:56:14.037345 21046 master.cpp:1043] Recovering from registrar I0916 22:56:14.037504 21040 registrar.cpp:313] Recovering registrar I0916 22:56:14.037505 21053 recover.cpp:188] Received a recover response from a replica in STARTING status I0916 22:56:14.037681 21047 recover.cpp:542] Updating replica status to VOTING I0916 22:56:14.038072 21052 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 330251ns I0916 22:56:14.038087 21052 replica.cpp:320] Persisted replica status to VOTING I0916 22:56:14.038127 21053 recover.cpp:556] Successfully joined the Paxos group I0916 22:56:14.038202 21053 recover.cpp:440] Recover process terminated I0916 22:56:14.038364 21048 log.cpp:656] Attempting to start the writer I0916 22:56:14.038812 21053 replica.cpp:474] Replica received implicit promise request with proposal 1 I0916 22:56:14.038925 21053 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 92623ns I0916 22:56:14.038944 21053 replica.cpp:342] Persisted promised to 1 I0916 22:56:14.039201 21052 coordinator.cpp:230] Coordinator attemping to fill missing position I0916 22:56:14.039676 21047 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2 I0916 22:56:14.039836 21047 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 144215ns I0916 22:56:14.039850 21047 replica.cpp:676] Persisted action at 0 I0916 22:56:14.040243 21047 replica.cpp:508] Replica received write request for position 0 I0916 22:56:14.040267 21047 leveldb.cpp:438] Reading position from leveldb took