[jira] [Commented] (MESOS-4024) HealthCheckTest.CheckCommandTimeout is flaky.
[ https://issues.apache.org/jira/browse/MESOS-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051350#comment-15051350 ] haosdent commented on MESOS-4024: - Could I change {noformat} Try containerizer = MesosContainerizer::create(flags, false, &fetcher); {noformat} to local in HealthCheckTest.CheckCommandTimeout? {noformat} Try containerizer = MesosContainerizer::create(flags, true, &fetcher); {noformat} So that it could print log to stdout. > HealthCheckTest.CheckCommandTimeout is flaky. > - > > Key: MESOS-4024 > URL: https://issues.apache.org/jira/browse/MESOS-4024 > Project: Mesos > Issue Type: Bug >Reporter: haosdent >Assignee: haosdent > Labels: flaky-test > Attachments: HealthCheckTest_CheckCommandTimeout.log > > > {noformat: title=Failed Run} > [ RUN ] HealthCheckTest.CheckCommandTimeout > I1201 13:03:15.211911 30288 leveldb.cpp:174] Opened db in 126.548747ms > I1201 13:03:15.254041 30288 leveldb.cpp:181] Compacted db in 42.053948ms > I1201 13:03:15.254226 30288 leveldb.cpp:196] Created db iterator in 25588ns > I1201 13:03:15.254281 30288 leveldb.cpp:202] Seeked to beginning of db in > 3231ns > I1201 13:03:15.254294 30288 leveldb.cpp:271] Iterated through 0 keys in the > db in 256ns > I1201 13:03:15.254348 30288 replica.cpp:778] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I1201 13:03:15.255162 30311 recover.cpp:447] Starting replica recovery > I1201 13:03:15.255502 30311 recover.cpp:473] Replica is in EMPTY status > I1201 13:03:15.257158 30311 replica.cpp:674] Replica in EMPTY status received > a broadcasted recover request from (1898)@172.17.21.0:52024 > I1201 13:03:15.258224 30318 recover.cpp:193] Received a recover response from > a replica in EMPTY status > I1201 13:03:15.259735 30310 recover.cpp:564] Updating replica status to > STARTING > I1201 13:03:15.265080 30322 master.cpp:365] Master > dd5bff66-362f-4efc-963a-54756b2edcce (fa812f474cf4) started on > 172.17.21.0:52024 > I1201 13:03:15.265121 30322 master.cpp:367] Flags at startup: --acls="" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/IaRntP/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="25secs" --registry_strict="true" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/mesos/mesos-0.27.0/_inst/share/mesos/webui" > --work_dir="/tmp/IaRntP/master" --zk_session_timeout="10secs" > I1201 13:03:15.265487 30322 master.cpp:412] Master only allowing > authenticated frameworks to register > I1201 13:03:15.265504 30322 master.cpp:417] Master only allowing > authenticated slaves to register > I1201 13:03:15.265513 30322 credentials.hpp:35] Loading credentials for > authentication from '/tmp/IaRntP/credentials' > I1201 13:03:15.265842 30322 master.cpp:456] Using default 'crammd5' > authenticator > I1201 13:03:15.266006 30322 master.cpp:493] Authorization enabled > I1201 13:03:15.266464 30308 hierarchical.cpp:162] Initialized hierarchical > allocator process > I1201 13:03:15.267225 30321 whitelist_watcher.cpp:77] No whitelist given > I1201 13:03:15.268847 30322 master.cpp:1637] The newly elected leader is > master@172.17.21.0:52024 with id dd5bff66-362f-4efc-963a-54756b2edcce > I1201 13:03:15.268887 30322 master.cpp:1650] Elected as the leading master! > I1201 13:03:15.268905 30322 master.cpp:1395] Recovering from registrar > I1201 13:03:15.270830 30322 registrar.cpp:307] Recovering registrar > I1201 13:03:15.291272 30318 leveldb.cpp:304] Persisting metadata (8 bytes) to > leveldb took 31.410668ms > I1201 13:03:15.291363 30318 replica.cpp:321] Persisted replica status to > STARTING > I1201 13:03:15.291733 30318 recover.cpp:473] Replica is in STARTING status > I1201 13:03:15.293392 30318 replica.cpp:674] Replica in STARTING status > received a broadcasted recover request from (1900)@172.17.21.0:52024 > I1201 13:03:15.294251 30307 recover.cpp:193] Received a recover response from > a replica in STARTING status > I1201 13:03:15.294756 30307 recover.cpp:564] Updating replica status to VOTING > I1201 13:03:15.338260 30307 leveldb.cpp:304] Persisting metadata (8 bytes) to > leveldb took 43.256127ms > I1201 13:03:15.338348 30307 replica.cpp:321] Persisted replica status to > VOTING > I1201 13:03:15.3386
[jira] [Commented] (MESOS-4024) HealthCheckTest.CheckCommandTimeout is flaky.
[ https://issues.apache.org/jira/browse/MESOS-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051302#comment-15051302 ] haosdent commented on MESOS-4024: - still not idea. And health check log are located in sandbox and not display in jenkins test log. But we could change consecutiveFailures to 2 and timeoutSeconds to 2 to reduce test time first. > HealthCheckTest.CheckCommandTimeout is flaky. > - > > Key: MESOS-4024 > URL: https://issues.apache.org/jira/browse/MESOS-4024 > Project: Mesos > Issue Type: Bug >Reporter: haosdent >Assignee: haosdent > Labels: flaky-test > Attachments: HealthCheckTest_CheckCommandTimeout.log > > > {noformat: title=Failed Run} > [ RUN ] HealthCheckTest.CheckCommandTimeout > I1201 13:03:15.211911 30288 leveldb.cpp:174] Opened db in 126.548747ms > I1201 13:03:15.254041 30288 leveldb.cpp:181] Compacted db in 42.053948ms > I1201 13:03:15.254226 30288 leveldb.cpp:196] Created db iterator in 25588ns > I1201 13:03:15.254281 30288 leveldb.cpp:202] Seeked to beginning of db in > 3231ns > I1201 13:03:15.254294 30288 leveldb.cpp:271] Iterated through 0 keys in the > db in 256ns > I1201 13:03:15.254348 30288 replica.cpp:778] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I1201 13:03:15.255162 30311 recover.cpp:447] Starting replica recovery > I1201 13:03:15.255502 30311 recover.cpp:473] Replica is in EMPTY status > I1201 13:03:15.257158 30311 replica.cpp:674] Replica in EMPTY status received > a broadcasted recover request from (1898)@172.17.21.0:52024 > I1201 13:03:15.258224 30318 recover.cpp:193] Received a recover response from > a replica in EMPTY status > I1201 13:03:15.259735 30310 recover.cpp:564] Updating replica status to > STARTING > I1201 13:03:15.265080 30322 master.cpp:365] Master > dd5bff66-362f-4efc-963a-54756b2edcce (fa812f474cf4) started on > 172.17.21.0:52024 > I1201 13:03:15.265121 30322 master.cpp:367] Flags at startup: --acls="" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/IaRntP/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="25secs" --registry_strict="true" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/mesos/mesos-0.27.0/_inst/share/mesos/webui" > --work_dir="/tmp/IaRntP/master" --zk_session_timeout="10secs" > I1201 13:03:15.265487 30322 master.cpp:412] Master only allowing > authenticated frameworks to register > I1201 13:03:15.265504 30322 master.cpp:417] Master only allowing > authenticated slaves to register > I1201 13:03:15.265513 30322 credentials.hpp:35] Loading credentials for > authentication from '/tmp/IaRntP/credentials' > I1201 13:03:15.265842 30322 master.cpp:456] Using default 'crammd5' > authenticator > I1201 13:03:15.266006 30322 master.cpp:493] Authorization enabled > I1201 13:03:15.266464 30308 hierarchical.cpp:162] Initialized hierarchical > allocator process > I1201 13:03:15.267225 30321 whitelist_watcher.cpp:77] No whitelist given > I1201 13:03:15.268847 30322 master.cpp:1637] The newly elected leader is > master@172.17.21.0:52024 with id dd5bff66-362f-4efc-963a-54756b2edcce > I1201 13:03:15.268887 30322 master.cpp:1650] Elected as the leading master! > I1201 13:03:15.268905 30322 master.cpp:1395] Recovering from registrar > I1201 13:03:15.270830 30322 registrar.cpp:307] Recovering registrar > I1201 13:03:15.291272 30318 leveldb.cpp:304] Persisting metadata (8 bytes) to > leveldb took 31.410668ms > I1201 13:03:15.291363 30318 replica.cpp:321] Persisted replica status to > STARTING > I1201 13:03:15.291733 30318 recover.cpp:473] Replica is in STARTING status > I1201 13:03:15.293392 30318 replica.cpp:674] Replica in STARTING status > received a broadcasted recover request from (1900)@172.17.21.0:52024 > I1201 13:03:15.294251 30307 recover.cpp:193] Received a recover response from > a replica in STARTING status > I1201 13:03:15.294756 30307 recover.cpp:564] Updating replica status to VOTING > I1201 13:03:15.338260 30307 leveldb.cpp:304] Persisting metadata (8 bytes) to > leveldb took 43.256127ms > I1201 13:03:15.338348 30307 replica.cpp:321] Persisted replica status to > VOTING > I1201 13:03:15.338601 30307 recover.cpp:578] Successfully joined the Paxos > group > I1201 13:03:15.338803 30307 recover.cpp:4
[jira] [Commented] (MESOS-4024) HealthCheckTest.CheckCommandTimeout is flaky.
[ https://issues.apache.org/jira/browse/MESOS-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15040774#comment-15040774 ] haosdent commented on MESOS-4024: - Sorry for the dalay, I would try to investigate this at this weekend. > HealthCheckTest.CheckCommandTimeout is flaky. > - > > Key: MESOS-4024 > URL: https://issues.apache.org/jira/browse/MESOS-4024 > Project: Mesos > Issue Type: Bug >Reporter: haosdent >Assignee: haosdent > Labels: flaky-test > Attachments: HealthCheckTest_CheckCommandTimeout.log > > > {noformat: title=Failed Run} > [ RUN ] HealthCheckTest.CheckCommandTimeout > I1201 13:03:15.211911 30288 leveldb.cpp:174] Opened db in 126.548747ms > I1201 13:03:15.254041 30288 leveldb.cpp:181] Compacted db in 42.053948ms > I1201 13:03:15.254226 30288 leveldb.cpp:196] Created db iterator in 25588ns > I1201 13:03:15.254281 30288 leveldb.cpp:202] Seeked to beginning of db in > 3231ns > I1201 13:03:15.254294 30288 leveldb.cpp:271] Iterated through 0 keys in the > db in 256ns > I1201 13:03:15.254348 30288 replica.cpp:778] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I1201 13:03:15.255162 30311 recover.cpp:447] Starting replica recovery > I1201 13:03:15.255502 30311 recover.cpp:473] Replica is in EMPTY status > I1201 13:03:15.257158 30311 replica.cpp:674] Replica in EMPTY status received > a broadcasted recover request from (1898)@172.17.21.0:52024 > I1201 13:03:15.258224 30318 recover.cpp:193] Received a recover response from > a replica in EMPTY status > I1201 13:03:15.259735 30310 recover.cpp:564] Updating replica status to > STARTING > I1201 13:03:15.265080 30322 master.cpp:365] Master > dd5bff66-362f-4efc-963a-54756b2edcce (fa812f474cf4) started on > 172.17.21.0:52024 > I1201 13:03:15.265121 30322 master.cpp:367] Flags at startup: --acls="" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/IaRntP/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="25secs" --registry_strict="true" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/mesos/mesos-0.27.0/_inst/share/mesos/webui" > --work_dir="/tmp/IaRntP/master" --zk_session_timeout="10secs" > I1201 13:03:15.265487 30322 master.cpp:412] Master only allowing > authenticated frameworks to register > I1201 13:03:15.265504 30322 master.cpp:417] Master only allowing > authenticated slaves to register > I1201 13:03:15.265513 30322 credentials.hpp:35] Loading credentials for > authentication from '/tmp/IaRntP/credentials' > I1201 13:03:15.265842 30322 master.cpp:456] Using default 'crammd5' > authenticator > I1201 13:03:15.266006 30322 master.cpp:493] Authorization enabled > I1201 13:03:15.266464 30308 hierarchical.cpp:162] Initialized hierarchical > allocator process > I1201 13:03:15.267225 30321 whitelist_watcher.cpp:77] No whitelist given > I1201 13:03:15.268847 30322 master.cpp:1637] The newly elected leader is > master@172.17.21.0:52024 with id dd5bff66-362f-4efc-963a-54756b2edcce > I1201 13:03:15.268887 30322 master.cpp:1650] Elected as the leading master! > I1201 13:03:15.268905 30322 master.cpp:1395] Recovering from registrar > I1201 13:03:15.270830 30322 registrar.cpp:307] Recovering registrar > I1201 13:03:15.291272 30318 leveldb.cpp:304] Persisting metadata (8 bytes) to > leveldb took 31.410668ms > I1201 13:03:15.291363 30318 replica.cpp:321] Persisted replica status to > STARTING > I1201 13:03:15.291733 30318 recover.cpp:473] Replica is in STARTING status > I1201 13:03:15.293392 30318 replica.cpp:674] Replica in STARTING status > received a broadcasted recover request from (1900)@172.17.21.0:52024 > I1201 13:03:15.294251 30307 recover.cpp:193] Received a recover response from > a replica in STARTING status > I1201 13:03:15.294756 30307 recover.cpp:564] Updating replica status to VOTING > I1201 13:03:15.338260 30307 leveldb.cpp:304] Persisting metadata (8 bytes) to > leveldb took 43.256127ms > I1201 13:03:15.338348 30307 replica.cpp:321] Persisted replica status to > VOTING > I1201 13:03:15.338601 30307 recover.cpp:578] Successfully joined the Paxos > group > I1201 13:03:15.338803 30307 recover.cpp:462] Recover process terminated > I1201 13:03:15.339624 30307 log.cpp:659] Attempting to start the writer > I1201 13:03:15.342
[jira] [Commented] (MESOS-4024) HealthCheckTest.CheckCommandTimeout is flaky.
[ https://issues.apache.org/jira/browse/MESOS-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15040775#comment-15040775 ] haosdent commented on MESOS-4024: - Sorry for the dalay, I would try to investigate this at this weekend. > HealthCheckTest.CheckCommandTimeout is flaky. > - > > Key: MESOS-4024 > URL: https://issues.apache.org/jira/browse/MESOS-4024 > Project: Mesos > Issue Type: Bug >Reporter: haosdent >Assignee: haosdent > Labels: flaky-test > Attachments: HealthCheckTest_CheckCommandTimeout.log > > > {noformat: title=Failed Run} > [ RUN ] HealthCheckTest.CheckCommandTimeout > I1201 13:03:15.211911 30288 leveldb.cpp:174] Opened db in 126.548747ms > I1201 13:03:15.254041 30288 leveldb.cpp:181] Compacted db in 42.053948ms > I1201 13:03:15.254226 30288 leveldb.cpp:196] Created db iterator in 25588ns > I1201 13:03:15.254281 30288 leveldb.cpp:202] Seeked to beginning of db in > 3231ns > I1201 13:03:15.254294 30288 leveldb.cpp:271] Iterated through 0 keys in the > db in 256ns > I1201 13:03:15.254348 30288 replica.cpp:778] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I1201 13:03:15.255162 30311 recover.cpp:447] Starting replica recovery > I1201 13:03:15.255502 30311 recover.cpp:473] Replica is in EMPTY status > I1201 13:03:15.257158 30311 replica.cpp:674] Replica in EMPTY status received > a broadcasted recover request from (1898)@172.17.21.0:52024 > I1201 13:03:15.258224 30318 recover.cpp:193] Received a recover response from > a replica in EMPTY status > I1201 13:03:15.259735 30310 recover.cpp:564] Updating replica status to > STARTING > I1201 13:03:15.265080 30322 master.cpp:365] Master > dd5bff66-362f-4efc-963a-54756b2edcce (fa812f474cf4) started on > 172.17.21.0:52024 > I1201 13:03:15.265121 30322 master.cpp:367] Flags at startup: --acls="" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/IaRntP/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="25secs" --registry_strict="true" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/mesos/mesos-0.27.0/_inst/share/mesos/webui" > --work_dir="/tmp/IaRntP/master" --zk_session_timeout="10secs" > I1201 13:03:15.265487 30322 master.cpp:412] Master only allowing > authenticated frameworks to register > I1201 13:03:15.265504 30322 master.cpp:417] Master only allowing > authenticated slaves to register > I1201 13:03:15.265513 30322 credentials.hpp:35] Loading credentials for > authentication from '/tmp/IaRntP/credentials' > I1201 13:03:15.265842 30322 master.cpp:456] Using default 'crammd5' > authenticator > I1201 13:03:15.266006 30322 master.cpp:493] Authorization enabled > I1201 13:03:15.266464 30308 hierarchical.cpp:162] Initialized hierarchical > allocator process > I1201 13:03:15.267225 30321 whitelist_watcher.cpp:77] No whitelist given > I1201 13:03:15.268847 30322 master.cpp:1637] The newly elected leader is > master@172.17.21.0:52024 with id dd5bff66-362f-4efc-963a-54756b2edcce > I1201 13:03:15.268887 30322 master.cpp:1650] Elected as the leading master! > I1201 13:03:15.268905 30322 master.cpp:1395] Recovering from registrar > I1201 13:03:15.270830 30322 registrar.cpp:307] Recovering registrar > I1201 13:03:15.291272 30318 leveldb.cpp:304] Persisting metadata (8 bytes) to > leveldb took 31.410668ms > I1201 13:03:15.291363 30318 replica.cpp:321] Persisted replica status to > STARTING > I1201 13:03:15.291733 30318 recover.cpp:473] Replica is in STARTING status > I1201 13:03:15.293392 30318 replica.cpp:674] Replica in STARTING status > received a broadcasted recover request from (1900)@172.17.21.0:52024 > I1201 13:03:15.294251 30307 recover.cpp:193] Received a recover response from > a replica in STARTING status > I1201 13:03:15.294756 30307 recover.cpp:564] Updating replica status to VOTING > I1201 13:03:15.338260 30307 leveldb.cpp:304] Persisting metadata (8 bytes) to > leveldb took 43.256127ms > I1201 13:03:15.338348 30307 replica.cpp:321] Persisted replica status to > VOTING > I1201 13:03:15.338601 30307 recover.cpp:578] Successfully joined the Paxos > group > I1201 13:03:15.338803 30307 recover.cpp:462] Recover process terminated > I1201 13:03:15.339624 30307 log.cpp:659] Attempting to start the writer > I1201 13:03:15.342
[jira] [Commented] (MESOS-4024) HealthCheckTest.CheckCommandTimeout is flaky.
[ https://issues.apache.org/jira/browse/MESOS-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15040436#comment-15040436 ] Timothy Chen commented on MESOS-4024: - Ah the test does take a long time to run since it's waiting for 5 seconds time out of the health check program to timeout 3 times :( Let me fix this to make it shorter. > HealthCheckTest.CheckCommandTimeout is flaky. > - > > Key: MESOS-4024 > URL: https://issues.apache.org/jira/browse/MESOS-4024 > Project: Mesos > Issue Type: Bug >Reporter: haosdent >Assignee: haosdent > Labels: flaky-test > Attachments: HealthCheckTest_CheckCommandTimeout.log > > > {noformat: title=Failed Run} > [ RUN ] HealthCheckTest.CheckCommandTimeout > I1201 13:03:15.211911 30288 leveldb.cpp:174] Opened db in 126.548747ms > I1201 13:03:15.254041 30288 leveldb.cpp:181] Compacted db in 42.053948ms > I1201 13:03:15.254226 30288 leveldb.cpp:196] Created db iterator in 25588ns > I1201 13:03:15.254281 30288 leveldb.cpp:202] Seeked to beginning of db in > 3231ns > I1201 13:03:15.254294 30288 leveldb.cpp:271] Iterated through 0 keys in the > db in 256ns > I1201 13:03:15.254348 30288 replica.cpp:778] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I1201 13:03:15.255162 30311 recover.cpp:447] Starting replica recovery > I1201 13:03:15.255502 30311 recover.cpp:473] Replica is in EMPTY status > I1201 13:03:15.257158 30311 replica.cpp:674] Replica in EMPTY status received > a broadcasted recover request from (1898)@172.17.21.0:52024 > I1201 13:03:15.258224 30318 recover.cpp:193] Received a recover response from > a replica in EMPTY status > I1201 13:03:15.259735 30310 recover.cpp:564] Updating replica status to > STARTING > I1201 13:03:15.265080 30322 master.cpp:365] Master > dd5bff66-362f-4efc-963a-54756b2edcce (fa812f474cf4) started on > 172.17.21.0:52024 > I1201 13:03:15.265121 30322 master.cpp:367] Flags at startup: --acls="" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/IaRntP/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="25secs" --registry_strict="true" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/mesos/mesos-0.27.0/_inst/share/mesos/webui" > --work_dir="/tmp/IaRntP/master" --zk_session_timeout="10secs" > I1201 13:03:15.265487 30322 master.cpp:412] Master only allowing > authenticated frameworks to register > I1201 13:03:15.265504 30322 master.cpp:417] Master only allowing > authenticated slaves to register > I1201 13:03:15.265513 30322 credentials.hpp:35] Loading credentials for > authentication from '/tmp/IaRntP/credentials' > I1201 13:03:15.265842 30322 master.cpp:456] Using default 'crammd5' > authenticator > I1201 13:03:15.266006 30322 master.cpp:493] Authorization enabled > I1201 13:03:15.266464 30308 hierarchical.cpp:162] Initialized hierarchical > allocator process > I1201 13:03:15.267225 30321 whitelist_watcher.cpp:77] No whitelist given > I1201 13:03:15.268847 30322 master.cpp:1637] The newly elected leader is > master@172.17.21.0:52024 with id dd5bff66-362f-4efc-963a-54756b2edcce > I1201 13:03:15.268887 30322 master.cpp:1650] Elected as the leading master! > I1201 13:03:15.268905 30322 master.cpp:1395] Recovering from registrar > I1201 13:03:15.270830 30322 registrar.cpp:307] Recovering registrar > I1201 13:03:15.291272 30318 leveldb.cpp:304] Persisting metadata (8 bytes) to > leveldb took 31.410668ms > I1201 13:03:15.291363 30318 replica.cpp:321] Persisted replica status to > STARTING > I1201 13:03:15.291733 30318 recover.cpp:473] Replica is in STARTING status > I1201 13:03:15.293392 30318 replica.cpp:674] Replica in STARTING status > received a broadcasted recover request from (1900)@172.17.21.0:52024 > I1201 13:03:15.294251 30307 recover.cpp:193] Received a recover response from > a replica in STARTING status > I1201 13:03:15.294756 30307 recover.cpp:564] Updating replica status to VOTING > I1201 13:03:15.338260 30307 leveldb.cpp:304] Persisting metadata (8 bytes) to > leveldb took 43.256127ms > I1201 13:03:15.338348 30307 replica.cpp:321] Persisted replica status to > VOTING > I1201 13:03:15.338601 30307 recover.cpp:578] Successfully joined the Paxos > group > I1201 13:03:15.338803 30307 recover.cpp:462] Recover proce
[jira] [Commented] (MESOS-4024) HealthCheckTest.CheckCommandTimeout is flaky.
[ https://issues.apache.org/jira/browse/MESOS-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039716#comment-15039716 ] haosdent commented on MESOS-4024: - Yes, the "HealthCheckTest_CheckCommandTimeout.log" in attachments is the plain text log I copy from jenkins at that time. > HealthCheckTest.CheckCommandTimeout is flaky. > - > > Key: MESOS-4024 > URL: https://issues.apache.org/jira/browse/MESOS-4024 > Project: Mesos > Issue Type: Bug >Reporter: haosdent >Assignee: haosdent > Labels: flaky-test > Attachments: HealthCheckTest_CheckCommandTimeout.log > > > {noformat: title=Failed Run} > [ RUN ] HealthCheckTest.CheckCommandTimeout > I1201 13:03:15.211911 30288 leveldb.cpp:174] Opened db in 126.548747ms > I1201 13:03:15.254041 30288 leveldb.cpp:181] Compacted db in 42.053948ms > I1201 13:03:15.254226 30288 leveldb.cpp:196] Created db iterator in 25588ns > I1201 13:03:15.254281 30288 leveldb.cpp:202] Seeked to beginning of db in > 3231ns > I1201 13:03:15.254294 30288 leveldb.cpp:271] Iterated through 0 keys in the > db in 256ns > I1201 13:03:15.254348 30288 replica.cpp:778] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I1201 13:03:15.255162 30311 recover.cpp:447] Starting replica recovery > I1201 13:03:15.255502 30311 recover.cpp:473] Replica is in EMPTY status > I1201 13:03:15.257158 30311 replica.cpp:674] Replica in EMPTY status received > a broadcasted recover request from (1898)@172.17.21.0:52024 > I1201 13:03:15.258224 30318 recover.cpp:193] Received a recover response from > a replica in EMPTY status > I1201 13:03:15.259735 30310 recover.cpp:564] Updating replica status to > STARTING > I1201 13:03:15.265080 30322 master.cpp:365] Master > dd5bff66-362f-4efc-963a-54756b2edcce (fa812f474cf4) started on > 172.17.21.0:52024 > I1201 13:03:15.265121 30322 master.cpp:367] Flags at startup: --acls="" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/IaRntP/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="25secs" --registry_strict="true" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/mesos/mesos-0.27.0/_inst/share/mesos/webui" > --work_dir="/tmp/IaRntP/master" --zk_session_timeout="10secs" > I1201 13:03:15.265487 30322 master.cpp:412] Master only allowing > authenticated frameworks to register > I1201 13:03:15.265504 30322 master.cpp:417] Master only allowing > authenticated slaves to register > I1201 13:03:15.265513 30322 credentials.hpp:35] Loading credentials for > authentication from '/tmp/IaRntP/credentials' > I1201 13:03:15.265842 30322 master.cpp:456] Using default 'crammd5' > authenticator > I1201 13:03:15.266006 30322 master.cpp:493] Authorization enabled > I1201 13:03:15.266464 30308 hierarchical.cpp:162] Initialized hierarchical > allocator process > I1201 13:03:15.267225 30321 whitelist_watcher.cpp:77] No whitelist given > I1201 13:03:15.268847 30322 master.cpp:1637] The newly elected leader is > master@172.17.21.0:52024 with id dd5bff66-362f-4efc-963a-54756b2edcce > I1201 13:03:15.268887 30322 master.cpp:1650] Elected as the leading master! > I1201 13:03:15.268905 30322 master.cpp:1395] Recovering from registrar > I1201 13:03:15.270830 30322 registrar.cpp:307] Recovering registrar > I1201 13:03:15.291272 30318 leveldb.cpp:304] Persisting metadata (8 bytes) to > leveldb took 31.410668ms > I1201 13:03:15.291363 30318 replica.cpp:321] Persisted replica status to > STARTING > I1201 13:03:15.291733 30318 recover.cpp:473] Replica is in STARTING status > I1201 13:03:15.293392 30318 replica.cpp:674] Replica in STARTING status > received a broadcasted recover request from (1900)@172.17.21.0:52024 > I1201 13:03:15.294251 30307 recover.cpp:193] Received a recover response from > a replica in STARTING status > I1201 13:03:15.294756 30307 recover.cpp:564] Updating replica status to VOTING > I1201 13:03:15.338260 30307 leveldb.cpp:304] Persisting metadata (8 bytes) to > leveldb took 43.256127ms > I1201 13:03:15.338348 30307 replica.cpp:321] Persisted replica status to > VOTING > I1201 13:03:15.338601 30307 recover.cpp:578] Successfully joined the Paxos > group > I1201 13:03:15.338803 30307 recover.cpp:462] Recover process terminated > I1201 13:03:15.339624 30307 log.cpp:659