[jira] [Commented] (MESOS-4024) HealthCheckTest.CheckCommandTimeout is flaky.

2015-12-10 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051350#comment-15051350
 ] 

haosdent commented on MESOS-4024:
-

Could I change 
{noformat}
  Try containerizer =
MesosContainerizer::create(flags, false, &fetcher);
{noformat}
to local in HealthCheckTest.CheckCommandTimeout?
{noformat}
  Try containerizer =
MesosContainerizer::create(flags, true, &fetcher);
{noformat}

So that it could print log to stdout.

> HealthCheckTest.CheckCommandTimeout is flaky.
> -
>
> Key: MESOS-4024
> URL: https://issues.apache.org/jira/browse/MESOS-4024
> Project: Mesos
>  Issue Type: Bug
>Reporter: haosdent
>Assignee: haosdent
>  Labels: flaky-test
> Attachments: HealthCheckTest_CheckCommandTimeout.log
>
>
> {noformat: title=Failed Run}
> [ RUN  ] HealthCheckTest.CheckCommandTimeout
> I1201 13:03:15.211911 30288 leveldb.cpp:174] Opened db in 126.548747ms
> I1201 13:03:15.254041 30288 leveldb.cpp:181] Compacted db in 42.053948ms
> I1201 13:03:15.254226 30288 leveldb.cpp:196] Created db iterator in 25588ns
> I1201 13:03:15.254281 30288 leveldb.cpp:202] Seeked to beginning of db in 
> 3231ns
> I1201 13:03:15.254294 30288 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 256ns
> I1201 13:03:15.254348 30288 replica.cpp:778] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1201 13:03:15.255162 30311 recover.cpp:447] Starting replica recovery
> I1201 13:03:15.255502 30311 recover.cpp:473] Replica is in EMPTY status
> I1201 13:03:15.257158 30311 replica.cpp:674] Replica in EMPTY status received 
> a broadcasted recover request from (1898)@172.17.21.0:52024
> I1201 13:03:15.258224 30318 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I1201 13:03:15.259735 30310 recover.cpp:564] Updating replica status to 
> STARTING
> I1201 13:03:15.265080 30322 master.cpp:365] Master 
> dd5bff66-362f-4efc-963a-54756b2edcce (fa812f474cf4) started on 
> 172.17.21.0:52024
> I1201 13:03:15.265121 30322 master.cpp:367] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/IaRntP/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.27.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/IaRntP/master" --zk_session_timeout="10secs"
> I1201 13:03:15.265487 30322 master.cpp:412] Master only allowing 
> authenticated frameworks to register
> I1201 13:03:15.265504 30322 master.cpp:417] Master only allowing 
> authenticated slaves to register
> I1201 13:03:15.265513 30322 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/IaRntP/credentials'
> I1201 13:03:15.265842 30322 master.cpp:456] Using default 'crammd5' 
> authenticator
> I1201 13:03:15.266006 30322 master.cpp:493] Authorization enabled
> I1201 13:03:15.266464 30308 hierarchical.cpp:162] Initialized hierarchical 
> allocator process
> I1201 13:03:15.267225 30321 whitelist_watcher.cpp:77] No whitelist given
> I1201 13:03:15.268847 30322 master.cpp:1637] The newly elected leader is 
> master@172.17.21.0:52024 with id dd5bff66-362f-4efc-963a-54756b2edcce
> I1201 13:03:15.268887 30322 master.cpp:1650] Elected as the leading master!
> I1201 13:03:15.268905 30322 master.cpp:1395] Recovering from registrar
> I1201 13:03:15.270830 30322 registrar.cpp:307] Recovering registrar
> I1201 13:03:15.291272 30318 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 31.410668ms
> I1201 13:03:15.291363 30318 replica.cpp:321] Persisted replica status to 
> STARTING
> I1201 13:03:15.291733 30318 recover.cpp:473] Replica is in STARTING status
> I1201 13:03:15.293392 30318 replica.cpp:674] Replica in STARTING status 
> received a broadcasted recover request from (1900)@172.17.21.0:52024
> I1201 13:03:15.294251 30307 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I1201 13:03:15.294756 30307 recover.cpp:564] Updating replica status to VOTING
> I1201 13:03:15.338260 30307 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 43.256127ms
> I1201 13:03:15.338348 30307 replica.cpp:321] Persisted replica status to 
> VOTING
> I1201 13:03:15.3386

[jira] [Commented] (MESOS-4024) HealthCheckTest.CheckCommandTimeout is flaky.

2015-12-10 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051302#comment-15051302
 ] 

haosdent commented on MESOS-4024:
-

still not idea. And health check log are located in sandbox and not display in 
jenkins test log. But we could change consecutiveFailures to 2 and 
timeoutSeconds to 2 to reduce test time first.

> HealthCheckTest.CheckCommandTimeout is flaky.
> -
>
> Key: MESOS-4024
> URL: https://issues.apache.org/jira/browse/MESOS-4024
> Project: Mesos
>  Issue Type: Bug
>Reporter: haosdent
>Assignee: haosdent
>  Labels: flaky-test
> Attachments: HealthCheckTest_CheckCommandTimeout.log
>
>
> {noformat: title=Failed Run}
> [ RUN  ] HealthCheckTest.CheckCommandTimeout
> I1201 13:03:15.211911 30288 leveldb.cpp:174] Opened db in 126.548747ms
> I1201 13:03:15.254041 30288 leveldb.cpp:181] Compacted db in 42.053948ms
> I1201 13:03:15.254226 30288 leveldb.cpp:196] Created db iterator in 25588ns
> I1201 13:03:15.254281 30288 leveldb.cpp:202] Seeked to beginning of db in 
> 3231ns
> I1201 13:03:15.254294 30288 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 256ns
> I1201 13:03:15.254348 30288 replica.cpp:778] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1201 13:03:15.255162 30311 recover.cpp:447] Starting replica recovery
> I1201 13:03:15.255502 30311 recover.cpp:473] Replica is in EMPTY status
> I1201 13:03:15.257158 30311 replica.cpp:674] Replica in EMPTY status received 
> a broadcasted recover request from (1898)@172.17.21.0:52024
> I1201 13:03:15.258224 30318 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I1201 13:03:15.259735 30310 recover.cpp:564] Updating replica status to 
> STARTING
> I1201 13:03:15.265080 30322 master.cpp:365] Master 
> dd5bff66-362f-4efc-963a-54756b2edcce (fa812f474cf4) started on 
> 172.17.21.0:52024
> I1201 13:03:15.265121 30322 master.cpp:367] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/IaRntP/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.27.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/IaRntP/master" --zk_session_timeout="10secs"
> I1201 13:03:15.265487 30322 master.cpp:412] Master only allowing 
> authenticated frameworks to register
> I1201 13:03:15.265504 30322 master.cpp:417] Master only allowing 
> authenticated slaves to register
> I1201 13:03:15.265513 30322 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/IaRntP/credentials'
> I1201 13:03:15.265842 30322 master.cpp:456] Using default 'crammd5' 
> authenticator
> I1201 13:03:15.266006 30322 master.cpp:493] Authorization enabled
> I1201 13:03:15.266464 30308 hierarchical.cpp:162] Initialized hierarchical 
> allocator process
> I1201 13:03:15.267225 30321 whitelist_watcher.cpp:77] No whitelist given
> I1201 13:03:15.268847 30322 master.cpp:1637] The newly elected leader is 
> master@172.17.21.0:52024 with id dd5bff66-362f-4efc-963a-54756b2edcce
> I1201 13:03:15.268887 30322 master.cpp:1650] Elected as the leading master!
> I1201 13:03:15.268905 30322 master.cpp:1395] Recovering from registrar
> I1201 13:03:15.270830 30322 registrar.cpp:307] Recovering registrar
> I1201 13:03:15.291272 30318 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 31.410668ms
> I1201 13:03:15.291363 30318 replica.cpp:321] Persisted replica status to 
> STARTING
> I1201 13:03:15.291733 30318 recover.cpp:473] Replica is in STARTING status
> I1201 13:03:15.293392 30318 replica.cpp:674] Replica in STARTING status 
> received a broadcasted recover request from (1900)@172.17.21.0:52024
> I1201 13:03:15.294251 30307 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I1201 13:03:15.294756 30307 recover.cpp:564] Updating replica status to VOTING
> I1201 13:03:15.338260 30307 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 43.256127ms
> I1201 13:03:15.338348 30307 replica.cpp:321] Persisted replica status to 
> VOTING
> I1201 13:03:15.338601 30307 recover.cpp:578] Successfully joined the Paxos 
> group
> I1201 13:03:15.338803 30307 recover.cpp:4

[jira] [Commented] (MESOS-4024) HealthCheckTest.CheckCommandTimeout is flaky.

2015-12-03 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15040774#comment-15040774
 ] 

haosdent commented on MESOS-4024:
-

Sorry for the dalay, I would try to investigate this at this weekend.

> HealthCheckTest.CheckCommandTimeout is flaky.
> -
>
> Key: MESOS-4024
> URL: https://issues.apache.org/jira/browse/MESOS-4024
> Project: Mesos
>  Issue Type: Bug
>Reporter: haosdent
>Assignee: haosdent
>  Labels: flaky-test
> Attachments: HealthCheckTest_CheckCommandTimeout.log
>
>
> {noformat: title=Failed Run}
> [ RUN  ] HealthCheckTest.CheckCommandTimeout
> I1201 13:03:15.211911 30288 leveldb.cpp:174] Opened db in 126.548747ms
> I1201 13:03:15.254041 30288 leveldb.cpp:181] Compacted db in 42.053948ms
> I1201 13:03:15.254226 30288 leveldb.cpp:196] Created db iterator in 25588ns
> I1201 13:03:15.254281 30288 leveldb.cpp:202] Seeked to beginning of db in 
> 3231ns
> I1201 13:03:15.254294 30288 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 256ns
> I1201 13:03:15.254348 30288 replica.cpp:778] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1201 13:03:15.255162 30311 recover.cpp:447] Starting replica recovery
> I1201 13:03:15.255502 30311 recover.cpp:473] Replica is in EMPTY status
> I1201 13:03:15.257158 30311 replica.cpp:674] Replica in EMPTY status received 
> a broadcasted recover request from (1898)@172.17.21.0:52024
> I1201 13:03:15.258224 30318 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I1201 13:03:15.259735 30310 recover.cpp:564] Updating replica status to 
> STARTING
> I1201 13:03:15.265080 30322 master.cpp:365] Master 
> dd5bff66-362f-4efc-963a-54756b2edcce (fa812f474cf4) started on 
> 172.17.21.0:52024
> I1201 13:03:15.265121 30322 master.cpp:367] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/IaRntP/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.27.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/IaRntP/master" --zk_session_timeout="10secs"
> I1201 13:03:15.265487 30322 master.cpp:412] Master only allowing 
> authenticated frameworks to register
> I1201 13:03:15.265504 30322 master.cpp:417] Master only allowing 
> authenticated slaves to register
> I1201 13:03:15.265513 30322 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/IaRntP/credentials'
> I1201 13:03:15.265842 30322 master.cpp:456] Using default 'crammd5' 
> authenticator
> I1201 13:03:15.266006 30322 master.cpp:493] Authorization enabled
> I1201 13:03:15.266464 30308 hierarchical.cpp:162] Initialized hierarchical 
> allocator process
> I1201 13:03:15.267225 30321 whitelist_watcher.cpp:77] No whitelist given
> I1201 13:03:15.268847 30322 master.cpp:1637] The newly elected leader is 
> master@172.17.21.0:52024 with id dd5bff66-362f-4efc-963a-54756b2edcce
> I1201 13:03:15.268887 30322 master.cpp:1650] Elected as the leading master!
> I1201 13:03:15.268905 30322 master.cpp:1395] Recovering from registrar
> I1201 13:03:15.270830 30322 registrar.cpp:307] Recovering registrar
> I1201 13:03:15.291272 30318 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 31.410668ms
> I1201 13:03:15.291363 30318 replica.cpp:321] Persisted replica status to 
> STARTING
> I1201 13:03:15.291733 30318 recover.cpp:473] Replica is in STARTING status
> I1201 13:03:15.293392 30318 replica.cpp:674] Replica in STARTING status 
> received a broadcasted recover request from (1900)@172.17.21.0:52024
> I1201 13:03:15.294251 30307 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I1201 13:03:15.294756 30307 recover.cpp:564] Updating replica status to VOTING
> I1201 13:03:15.338260 30307 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 43.256127ms
> I1201 13:03:15.338348 30307 replica.cpp:321] Persisted replica status to 
> VOTING
> I1201 13:03:15.338601 30307 recover.cpp:578] Successfully joined the Paxos 
> group
> I1201 13:03:15.338803 30307 recover.cpp:462] Recover process terminated
> I1201 13:03:15.339624 30307 log.cpp:659] Attempting to start the writer
> I1201 13:03:15.342

[jira] [Commented] (MESOS-4024) HealthCheckTest.CheckCommandTimeout is flaky.

2015-12-03 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15040775#comment-15040775
 ] 

haosdent commented on MESOS-4024:
-

Sorry for the dalay, I would try to investigate this at this weekend.

> HealthCheckTest.CheckCommandTimeout is flaky.
> -
>
> Key: MESOS-4024
> URL: https://issues.apache.org/jira/browse/MESOS-4024
> Project: Mesos
>  Issue Type: Bug
>Reporter: haosdent
>Assignee: haosdent
>  Labels: flaky-test
> Attachments: HealthCheckTest_CheckCommandTimeout.log
>
>
> {noformat: title=Failed Run}
> [ RUN  ] HealthCheckTest.CheckCommandTimeout
> I1201 13:03:15.211911 30288 leveldb.cpp:174] Opened db in 126.548747ms
> I1201 13:03:15.254041 30288 leveldb.cpp:181] Compacted db in 42.053948ms
> I1201 13:03:15.254226 30288 leveldb.cpp:196] Created db iterator in 25588ns
> I1201 13:03:15.254281 30288 leveldb.cpp:202] Seeked to beginning of db in 
> 3231ns
> I1201 13:03:15.254294 30288 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 256ns
> I1201 13:03:15.254348 30288 replica.cpp:778] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1201 13:03:15.255162 30311 recover.cpp:447] Starting replica recovery
> I1201 13:03:15.255502 30311 recover.cpp:473] Replica is in EMPTY status
> I1201 13:03:15.257158 30311 replica.cpp:674] Replica in EMPTY status received 
> a broadcasted recover request from (1898)@172.17.21.0:52024
> I1201 13:03:15.258224 30318 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I1201 13:03:15.259735 30310 recover.cpp:564] Updating replica status to 
> STARTING
> I1201 13:03:15.265080 30322 master.cpp:365] Master 
> dd5bff66-362f-4efc-963a-54756b2edcce (fa812f474cf4) started on 
> 172.17.21.0:52024
> I1201 13:03:15.265121 30322 master.cpp:367] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/IaRntP/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.27.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/IaRntP/master" --zk_session_timeout="10secs"
> I1201 13:03:15.265487 30322 master.cpp:412] Master only allowing 
> authenticated frameworks to register
> I1201 13:03:15.265504 30322 master.cpp:417] Master only allowing 
> authenticated slaves to register
> I1201 13:03:15.265513 30322 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/IaRntP/credentials'
> I1201 13:03:15.265842 30322 master.cpp:456] Using default 'crammd5' 
> authenticator
> I1201 13:03:15.266006 30322 master.cpp:493] Authorization enabled
> I1201 13:03:15.266464 30308 hierarchical.cpp:162] Initialized hierarchical 
> allocator process
> I1201 13:03:15.267225 30321 whitelist_watcher.cpp:77] No whitelist given
> I1201 13:03:15.268847 30322 master.cpp:1637] The newly elected leader is 
> master@172.17.21.0:52024 with id dd5bff66-362f-4efc-963a-54756b2edcce
> I1201 13:03:15.268887 30322 master.cpp:1650] Elected as the leading master!
> I1201 13:03:15.268905 30322 master.cpp:1395] Recovering from registrar
> I1201 13:03:15.270830 30322 registrar.cpp:307] Recovering registrar
> I1201 13:03:15.291272 30318 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 31.410668ms
> I1201 13:03:15.291363 30318 replica.cpp:321] Persisted replica status to 
> STARTING
> I1201 13:03:15.291733 30318 recover.cpp:473] Replica is in STARTING status
> I1201 13:03:15.293392 30318 replica.cpp:674] Replica in STARTING status 
> received a broadcasted recover request from (1900)@172.17.21.0:52024
> I1201 13:03:15.294251 30307 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I1201 13:03:15.294756 30307 recover.cpp:564] Updating replica status to VOTING
> I1201 13:03:15.338260 30307 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 43.256127ms
> I1201 13:03:15.338348 30307 replica.cpp:321] Persisted replica status to 
> VOTING
> I1201 13:03:15.338601 30307 recover.cpp:578] Successfully joined the Paxos 
> group
> I1201 13:03:15.338803 30307 recover.cpp:462] Recover process terminated
> I1201 13:03:15.339624 30307 log.cpp:659] Attempting to start the writer
> I1201 13:03:15.342

[jira] [Commented] (MESOS-4024) HealthCheckTest.CheckCommandTimeout is flaky.

2015-12-03 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15040436#comment-15040436
 ] 

Timothy Chen commented on MESOS-4024:
-

Ah the test does take a long time to run since it's waiting for 5 seconds time 
out of the health check program to timeout 3 times :(
Let me fix this to make it shorter.

> HealthCheckTest.CheckCommandTimeout is flaky.
> -
>
> Key: MESOS-4024
> URL: https://issues.apache.org/jira/browse/MESOS-4024
> Project: Mesos
>  Issue Type: Bug
>Reporter: haosdent
>Assignee: haosdent
>  Labels: flaky-test
> Attachments: HealthCheckTest_CheckCommandTimeout.log
>
>
> {noformat: title=Failed Run}
> [ RUN  ] HealthCheckTest.CheckCommandTimeout
> I1201 13:03:15.211911 30288 leveldb.cpp:174] Opened db in 126.548747ms
> I1201 13:03:15.254041 30288 leveldb.cpp:181] Compacted db in 42.053948ms
> I1201 13:03:15.254226 30288 leveldb.cpp:196] Created db iterator in 25588ns
> I1201 13:03:15.254281 30288 leveldb.cpp:202] Seeked to beginning of db in 
> 3231ns
> I1201 13:03:15.254294 30288 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 256ns
> I1201 13:03:15.254348 30288 replica.cpp:778] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1201 13:03:15.255162 30311 recover.cpp:447] Starting replica recovery
> I1201 13:03:15.255502 30311 recover.cpp:473] Replica is in EMPTY status
> I1201 13:03:15.257158 30311 replica.cpp:674] Replica in EMPTY status received 
> a broadcasted recover request from (1898)@172.17.21.0:52024
> I1201 13:03:15.258224 30318 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I1201 13:03:15.259735 30310 recover.cpp:564] Updating replica status to 
> STARTING
> I1201 13:03:15.265080 30322 master.cpp:365] Master 
> dd5bff66-362f-4efc-963a-54756b2edcce (fa812f474cf4) started on 
> 172.17.21.0:52024
> I1201 13:03:15.265121 30322 master.cpp:367] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/IaRntP/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.27.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/IaRntP/master" --zk_session_timeout="10secs"
> I1201 13:03:15.265487 30322 master.cpp:412] Master only allowing 
> authenticated frameworks to register
> I1201 13:03:15.265504 30322 master.cpp:417] Master only allowing 
> authenticated slaves to register
> I1201 13:03:15.265513 30322 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/IaRntP/credentials'
> I1201 13:03:15.265842 30322 master.cpp:456] Using default 'crammd5' 
> authenticator
> I1201 13:03:15.266006 30322 master.cpp:493] Authorization enabled
> I1201 13:03:15.266464 30308 hierarchical.cpp:162] Initialized hierarchical 
> allocator process
> I1201 13:03:15.267225 30321 whitelist_watcher.cpp:77] No whitelist given
> I1201 13:03:15.268847 30322 master.cpp:1637] The newly elected leader is 
> master@172.17.21.0:52024 with id dd5bff66-362f-4efc-963a-54756b2edcce
> I1201 13:03:15.268887 30322 master.cpp:1650] Elected as the leading master!
> I1201 13:03:15.268905 30322 master.cpp:1395] Recovering from registrar
> I1201 13:03:15.270830 30322 registrar.cpp:307] Recovering registrar
> I1201 13:03:15.291272 30318 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 31.410668ms
> I1201 13:03:15.291363 30318 replica.cpp:321] Persisted replica status to 
> STARTING
> I1201 13:03:15.291733 30318 recover.cpp:473] Replica is in STARTING status
> I1201 13:03:15.293392 30318 replica.cpp:674] Replica in STARTING status 
> received a broadcasted recover request from (1900)@172.17.21.0:52024
> I1201 13:03:15.294251 30307 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I1201 13:03:15.294756 30307 recover.cpp:564] Updating replica status to VOTING
> I1201 13:03:15.338260 30307 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 43.256127ms
> I1201 13:03:15.338348 30307 replica.cpp:321] Persisted replica status to 
> VOTING
> I1201 13:03:15.338601 30307 recover.cpp:578] Successfully joined the Paxos 
> group
> I1201 13:03:15.338803 30307 recover.cpp:462] Recover proce

[jira] [Commented] (MESOS-4024) HealthCheckTest.CheckCommandTimeout is flaky.

2015-12-03 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039716#comment-15039716
 ] 

haosdent commented on MESOS-4024:
-

Yes, the "HealthCheckTest_CheckCommandTimeout.log" in attachments is the plain 
text log I copy from jenkins at that time.

> HealthCheckTest.CheckCommandTimeout is flaky.
> -
>
> Key: MESOS-4024
> URL: https://issues.apache.org/jira/browse/MESOS-4024
> Project: Mesos
>  Issue Type: Bug
>Reporter: haosdent
>Assignee: haosdent
>  Labels: flaky-test
> Attachments: HealthCheckTest_CheckCommandTimeout.log
>
>
> {noformat: title=Failed Run}
> [ RUN  ] HealthCheckTest.CheckCommandTimeout
> I1201 13:03:15.211911 30288 leveldb.cpp:174] Opened db in 126.548747ms
> I1201 13:03:15.254041 30288 leveldb.cpp:181] Compacted db in 42.053948ms
> I1201 13:03:15.254226 30288 leveldb.cpp:196] Created db iterator in 25588ns
> I1201 13:03:15.254281 30288 leveldb.cpp:202] Seeked to beginning of db in 
> 3231ns
> I1201 13:03:15.254294 30288 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 256ns
> I1201 13:03:15.254348 30288 replica.cpp:778] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1201 13:03:15.255162 30311 recover.cpp:447] Starting replica recovery
> I1201 13:03:15.255502 30311 recover.cpp:473] Replica is in EMPTY status
> I1201 13:03:15.257158 30311 replica.cpp:674] Replica in EMPTY status received 
> a broadcasted recover request from (1898)@172.17.21.0:52024
> I1201 13:03:15.258224 30318 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I1201 13:03:15.259735 30310 recover.cpp:564] Updating replica status to 
> STARTING
> I1201 13:03:15.265080 30322 master.cpp:365] Master 
> dd5bff66-362f-4efc-963a-54756b2edcce (fa812f474cf4) started on 
> 172.17.21.0:52024
> I1201 13:03:15.265121 30322 master.cpp:367] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/IaRntP/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.27.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/IaRntP/master" --zk_session_timeout="10secs"
> I1201 13:03:15.265487 30322 master.cpp:412] Master only allowing 
> authenticated frameworks to register
> I1201 13:03:15.265504 30322 master.cpp:417] Master only allowing 
> authenticated slaves to register
> I1201 13:03:15.265513 30322 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/IaRntP/credentials'
> I1201 13:03:15.265842 30322 master.cpp:456] Using default 'crammd5' 
> authenticator
> I1201 13:03:15.266006 30322 master.cpp:493] Authorization enabled
> I1201 13:03:15.266464 30308 hierarchical.cpp:162] Initialized hierarchical 
> allocator process
> I1201 13:03:15.267225 30321 whitelist_watcher.cpp:77] No whitelist given
> I1201 13:03:15.268847 30322 master.cpp:1637] The newly elected leader is 
> master@172.17.21.0:52024 with id dd5bff66-362f-4efc-963a-54756b2edcce
> I1201 13:03:15.268887 30322 master.cpp:1650] Elected as the leading master!
> I1201 13:03:15.268905 30322 master.cpp:1395] Recovering from registrar
> I1201 13:03:15.270830 30322 registrar.cpp:307] Recovering registrar
> I1201 13:03:15.291272 30318 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 31.410668ms
> I1201 13:03:15.291363 30318 replica.cpp:321] Persisted replica status to 
> STARTING
> I1201 13:03:15.291733 30318 recover.cpp:473] Replica is in STARTING status
> I1201 13:03:15.293392 30318 replica.cpp:674] Replica in STARTING status 
> received a broadcasted recover request from (1900)@172.17.21.0:52024
> I1201 13:03:15.294251 30307 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I1201 13:03:15.294756 30307 recover.cpp:564] Updating replica status to VOTING
> I1201 13:03:15.338260 30307 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 43.256127ms
> I1201 13:03:15.338348 30307 replica.cpp:321] Persisted replica status to 
> VOTING
> I1201 13:03:15.338601 30307 recover.cpp:578] Successfully joined the Paxos 
> group
> I1201 13:03:15.338803 30307 recover.cpp:462] Recover process terminated
> I1201 13:03:15.339624 30307 log.cpp:659