[jira] [Commented] (MESOS-1802) HealthCheckTest.HealthStatusChange is flaky on jenkins.

2016-11-24 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15694334#comment-15694334
 ] 

Alexander Rukletsov commented on MESOS-1802:


I can reproduce it relatively easy by running _parallel_ {{make check}}. Here 
is a fresh log:
{noformat}
[ RUN  ] HealthCheckTest.HealthStatusChange
I1124 23:20:48.351884 4284416 exec.cpp:162] Version: 1.2.0
I1124 23:20:48.375592 3747840 exec.cpp:237] Executor registered on agent 
6db7ef4d-7211-47be-98ba-ad590b528c69-S0
Received SUBSCRIBED event
Subscribed executor on alexr.speedportneo09012801000249
Received LAUNCH event
Starting task 1
/Users/alex/Projects/mesos/build/parallel/src/mesos-containerizer launch 
--command="{"shell":true,"value":"sleep 120"}" --help="false"
Forked command at 73286
Received task health update, healthy: true
rm: /private/tmp/z1PbfH/rG7Gha: No such file or directory
W1124 23:20:48.544631 3211264 health_checker.cpp:245] Health check failed 1 
times consecutively: COMMAND health check failed: Command returned exited with 
status 1
Received task health update, healthy: false
Received task health update, healthy: true
rm: /private/tmp/z1PbfH/rG7Gha: No such file or directory
../../../src/tests/health_check_tests.cpp:790: Failure
Value of: (find).get()
  Actual: 16-byte object <05-00 00-00 00-00 00-00 60-A9 62-1B B2-7F 00-00>
Expected: false
Which is: false
I1124 23:20:48.732457 4284416 exec.cpp:414] Executor asked to shutdown
Received SHUTDOWN event
Shutting down
Sending SIGTERM to process tree at pid 73286
W1124 23:20:48.747885 1064960 health_checker.cpp:245] Health check failed 1 
times consecutively: COMMAND health check failed: Command returned exited with 
status 1
rm: /private/tmp/z1PbfH/rG7Gha: No such file or directory
W1124 23:20:48.948562 3747840 health_checker.cpp:245] Health check failed 1 
times consecutively: COMMAND health check failed: Command returned exited with 
status 1
Sent SIGTERM to the following process trees:
[ 
--- 73286 sleep 120
]
Scheduling escalation to SIGKILL in 3secs from now
[  FAILED  ] HealthCheckTest.HealthStatusChange (1639 ms)
{noformat}

These lines
{noformat}
Received task health update, healthy: true
rm: /private/tmp/z1PbfH/rG7Gha: No such file or directory
../../../src/tests/health_check_tests.cpp:790: Failure
{noformat}
obviously hint that we've queried the HTTP endpoint _after_ the next health 
status change.

> HealthCheckTest.HealthStatusChange is flaky on jenkins.
> ---
>
> Key: MESOS-1802
> URL: https://issues.apache.org/jira/browse/MESOS-1802
> Project: Mesos
>  Issue Type: Bug
>  Components: test, tests
>Affects Versions: 0.26.0
>Reporter: Benjamin Mahler
>Assignee: haosdent
>  Labels: flaky, health-check, mesosphere
> Attachments: health_check_flaky_test_log.txt
>
>
> https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2374/consoleFull
> {noformat}
> [ RUN  ] HealthCheckTest.HealthStatusChange
> Using temporary directory '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2'
> I0916 22:56:14.034612 21026 leveldb.cpp:176] Opened db in 2.155713ms
> I0916 22:56:14.034965 21026 leveldb.cpp:183] Compacted db in 332489ns
> I0916 22:56:14.034984 21026 leveldb.cpp:198] Created db iterator in 3710ns
> I0916 22:56:14.034996 21026 leveldb.cpp:204] Seeked to beginning of db in 
> 642ns
> I0916 22:56:14.035006 21026 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 343ns
> I0916 22:56:14.035023 21026 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0916 22:56:14.035200 21054 recover.cpp:425] Starting replica recovery
> I0916 22:56:14.035403 21041 recover.cpp:451] Replica is in EMPTY status
> I0916 22:56:14.035888 21045 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I0916 22:56:14.035969 21052 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I0916 22:56:14.036118 21042 recover.cpp:542] Updating replica status to 
> STARTING
> I0916 22:56:14.036603 21046 master.cpp:286] Master 
> 20140916-225614-3125920579-47865-21026 (penates.apache.org) started on 
> 67.195.81.186:47865
> I0916 22:56:14.036634 21046 master.cpp:332] Master only allowing 
> authenticated frameworks to register
> I0916 22:56:14.036648 21046 master.cpp:337] Master only allowing 
> authenticated slaves to register
> I0916 22:56:14.036659 21046 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2/credentials'
> I0916 22:56:14.036686 21045 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 480322ns
> I0916 22:56:14.036700 21045 replica.cpp:320] Persisted replica status to 
> STARTING
> I0916 22:56:14.036769 21046 

[jira] [Commented] (MESOS-1802) HealthCheckTest.HealthStatusChange is flaky on jenkins.

2016-04-26 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258435#comment-15258435
 ] 

haosdent commented on MESOS-1802:
-

I prefer to drop the http status check in the test cases, because I still don't 
have any clues to guarantee the http queries order match the statusUpdate 
order. [~neilc] [~bmahler] [~tnachen] Do you think this proposal is doable?

> HealthCheckTest.HealthStatusChange is flaky on jenkins.
> ---
>
> Key: MESOS-1802
> URL: https://issues.apache.org/jira/browse/MESOS-1802
> Project: Mesos
>  Issue Type: Bug
>  Components: test, tests
>Affects Versions: 0.26.0
>Reporter: Benjamin Mahler
>Assignee: haosdent
>  Labels: flaky, health-check, mesosphere
> Attachments: health_check_flaky_test_log.txt
>
>
> https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2374/consoleFull
> {noformat}
> [ RUN  ] HealthCheckTest.HealthStatusChange
> Using temporary directory '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2'
> I0916 22:56:14.034612 21026 leveldb.cpp:176] Opened db in 2.155713ms
> I0916 22:56:14.034965 21026 leveldb.cpp:183] Compacted db in 332489ns
> I0916 22:56:14.034984 21026 leveldb.cpp:198] Created db iterator in 3710ns
> I0916 22:56:14.034996 21026 leveldb.cpp:204] Seeked to beginning of db in 
> 642ns
> I0916 22:56:14.035006 21026 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 343ns
> I0916 22:56:14.035023 21026 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0916 22:56:14.035200 21054 recover.cpp:425] Starting replica recovery
> I0916 22:56:14.035403 21041 recover.cpp:451] Replica is in EMPTY status
> I0916 22:56:14.035888 21045 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I0916 22:56:14.035969 21052 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I0916 22:56:14.036118 21042 recover.cpp:542] Updating replica status to 
> STARTING
> I0916 22:56:14.036603 21046 master.cpp:286] Master 
> 20140916-225614-3125920579-47865-21026 (penates.apache.org) started on 
> 67.195.81.186:47865
> I0916 22:56:14.036634 21046 master.cpp:332] Master only allowing 
> authenticated frameworks to register
> I0916 22:56:14.036648 21046 master.cpp:337] Master only allowing 
> authenticated slaves to register
> I0916 22:56:14.036659 21046 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2/credentials'
> I0916 22:56:14.036686 21045 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 480322ns
> I0916 22:56:14.036700 21045 replica.cpp:320] Persisted replica status to 
> STARTING
> I0916 22:56:14.036769 21046 master.cpp:366] Authorization enabled
> I0916 22:56:14.036826 21045 recover.cpp:451] Replica is in STARTING status
> I0916 22:56:14.036944 21052 master.cpp:120] No whitelist given. Advertising 
> offers for all slaves
> I0916 22:56:14.036968 21049 hierarchical_allocator_process.hpp:299] 
> Initializing hierarchical allocator process with master : 
> master@67.195.81.186:47865
> I0916 22:56:14.037284 21054 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I0916 22:56:14.037312 21046 master.cpp:1212] The newly elected leader is 
> master@67.195.81.186:47865 with id 20140916-225614-3125920579-47865-21026
> I0916 22:56:14.037333 21046 master.cpp:1225] Elected as the leading master!
> I0916 22:56:14.037345 21046 master.cpp:1043] Recovering from registrar
> I0916 22:56:14.037504 21040 registrar.cpp:313] Recovering registrar
> I0916 22:56:14.037505 21053 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I0916 22:56:14.037681 21047 recover.cpp:542] Updating replica status to VOTING
> I0916 22:56:14.038072 21052 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 330251ns
> I0916 22:56:14.038087 21052 replica.cpp:320] Persisted replica status to 
> VOTING
> I0916 22:56:14.038127 21053 recover.cpp:556] Successfully joined the Paxos 
> group
> I0916 22:56:14.038202 21053 recover.cpp:440] Recover process terminated
> I0916 22:56:14.038364 21048 log.cpp:656] Attempting to start the writer
> I0916 22:56:14.038812 21053 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I0916 22:56:14.038925 21053 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 92623ns
> I0916 22:56:14.038944 21053 replica.cpp:342] Persisted promised to 1
> I0916 22:56:14.039201 21052 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0916 22:56:14.039676 21047 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I0916 22:56:14.039836 21047 

[jira] [Commented] (MESOS-1802) HealthCheckTest.HealthStatusChange is flaky on jenkins.

2016-04-19 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248222#comment-15248222
 ] 

haosdent commented on MESOS-1802:
-

According to another flaky log provided by [~neilc] (Thanks a lot!)
{code}
[ RUN  ] HealthCheckTest.HealthStatusChange
I0420 00:33:12.691145 15381 cluster.cpp:149] Creating default 'local' authorizer
I0420 00:33:12.691897 15381 leveldb.cpp:174] Opened db in 620608ns
I0420 00:33:12.692211 15381 leveldb.cpp:181] Compacted db in 283038ns
I0420 00:33:12.692265 15381 leveldb.cpp:196] Created db iterator in 15184ns
I0420 00:33:12.692273 15381 leveldb.cpp:202] Seeked to beginning of db in 2161ns
I0420 00:33:12.692281 15381 leveldb.cpp:271] Iterated through 0 keys in the db 
in 1084ns
I0420 00:33:12.692312 15381 replica.cpp:779] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0420 00:33:12.693172 15398 recover.cpp:447] Starting replica recovery
I0420 00:33:12.693435 15400 recover.cpp:473] Replica is in EMPTY status
I0420 00:33:12.695106 15400 replica.cpp:673] Replica in EMPTY status received a 
broadcasted recover request from (3755)@10.0.2.15:41408
I0420 00:33:12.695829 15400 recover.cpp:193] Received a recover response from a 
replica in EMPTY status
I0420 00:33:12.696321 15398 recover.cpp:564] Updating replica status to STARTING
I0420 00:33:12.696807 15402 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 101090ns
I0420 00:33:12.696836 15402 replica.cpp:320] Persisted replica status to 
STARTING
I0420 00:33:12.696964 15402 recover.cpp:473] Replica is in STARTING status
I0420 00:33:12.697968 15400 replica.cpp:673] Replica in STARTING status 
received a broadcasted recover request from (3756)@10.0.2.15:41408
I0420 00:33:12.698274 15400 recover.cpp:193] Received a recover response from a 
replica in STARTING status
I0420 00:33:12.698691 15400 recover.cpp:564] Updating replica status to VOTING
I0420 00:33:12.698882 15400 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 62549ns
I0420 00:33:12.698938 15400 replica.cpp:320] Persisted replica status to VOTING
I0420 00:33:12.699002 15400 recover.cpp:578] Successfully joined the Paxos group
I0420 00:33:12.699128 15400 recover.cpp:462] Recover process terminated
I0420 00:33:12.721339 15401 master.cpp:382] Master 
7cf5923c-3d03-4ed6-826a-efa97f54e765 (archlinux.vagrant.vm) started on 
10.0.2.15:41408
I0420 00:33:12.721387 15401 master.cpp:384] Flags at startup: --acls="" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="true" --authenticate_http="true" 
--authenticate_http_frameworks="true" --authenticate_slaves="true" 
--authenticators="crammd5" --authorizers="local" 
--credentials="/tmp/9zoT36/credentials" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--http_framework_authenticators="basic" --initialize_driver_logging="true" 
--log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--max_slave_ping_timeouts="5" --quiet="false" 
--recovery_slave_removal_limit="100%" --registry="replicated_log" 
--registry_fetch_timeout="1mins" --registry_store_timeout="100secs" 
--registry_strict="true" --root_submissions="true" 
--slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
--user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/9zoT36/master" 
--zk_session_timeout="10secs"
I0420 00:33:12.721667 15401 master.cpp:433] Master only allowing authenticated 
frameworks to register
I0420 00:33:12.721736 15401 master.cpp:439] Master only allowing authenticated 
agents to register
I0420 00:33:12.721745 15401 master.cpp:445] Master only allowing authenticated 
HTTP frameworks to register
I0420 00:33:12.721750 15401 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/9zoT36/credentials'
I0420 00:33:12.722005 15401 master.cpp:489] Using default 'crammd5' 
authenticator
I0420 00:33:12.723206 15401 master.cpp:560] Using default 'basic' HTTP 
authenticator
I0420 00:33:12.723458 15401 master.cpp:640] Using default 'basic' HTTP 
framework authenticator
I0420 00:33:12.723574 15401 master.cpp:687] Authorization enabled
I0420 00:33:12.725332 15399 master.cpp:1932] The newly elected leader is 
master@10.0.2.15:41408 with id 7cf5923c-3d03-4ed6-826a-efa97f54e765
I0420 00:33:12.725376 15399 master.cpp:1945] Elected as the leading master!
I0420 00:33:12.725389 15399 master.cpp:1632] Recovering from registrar
I0420 00:33:12.725544 15399 registrar.cpp:331] Recovering registrar
I0420 00:33:12.726506 15399 log.cpp:524] Attempting to start the writer
I0420 00:33:12.727424 15399 replica.cpp:493] Replica received implicit promise 
request from (3759)@10.0.2.15:41408 with proposal 1
I0420 00:33:12.727517 15399 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 50404ns
I0420 

[jira] [Commented] (MESOS-1802) HealthCheckTest.HealthStatusChange is flaky on jenkins.

2016-02-09 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15139357#comment-15139357
 ] 

Neil Conway commented on MESOS-1802:


Hi [~haosd...@gmail.com] -- I attached the output of {{--verbose}} for a 
failing run as a file associated with this issue.

> HealthCheckTest.HealthStatusChange is flaky on jenkins.
> ---
>
> Key: MESOS-1802
> URL: https://issues.apache.org/jira/browse/MESOS-1802
> Project: Mesos
>  Issue Type: Bug
>  Components: test, tests
>Affects Versions: 0.26.0
>Reporter: Benjamin Mahler
>  Labels: flaky, mesosphere
> Attachments: health_check_flaky_test_log.txt
>
>
> https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2374/consoleFull
> {noformat}
> [ RUN  ] HealthCheckTest.HealthStatusChange
> Using temporary directory '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2'
> I0916 22:56:14.034612 21026 leveldb.cpp:176] Opened db in 2.155713ms
> I0916 22:56:14.034965 21026 leveldb.cpp:183] Compacted db in 332489ns
> I0916 22:56:14.034984 21026 leveldb.cpp:198] Created db iterator in 3710ns
> I0916 22:56:14.034996 21026 leveldb.cpp:204] Seeked to beginning of db in 
> 642ns
> I0916 22:56:14.035006 21026 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 343ns
> I0916 22:56:14.035023 21026 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0916 22:56:14.035200 21054 recover.cpp:425] Starting replica recovery
> I0916 22:56:14.035403 21041 recover.cpp:451] Replica is in EMPTY status
> I0916 22:56:14.035888 21045 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I0916 22:56:14.035969 21052 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I0916 22:56:14.036118 21042 recover.cpp:542] Updating replica status to 
> STARTING
> I0916 22:56:14.036603 21046 master.cpp:286] Master 
> 20140916-225614-3125920579-47865-21026 (penates.apache.org) started on 
> 67.195.81.186:47865
> I0916 22:56:14.036634 21046 master.cpp:332] Master only allowing 
> authenticated frameworks to register
> I0916 22:56:14.036648 21046 master.cpp:337] Master only allowing 
> authenticated slaves to register
> I0916 22:56:14.036659 21046 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2/credentials'
> I0916 22:56:14.036686 21045 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 480322ns
> I0916 22:56:14.036700 21045 replica.cpp:320] Persisted replica status to 
> STARTING
> I0916 22:56:14.036769 21046 master.cpp:366] Authorization enabled
> I0916 22:56:14.036826 21045 recover.cpp:451] Replica is in STARTING status
> I0916 22:56:14.036944 21052 master.cpp:120] No whitelist given. Advertising 
> offers for all slaves
> I0916 22:56:14.036968 21049 hierarchical_allocator_process.hpp:299] 
> Initializing hierarchical allocator process with master : 
> master@67.195.81.186:47865
> I0916 22:56:14.037284 21054 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I0916 22:56:14.037312 21046 master.cpp:1212] The newly elected leader is 
> master@67.195.81.186:47865 with id 20140916-225614-3125920579-47865-21026
> I0916 22:56:14.037333 21046 master.cpp:1225] Elected as the leading master!
> I0916 22:56:14.037345 21046 master.cpp:1043] Recovering from registrar
> I0916 22:56:14.037504 21040 registrar.cpp:313] Recovering registrar
> I0916 22:56:14.037505 21053 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I0916 22:56:14.037681 21047 recover.cpp:542] Updating replica status to VOTING
> I0916 22:56:14.038072 21052 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 330251ns
> I0916 22:56:14.038087 21052 replica.cpp:320] Persisted replica status to 
> VOTING
> I0916 22:56:14.038127 21053 recover.cpp:556] Successfully joined the Paxos 
> group
> I0916 22:56:14.038202 21053 recover.cpp:440] Recover process terminated
> I0916 22:56:14.038364 21048 log.cpp:656] Attempting to start the writer
> I0916 22:56:14.038812 21053 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I0916 22:56:14.038925 21053 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 92623ns
> I0916 22:56:14.038944 21053 replica.cpp:342] Persisted promised to 1
> I0916 22:56:14.039201 21052 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0916 22:56:14.039676 21047 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I0916 22:56:14.039836 21047 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 144215ns
> I0916 22:56:14.039850 21047 replica.cpp:676] Persisted action at 0
> 

[jira] [Commented] (MESOS-1802) HealthCheckTest.HealthStatusChange is flaky on jenkins.

2016-02-09 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15139364#comment-15139364
 ] 

haosdent commented on MESOS-1802:
-

Thank you very much. Let me try to get more information from you log.

> HealthCheckTest.HealthStatusChange is flaky on jenkins.
> ---
>
> Key: MESOS-1802
> URL: https://issues.apache.org/jira/browse/MESOS-1802
> Project: Mesos
>  Issue Type: Bug
>  Components: test, tests
>Affects Versions: 0.26.0
>Reporter: Benjamin Mahler
>  Labels: flaky, mesosphere
> Attachments: health_check_flaky_test_log.txt
>
>
> https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2374/consoleFull
> {noformat}
> [ RUN  ] HealthCheckTest.HealthStatusChange
> Using temporary directory '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2'
> I0916 22:56:14.034612 21026 leveldb.cpp:176] Opened db in 2.155713ms
> I0916 22:56:14.034965 21026 leveldb.cpp:183] Compacted db in 332489ns
> I0916 22:56:14.034984 21026 leveldb.cpp:198] Created db iterator in 3710ns
> I0916 22:56:14.034996 21026 leveldb.cpp:204] Seeked to beginning of db in 
> 642ns
> I0916 22:56:14.035006 21026 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 343ns
> I0916 22:56:14.035023 21026 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0916 22:56:14.035200 21054 recover.cpp:425] Starting replica recovery
> I0916 22:56:14.035403 21041 recover.cpp:451] Replica is in EMPTY status
> I0916 22:56:14.035888 21045 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I0916 22:56:14.035969 21052 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I0916 22:56:14.036118 21042 recover.cpp:542] Updating replica status to 
> STARTING
> I0916 22:56:14.036603 21046 master.cpp:286] Master 
> 20140916-225614-3125920579-47865-21026 (penates.apache.org) started on 
> 67.195.81.186:47865
> I0916 22:56:14.036634 21046 master.cpp:332] Master only allowing 
> authenticated frameworks to register
> I0916 22:56:14.036648 21046 master.cpp:337] Master only allowing 
> authenticated slaves to register
> I0916 22:56:14.036659 21046 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2/credentials'
> I0916 22:56:14.036686 21045 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 480322ns
> I0916 22:56:14.036700 21045 replica.cpp:320] Persisted replica status to 
> STARTING
> I0916 22:56:14.036769 21046 master.cpp:366] Authorization enabled
> I0916 22:56:14.036826 21045 recover.cpp:451] Replica is in STARTING status
> I0916 22:56:14.036944 21052 master.cpp:120] No whitelist given. Advertising 
> offers for all slaves
> I0916 22:56:14.036968 21049 hierarchical_allocator_process.hpp:299] 
> Initializing hierarchical allocator process with master : 
> master@67.195.81.186:47865
> I0916 22:56:14.037284 21054 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I0916 22:56:14.037312 21046 master.cpp:1212] The newly elected leader is 
> master@67.195.81.186:47865 with id 20140916-225614-3125920579-47865-21026
> I0916 22:56:14.037333 21046 master.cpp:1225] Elected as the leading master!
> I0916 22:56:14.037345 21046 master.cpp:1043] Recovering from registrar
> I0916 22:56:14.037504 21040 registrar.cpp:313] Recovering registrar
> I0916 22:56:14.037505 21053 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I0916 22:56:14.037681 21047 recover.cpp:542] Updating replica status to VOTING
> I0916 22:56:14.038072 21052 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 330251ns
> I0916 22:56:14.038087 21052 replica.cpp:320] Persisted replica status to 
> VOTING
> I0916 22:56:14.038127 21053 recover.cpp:556] Successfully joined the Paxos 
> group
> I0916 22:56:14.038202 21053 recover.cpp:440] Recover process terminated
> I0916 22:56:14.038364 21048 log.cpp:656] Attempting to start the writer
> I0916 22:56:14.038812 21053 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I0916 22:56:14.038925 21053 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 92623ns
> I0916 22:56:14.038944 21053 replica.cpp:342] Persisted promised to 1
> I0916 22:56:14.039201 21052 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0916 22:56:14.039676 21047 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I0916 22:56:14.039836 21047 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 144215ns
> I0916 22:56:14.039850 21047 replica.cpp:676] Persisted action at 0
> I0916 22:56:14.040243 21047 replica.cpp:508] Replica received 

[jira] [Commented] (MESOS-1802) HealthCheckTest.HealthStatusChange is flaky on jenkins.

2016-02-06 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135906#comment-15135906
 ] 

haosdent commented on MESOS-1802:
-

Yes, [~neilc] do you have the stdout and stderr log?

> HealthCheckTest.HealthStatusChange is flaky on jenkins.
> ---
>
> Key: MESOS-1802
> URL: https://issues.apache.org/jira/browse/MESOS-1802
> Project: Mesos
>  Issue Type: Bug
>  Components: test, tests
>Affects Versions: 0.26.0
>Reporter: Benjamin Mahler
>  Labels: flaky, mesosphere
>
> https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2374/consoleFull
> {noformat}
> [ RUN  ] HealthCheckTest.HealthStatusChange
> Using temporary directory '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2'
> I0916 22:56:14.034612 21026 leveldb.cpp:176] Opened db in 2.155713ms
> I0916 22:56:14.034965 21026 leveldb.cpp:183] Compacted db in 332489ns
> I0916 22:56:14.034984 21026 leveldb.cpp:198] Created db iterator in 3710ns
> I0916 22:56:14.034996 21026 leveldb.cpp:204] Seeked to beginning of db in 
> 642ns
> I0916 22:56:14.035006 21026 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 343ns
> I0916 22:56:14.035023 21026 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0916 22:56:14.035200 21054 recover.cpp:425] Starting replica recovery
> I0916 22:56:14.035403 21041 recover.cpp:451] Replica is in EMPTY status
> I0916 22:56:14.035888 21045 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I0916 22:56:14.035969 21052 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I0916 22:56:14.036118 21042 recover.cpp:542] Updating replica status to 
> STARTING
> I0916 22:56:14.036603 21046 master.cpp:286] Master 
> 20140916-225614-3125920579-47865-21026 (penates.apache.org) started on 
> 67.195.81.186:47865
> I0916 22:56:14.036634 21046 master.cpp:332] Master only allowing 
> authenticated frameworks to register
> I0916 22:56:14.036648 21046 master.cpp:337] Master only allowing 
> authenticated slaves to register
> I0916 22:56:14.036659 21046 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2/credentials'
> I0916 22:56:14.036686 21045 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 480322ns
> I0916 22:56:14.036700 21045 replica.cpp:320] Persisted replica status to 
> STARTING
> I0916 22:56:14.036769 21046 master.cpp:366] Authorization enabled
> I0916 22:56:14.036826 21045 recover.cpp:451] Replica is in STARTING status
> I0916 22:56:14.036944 21052 master.cpp:120] No whitelist given. Advertising 
> offers for all slaves
> I0916 22:56:14.036968 21049 hierarchical_allocator_process.hpp:299] 
> Initializing hierarchical allocator process with master : 
> master@67.195.81.186:47865
> I0916 22:56:14.037284 21054 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I0916 22:56:14.037312 21046 master.cpp:1212] The newly elected leader is 
> master@67.195.81.186:47865 with id 20140916-225614-3125920579-47865-21026
> I0916 22:56:14.037333 21046 master.cpp:1225] Elected as the leading master!
> I0916 22:56:14.037345 21046 master.cpp:1043] Recovering from registrar
> I0916 22:56:14.037504 21040 registrar.cpp:313] Recovering registrar
> I0916 22:56:14.037505 21053 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I0916 22:56:14.037681 21047 recover.cpp:542] Updating replica status to VOTING
> I0916 22:56:14.038072 21052 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 330251ns
> I0916 22:56:14.038087 21052 replica.cpp:320] Persisted replica status to 
> VOTING
> I0916 22:56:14.038127 21053 recover.cpp:556] Successfully joined the Paxos 
> group
> I0916 22:56:14.038202 21053 recover.cpp:440] Recover process terminated
> I0916 22:56:14.038364 21048 log.cpp:656] Attempting to start the writer
> I0916 22:56:14.038812 21053 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I0916 22:56:14.038925 21053 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 92623ns
> I0916 22:56:14.038944 21053 replica.cpp:342] Persisted promised to 1
> I0916 22:56:14.039201 21052 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0916 22:56:14.039676 21047 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I0916 22:56:14.039836 21047 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 144215ns
> I0916 22:56:14.039850 21047 replica.cpp:676] Persisted action at 0
> I0916 22:56:14.040243 21047 replica.cpp:508] Replica received write request 
> for position 0
> I0916 22:56:14.040267 21047 

[jira] [Commented] (MESOS-1802) HealthCheckTest.HealthStatusChange is flaky on jenkins.

2016-02-04 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133401#comment-15133401
 ] 

Neil Conway commented on MESOS-1802:


Fails pretty reliably for me: ~3 failures in 35 test runs. If you'd like more 
information, just let me know.

> HealthCheckTest.HealthStatusChange is flaky on jenkins.
> ---
>
> Key: MESOS-1802
> URL: https://issues.apache.org/jira/browse/MESOS-1802
> Project: Mesos
>  Issue Type: Bug
>  Components: test, tests
>Affects Versions: 0.26.0
>Reporter: Benjamin Mahler
>  Labels: flaky, mesosphere
>
> https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2374/consoleFull
> {noformat}
> [ RUN  ] HealthCheckTest.HealthStatusChange
> Using temporary directory '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2'
> I0916 22:56:14.034612 21026 leveldb.cpp:176] Opened db in 2.155713ms
> I0916 22:56:14.034965 21026 leveldb.cpp:183] Compacted db in 332489ns
> I0916 22:56:14.034984 21026 leveldb.cpp:198] Created db iterator in 3710ns
> I0916 22:56:14.034996 21026 leveldb.cpp:204] Seeked to beginning of db in 
> 642ns
> I0916 22:56:14.035006 21026 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 343ns
> I0916 22:56:14.035023 21026 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0916 22:56:14.035200 21054 recover.cpp:425] Starting replica recovery
> I0916 22:56:14.035403 21041 recover.cpp:451] Replica is in EMPTY status
> I0916 22:56:14.035888 21045 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I0916 22:56:14.035969 21052 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I0916 22:56:14.036118 21042 recover.cpp:542] Updating replica status to 
> STARTING
> I0916 22:56:14.036603 21046 master.cpp:286] Master 
> 20140916-225614-3125920579-47865-21026 (penates.apache.org) started on 
> 67.195.81.186:47865
> I0916 22:56:14.036634 21046 master.cpp:332] Master only allowing 
> authenticated frameworks to register
> I0916 22:56:14.036648 21046 master.cpp:337] Master only allowing 
> authenticated slaves to register
> I0916 22:56:14.036659 21046 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2/credentials'
> I0916 22:56:14.036686 21045 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 480322ns
> I0916 22:56:14.036700 21045 replica.cpp:320] Persisted replica status to 
> STARTING
> I0916 22:56:14.036769 21046 master.cpp:366] Authorization enabled
> I0916 22:56:14.036826 21045 recover.cpp:451] Replica is in STARTING status
> I0916 22:56:14.036944 21052 master.cpp:120] No whitelist given. Advertising 
> offers for all slaves
> I0916 22:56:14.036968 21049 hierarchical_allocator_process.hpp:299] 
> Initializing hierarchical allocator process with master : 
> master@67.195.81.186:47865
> I0916 22:56:14.037284 21054 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I0916 22:56:14.037312 21046 master.cpp:1212] The newly elected leader is 
> master@67.195.81.186:47865 with id 20140916-225614-3125920579-47865-21026
> I0916 22:56:14.037333 21046 master.cpp:1225] Elected as the leading master!
> I0916 22:56:14.037345 21046 master.cpp:1043] Recovering from registrar
> I0916 22:56:14.037504 21040 registrar.cpp:313] Recovering registrar
> I0916 22:56:14.037505 21053 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I0916 22:56:14.037681 21047 recover.cpp:542] Updating replica status to VOTING
> I0916 22:56:14.038072 21052 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 330251ns
> I0916 22:56:14.038087 21052 replica.cpp:320] Persisted replica status to 
> VOTING
> I0916 22:56:14.038127 21053 recover.cpp:556] Successfully joined the Paxos 
> group
> I0916 22:56:14.038202 21053 recover.cpp:440] Recover process terminated
> I0916 22:56:14.038364 21048 log.cpp:656] Attempting to start the writer
> I0916 22:56:14.038812 21053 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I0916 22:56:14.038925 21053 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 92623ns
> I0916 22:56:14.038944 21053 replica.cpp:342] Persisted promised to 1
> I0916 22:56:14.039201 21052 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0916 22:56:14.039676 21047 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I0916 22:56:14.039836 21047 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 144215ns
> I0916 22:56:14.039850 21047 replica.cpp:676] Persisted action at 0
> I0916 22:56:14.040243 21047 replica.cpp:508] Replica received write request 

[jira] [Commented] (MESOS-1802) HealthCheckTest.HealthStatusChange is flaky on jenkins.

2016-01-30 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15124897#comment-15124897
 ] 

haosdent commented on MESOS-1802:
-

Could it easy reproduce in your env? For example, after xxx cycles the failure 
happens for sure. Last time I try run a lot cycles in a CentOS 7(physical 
machine), but it runs well.

> HealthCheckTest.HealthStatusChange is flaky on jenkins.
> ---
>
> Key: MESOS-1802
> URL: https://issues.apache.org/jira/browse/MESOS-1802
> Project: Mesos
>  Issue Type: Bug
>  Components: test, tests
>Affects Versions: 0.26.0
>Reporter: Benjamin Mahler
>  Labels: flaky, mesosphere
>
> https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2374/consoleFull
> {noformat}
> [ RUN  ] HealthCheckTest.HealthStatusChange
> Using temporary directory '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2'
> I0916 22:56:14.034612 21026 leveldb.cpp:176] Opened db in 2.155713ms
> I0916 22:56:14.034965 21026 leveldb.cpp:183] Compacted db in 332489ns
> I0916 22:56:14.034984 21026 leveldb.cpp:198] Created db iterator in 3710ns
> I0916 22:56:14.034996 21026 leveldb.cpp:204] Seeked to beginning of db in 
> 642ns
> I0916 22:56:14.035006 21026 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 343ns
> I0916 22:56:14.035023 21026 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0916 22:56:14.035200 21054 recover.cpp:425] Starting replica recovery
> I0916 22:56:14.035403 21041 recover.cpp:451] Replica is in EMPTY status
> I0916 22:56:14.035888 21045 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I0916 22:56:14.035969 21052 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I0916 22:56:14.036118 21042 recover.cpp:542] Updating replica status to 
> STARTING
> I0916 22:56:14.036603 21046 master.cpp:286] Master 
> 20140916-225614-3125920579-47865-21026 (penates.apache.org) started on 
> 67.195.81.186:47865
> I0916 22:56:14.036634 21046 master.cpp:332] Master only allowing 
> authenticated frameworks to register
> I0916 22:56:14.036648 21046 master.cpp:337] Master only allowing 
> authenticated slaves to register
> I0916 22:56:14.036659 21046 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2/credentials'
> I0916 22:56:14.036686 21045 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 480322ns
> I0916 22:56:14.036700 21045 replica.cpp:320] Persisted replica status to 
> STARTING
> I0916 22:56:14.036769 21046 master.cpp:366] Authorization enabled
> I0916 22:56:14.036826 21045 recover.cpp:451] Replica is in STARTING status
> I0916 22:56:14.036944 21052 master.cpp:120] No whitelist given. Advertising 
> offers for all slaves
> I0916 22:56:14.036968 21049 hierarchical_allocator_process.hpp:299] 
> Initializing hierarchical allocator process with master : 
> master@67.195.81.186:47865
> I0916 22:56:14.037284 21054 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I0916 22:56:14.037312 21046 master.cpp:1212] The newly elected leader is 
> master@67.195.81.186:47865 with id 20140916-225614-3125920579-47865-21026
> I0916 22:56:14.037333 21046 master.cpp:1225] Elected as the leading master!
> I0916 22:56:14.037345 21046 master.cpp:1043] Recovering from registrar
> I0916 22:56:14.037504 21040 registrar.cpp:313] Recovering registrar
> I0916 22:56:14.037505 21053 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I0916 22:56:14.037681 21047 recover.cpp:542] Updating replica status to VOTING
> I0916 22:56:14.038072 21052 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 330251ns
> I0916 22:56:14.038087 21052 replica.cpp:320] Persisted replica status to 
> VOTING
> I0916 22:56:14.038127 21053 recover.cpp:556] Successfully joined the Paxos 
> group
> I0916 22:56:14.038202 21053 recover.cpp:440] Recover process terminated
> I0916 22:56:14.038364 21048 log.cpp:656] Attempting to start the writer
> I0916 22:56:14.038812 21053 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I0916 22:56:14.038925 21053 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 92623ns
> I0916 22:56:14.038944 21053 replica.cpp:342] Persisted promised to 1
> I0916 22:56:14.039201 21052 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0916 22:56:14.039676 21047 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I0916 22:56:14.039836 21047 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 144215ns
> I0916 22:56:14.039850 21047 replica.cpp:676] Persisted action at 0
> I0916 

[jira] [Commented] (MESOS-1802) HealthCheckTest.HealthStatusChange is flaky on jenkins.

2016-01-29 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15124498#comment-15124498
 ] 

Neil Conway commented on MESOS-1802:


Just saw this on an Arch Linux VM (Virtualbox):

{noformat}
[ RUN  ] HealthCheckTest.HealthStatusChange
/mesos-2/src/tests/health_check_tests.cpp:529: Failure
Mock function called more times than expected - returning directly.
Function call: statusUpdate(0x7ffd37ae9bd0, @0x7f2afcdf20a0 112-byte object 
<70-9D F0-08 2B-7F 00-00 00-00 00-00 00-00 00-00 A0-6D 01-CC 2A-7F 00-00 20-EC 
1E-02 00-00 00-00 01-00 00-00 02-00 00-00 20-EC 1E-02 00-00 00-00 60-C4 01-CC 
2A-7F 00-00 C0-C3 01-CC 2A-7F 00-00 00-00 00-00 00-00 00-00 0C-74 DD-41 00-AB 
D5-41 20-C5 01-CC 2A-7F 00-00 00-00 00-00 00-00 00-00 B0-20 00-CC 2A-7F 00-00 
00-00 00-00 CB-17 00-00>)
 Expected: to be called 4 times
   Actual: called 5 times - over-saturated and active
/mesos-2/src/tests/health_check_tests.cpp:529: Failure
Mock function called more times than expected - returning directly.
Function call: statusUpdate(0x7ffd37ae9bd0, @0x7f2afb5ef0a0 112-byte object 
<70-9D F0-08 2B-7F 00-00 00-00 00-00 00-00 00-00 C0-6E 02-D4 2A-7F 00-00 20-EC 
1E-02 00-00 00-00 01-00 00-00 02-00 00-00 20-EC 1E-02 00-00 00-00 10-08 02-D4 
2A-7F 00-00 10-09 02-D4 2A-7F 00-00 00-00 00-00 01-00 00-00 2D-D0 E3-41 00-AB 
D5-41 10-07 02-D4 2A-7F 00-00 00-00 00-00 00-00 00-00 D0-07 02-D4 2A-7F 00-00 
00-00 00-00 CB-17 00-00>)
 Expected: to be called 4 times
   Actual: called 6 times - over-saturated and active
/mesos-2/src/tests/health_check_tests.cpp:529: Failure
Mock function called more times than expected - returning directly.
Function call: statusUpdate(0x7ffd37ae9bd0, @0x7f2afcdf20a0 112-byte object 
<70-9D F0-08 2B-7F 00-00 00-00 00-00 00-00 00-00 90-C5 01-CC 2A-7F 00-00 20-EC 
1E-02 00-00 00-00 01-00 00-00 02-00 00-00 20-EC 1E-02 00-00 00-00 00-83 02-CC 
2A-7F 00-00 A0-6D 01-CC 2A-7F 00-00 00-00 00-00 00-00 00-00 F5-F7 EA-41 00-AB 
D5-41 20-4D 00-CC 2A-7F 00-00 00-00 00-00 00-00 00-00 F0-C5 01-CC 2A-7F 00-00 
00-00 00-00 CB-17 00-00>)
 Expected: to be called 4 times
   Actual: called 7 times - over-saturated and active
/mesos-2/src/tests/health_check_tests.cpp:529: Failure
Mock function called more times than expected - returning directly.
Function call: statusUpdate(0x7ffd37ae9bd0, @0x7f2afa5ed0a0 112-byte object 
<70-9D F0-08 2B-7F 00-00 00-00 00-00 00-00 00-00 A0-89 01-E8 2A-7F 00-00 20-EC 
1E-02 00-00 00-00 01-00 00-00 02-00 00-00 20-EC 1E-02 00-00 00-00 60-34 00-E8 
2A-7F 00-00 B0-98 01-E8 2A-7F 00-00 00-00 00-00 01-00 00-00 02-0E F2-41 00-AB 
D5-41 90-BF 00-E8 2A-7F 00-00 00-00 00-00 00-00 00-00 80-B1 00-E8 2A-7F 00-00 
00-00 00-00 CB-17 00-00>)
 Expected: to be called 4 times
   Actual: called 8 times - over-saturated and active
/mesos-2/src/tests/health_check_tests.cpp:529: Failure
Mock function called more times than expected - returning directly.
Function call: statusUpdate(0x7ffd37ae9bd0, @0x7f2afc5f10a0 112-byte object 
<70-9D F0-08 2B-7F 00-00 00-00 00-00 00-00 00-00 50-58 02-DC 2A-7F 00-00 20-EC 
1E-02 00-00 00-00 01-00 00-00 02-00 00-00 20-EC 1E-02 00-00 00-00 B0-1E 01-DC 
2A-7F 00-00 40-1F 01-DC 2A-7F 00-00 00-00 00-00 00-00 00-00 4B-76 F8-41 00-AB 
D5-41 A0-95 02-DC 2A-7F 00-00 00-00 00-00 00-00 00-00 30-3F 01-DC 2A-7F 00-00 
00-00 00-00 CB-17 00-00>)
 Expected: to be called 4 times
   Actual: called 9 times - over-saturated and active
/mesos-2/src/tests/health_check_tests.cpp:529: Failure
Mock function called more times than expected - returning directly.
Function call: statusUpdate(0x7ffd37ae9bd0, @0x7f2afb5ef0a0 112-byte object 
<70-9D F0-08 2B-7F 00-00 00-00 00-00 00-00 00-00 50-65 03-D4 2A-7F 00-00 20-EC 
1E-02 00-00 00-00 01-00 00-00 02-00 00-00 20-EC 1E-02 00-00 00-00 80-06 02-D4 
2A-7F 00-00 20-06 02-D4 2A-7F 00-00 00-00 00-00 01-00 00-00 FC-17 FF-41 00-AB 
D5-41 10-09 02-D4 2A-7F 00-00 00-00 00-00 00-00 00-00 E0-63 03-D4 2A-7F 00-00 
00-00 00-00 CB-17 00-00>)
 Expected: to be called 4 times
   Actual: called 10 times - over-saturated and active
/mesos-2/src/tests/health_check_tests.cpp:529: Failure
Mock function called more times than expected - returning directly.
Function call: statusUpdate(0x7ffd37ae9bd0, @0x7f2afadee0a0 112-byte object 
<70-9D F0-08 2B-7F 00-00 00-00 00-00 00-00 00-00 10-A5 05-D8 2A-7F 00-00 20-EC 
1E-02 00-00 00-00 01-00 00-00 02-00 00-00 20-EC 1E-02 00-00 00-00 60-1C 06-D8 
2A-7F 00-00 C0-FF 0A-D8 2A-7F 00-00 00-00 00-00 00-00 00-00 C3-82 05-42 00-AB 
D5-41 F0-EB 06-D8 2A-7F 00-00 00-00 00-00 00-00 00-00 E0-FB 06-D8 2A-7F 00-00 
00-00 00-00 CB-17 00-00>)
 Expected: to be called 4 times
   Actual: called 11 times - over-saturated and active
/mesos-2/src/tests/health_check_tests.cpp:529: Failure
Mock function called more times than expected - returning 

[jira] [Commented] (MESOS-1802) HealthCheckTest.HealthStatusChange is flaky on jenkins.

2016-01-25 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115532#comment-15115532
 ] 

Greg Mann commented on MESOS-1802:
--

Looks like this is still an issue. Just saw this on the ASF CI, Ubuntu 14.04 
with gcc:

{code}
[ RUN  ] HealthCheckTest.HealthStatusChange
I0125 15:22:26.282974 31962 leveldb.cpp:174] Opened db in 83.974532ms
I0125 15:22:26.329022 31962 leveldb.cpp:181] Compacted db in 45.96014ms
I0125 15:22:26.329654 31962 leveldb.cpp:196] Created db iterator in 28398ns
I0125 15:22:26.330154 31962 leveldb.cpp:202] Seeked to beginning of db in 4095ns
I0125 15:22:26.330663 31962 leveldb.cpp:271] Iterated through 0 keys in the db 
in 468ns
I0125 15:22:26.331181 31962 replica.cpp:779] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0125 15:22:26.332582 31988 recover.cpp:447] Starting replica recovery
I0125 15:22:26.333237 31991 recover.cpp:473] Replica is in EMPTY status
I0125 15:22:26.336228 31993 replica.cpp:673] Replica in EMPTY status received a 
broadcasted recover request from (2239)@172.17.0.2:52355
I0125 15:22:26.336789 31992 recover.cpp:193] Received a recover response from a 
replica in EMPTY status
I0125 15:22:26.338047 31992 recover.cpp:564] Updating replica status to STARTING
I0125 15:22:26.350210 31982 master.cpp:374] Master 
8bd61c8a-63c9-4f67-99a4-d6ce2e3f77be (6a3fa6c4588a) started on 172.17.0.2:52355
I0125 15:22:26.350651 31982 master.cpp:376] Flags at startup: --acls="" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
--authenticators="crammd5" --authorizers="local" 
--credentials="/tmp/inJ0ZL/credentials" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
--max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
--quiet="false" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="25secs" --registry_strict="true" 
--root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/mesos/mesos-0.27.0/_inst/share/mesos/webui" 
--work_dir="/tmp/inJ0ZL/master" --zk_session_timeout="10secs"
I0125 15:22:26.352558 31982 master.cpp:421] Master only allowing authenticated 
frameworks to register
I0125 15:22:26.352833 31982 master.cpp:426] Master only allowing authenticated 
slaves to register
I0125 15:22:26.353160 31982 credentials.hpp:35] Loading credentials for 
authentication from '/tmp/inJ0ZL/credentials'
I0125 15:22:26.356725 31982 master.cpp:466] Using default 'crammd5' 
authenticator
I0125 15:22:26.357357 31982 master.cpp:535] Using default 'basic' HTTP 
authenticator
I0125 15:22:26.357851 31982 master.cpp:569] Authorization enabled
I0125 15:22:26.359246 31986 hierarchical.cpp:144] Initialized hierarchical 
allocator process
I0125 15:22:26.359474 31992 whitelist_watcher.cpp:77] No whitelist given
I0125 15:22:26.364750 31991 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 25.971673ms
I0125 15:22:26.365209 31991 replica.cpp:320] Persisted replica status to 
STARTING
I0125 15:22:26.365898 31991 recover.cpp:473] Replica is in STARTING status
I0125 15:22:26.369220 31993 replica.cpp:673] Replica in STARTING status 
received a broadcasted recover request from (2241)@172.17.0.2:52355
I0125 15:22:26.370393 31991 recover.cpp:193] Received a recover response from a 
replica in STARTING status
I0125 15:22:26.371255 31993 recover.cpp:564] Updating replica status to VOTING
I0125 15:22:26.372786 31982 master.cpp:1710] The newly elected leader is 
master@172.17.0.2:52355 with id 8bd61c8a-63c9-4f67-99a4-d6ce2e3f77be
I0125 15:22:26.373251 31982 master.cpp:1723] Elected as the leading master!
I0125 15:22:26.373553 31982 master.cpp:1468] Recovering from registrar
I0125 15:22:26.374184 31991 registrar.cpp:307] Recovering registrar
I0125 15:22:26.398006 31993 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 26.052513ms
I0125 15:22:26.398576 31993 replica.cpp:320] Persisted replica status to VOTING
I0125 15:22:26.399235 31982 recover.cpp:578] Successfully joined the Paxos group
I0125 15:22:26.399518 31982 recover.cpp:462] Recover process terminated
I0125 15:22:26.400599 31993 log.cpp:659] Attempting to start the writer
I0125 15:22:26.402815 31992 replica.cpp:493] Replica received implicit promise 
request from (2242)@172.17.0.2:52355 with proposal 1
I0125 15:22:26.423198 31992 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 19.954945ms
I0125 15:22:26.423651 31992 replica.cpp:342] Persisted promised to 1
I0125 15:22:26.425547 31992 coordinator.cpp:238] Coordinator attempting to fill 
missing 

[jira] [Commented] (MESOS-1802) HealthCheckTest.HealthStatusChange is flaky on jenkins.

2015-11-09 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997592#comment-14997592
 ] 

Vinod Kone commented on MESOS-1802:
---

This is still happening.

https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/1204/changes

{code}

[ RUN  ] HealthCheckTest.HealthStatusChange
I1109 19:00:52.094830 29603 leveldb.cpp:176] Opened db in 282.154657ms
I1109 19:00:52.187214 29603 leveldb.cpp:183] Compacted db in 92.28671ms
I1109 19:00:52.187328 29603 leveldb.cpp:198] Created db iterator in 31318ns
I1109 19:00:52.187345 29603 leveldb.cpp:204] Seeked to beginning of db in 3925ns
I1109 19:00:52.187357 29603 leveldb.cpp:273] Iterated through 0 keys in the db 
in 386ns
I1109 19:00:52.187417 29603 replica.cpp:780] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1109 19:00:52.188252 29628 recover.cpp:449] Starting replica recovery
I1109 19:00:52.188861 29628 recover.cpp:475] Replica is in EMPTY status
I1109 19:00:52.190914 29626 replica.cpp:676] Replica in EMPTY status received a 
broadcasted recover request from (1864)@172.17.14.49:60889
I1109 19:00:52.191453 29632 recover.cpp:195] Received a recover response from a 
replica in EMPTY status
I1109 19:00:52.192078 29632 recover.cpp:566] Updating replica status to STARTING
I1109 19:00:52.199542 29624 master.cpp:367] Master 
34506b61-7fc1-4e5e-821c-e5f5650782e0 (61f7c652bdf2) started on 
172.17.14.49:60889
I1109 19:00:52.200220 29624 master.cpp:369] Flags at startup: --acls="" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
--authorizers="local" --credentials="/tmp/z7oqX0/credentials" 
--framework_sorter="drf" --help="false" --hostname_lookup="true" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
--quiet="false" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="25secs" --registry_strict="true" 
--root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/mesos/mesos-0.26.0/_inst/share/mesos/webui" 
--work_dir="/tmp/z7oqX0/master" --zk_session_timeout="10secs"
I1109 19:00:52.200584 29624 master.cpp:414] Master only allowing authenticated 
frameworks to register
I1109 19:00:52.200597 29624 master.cpp:419] Master only allowing authenticated 
slaves to register
I1109 19:00:52.200606 29624 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/z7oqX0/credentials'
I1109 19:00:52.200917 29624 master.cpp:458] Using default 'crammd5' 
authenticator
I1109 19:00:52.201328 29624 master.cpp:495] Authorization enabled
I1109 19:00:52.201648 29622 hierarchical.cpp:140] Initialized hierarchical 
allocator process
I1109 19:00:52.201715 29622 whitelist_watcher.cpp:79] No whitelist given
I1109 19:00:52.204258 29624 master.cpp:1606] The newly elected leader is 
master@172.17.14.49:60889 with id 34506b61-7fc1-4e5e-821c-e5f5650782e0
I1109 19:00:52.204290 29624 master.cpp:1619] Elected as the leading master!
I1109 19:00:52.204308 29624 master.cpp:1379] Recovering from registrar
I1109 19:00:52.204457 29637 registrar.cpp:309] Recovering registrar
I1109 19:00:52.257841 29632 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 65.359249ms
I1109 19:00:52.257933 29632 replica.cpp:323] Persisted replica status to 
STARTING
I1109 19:00:52.258283 29631 recover.cpp:475] Replica is in STARTING status
I1109 19:00:52.259827 29630 replica.cpp:676] Replica in STARTING status 
received a broadcasted recover request from (1865)@172.17.14.49:60889
I1109 19:00:52.260082 29628 recover.cpp:195] Received a recover response from a 
replica in STARTING status
I1109 19:00:52.260671 29630 recover.cpp:566] Updating replica status to VOTING
I1109 19:00:52.369948 29628 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 109.102187ms
I1109 19:00:52.370046 29628 replica.cpp:323] Persisted replica status to VOTING
I1109 19:00:52.370317 29628 recover.cpp:580] Successfully joined the Paxos group
I1109 19:00:52.370551 29628 recover.cpp:464] Recover process terminated
I1109 19:00:52.371186 29629 log.cpp:661] Attempting to start the writer
I1109 19:00:52.372530 29629 replica.cpp:496] Replica received implicit promise 
request from (1866)@172.17.14.49:60889 with proposal 1
I1109 19:00:52.505920 29629 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 133.338727ms
I1109 19:00:52.505997 29629 replica.cpp:345] Persisted promised to 1
I1109 19:00:52.506805 29634 coordinator.cpp:240] Coordinator attempting to fill 
missing positions
I1109 19:00:52.508029 29622 replica.cpp:391] Replica received explicit promise 
request from (1867)@172.17.14.49:60889 for 

[jira] [Commented] (MESOS-1802) HealthCheckTest.HealthStatusChange is flaky on jenkins.

2015-04-24 Thread Benjamin Hindman (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511406#comment-14511406
 ] 

Benjamin Hindman commented on MESOS-1802:
-

Hey [~tnachen], are you working on this? We're doing some backlog grooming!

 HealthCheckTest.HealthStatusChange is flaky on jenkins.
 ---

 Key: MESOS-1802
 URL: https://issues.apache.org/jira/browse/MESOS-1802
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Benjamin Mahler
Assignee: Timothy Chen
  Labels: flaky

 https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2374/consoleFull
 {noformat}
 [ RUN  ] HealthCheckTest.HealthStatusChange
 Using temporary directory '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2'
 I0916 22:56:14.034612 21026 leveldb.cpp:176] Opened db in 2.155713ms
 I0916 22:56:14.034965 21026 leveldb.cpp:183] Compacted db in 332489ns
 I0916 22:56:14.034984 21026 leveldb.cpp:198] Created db iterator in 3710ns
 I0916 22:56:14.034996 21026 leveldb.cpp:204] Seeked to beginning of db in 
 642ns
 I0916 22:56:14.035006 21026 leveldb.cpp:273] Iterated through 0 keys in the 
 db in 343ns
 I0916 22:56:14.035023 21026 replica.cpp:741] Replica recovered with log 
 positions 0 - 0 with 1 holes and 0 unlearned
 I0916 22:56:14.035200 21054 recover.cpp:425] Starting replica recovery
 I0916 22:56:14.035403 21041 recover.cpp:451] Replica is in EMPTY status
 I0916 22:56:14.035888 21045 replica.cpp:638] Replica in EMPTY status received 
 a broadcasted recover request
 I0916 22:56:14.035969 21052 recover.cpp:188] Received a recover response from 
 a replica in EMPTY status
 I0916 22:56:14.036118 21042 recover.cpp:542] Updating replica status to 
 STARTING
 I0916 22:56:14.036603 21046 master.cpp:286] Master 
 20140916-225614-3125920579-47865-21026 (penates.apache.org) started on 
 67.195.81.186:47865
 I0916 22:56:14.036634 21046 master.cpp:332] Master only allowing 
 authenticated frameworks to register
 I0916 22:56:14.036648 21046 master.cpp:337] Master only allowing 
 authenticated slaves to register
 I0916 22:56:14.036659 21046 credentials.hpp:36] Loading credentials for 
 authentication from 
 '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2/credentials'
 I0916 22:56:14.036686 21045 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 480322ns
 I0916 22:56:14.036700 21045 replica.cpp:320] Persisted replica status to 
 STARTING
 I0916 22:56:14.036769 21046 master.cpp:366] Authorization enabled
 I0916 22:56:14.036826 21045 recover.cpp:451] Replica is in STARTING status
 I0916 22:56:14.036944 21052 master.cpp:120] No whitelist given. Advertising 
 offers for all slaves
 I0916 22:56:14.036968 21049 hierarchical_allocator_process.hpp:299] 
 Initializing hierarchical allocator process with master : 
 master@67.195.81.186:47865
 I0916 22:56:14.037284 21054 replica.cpp:638] Replica in STARTING status 
 received a broadcasted recover request
 I0916 22:56:14.037312 21046 master.cpp:1212] The newly elected leader is 
 master@67.195.81.186:47865 with id 20140916-225614-3125920579-47865-21026
 I0916 22:56:14.037333 21046 master.cpp:1225] Elected as the leading master!
 I0916 22:56:14.037345 21046 master.cpp:1043] Recovering from registrar
 I0916 22:56:14.037504 21040 registrar.cpp:313] Recovering registrar
 I0916 22:56:14.037505 21053 recover.cpp:188] Received a recover response from 
 a replica in STARTING status
 I0916 22:56:14.037681 21047 recover.cpp:542] Updating replica status to VOTING
 I0916 22:56:14.038072 21052 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 330251ns
 I0916 22:56:14.038087 21052 replica.cpp:320] Persisted replica status to 
 VOTING
 I0916 22:56:14.038127 21053 recover.cpp:556] Successfully joined the Paxos 
 group
 I0916 22:56:14.038202 21053 recover.cpp:440] Recover process terminated
 I0916 22:56:14.038364 21048 log.cpp:656] Attempting to start the writer
 I0916 22:56:14.038812 21053 replica.cpp:474] Replica received implicit 
 promise request with proposal 1
 I0916 22:56:14.038925 21053 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 92623ns
 I0916 22:56:14.038944 21053 replica.cpp:342] Persisted promised to 1
 I0916 22:56:14.039201 21052 coordinator.cpp:230] Coordinator attemping to 
 fill missing position
 I0916 22:56:14.039676 21047 replica.cpp:375] Replica received explicit 
 promise request for position 0 with proposal 2
 I0916 22:56:14.039836 21047 leveldb.cpp:343] Persisting action (8 bytes) to 
 leveldb took 144215ns
 I0916 22:56:14.039850 21047 replica.cpp:676] Persisted action at 0
 I0916 22:56:14.040243 21047 replica.cpp:508] Replica received write request 
 for position 0
 I0916 22:56:14.040267 21047 leveldb.cpp:438] Reading position from leveldb 
 took