[jira] [Commented] (MESOS-1613) HealthCheckTest.ConsecutiveFailures is flaky
[ https://issues.apache.org/jira/browse/MESOS-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049850#comment-15049850 ] Benjamin Mahler commented on MESOS-1613: For posterity, I also wasn't able to reproduce this by just running in repetition. However, when I ran one {{openssl speed}} for each core on my laptop in order to induce load, I could reproduce easily. We probably want to direct folks to try this when they are having trouble reproducing something flaky from CI. I will post a fix through MESOS-4106. > HealthCheckTest.ConsecutiveFailures is flaky > > > Key: MESOS-1613 > URL: https://issues.apache.org/jira/browse/MESOS-1613 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 0.20.0 > Environment: Ubuntu 10.04 GCC >Reporter: Vinod Kone >Assignee: Timothy Chen > Labels: flaky, mesosphere > > {code} > [ RUN ] HealthCheckTest.ConsecutiveFailures > Using temporary directory '/tmp/HealthCheckTest_ConsecutiveFailures_AzK0OV' > I0717 04:39:59.288471 5009 leveldb.cpp:176] Opened db in 21.575631ms > I0717 04:39:59.295274 5009 leveldb.cpp:183] Compacted db in 6.471982ms > I0717 04:39:59.295552 5009 leveldb.cpp:198] Created db iterator in 16783ns > I0717 04:39:59.296026 5009 leveldb.cpp:204] Seeked to beginning of db in > 2125ns > I0717 04:39:59.296257 5009 leveldb.cpp:273] Iterated through 0 keys in the > db in 10747ns > I0717 04:39:59.296584 5009 replica.cpp:741] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0717 04:39:59.297322 5033 recover.cpp:425] Starting replica recovery > I0717 04:39:59.297413 5033 recover.cpp:451] Replica is in EMPTY status > I0717 04:39:59.297824 5033 replica.cpp:638] Replica in EMPTY status received > a broadcasted recover request > I0717 04:39:59.297899 5033 recover.cpp:188] Received a recover response from > a replica in EMPTY status > I0717 04:39:59.297997 5033 recover.cpp:542] Updating replica status to > STARTING > I0717 04:39:59.301985 5031 master.cpp:288] Master > 20140717-043959-16842879-40280-5009 (lucid) started on 127.0.1.1:40280 > I0717 04:39:59.302026 5031 master.cpp:325] Master only allowing > authenticated frameworks to register > I0717 04:39:59.302032 5031 master.cpp:330] Master only allowing > authenticated slaves to register > I0717 04:39:59.302039 5031 credentials.hpp:36] Loading credentials for > authentication from > '/tmp/HealthCheckTest_ConsecutiveFailures_AzK0OV/credentials' > I0717 04:39:59.302283 5031 master.cpp:359] Authorization enabled > I0717 04:39:59.302971 5031 hierarchical_allocator_process.hpp:301] > Initializing hierarchical allocator process with master : > master@127.0.1.1:40280 > I0717 04:39:59.303022 5031 master.cpp:122] No whitelist given. Advertising > offers for all slaves > I0717 04:39:59.303390 5033 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 5.325097ms > I0717 04:39:59.303419 5033 replica.cpp:320] Persisted replica status to > STARTING > I0717 04:39:59.304076 5030 master.cpp:1128] The newly elected leader is > master@127.0.1.1:40280 with id 20140717-043959-16842879-40280-5009 > I0717 04:39:59.304095 5030 master.cpp:1141] Elected as the leading master! > I0717 04:39:59.304102 5030 master.cpp:959] Recovering from registrar > I0717 04:39:59.304182 5030 registrar.cpp:313] Recovering registrar > I0717 04:39:59.304635 5033 recover.cpp:451] Replica is in STARTING status > I0717 04:39:59.304962 5033 replica.cpp:638] Replica in STARTING status > received a broadcasted recover request > I0717 04:39:59.305026 5033 recover.cpp:188] Received a recover response from > a replica in STARTING status > I0717 04:39:59.305130 5033 recover.cpp:542] Updating replica status to VOTING > I0717 04:39:59.310416 5033 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 5.204157ms > I0717 04:39:59.310459 5033 replica.cpp:320] Persisted replica status to > VOTING > I0717 04:39:59.310534 5033 recover.cpp:556] Successfully joined the Paxos > group > I0717 04:39:59.310607 5033 recover.cpp:440] Recover process terminated > I0717 04:39:59.310773 5033 log.cpp:656] Attempting to start the writer > I0717 04:39:59.311157 5033 replica.cpp:474] Replica received implicit > promise request with proposal 1 > I0717 04:39:59.313451 5033 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 2.271822ms > I0717 04:39:59.313627 5033 replica.cpp:342] Persisted promised to 1 > I0717 04:39:59.318038 5031 coordinator.cpp:230] Coordinator attemping to > fill missing position > I0717 04:39:59.318430 5031 replica.cpp:375] Replica received explicit > promise request for position 0 with proposal 2 > I0717 04:39:59.323459 5031 leveldb.cpp:343] Persisting action (8 bytes) to > leveldb took
[jira] [Commented] (MESOS-1613) HealthCheckTest.ConsecutiveFailures is flaky
[ https://issues.apache.org/jira/browse/MESOS-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152062#comment-14152062 ] Timothy Chen commented on MESOS-1613: - Added output to stdout and stderr in my branch, hopefully to see what's going on HealthCheckTest.ConsecutiveFailures is flaky Key: MESOS-1613 URL: https://issues.apache.org/jira/browse/MESOS-1613 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.20.0 Environment: Ubuntu 10.04 GCC Reporter: Vinod Kone Assignee: Timothy Chen {code} [ RUN ] HealthCheckTest.ConsecutiveFailures Using temporary directory '/tmp/HealthCheckTest_ConsecutiveFailures_AzK0OV' I0717 04:39:59.288471 5009 leveldb.cpp:176] Opened db in 21.575631ms I0717 04:39:59.295274 5009 leveldb.cpp:183] Compacted db in 6.471982ms I0717 04:39:59.295552 5009 leveldb.cpp:198] Created db iterator in 16783ns I0717 04:39:59.296026 5009 leveldb.cpp:204] Seeked to beginning of db in 2125ns I0717 04:39:59.296257 5009 leveldb.cpp:273] Iterated through 0 keys in the db in 10747ns I0717 04:39:59.296584 5009 replica.cpp:741] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I0717 04:39:59.297322 5033 recover.cpp:425] Starting replica recovery I0717 04:39:59.297413 5033 recover.cpp:451] Replica is in EMPTY status I0717 04:39:59.297824 5033 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I0717 04:39:59.297899 5033 recover.cpp:188] Received a recover response from a replica in EMPTY status I0717 04:39:59.297997 5033 recover.cpp:542] Updating replica status to STARTING I0717 04:39:59.301985 5031 master.cpp:288] Master 20140717-043959-16842879-40280-5009 (lucid) started on 127.0.1.1:40280 I0717 04:39:59.302026 5031 master.cpp:325] Master only allowing authenticated frameworks to register I0717 04:39:59.302032 5031 master.cpp:330] Master only allowing authenticated slaves to register I0717 04:39:59.302039 5031 credentials.hpp:36] Loading credentials for authentication from '/tmp/HealthCheckTest_ConsecutiveFailures_AzK0OV/credentials' I0717 04:39:59.302283 5031 master.cpp:359] Authorization enabled I0717 04:39:59.302971 5031 hierarchical_allocator_process.hpp:301] Initializing hierarchical allocator process with master : master@127.0.1.1:40280 I0717 04:39:59.303022 5031 master.cpp:122] No whitelist given. Advertising offers for all slaves I0717 04:39:59.303390 5033 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 5.325097ms I0717 04:39:59.303419 5033 replica.cpp:320] Persisted replica status to STARTING I0717 04:39:59.304076 5030 master.cpp:1128] The newly elected leader is master@127.0.1.1:40280 with id 20140717-043959-16842879-40280-5009 I0717 04:39:59.304095 5030 master.cpp:1141] Elected as the leading master! I0717 04:39:59.304102 5030 master.cpp:959] Recovering from registrar I0717 04:39:59.304182 5030 registrar.cpp:313] Recovering registrar I0717 04:39:59.304635 5033 recover.cpp:451] Replica is in STARTING status I0717 04:39:59.304962 5033 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I0717 04:39:59.305026 5033 recover.cpp:188] Received a recover response from a replica in STARTING status I0717 04:39:59.305130 5033 recover.cpp:542] Updating replica status to VOTING I0717 04:39:59.310416 5033 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 5.204157ms I0717 04:39:59.310459 5033 replica.cpp:320] Persisted replica status to VOTING I0717 04:39:59.310534 5033 recover.cpp:556] Successfully joined the Paxos group I0717 04:39:59.310607 5033 recover.cpp:440] Recover process terminated I0717 04:39:59.310773 5033 log.cpp:656] Attempting to start the writer I0717 04:39:59.311157 5033 replica.cpp:474] Replica received implicit promise request with proposal 1 I0717 04:39:59.313451 5033 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 2.271822ms I0717 04:39:59.313627 5033 replica.cpp:342] Persisted promised to 1 I0717 04:39:59.318038 5031 coordinator.cpp:230] Coordinator attemping to fill missing position I0717 04:39:59.318430 5031 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2 I0717 04:39:59.323459 5031 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 5.004323ms I0717 04:39:59.323493 5031 replica.cpp:676] Persisted action at 0 I0717 04:39:59.323799 5031 replica.cpp:508] Replica received write request for position 0 I0717 04:39:59.323837 5031 leveldb.cpp:438] Reading position from leveldb took 21901ns I0717 04:39:59.329038 5031 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 5.175998ms I0717 04:39:59.329244 5031
[jira] [Commented] (MESOS-1613) HealthCheckTest.ConsecutiveFailures is flaky
[ https://issues.apache.org/jira/browse/MESOS-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089385#comment-14089385 ] Timothy Chen commented on MESOS-1613: - Yes I will need the task logs and need your help to get output from Twitter machines as I can't repro this. I'll try to log the task output in the tests to help debug this. HealthCheckTest.ConsecutiveFailures is flaky Key: MESOS-1613 URL: https://issues.apache.org/jira/browse/MESOS-1613 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.20.0 Environment: Ubuntu 10.04 GCC Reporter: Vinod Kone Assignee: Timothy Chen {code} [ RUN ] HealthCheckTest.ConsecutiveFailures Using temporary directory '/tmp/HealthCheckTest_ConsecutiveFailures_AzK0OV' I0717 04:39:59.288471 5009 leveldb.cpp:176] Opened db in 21.575631ms I0717 04:39:59.295274 5009 leveldb.cpp:183] Compacted db in 6.471982ms I0717 04:39:59.295552 5009 leveldb.cpp:198] Created db iterator in 16783ns I0717 04:39:59.296026 5009 leveldb.cpp:204] Seeked to beginning of db in 2125ns I0717 04:39:59.296257 5009 leveldb.cpp:273] Iterated through 0 keys in the db in 10747ns I0717 04:39:59.296584 5009 replica.cpp:741] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I0717 04:39:59.297322 5033 recover.cpp:425] Starting replica recovery I0717 04:39:59.297413 5033 recover.cpp:451] Replica is in EMPTY status I0717 04:39:59.297824 5033 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I0717 04:39:59.297899 5033 recover.cpp:188] Received a recover response from a replica in EMPTY status I0717 04:39:59.297997 5033 recover.cpp:542] Updating replica status to STARTING I0717 04:39:59.301985 5031 master.cpp:288] Master 20140717-043959-16842879-40280-5009 (lucid) started on 127.0.1.1:40280 I0717 04:39:59.302026 5031 master.cpp:325] Master only allowing authenticated frameworks to register I0717 04:39:59.302032 5031 master.cpp:330] Master only allowing authenticated slaves to register I0717 04:39:59.302039 5031 credentials.hpp:36] Loading credentials for authentication from '/tmp/HealthCheckTest_ConsecutiveFailures_AzK0OV/credentials' I0717 04:39:59.302283 5031 master.cpp:359] Authorization enabled I0717 04:39:59.302971 5031 hierarchical_allocator_process.hpp:301] Initializing hierarchical allocator process with master : master@127.0.1.1:40280 I0717 04:39:59.303022 5031 master.cpp:122] No whitelist given. Advertising offers for all slaves I0717 04:39:59.303390 5033 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 5.325097ms I0717 04:39:59.303419 5033 replica.cpp:320] Persisted replica status to STARTING I0717 04:39:59.304076 5030 master.cpp:1128] The newly elected leader is master@127.0.1.1:40280 with id 20140717-043959-16842879-40280-5009 I0717 04:39:59.304095 5030 master.cpp:1141] Elected as the leading master! I0717 04:39:59.304102 5030 master.cpp:959] Recovering from registrar I0717 04:39:59.304182 5030 registrar.cpp:313] Recovering registrar I0717 04:39:59.304635 5033 recover.cpp:451] Replica is in STARTING status I0717 04:39:59.304962 5033 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I0717 04:39:59.305026 5033 recover.cpp:188] Received a recover response from a replica in STARTING status I0717 04:39:59.305130 5033 recover.cpp:542] Updating replica status to VOTING I0717 04:39:59.310416 5033 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 5.204157ms I0717 04:39:59.310459 5033 replica.cpp:320] Persisted replica status to VOTING I0717 04:39:59.310534 5033 recover.cpp:556] Successfully joined the Paxos group I0717 04:39:59.310607 5033 recover.cpp:440] Recover process terminated I0717 04:39:59.310773 5033 log.cpp:656] Attempting to start the writer I0717 04:39:59.311157 5033 replica.cpp:474] Replica received implicit promise request with proposal 1 I0717 04:39:59.313451 5033 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 2.271822ms I0717 04:39:59.313627 5033 replica.cpp:342] Persisted promised to 1 I0717 04:39:59.318038 5031 coordinator.cpp:230] Coordinator attemping to fill missing position I0717 04:39:59.318430 5031 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2 I0717 04:39:59.323459 5031 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 5.004323ms I0717 04:39:59.323493 5031 replica.cpp:676] Persisted action at 0 I0717 04:39:59.323799 5031 replica.cpp:508] Replica received write request for position 0 I0717 04:39:59.323837 5031 leveldb.cpp:438] Reading position from leveldb took 21901ns I0717 04:39:59.329038 5031
[jira] [Commented] (MESOS-1613) HealthCheckTest.ConsecutiveFailures is flaky
[ https://issues.apache.org/jira/browse/MESOS-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090065#comment-14090065 ] Benjamin Mahler commented on MESOS-1613: [~tnachen] it's failing on ASF CI as well, can you triage or disable in the interim? HealthCheckTest.ConsecutiveFailures is flaky Key: MESOS-1613 URL: https://issues.apache.org/jira/browse/MESOS-1613 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.20.0 Environment: Ubuntu 10.04 GCC Reporter: Vinod Kone Assignee: Timothy Chen {code} [ RUN ] HealthCheckTest.ConsecutiveFailures Using temporary directory '/tmp/HealthCheckTest_ConsecutiveFailures_AzK0OV' I0717 04:39:59.288471 5009 leveldb.cpp:176] Opened db in 21.575631ms I0717 04:39:59.295274 5009 leveldb.cpp:183] Compacted db in 6.471982ms I0717 04:39:59.295552 5009 leveldb.cpp:198] Created db iterator in 16783ns I0717 04:39:59.296026 5009 leveldb.cpp:204] Seeked to beginning of db in 2125ns I0717 04:39:59.296257 5009 leveldb.cpp:273] Iterated through 0 keys in the db in 10747ns I0717 04:39:59.296584 5009 replica.cpp:741] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I0717 04:39:59.297322 5033 recover.cpp:425] Starting replica recovery I0717 04:39:59.297413 5033 recover.cpp:451] Replica is in EMPTY status I0717 04:39:59.297824 5033 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I0717 04:39:59.297899 5033 recover.cpp:188] Received a recover response from a replica in EMPTY status I0717 04:39:59.297997 5033 recover.cpp:542] Updating replica status to STARTING I0717 04:39:59.301985 5031 master.cpp:288] Master 20140717-043959-16842879-40280-5009 (lucid) started on 127.0.1.1:40280 I0717 04:39:59.302026 5031 master.cpp:325] Master only allowing authenticated frameworks to register I0717 04:39:59.302032 5031 master.cpp:330] Master only allowing authenticated slaves to register I0717 04:39:59.302039 5031 credentials.hpp:36] Loading credentials for authentication from '/tmp/HealthCheckTest_ConsecutiveFailures_AzK0OV/credentials' I0717 04:39:59.302283 5031 master.cpp:359] Authorization enabled I0717 04:39:59.302971 5031 hierarchical_allocator_process.hpp:301] Initializing hierarchical allocator process with master : master@127.0.1.1:40280 I0717 04:39:59.303022 5031 master.cpp:122] No whitelist given. Advertising offers for all slaves I0717 04:39:59.303390 5033 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 5.325097ms I0717 04:39:59.303419 5033 replica.cpp:320] Persisted replica status to STARTING I0717 04:39:59.304076 5030 master.cpp:1128] The newly elected leader is master@127.0.1.1:40280 with id 20140717-043959-16842879-40280-5009 I0717 04:39:59.304095 5030 master.cpp:1141] Elected as the leading master! I0717 04:39:59.304102 5030 master.cpp:959] Recovering from registrar I0717 04:39:59.304182 5030 registrar.cpp:313] Recovering registrar I0717 04:39:59.304635 5033 recover.cpp:451] Replica is in STARTING status I0717 04:39:59.304962 5033 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I0717 04:39:59.305026 5033 recover.cpp:188] Received a recover response from a replica in STARTING status I0717 04:39:59.305130 5033 recover.cpp:542] Updating replica status to VOTING I0717 04:39:59.310416 5033 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 5.204157ms I0717 04:39:59.310459 5033 replica.cpp:320] Persisted replica status to VOTING I0717 04:39:59.310534 5033 recover.cpp:556] Successfully joined the Paxos group I0717 04:39:59.310607 5033 recover.cpp:440] Recover process terminated I0717 04:39:59.310773 5033 log.cpp:656] Attempting to start the writer I0717 04:39:59.311157 5033 replica.cpp:474] Replica received implicit promise request with proposal 1 I0717 04:39:59.313451 5033 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 2.271822ms I0717 04:39:59.313627 5033 replica.cpp:342] Persisted promised to 1 I0717 04:39:59.318038 5031 coordinator.cpp:230] Coordinator attemping to fill missing position I0717 04:39:59.318430 5031 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2 I0717 04:39:59.323459 5031 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 5.004323ms I0717 04:39:59.323493 5031 replica.cpp:676] Persisted action at 0 I0717 04:39:59.323799 5031 replica.cpp:508] Replica received write request for position 0 I0717 04:39:59.323837 5031 leveldb.cpp:438] Reading position from leveldb took 21901ns I0717 04:39:59.329038 5031 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 5.175998ms I0717
[jira] [Commented] (MESOS-1613) HealthCheckTest.ConsecutiveFailures is flaky
[ https://issues.apache.org/jira/browse/MESOS-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090266#comment-14090266 ] Vinod Kone commented on MESOS-1613: --- Chatted with [~tnachen] on IRC. Decided to temporarily disabled the test. He is going to send a review out with more instrumentation to the test to find the root cause. {code} commit 7814140991217209494f5ec0f266e1d7b0f1d897 Author: Vinod Kone vinodk...@gmail.com Date: Thu Aug 7 20:50:14 2014 -0700 Temporarily disabled HealthcheckTest.ConsecutiveFailures test due to MESOS-1613. {code} HealthCheckTest.ConsecutiveFailures is flaky Key: MESOS-1613 URL: https://issues.apache.org/jira/browse/MESOS-1613 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.20.0 Environment: Ubuntu 10.04 GCC Reporter: Vinod Kone Assignee: Timothy Chen {code} [ RUN ] HealthCheckTest.ConsecutiveFailures Using temporary directory '/tmp/HealthCheckTest_ConsecutiveFailures_AzK0OV' I0717 04:39:59.288471 5009 leveldb.cpp:176] Opened db in 21.575631ms I0717 04:39:59.295274 5009 leveldb.cpp:183] Compacted db in 6.471982ms I0717 04:39:59.295552 5009 leveldb.cpp:198] Created db iterator in 16783ns I0717 04:39:59.296026 5009 leveldb.cpp:204] Seeked to beginning of db in 2125ns I0717 04:39:59.296257 5009 leveldb.cpp:273] Iterated through 0 keys in the db in 10747ns I0717 04:39:59.296584 5009 replica.cpp:741] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I0717 04:39:59.297322 5033 recover.cpp:425] Starting replica recovery I0717 04:39:59.297413 5033 recover.cpp:451] Replica is in EMPTY status I0717 04:39:59.297824 5033 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I0717 04:39:59.297899 5033 recover.cpp:188] Received a recover response from a replica in EMPTY status I0717 04:39:59.297997 5033 recover.cpp:542] Updating replica status to STARTING I0717 04:39:59.301985 5031 master.cpp:288] Master 20140717-043959-16842879-40280-5009 (lucid) started on 127.0.1.1:40280 I0717 04:39:59.302026 5031 master.cpp:325] Master only allowing authenticated frameworks to register I0717 04:39:59.302032 5031 master.cpp:330] Master only allowing authenticated slaves to register I0717 04:39:59.302039 5031 credentials.hpp:36] Loading credentials for authentication from '/tmp/HealthCheckTest_ConsecutiveFailures_AzK0OV/credentials' I0717 04:39:59.302283 5031 master.cpp:359] Authorization enabled I0717 04:39:59.302971 5031 hierarchical_allocator_process.hpp:301] Initializing hierarchical allocator process with master : master@127.0.1.1:40280 I0717 04:39:59.303022 5031 master.cpp:122] No whitelist given. Advertising offers for all slaves I0717 04:39:59.303390 5033 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 5.325097ms I0717 04:39:59.303419 5033 replica.cpp:320] Persisted replica status to STARTING I0717 04:39:59.304076 5030 master.cpp:1128] The newly elected leader is master@127.0.1.1:40280 with id 20140717-043959-16842879-40280-5009 I0717 04:39:59.304095 5030 master.cpp:1141] Elected as the leading master! I0717 04:39:59.304102 5030 master.cpp:959] Recovering from registrar I0717 04:39:59.304182 5030 registrar.cpp:313] Recovering registrar I0717 04:39:59.304635 5033 recover.cpp:451] Replica is in STARTING status I0717 04:39:59.304962 5033 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I0717 04:39:59.305026 5033 recover.cpp:188] Received a recover response from a replica in STARTING status I0717 04:39:59.305130 5033 recover.cpp:542] Updating replica status to VOTING I0717 04:39:59.310416 5033 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 5.204157ms I0717 04:39:59.310459 5033 replica.cpp:320] Persisted replica status to VOTING I0717 04:39:59.310534 5033 recover.cpp:556] Successfully joined the Paxos group I0717 04:39:59.310607 5033 recover.cpp:440] Recover process terminated I0717 04:39:59.310773 5033 log.cpp:656] Attempting to start the writer I0717 04:39:59.311157 5033 replica.cpp:474] Replica received implicit promise request with proposal 1 I0717 04:39:59.313451 5033 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 2.271822ms I0717 04:39:59.313627 5033 replica.cpp:342] Persisted promised to 1 I0717 04:39:59.318038 5031 coordinator.cpp:230] Coordinator attemping to fill missing position I0717 04:39:59.318430 5031 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2 I0717 04:39:59.323459 5031 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 5.004323ms I0717 04:39:59.323493 5031 replica.cpp:676] Persisted action at 0
[jira] [Commented] (MESOS-1613) HealthCheckTest.ConsecutiveFailures is flaky
[ https://issues.apache.org/jira/browse/MESOS-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088425#comment-14088425 ] Benjamin Mahler commented on MESOS-1613: So far only Twitter CI is exposing this flakiness. I've pasted the full logs in the comment above, are you looking for logging from the command executor? We may want to investigate wiring up the tests to expose them in the output to make this easier for you to debug. HealthCheckTest.ConsecutiveFailures is flaky Key: MESOS-1613 URL: https://issues.apache.org/jira/browse/MESOS-1613 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.20.0 Environment: Ubuntu 10.04 GCC Reporter: Vinod Kone Assignee: Timothy Chen {code} [ RUN ] HealthCheckTest.ConsecutiveFailures Using temporary directory '/tmp/HealthCheckTest_ConsecutiveFailures_AzK0OV' I0717 04:39:59.288471 5009 leveldb.cpp:176] Opened db in 21.575631ms I0717 04:39:59.295274 5009 leveldb.cpp:183] Compacted db in 6.471982ms I0717 04:39:59.295552 5009 leveldb.cpp:198] Created db iterator in 16783ns I0717 04:39:59.296026 5009 leveldb.cpp:204] Seeked to beginning of db in 2125ns I0717 04:39:59.296257 5009 leveldb.cpp:273] Iterated through 0 keys in the db in 10747ns I0717 04:39:59.296584 5009 replica.cpp:741] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I0717 04:39:59.297322 5033 recover.cpp:425] Starting replica recovery I0717 04:39:59.297413 5033 recover.cpp:451] Replica is in EMPTY status I0717 04:39:59.297824 5033 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I0717 04:39:59.297899 5033 recover.cpp:188] Received a recover response from a replica in EMPTY status I0717 04:39:59.297997 5033 recover.cpp:542] Updating replica status to STARTING I0717 04:39:59.301985 5031 master.cpp:288] Master 20140717-043959-16842879-40280-5009 (lucid) started on 127.0.1.1:40280 I0717 04:39:59.302026 5031 master.cpp:325] Master only allowing authenticated frameworks to register I0717 04:39:59.302032 5031 master.cpp:330] Master only allowing authenticated slaves to register I0717 04:39:59.302039 5031 credentials.hpp:36] Loading credentials for authentication from '/tmp/HealthCheckTest_ConsecutiveFailures_AzK0OV/credentials' I0717 04:39:59.302283 5031 master.cpp:359] Authorization enabled I0717 04:39:59.302971 5031 hierarchical_allocator_process.hpp:301] Initializing hierarchical allocator process with master : master@127.0.1.1:40280 I0717 04:39:59.303022 5031 master.cpp:122] No whitelist given. Advertising offers for all slaves I0717 04:39:59.303390 5033 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 5.325097ms I0717 04:39:59.303419 5033 replica.cpp:320] Persisted replica status to STARTING I0717 04:39:59.304076 5030 master.cpp:1128] The newly elected leader is master@127.0.1.1:40280 with id 20140717-043959-16842879-40280-5009 I0717 04:39:59.304095 5030 master.cpp:1141] Elected as the leading master! I0717 04:39:59.304102 5030 master.cpp:959] Recovering from registrar I0717 04:39:59.304182 5030 registrar.cpp:313] Recovering registrar I0717 04:39:59.304635 5033 recover.cpp:451] Replica is in STARTING status I0717 04:39:59.304962 5033 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I0717 04:39:59.305026 5033 recover.cpp:188] Received a recover response from a replica in STARTING status I0717 04:39:59.305130 5033 recover.cpp:542] Updating replica status to VOTING I0717 04:39:59.310416 5033 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 5.204157ms I0717 04:39:59.310459 5033 replica.cpp:320] Persisted replica status to VOTING I0717 04:39:59.310534 5033 recover.cpp:556] Successfully joined the Paxos group I0717 04:39:59.310607 5033 recover.cpp:440] Recover process terminated I0717 04:39:59.310773 5033 log.cpp:656] Attempting to start the writer I0717 04:39:59.311157 5033 replica.cpp:474] Replica received implicit promise request with proposal 1 I0717 04:39:59.313451 5033 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 2.271822ms I0717 04:39:59.313627 5033 replica.cpp:342] Persisted promised to 1 I0717 04:39:59.318038 5031 coordinator.cpp:230] Coordinator attemping to fill missing position I0717 04:39:59.318430 5031 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2 I0717 04:39:59.323459 5031 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 5.004323ms I0717 04:39:59.323493 5031 replica.cpp:676] Persisted action at 0 I0717 04:39:59.323799 5031 replica.cpp:508] Replica received write request for position 0 I0717 04:39:59.323837 5031
[jira] [Commented] (MESOS-1613) HealthCheckTest.ConsecutiveFailures is flaky
[ https://issues.apache.org/jira/browse/MESOS-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072336#comment-14072336 ] Timothy Chen commented on MESOS-1613: - [~vi...@twitter.com] Vinod I have a reviewboard out already, I saw you commented earlier but can you try it or if it looks good commit it? Thanks! HealthCheckTest.ConsecutiveFailures is flaky Key: MESOS-1613 URL: https://issues.apache.org/jira/browse/MESOS-1613 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.20.0 Environment: Ubuntu 10.04 GCC Reporter: Vinod Kone Assignee: Timothy Chen {code} [ RUN ] HealthCheckTest.ConsecutiveFailures Using temporary directory '/tmp/HealthCheckTest_ConsecutiveFailures_AzK0OV' I0717 04:39:59.288471 5009 leveldb.cpp:176] Opened db in 21.575631ms I0717 04:39:59.295274 5009 leveldb.cpp:183] Compacted db in 6.471982ms I0717 04:39:59.295552 5009 leveldb.cpp:198] Created db iterator in 16783ns I0717 04:39:59.296026 5009 leveldb.cpp:204] Seeked to beginning of db in 2125ns I0717 04:39:59.296257 5009 leveldb.cpp:273] Iterated through 0 keys in the db in 10747ns I0717 04:39:59.296584 5009 replica.cpp:741] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I0717 04:39:59.297322 5033 recover.cpp:425] Starting replica recovery I0717 04:39:59.297413 5033 recover.cpp:451] Replica is in EMPTY status I0717 04:39:59.297824 5033 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I0717 04:39:59.297899 5033 recover.cpp:188] Received a recover response from a replica in EMPTY status I0717 04:39:59.297997 5033 recover.cpp:542] Updating replica status to STARTING I0717 04:39:59.301985 5031 master.cpp:288] Master 20140717-043959-16842879-40280-5009 (lucid) started on 127.0.1.1:40280 I0717 04:39:59.302026 5031 master.cpp:325] Master only allowing authenticated frameworks to register I0717 04:39:59.302032 5031 master.cpp:330] Master only allowing authenticated slaves to register I0717 04:39:59.302039 5031 credentials.hpp:36] Loading credentials for authentication from '/tmp/HealthCheckTest_ConsecutiveFailures_AzK0OV/credentials' I0717 04:39:59.302283 5031 master.cpp:359] Authorization enabled I0717 04:39:59.302971 5031 hierarchical_allocator_process.hpp:301] Initializing hierarchical allocator process with master : master@127.0.1.1:40280 I0717 04:39:59.303022 5031 master.cpp:122] No whitelist given. Advertising offers for all slaves I0717 04:39:59.303390 5033 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 5.325097ms I0717 04:39:59.303419 5033 replica.cpp:320] Persisted replica status to STARTING I0717 04:39:59.304076 5030 master.cpp:1128] The newly elected leader is master@127.0.1.1:40280 with id 20140717-043959-16842879-40280-5009 I0717 04:39:59.304095 5030 master.cpp:1141] Elected as the leading master! I0717 04:39:59.304102 5030 master.cpp:959] Recovering from registrar I0717 04:39:59.304182 5030 registrar.cpp:313] Recovering registrar I0717 04:39:59.304635 5033 recover.cpp:451] Replica is in STARTING status I0717 04:39:59.304962 5033 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I0717 04:39:59.305026 5033 recover.cpp:188] Received a recover response from a replica in STARTING status I0717 04:39:59.305130 5033 recover.cpp:542] Updating replica status to VOTING I0717 04:39:59.310416 5033 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 5.204157ms I0717 04:39:59.310459 5033 replica.cpp:320] Persisted replica status to VOTING I0717 04:39:59.310534 5033 recover.cpp:556] Successfully joined the Paxos group I0717 04:39:59.310607 5033 recover.cpp:440] Recover process terminated I0717 04:39:59.310773 5033 log.cpp:656] Attempting to start the writer I0717 04:39:59.311157 5033 replica.cpp:474] Replica received implicit promise request with proposal 1 I0717 04:39:59.313451 5033 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 2.271822ms I0717 04:39:59.313627 5033 replica.cpp:342] Persisted promised to 1 I0717 04:39:59.318038 5031 coordinator.cpp:230] Coordinator attemping to fill missing position I0717 04:39:59.318430 5031 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2 I0717 04:39:59.323459 5031 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 5.004323ms I0717 04:39:59.323493 5031 replica.cpp:676] Persisted action at 0 I0717 04:39:59.323799 5031 replica.cpp:508] Replica received write request for position 0 I0717 04:39:59.323837 5031 leveldb.cpp:438] Reading position from leveldb took 21901ns I0717 04:39:59.329038 5031 leveldb.cpp:343] Persisting action
[jira] [Commented] (MESOS-1613) HealthCheckTest.ConsecutiveFailures is flaky
[ https://issues.apache.org/jira/browse/MESOS-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065720#comment-14065720 ] Timothy Chen commented on MESOS-1613: - From the log it's actually failing because it couldn't find the healthy_state set when the task is killed, which is added in 169ec89. Both Apache Jenkins and on my Ubuntu they both passes, so I'm not really sure how this is failing as it should get picked up. How does the Twitter CI run? Does it do a make clean and re clone before running it again? HealthCheckTest.ConsecutiveFailures is flaky Key: MESOS-1613 URL: https://issues.apache.org/jira/browse/MESOS-1613 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.20.0 Environment: Ubuntu 10.04 GCC Reporter: Vinod Kone Assignee: Timothy Chen {code} [ RUN ] HealthCheckTest.ConsecutiveFailures Using temporary directory '/tmp/HealthCheckTest_ConsecutiveFailures_AzK0OV' I0717 04:39:59.288471 5009 leveldb.cpp:176] Opened db in 21.575631ms I0717 04:39:59.295274 5009 leveldb.cpp:183] Compacted db in 6.471982ms I0717 04:39:59.295552 5009 leveldb.cpp:198] Created db iterator in 16783ns I0717 04:39:59.296026 5009 leveldb.cpp:204] Seeked to beginning of db in 2125ns I0717 04:39:59.296257 5009 leveldb.cpp:273] Iterated through 0 keys in the db in 10747ns I0717 04:39:59.296584 5009 replica.cpp:741] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I0717 04:39:59.297322 5033 recover.cpp:425] Starting replica recovery I0717 04:39:59.297413 5033 recover.cpp:451] Replica is in EMPTY status I0717 04:39:59.297824 5033 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I0717 04:39:59.297899 5033 recover.cpp:188] Received a recover response from a replica in EMPTY status I0717 04:39:59.297997 5033 recover.cpp:542] Updating replica status to STARTING I0717 04:39:59.301985 5031 master.cpp:288] Master 20140717-043959-16842879-40280-5009 (lucid) started on 127.0.1.1:40280 I0717 04:39:59.302026 5031 master.cpp:325] Master only allowing authenticated frameworks to register I0717 04:39:59.302032 5031 master.cpp:330] Master only allowing authenticated slaves to register I0717 04:39:59.302039 5031 credentials.hpp:36] Loading credentials for authentication from '/tmp/HealthCheckTest_ConsecutiveFailures_AzK0OV/credentials' I0717 04:39:59.302283 5031 master.cpp:359] Authorization enabled I0717 04:39:59.302971 5031 hierarchical_allocator_process.hpp:301] Initializing hierarchical allocator process with master : master@127.0.1.1:40280 I0717 04:39:59.303022 5031 master.cpp:122] No whitelist given. Advertising offers for all slaves I0717 04:39:59.303390 5033 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 5.325097ms I0717 04:39:59.303419 5033 replica.cpp:320] Persisted replica status to STARTING I0717 04:39:59.304076 5030 master.cpp:1128] The newly elected leader is master@127.0.1.1:40280 with id 20140717-043959-16842879-40280-5009 I0717 04:39:59.304095 5030 master.cpp:1141] Elected as the leading master! I0717 04:39:59.304102 5030 master.cpp:959] Recovering from registrar I0717 04:39:59.304182 5030 registrar.cpp:313] Recovering registrar I0717 04:39:59.304635 5033 recover.cpp:451] Replica is in STARTING status I0717 04:39:59.304962 5033 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I0717 04:39:59.305026 5033 recover.cpp:188] Received a recover response from a replica in STARTING status I0717 04:39:59.305130 5033 recover.cpp:542] Updating replica status to VOTING I0717 04:39:59.310416 5033 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 5.204157ms I0717 04:39:59.310459 5033 replica.cpp:320] Persisted replica status to VOTING I0717 04:39:59.310534 5033 recover.cpp:556] Successfully joined the Paxos group I0717 04:39:59.310607 5033 recover.cpp:440] Recover process terminated I0717 04:39:59.310773 5033 log.cpp:656] Attempting to start the writer I0717 04:39:59.311157 5033 replica.cpp:474] Replica received implicit promise request with proposal 1 I0717 04:39:59.313451 5033 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 2.271822ms I0717 04:39:59.313627 5033 replica.cpp:342] Persisted promised to 1 I0717 04:39:59.318038 5031 coordinator.cpp:230] Coordinator attemping to fill missing position I0717 04:39:59.318430 5031 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2 I0717 04:39:59.323459 5031 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 5.004323ms I0717 04:39:59.323493 5031 replica.cpp:676] Persisted action at 0 I0717 04:39:59.323799 5031
[jira] [Commented] (MESOS-1613) HealthCheckTest.ConsecutiveFailures is flaky
[ https://issues.apache.org/jira/browse/MESOS-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065888#comment-14065888 ] Vinod Kone commented on MESOS-1613: --- Disabled on HEAD. Updated mesos-tnachen Jenkins build (which watches the health_check branch of yours) as follows: GLOG_v=1 MESOS_VERBOSE=1 make distcheck GTEST_SHUFFLE=1 GTEST_FILTER=*ConsecutiveFailures* GTEST_REPEAT=100 GTEST_BREAK_ON_FAILURE=1 Lets hope we can repro. Also, feel free to add debugging info in your branch and push upstream to your repo. HealthCheckTest.ConsecutiveFailures is flaky Key: MESOS-1613 URL: https://issues.apache.org/jira/browse/MESOS-1613 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.20.0 Environment: Ubuntu 10.04 GCC Reporter: Vinod Kone Assignee: Timothy Chen {code} [ RUN ] HealthCheckTest.ConsecutiveFailures Using temporary directory '/tmp/HealthCheckTest_ConsecutiveFailures_AzK0OV' I0717 04:39:59.288471 5009 leveldb.cpp:176] Opened db in 21.575631ms I0717 04:39:59.295274 5009 leveldb.cpp:183] Compacted db in 6.471982ms I0717 04:39:59.295552 5009 leveldb.cpp:198] Created db iterator in 16783ns I0717 04:39:59.296026 5009 leveldb.cpp:204] Seeked to beginning of db in 2125ns I0717 04:39:59.296257 5009 leveldb.cpp:273] Iterated through 0 keys in the db in 10747ns I0717 04:39:59.296584 5009 replica.cpp:741] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I0717 04:39:59.297322 5033 recover.cpp:425] Starting replica recovery I0717 04:39:59.297413 5033 recover.cpp:451] Replica is in EMPTY status I0717 04:39:59.297824 5033 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I0717 04:39:59.297899 5033 recover.cpp:188] Received a recover response from a replica in EMPTY status I0717 04:39:59.297997 5033 recover.cpp:542] Updating replica status to STARTING I0717 04:39:59.301985 5031 master.cpp:288] Master 20140717-043959-16842879-40280-5009 (lucid) started on 127.0.1.1:40280 I0717 04:39:59.302026 5031 master.cpp:325] Master only allowing authenticated frameworks to register I0717 04:39:59.302032 5031 master.cpp:330] Master only allowing authenticated slaves to register I0717 04:39:59.302039 5031 credentials.hpp:36] Loading credentials for authentication from '/tmp/HealthCheckTest_ConsecutiveFailures_AzK0OV/credentials' I0717 04:39:59.302283 5031 master.cpp:359] Authorization enabled I0717 04:39:59.302971 5031 hierarchical_allocator_process.hpp:301] Initializing hierarchical allocator process with master : master@127.0.1.1:40280 I0717 04:39:59.303022 5031 master.cpp:122] No whitelist given. Advertising offers for all slaves I0717 04:39:59.303390 5033 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 5.325097ms I0717 04:39:59.303419 5033 replica.cpp:320] Persisted replica status to STARTING I0717 04:39:59.304076 5030 master.cpp:1128] The newly elected leader is master@127.0.1.1:40280 with id 20140717-043959-16842879-40280-5009 I0717 04:39:59.304095 5030 master.cpp:1141] Elected as the leading master! I0717 04:39:59.304102 5030 master.cpp:959] Recovering from registrar I0717 04:39:59.304182 5030 registrar.cpp:313] Recovering registrar I0717 04:39:59.304635 5033 recover.cpp:451] Replica is in STARTING status I0717 04:39:59.304962 5033 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I0717 04:39:59.305026 5033 recover.cpp:188] Received a recover response from a replica in STARTING status I0717 04:39:59.305130 5033 recover.cpp:542] Updating replica status to VOTING I0717 04:39:59.310416 5033 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 5.204157ms I0717 04:39:59.310459 5033 replica.cpp:320] Persisted replica status to VOTING I0717 04:39:59.310534 5033 recover.cpp:556] Successfully joined the Paxos group I0717 04:39:59.310607 5033 recover.cpp:440] Recover process terminated I0717 04:39:59.310773 5033 log.cpp:656] Attempting to start the writer I0717 04:39:59.311157 5033 replica.cpp:474] Replica received implicit promise request with proposal 1 I0717 04:39:59.313451 5033 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 2.271822ms I0717 04:39:59.313627 5033 replica.cpp:342] Persisted promised to 1 I0717 04:39:59.318038 5031 coordinator.cpp:230] Coordinator attemping to fill missing position I0717 04:39:59.318430 5031 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2 I0717 04:39:59.323459 5031 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 5.004323ms I0717 04:39:59.323493 5031 replica.cpp:676] Persisted action at 0 I0717 04:39:59.323799 5031