[jira] [Commented] (MESOS-2255) SlaveRecoveryTest/0.MasterFailover is flaky

2015-10-15 Thread Yong Qiao Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14958448#comment-14958448
 ] 

Yong Qiao Wang commented on MESOS-2255:
---

[~xujyan], I ran the test case SlaveRecoveryTest/0.MasterFailover again on OS 
X(10.10.4), but I found it work well:

{noformat:title=}
Yongs-MacBook-Pro:bin yqwyq$ ./mesos-tests.sh 
--gtest_filter=SlaveRecoveryTest/0.MasterFailover
..
..
[==] Running 1 test from 1 test case.
[--] Global test environment set-up.
[--] 1 test from SlaveRecoveryTest/0, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[ RUN  ] SlaveRecoveryTest/0.MasterFailover
I1015 14:58:55.538914 1939460864 exec.cpp:136] Version: 0.26.0
..
..
[   OK ] SlaveRecoveryTest/0.MasterFailover (1397 ms)
[--] 1 test from SlaveRecoveryTest/0 (1397 ms total)

[--] Global test environment tear-down
[==] 1 test from 1 test case ran. (1406 ms total)
[  PASSED  ] 1 test.
{noformat:title=}

Could you let me know which OS/version you ran this case?

> SlaveRecoveryTest/0.MasterFailover is flaky
> ---
>
> Key: MESOS-2255
> URL: https://issues.apache.org/jira/browse/MESOS-2255
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.22.0
>Reporter: Yan Xu
>Assignee: Yong Qiao Wang
>  Labels: flaky, twitter
>
> {noformat:title=}
> [ RUN  ] SlaveRecoveryTest/0.MasterFailover
> Using temporary directory '/tmp/SlaveRecoveryTest_0_MasterFailover_dtF7o0'
> I0123 07:45:49.818686 17634 leveldb.cpp:176] Opened db in 31.195549ms
> I0123 07:45:49.821962 17634 leveldb.cpp:183] Compacted db in 3.190936ms
> I0123 07:45:49.822049 17634 leveldb.cpp:198] Created db iterator in 47324ns
> I0123 07:45:49.822069 17634 leveldb.cpp:204] Seeked to beginning of db in 
> 2038ns
> I0123 07:45:49.822084 17634 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 484ns
> I0123 07:45:49.822160 17634 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0123 07:45:49.824241 17660 recover.cpp:449] Starting replica recovery
> I0123 07:45:49.825217 17660 recover.cpp:475] Replica is in EMPTY status
> I0123 07:45:49.827020 17660 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0123 07:45:49.827453 17659 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0123 07:45:49.828047 17659 recover.cpp:566] Updating replica status to 
> STARTING
> I0123 07:45:49.838543 17659 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 10.24963ms
> I0123 07:45:49.838580 17659 replica.cpp:323] Persisted replica status to 
> STARTING
> I0123 07:45:49.848836 17659 recover.cpp:475] Replica is in STARTING status
> I0123 07:45:49.850039 17659 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0123 07:45:49.850286 17659 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0123 07:45:49.850754 17659 recover.cpp:566] Updating replica status to VOTING
> I0123 07:45:49.853698 17655 master.cpp:262] Master 
> 20150123-074549-16842879-44955-17634 (utopic) started on 127.0.1.1:44955
> I0123 07:45:49.853981 17655 master.cpp:308] Master only allowing 
> authenticated frameworks to register
> I0123 07:45:49.853997 17655 master.cpp:313] Master only allowing 
> authenticated slaves to register
> I0123 07:45:49.854038 17655 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/SlaveRecoveryTest_0_MasterFailover_dtF7o0/credentials'
> I0123 07:45:49.854557 17655 master.cpp:357] Authorization enabled
> I0123 07:45:49.859633 17659 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 8.742923ms
> I0123 07:45:49.859853 17659 replica.cpp:323] Persisted replica status to 
> VOTING
> I0123 07:45:49.860327 17658 recover.cpp:580] Successfully joined the Paxos 
> group
> I0123 07:45:49.860703 17654 recover.cpp:464] Recover process terminated
> I0123 07:45:49.859591 17655 master.cpp:1219] The newly elected leader is 
> master@127.0.1.1:44955 with id 20150123-074549-16842879-44955-17634
> I0123 07:45:49.864702 17655 master.cpp:1232] Elected as the leading master!
> I0123 07:45:49.864904 17655 master.cpp:1050] Recovering from registrar
> I0123 07:45:49.865406 17660 registrar.cpp:313] Recovering registrar
> I0123 07:45:49.866576 17660 log.cpp:660] Attempting to start the writer
> I0123 07:45:49.868638 17658 replica.cpp:477] Replica received implicit 
> promise request with proposal 1
> I0123 07:45:49.872521 17658 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 3.848859ms
> I0123 07:45:49.872555 17658 replica.cpp:345] Persisted promised to 1
> I0123 07:45:49.873769 17661 coordinator.cpp:230] Coordinator attempin

[jira] [Commented] (MESOS-2255) SlaveRecoveryTest/0.MasterFailover is flaky

2015-10-13 Thread Yong Qiao Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14956331#comment-14956331
 ] 

Yong Qiao Wang commented on MESOS-2255:
---

I will re-run this test case and fix it if it is still a problem.

> SlaveRecoveryTest/0.MasterFailover is flaky
> ---
>
> Key: MESOS-2255
> URL: https://issues.apache.org/jira/browse/MESOS-2255
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.22.0
>Reporter: Yan Xu
>Assignee: Yong Qiao Wang
>  Labels: flaky, twitter
>
> {noformat:title=}
> [ RUN  ] SlaveRecoveryTest/0.MasterFailover
> Using temporary directory '/tmp/SlaveRecoveryTest_0_MasterFailover_dtF7o0'
> I0123 07:45:49.818686 17634 leveldb.cpp:176] Opened db in 31.195549ms
> I0123 07:45:49.821962 17634 leveldb.cpp:183] Compacted db in 3.190936ms
> I0123 07:45:49.822049 17634 leveldb.cpp:198] Created db iterator in 47324ns
> I0123 07:45:49.822069 17634 leveldb.cpp:204] Seeked to beginning of db in 
> 2038ns
> I0123 07:45:49.822084 17634 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 484ns
> I0123 07:45:49.822160 17634 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0123 07:45:49.824241 17660 recover.cpp:449] Starting replica recovery
> I0123 07:45:49.825217 17660 recover.cpp:475] Replica is in EMPTY status
> I0123 07:45:49.827020 17660 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0123 07:45:49.827453 17659 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0123 07:45:49.828047 17659 recover.cpp:566] Updating replica status to 
> STARTING
> I0123 07:45:49.838543 17659 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 10.24963ms
> I0123 07:45:49.838580 17659 replica.cpp:323] Persisted replica status to 
> STARTING
> I0123 07:45:49.848836 17659 recover.cpp:475] Replica is in STARTING status
> I0123 07:45:49.850039 17659 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0123 07:45:49.850286 17659 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0123 07:45:49.850754 17659 recover.cpp:566] Updating replica status to VOTING
> I0123 07:45:49.853698 17655 master.cpp:262] Master 
> 20150123-074549-16842879-44955-17634 (utopic) started on 127.0.1.1:44955
> I0123 07:45:49.853981 17655 master.cpp:308] Master only allowing 
> authenticated frameworks to register
> I0123 07:45:49.853997 17655 master.cpp:313] Master only allowing 
> authenticated slaves to register
> I0123 07:45:49.854038 17655 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/SlaveRecoveryTest_0_MasterFailover_dtF7o0/credentials'
> I0123 07:45:49.854557 17655 master.cpp:357] Authorization enabled
> I0123 07:45:49.859633 17659 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 8.742923ms
> I0123 07:45:49.859853 17659 replica.cpp:323] Persisted replica status to 
> VOTING
> I0123 07:45:49.860327 17658 recover.cpp:580] Successfully joined the Paxos 
> group
> I0123 07:45:49.860703 17654 recover.cpp:464] Recover process terminated
> I0123 07:45:49.859591 17655 master.cpp:1219] The newly elected leader is 
> master@127.0.1.1:44955 with id 20150123-074549-16842879-44955-17634
> I0123 07:45:49.864702 17655 master.cpp:1232] Elected as the leading master!
> I0123 07:45:49.864904 17655 master.cpp:1050] Recovering from registrar
> I0123 07:45:49.865406 17660 registrar.cpp:313] Recovering registrar
> I0123 07:45:49.866576 17660 log.cpp:660] Attempting to start the writer
> I0123 07:45:49.868638 17658 replica.cpp:477] Replica received implicit 
> promise request with proposal 1
> I0123 07:45:49.872521 17658 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 3.848859ms
> I0123 07:45:49.872555 17658 replica.cpp:345] Persisted promised to 1
> I0123 07:45:49.873769 17661 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0123 07:45:49.875474 17658 replica.cpp:378] Replica received explicit 
> promise request for position 0 with proposal 2
> I0123 07:45:49.880878 17658 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 5.364021ms
> I0123 07:45:49.880913 17658 replica.cpp:679] Persisted action at 0
> I0123 07:45:49.882619 17657 replica.cpp:511] Replica received write request 
> for position 0
> I0123 07:45:49.882998 17657 leveldb.cpp:438] Reading position from leveldb 
> took 150092ns
> I0123 07:45:49.886488 17657 leveldb.cpp:343] Persisting action (14 bytes) to 
> leveldb took 3.269189ms
> I0123 07:45:49.886536 17657 replica.cpp:679] Persisted action at 0
> I0123 07:45:49.887181 17657 replica.cpp:658] Replica received learned notice 
> for position 0
> I0123 07:45:49.892900 17657 leveldb.c