Benjamin Mahler created MESOS-1789:
--------------------------------------

             Summary: MasterTest.RecoveredSlaveReregisters is flaky.
                 Key: MESOS-1789
                 URL: https://issues.apache.org/jira/browse/MESOS-1789
             Project: Mesos
          Issue Type: Bug
          Components: test
            Reporter: Benjamin Mahler
            Priority: Minor


Seen flaky on a fedora 19 VM w/ clang.

{noformat}
[ RUN      ] MasterTest.RecoveredSlaveReregisters
Using temporary directory '/tmp/MasterTest_RecoveredSlaveReregisters_CHREru'
I0910 23:37:24.522372 22914 leveldb.cpp:176] Opened db in 978us
I0910 23:37:24.522948 22914 leveldb.cpp:183] Compacted db in 554320ns
I0910 23:37:24.522981 22914 leveldb.cpp:198] Created db iterator in 15459ns
I0910 23:37:24.523000 22914 leveldb.cpp:204] Seeked to beginning of db in 9593ns
I0910 23:37:24.523020 22914 leveldb.cpp:273] Iterated through 0 keys in the db 
in 9137ns
I0910 23:37:24.523043 22914 replica.cpp:741] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0910 23:37:24.525143 22935 recover.cpp:425] Starting replica recovery
I0910 23:37:24.525266 22935 recover.cpp:451] Replica is in EMPTY status
I0910 23:37:24.525774 22935 replica.cpp:638] Replica in EMPTY status received a 
broadcasted recover request
I0910 23:37:24.525871 22935 recover.cpp:188] Received a recover response from a 
replica in EMPTY status
I0910 23:37:24.526028 22935 recover.cpp:542] Updating replica status to STARTING
I0910 23:37:24.526180 22935 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 83617ns
I0910 23:37:24.526211 22935 replica.cpp:320] Persisted replica status to 
STARTING
I0910 23:37:24.526283 22935 recover.cpp:451] Replica is in STARTING status
I0910 23:37:24.526725 22935 replica.cpp:638] Replica in STARTING status 
received a broadcasted recover request
I0910 23:37:24.526813 22935 recover.cpp:188] Received a recover response from a 
replica in STARTING status
I0910 23:37:24.526964 22935 recover.cpp:542] Updating replica status to VOTING
I0910 23:37:24.527061 22935 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 44802ns
I0910 23:37:24.527091 22935 replica.cpp:320] Persisted replica status to VOTING
I0910 23:37:24.527139 22935 recover.cpp:556] Successfully joined the Paxos group
I0910 23:37:24.527230 22935 recover.cpp:440] Recover process terminated
I0910 23:37:24.527748 22928 master.cpp:286] Master 
20140910-233724-2272962752-36006-22914 (fedora-19) started on 
192.168.122.135:36006
I0910 23:37:24.527807 22928 master.cpp:332] Master only allowing authenticated 
frameworks to register
I0910 23:37:24.527827 22928 master.cpp:337] Master only allowing authenticated 
slaves to register
I0910 23:37:24.527849 22928 credentials.hpp:36] Loading credentials for 
authentication from 
'/tmp/MasterTest_RecoveredSlaveReregisters_CHREru/credentials'
I0910 23:37:24.528890 22928 master.cpp:366] Authorization enabled
I0910 23:37:24.529822 22928 hierarchical_allocator_process.hpp:299] 
Initializing hierarchical allocator process with master : 
master@192.168.122.135:36006
I0910 23:37:24.529903 22928 master.cpp:120] No whitelist given. Advertising 
offers for all slaves
I0910 23:37:24.530275 22928 master.cpp:1212] The newly elected leader is 
master@192.168.122.135:36006 with id 20140910-233724-2272962752-36006-22914
I0910 23:37:24.530311 22928 master.cpp:1225] Elected as the leading master!
I0910 23:37:24.530328 22928 master.cpp:1043] Recovering from registrar
I0910 23:37:24.530426 22928 registrar.cpp:313] Recovering registrar
I0910 23:37:24.530993 22928 log.cpp:656] Attempting to start the writer
I0910 23:37:24.531601 22928 replica.cpp:474] Replica received implicit promise 
request with proposal 1
I0910 23:37:24.531677 22928 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 60319ns
I0910 23:37:24.531707 22928 replica.cpp:342] Persisted promised to 1
I0910 23:37:24.532016 22928 coordinator.cpp:230] Coordinator attemping to fill 
missing position
I0910 23:37:24.532691 22928 replica.cpp:375] Replica received explicit promise 
request for position 0 with proposal 2
I0910 23:37:24.532752 22928 leveldb.cpp:343] Persisting action (8 bytes) to 
leveldb took 45735ns
I0910 23:37:24.532783 22928 replica.cpp:676] Persisted action at 0
I0910 23:37:24.533252 22928 replica.cpp:508] Replica received write request for 
position 0
I0910 23:37:24.533299 22928 leveldb.cpp:438] Reading position from leveldb took 
34066ns
I0910 23:37:24.533354 22928 leveldb.cpp:343] Persisting action (14 bytes) to 
leveldb took 37637ns
I0910 23:37:24.533381 22928 replica.cpp:676] Persisted action at 0
I0910 23:37:24.533701 22928 replica.cpp:655] Replica received learned notice 
for position 0
I0910 23:37:24.533757 22928 leveldb.cpp:343] Persisting action (16 bytes) to 
leveldb took 42842ns
I0910 23:37:24.533785 22928 replica.cpp:676] Persisted action at 0
I0910 23:37:24.533804 22928 replica.cpp:661] Replica learned NOP action at 
position 0
I0910 23:37:24.534191 22928 log.cpp:672] Writer started with ending position 0
I0910 23:37:24.534695 22928 leveldb.cpp:438] Reading position from leveldb took 
36602ns
I0910 23:37:24.536397 22928 registrar.cpp:346] Successfully fetched the 
registry (0B)
I0910 23:37:24.536440 22928 registrar.cpp:422] Attempting to update the 
'registry'
I0910 23:37:24.538182 22928 log.cpp:680] Attempting to append 131 bytes to the 
log
I0910 23:37:24.538269 22928 coordinator.cpp:340] Coordinator attempting to 
write APPEND action at position 1
I0910 23:37:24.538710 22928 replica.cpp:508] Replica received write request for 
position 1
I0910 23:37:24.538791 22928 leveldb.cpp:343] Persisting action (150 bytes) to 
leveldb took 63482ns
I0910 23:37:24.538822 22928 replica.cpp:676] Persisted action at 1
I0910 23:37:24.539145 22928 replica.cpp:655] Replica received learned notice 
for position 1
I0910 23:37:24.539204 22928 leveldb.cpp:343] Persisting action (152 bytes) to 
leveldb took 44797ns
I0910 23:37:24.539233 22928 replica.cpp:676] Persisted action at 1
I0910 23:37:24.539252 22928 replica.cpp:661] Replica learned APPEND action at 
position 1
I0910 23:37:24.539824 22928 registrar.cpp:479] Successfully updated 'registry'
I0910 23:37:24.539904 22928 registrar.cpp:372] Successfully recovered registrar
I0910 23:37:24.539988 22928 log.cpp:699] Attempting to truncate the log to 1
I0910 23:37:24.540114 22928 master.cpp:1070] Recovered 0 slaves from the 
Registry (95B) ; allowing 10mins for slaves to re-register
I0910 23:37:24.540201 22928 coordinator.cpp:340] Coordinator attempting to 
write TRUNCATE action at position 2
I0910 23:37:24.540669 22928 replica.cpp:508] Replica received write request for 
position 2
I0910 23:37:24.540729 22928 leveldb.cpp:343] Persisting action (16 bytes) to 
leveldb took 44502ns
I0910 23:37:24.540757 22928 replica.cpp:676] Persisted action at 2
I0910 23:37:24.541095 22928 replica.cpp:655] Replica received learned notice 
for position 2
I0910 23:37:24.541153 22928 leveldb.cpp:343] Persisting action (18 bytes) to 
leveldb took 43832ns
I0910 23:37:24.541194 22928 leveldb.cpp:401] Deleting ~1 keys from leveldb took 
29490ns
I0910 23:37:24.541220 22928 replica.cpp:676] Persisted action at 2
I0910 23:37:24.541239 22928 replica.cpp:661] Replica learned TRUNCATE action at 
position 2
I0910 23:37:24.553033 22914 containerizer.cpp:89] Using isolation: 
posix/cpu,posix/mem
I0910 23:37:24.556795 22928 slave.cpp:169] Slave started on 
59)@192.168.122.135:36006
I0910 23:37:24.556851 22928 credentials.hpp:84] Loading credential for 
authentication from 
'/tmp/MasterTest_RecoveredSlaveReregisters_kkerkd/credential'
I0910 23:37:24.556998 22928 slave.cpp:276] Slave using credential for: 
test-principal
I0910 23:37:24.557144 22928 slave.cpp:289] Slave resources: cpus(*):2; 
mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I0910 23:37:24.557618 22928 slave.cpp:317] Slave hostname: fedora-19
I0910 23:37:24.557644 22928 slave.cpp:318] Slave checkpoint: true
I0910 23:37:24.558248 22928 state.cpp:33] Recovering state from 
'/tmp/MasterTest_RecoveredSlaveReregisters_kkerkd/meta'
I0910 23:37:24.558413 22928 status_update_manager.cpp:193] Recovering status 
update manager
I0910 23:37:24.558584 22928 containerizer.cpp:252] Recovering containerizer
I0910 23:37:24.559092 22928 slave.cpp:3219] Finished recovery
I0910 23:37:24.559608 22928 slave.cpp:600] New master detected at 
master@192.168.122.135:36006
I0910 23:37:24.559656 22928 slave.cpp:674] Authenticating with master 
master@192.168.122.135:36006
I0910 23:37:24.559743 22928 slave.cpp:647] Detecting new master
I0910 23:37:24.559816 22928 status_update_manager.cpp:167] New master detected 
at master@192.168.122.135:36006
I0910 23:37:24.559878 22928 authenticatee.hpp:128] Creating new client SASL 
connection
I0910 23:37:24.560583 22928 master.cpp:3653] Authenticating 
slave(59)@192.168.122.135:36006
I0910 23:37:24.560736 22928 authenticator.hpp:156] Creating new server SASL 
connection
I0910 23:37:24.561318 22928 authenticatee.hpp:219] Received SASL authentication 
mechanisms: CRAM-MD5
I0910 23:37:24.561362 22928 authenticatee.hpp:245] Attempting to authenticate 
with mechanism 'CRAM-MD5'
I0910 23:37:24.561420 22928 authenticator.hpp:262] Received SASL authentication 
start
I0910 23:37:24.561486 22928 authenticator.hpp:384] Authentication requires more 
steps
I0910 23:37:24.561590 22928 authenticatee.hpp:265] Received SASL authentication 
step
I0910 23:37:24.561666 22928 authenticator.hpp:290] Received SASL authentication 
step
I0910 23:37:24.561702 22928 auxprop.cpp:81] Request to lookup properties for 
user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' 
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
SASL_AUXPROP_AUTHZID: false 
I0910 23:37:24.561720 22928 auxprop.cpp:153] Looking up auxiliary property 
'*userPassword'
I0910 23:37:24.561741 22928 auxprop.cpp:153] Looking up auxiliary property 
'*cmusaslsecretCRAM-MD5'
I0910 23:37:24.561764 22928 auxprop.cpp:81] Request to lookup properties for 
user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' 
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
SASL_AUXPROP_AUTHZID: true 
I0910 23:37:24.561781 22928 auxprop.cpp:103] Skipping auxiliary property 
'*userPassword' since SASL_AUXPROP_AUTHZID == true
I0910 23:37:24.561795 22928 auxprop.cpp:103] Skipping auxiliary property 
'*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0910 23:37:24.561817 22928 authenticator.hpp:376] Authentication success
I0910 23:37:24.561874 22928 authenticatee.hpp:305] Authentication success
I0910 23:37:24.561924 22928 master.cpp:3693] Successfully authenticated 
principal 'test-principal' at slave(59)@192.168.122.135:36006
I0910 23:37:24.562062 22928 slave.cpp:731] Successfully authenticated with 
master master@192.168.122.135:36006
Itests/master_tests.cpp:1703: Failure
0910 23:37:24.562134 22928 slave.cpp:994] Will retry registration in 
12.871919ms if necessary
I0910 23:37:24.562263 22935 master.cpp:2843] Registering slave at 
slave(59)@192.168.122.135:36006 (fedora-19) with id 
20140910-233724-2272962752-36006-22914-0
I0910 23:37:25.531041 22930 hierarchical_allocator_process.hpp:697] No 
resources available to allocate!
I0910 23:37:35.832602 22930 hierarchical_allocator_process.hpp:659] Performed 
allocation for 0 slaves in 10.301614677secs
I0910 23:37:35.832757 22930 registrar.cpp:422] Attempting to update the 
'registry'
I0910 23:37:29.530727 22929 master.cpp:120] No whitelist given. Advertising 
offers for all slaves
Failed to wait 10secs for slaveRegisteredMessage
I0910 23:37:35.834206 22935 master.cpp:650] Master terminating
I0910 23:37:35.834398 22935 slave.cpp:2378] master@192.168.122.135:36006 exited
W0910 23:37:35.834429 22935 slave.cpp:2381] Master disconnected! Waiting for a 
new master to be elected
I0910 23:37:35.835822 22929 log.cpp:680] Attempting to append 312 bytes to the 
log
I0910 23:37:35.835921 22929 coordinator.cpp:340] Coordinator attempting to 
write APPEND action at position 3
I0910 23:37:35.836328 22929 replica.cpp:508] Replica received write request for 
position 3
I0910 23:37:35.836452 22929 leveldb.cpp:343] Persisting action (331 bytes) to 
leveldb took 95913ns
I0910 23:37:35.836467 22929 replica.cpp:676] Persisted action at 3
I0910 23:37:35.836714 22929 replica.cpp:655] Replica received learned notice 
for position 3
I0910 23:37:35.836751 22929 leveldb.cpp:343] Persisting action (333 bytes) to 
leveldb took 25990ns
I0910 23:37:35.836762 22929 replica.cpp:676] Persisted action at 3
I0910 23:37:35.836771 22929 replica.cpp:661] Replica learned APPEND action at 
position 3
I0910 23:37:35.837183 22929 log.cpp:699] Attempting to truncate the log to 3
I0910 23:37:35.837225 22929 coordinator.cpp:340] Coordinator attempting to 
write TRUNCATE action at position 4
I0910 23:37:35.837359 22929 replica.cpp:508] Replica received write request for 
position 4
I0910 23:37:35.837393 22929 leveldb.cpp:343] Persisting action (16 bytes) to 
leveldb took 22808ns
I0910 23:37:35.837404 22929 replica.cpp:676] Persisted action at 4
I0910 23:37:35.837700 22931 replica.cpp:655] Replica received learned notice 
for position 4
I0910 23:37:35.837745 22931 leveldb.cpp:343] Persisting action (18 bytes) to 
leveldb took 33314ns
I0910 23:37:35.837765 22931 leveldb.cpp:401] Deleting ~2 keys from leveldb took 
13548ns
I0910 23:37:35.837775 22931 replica.cpp:676] Persisted action at 4
I0910 23:37:35.837782 22931 replica.cpp:661] Replica learned TRUNCATE action at 
position 4
I0910 23:37:35.839639 22914 slave.cpp:477] Slave terminating
../3rdparty/libprocess/include/process/gmock.hpp:282: Failure
Actual function call count doesn't match EXPECT_CALL(filter->mock, 
filter(testing::A<const MessageEvent&>()))...
    Expected args: message matcher (8-byte object <28-3C 02-68 ED-7F 00-00>, 1, 
1-byte object <60>)
         Expected: to be called once
           Actual: never called - unsatisfied and active
[  FAILED  ] MasterTest.RecoveredSlaveReregisters (11320 ms)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to