[ https://issues.apache.org/jira/browse/MESOS-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Benjamin Mahler resolved MESOS-1789. ------------------------------------ Resolution: Cannot Reproduce Looks like the disk on this VM may have been blocked for quite some time, some strange interleaving of time in the log lines. Will re-open if this crops up again. > MasterTest.RecoveredSlaveReregisters is flaky. > ---------------------------------------------- > > Key: MESOS-1789 > URL: https://issues.apache.org/jira/browse/MESOS-1789 > Project: Mesos > Issue Type: Bug > Components: test > Reporter: Benjamin Mahler > Priority: Minor > > Seen flaky on a fedora 19 VM w/ clang. > {noformat} > [ RUN ] MasterTest.RecoveredSlaveReregisters > Using temporary directory '/tmp/MasterTest_RecoveredSlaveReregisters_CHREru' > I0910 23:37:24.522372 22914 leveldb.cpp:176] Opened db in 978us > I0910 23:37:24.522948 22914 leveldb.cpp:183] Compacted db in 554320ns > I0910 23:37:24.522981 22914 leveldb.cpp:198] Created db iterator in 15459ns > I0910 23:37:24.523000 22914 leveldb.cpp:204] Seeked to beginning of db in > 9593ns > I0910 23:37:24.523020 22914 leveldb.cpp:273] Iterated through 0 keys in the > db in 9137ns > I0910 23:37:24.523043 22914 replica.cpp:741] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0910 23:37:24.525143 22935 recover.cpp:425] Starting replica recovery > I0910 23:37:24.525266 22935 recover.cpp:451] Replica is in EMPTY status > I0910 23:37:24.525774 22935 replica.cpp:638] Replica in EMPTY status received > a broadcasted recover request > I0910 23:37:24.525871 22935 recover.cpp:188] Received a recover response from > a replica in EMPTY status > I0910 23:37:24.526028 22935 recover.cpp:542] Updating replica status to > STARTING > I0910 23:37:24.526180 22935 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 83617ns > I0910 23:37:24.526211 22935 replica.cpp:320] Persisted replica status to > STARTING > I0910 23:37:24.526283 22935 recover.cpp:451] Replica is in STARTING status > I0910 23:37:24.526725 22935 replica.cpp:638] Replica in STARTING status > received a broadcasted recover request > I0910 23:37:24.526813 22935 recover.cpp:188] Received a recover response from > a replica in STARTING status > I0910 23:37:24.526964 22935 recover.cpp:542] Updating replica status to VOTING > I0910 23:37:24.527061 22935 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 44802ns > I0910 23:37:24.527091 22935 replica.cpp:320] Persisted replica status to > VOTING > I0910 23:37:24.527139 22935 recover.cpp:556] Successfully joined the Paxos > group > I0910 23:37:24.527230 22935 recover.cpp:440] Recover process terminated > I0910 23:37:24.527748 22928 master.cpp:286] Master > 20140910-233724-2272962752-36006-22914 (fedora-19) started on > 192.168.122.135:36006 > I0910 23:37:24.527807 22928 master.cpp:332] Master only allowing > authenticated frameworks to register > I0910 23:37:24.527827 22928 master.cpp:337] Master only allowing > authenticated slaves to register > I0910 23:37:24.527849 22928 credentials.hpp:36] Loading credentials for > authentication from > '/tmp/MasterTest_RecoveredSlaveReregisters_CHREru/credentials' > I0910 23:37:24.528890 22928 master.cpp:366] Authorization enabled > I0910 23:37:24.529822 22928 hierarchical_allocator_process.hpp:299] > Initializing hierarchical allocator process with master : > master@192.168.122.135:36006 > I0910 23:37:24.529903 22928 master.cpp:120] No whitelist given. Advertising > offers for all slaves > I0910 23:37:24.530275 22928 master.cpp:1212] The newly elected leader is > master@192.168.122.135:36006 with id 20140910-233724-2272962752-36006-22914 > I0910 23:37:24.530311 22928 master.cpp:1225] Elected as the leading master! > I0910 23:37:24.530328 22928 master.cpp:1043] Recovering from registrar > I0910 23:37:24.530426 22928 registrar.cpp:313] Recovering registrar > I0910 23:37:24.530993 22928 log.cpp:656] Attempting to start the writer > I0910 23:37:24.531601 22928 replica.cpp:474] Replica received implicit > promise request with proposal 1 > I0910 23:37:24.531677 22928 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 60319ns > I0910 23:37:24.531707 22928 replica.cpp:342] Persisted promised to 1 > I0910 23:37:24.532016 22928 coordinator.cpp:230] Coordinator attemping to > fill missing position > I0910 23:37:24.532691 22928 replica.cpp:375] Replica received explicit > promise request for position 0 with proposal 2 > I0910 23:37:24.532752 22928 leveldb.cpp:343] Persisting action (8 bytes) to > leveldb took 45735ns > I0910 23:37:24.532783 22928 replica.cpp:676] Persisted action at 0 > I0910 23:37:24.533252 22928 replica.cpp:508] Replica received write request > for position 0 > I0910 23:37:24.533299 22928 leveldb.cpp:438] Reading position from leveldb > took 34066ns > I0910 23:37:24.533354 22928 leveldb.cpp:343] Persisting action (14 bytes) to > leveldb took 37637ns > I0910 23:37:24.533381 22928 replica.cpp:676] Persisted action at 0 > I0910 23:37:24.533701 22928 replica.cpp:655] Replica received learned notice > for position 0 > I0910 23:37:24.533757 22928 leveldb.cpp:343] Persisting action (16 bytes) to > leveldb took 42842ns > I0910 23:37:24.533785 22928 replica.cpp:676] Persisted action at 0 > I0910 23:37:24.533804 22928 replica.cpp:661] Replica learned NOP action at > position 0 > I0910 23:37:24.534191 22928 log.cpp:672] Writer started with ending position 0 > I0910 23:37:24.534695 22928 leveldb.cpp:438] Reading position from leveldb > took 36602ns > I0910 23:37:24.536397 22928 registrar.cpp:346] Successfully fetched the > registry (0B) > I0910 23:37:24.536440 22928 registrar.cpp:422] Attempting to update the > 'registry' > I0910 23:37:24.538182 22928 log.cpp:680] Attempting to append 131 bytes to > the log > I0910 23:37:24.538269 22928 coordinator.cpp:340] Coordinator attempting to > write APPEND action at position 1 > I0910 23:37:24.538710 22928 replica.cpp:508] Replica received write request > for position 1 > I0910 23:37:24.538791 22928 leveldb.cpp:343] Persisting action (150 bytes) to > leveldb took 63482ns > I0910 23:37:24.538822 22928 replica.cpp:676] Persisted action at 1 > I0910 23:37:24.539145 22928 replica.cpp:655] Replica received learned notice > for position 1 > I0910 23:37:24.539204 22928 leveldb.cpp:343] Persisting action (152 bytes) to > leveldb took 44797ns > I0910 23:37:24.539233 22928 replica.cpp:676] Persisted action at 1 > I0910 23:37:24.539252 22928 replica.cpp:661] Replica learned APPEND action at > position 1 > I0910 23:37:24.539824 22928 registrar.cpp:479] Successfully updated 'registry' > I0910 23:37:24.539904 22928 registrar.cpp:372] Successfully recovered > registrar > I0910 23:37:24.539988 22928 log.cpp:699] Attempting to truncate the log to 1 > I0910 23:37:24.540114 22928 master.cpp:1070] Recovered 0 slaves from the > Registry (95B) ; allowing 10mins for slaves to re-register > I0910 23:37:24.540201 22928 coordinator.cpp:340] Coordinator attempting to > write TRUNCATE action at position 2 > I0910 23:37:24.540669 22928 replica.cpp:508] Replica received write request > for position 2 > I0910 23:37:24.540729 22928 leveldb.cpp:343] Persisting action (16 bytes) to > leveldb took 44502ns > I0910 23:37:24.540757 22928 replica.cpp:676] Persisted action at 2 > I0910 23:37:24.541095 22928 replica.cpp:655] Replica received learned notice > for position 2 > I0910 23:37:24.541153 22928 leveldb.cpp:343] Persisting action (18 bytes) to > leveldb took 43832ns > I0910 23:37:24.541194 22928 leveldb.cpp:401] Deleting ~1 keys from leveldb > took 29490ns > I0910 23:37:24.541220 22928 replica.cpp:676] Persisted action at 2 > I0910 23:37:24.541239 22928 replica.cpp:661] Replica learned TRUNCATE action > at position 2 > I0910 23:37:24.553033 22914 containerizer.cpp:89] Using isolation: > posix/cpu,posix/mem > I0910 23:37:24.556795 22928 slave.cpp:169] Slave started on > 59)@192.168.122.135:36006 > I0910 23:37:24.556851 22928 credentials.hpp:84] Loading credential for > authentication from > '/tmp/MasterTest_RecoveredSlaveReregisters_kkerkd/credential' > I0910 23:37:24.556998 22928 slave.cpp:276] Slave using credential for: > test-principal > I0910 23:37:24.557144 22928 slave.cpp:289] Slave resources: cpus(*):2; > mem(*):1024; disk(*):1024; ports(*):[31000-32000] > I0910 23:37:24.557618 22928 slave.cpp:317] Slave hostname: fedora-19 > I0910 23:37:24.557644 22928 slave.cpp:318] Slave checkpoint: true > I0910 23:37:24.558248 22928 state.cpp:33] Recovering state from > '/tmp/MasterTest_RecoveredSlaveReregisters_kkerkd/meta' > I0910 23:37:24.558413 22928 status_update_manager.cpp:193] Recovering status > update manager > I0910 23:37:24.558584 22928 containerizer.cpp:252] Recovering containerizer > I0910 23:37:24.559092 22928 slave.cpp:3219] Finished recovery > I0910 23:37:24.559608 22928 slave.cpp:600] New master detected at > master@192.168.122.135:36006 > I0910 23:37:24.559656 22928 slave.cpp:674] Authenticating with master > master@192.168.122.135:36006 > I0910 23:37:24.559743 22928 slave.cpp:647] Detecting new master > I0910 23:37:24.559816 22928 status_update_manager.cpp:167] New master > detected at master@192.168.122.135:36006 > I0910 23:37:24.559878 22928 authenticatee.hpp:128] Creating new client SASL > connection > I0910 23:37:24.560583 22928 master.cpp:3653] Authenticating > slave(59)@192.168.122.135:36006 > I0910 23:37:24.560736 22928 authenticator.hpp:156] Creating new server SASL > connection > I0910 23:37:24.561318 22928 authenticatee.hpp:219] Received SASL > authentication mechanisms: CRAM-MD5 > I0910 23:37:24.561362 22928 authenticatee.hpp:245] Attempting to authenticate > with mechanism 'CRAM-MD5' > I0910 23:37:24.561420 22928 authenticator.hpp:262] Received SASL > authentication start > I0910 23:37:24.561486 22928 authenticator.hpp:384] Authentication requires > more steps > I0910 23:37:24.561590 22928 authenticatee.hpp:265] Received SASL > authentication step > I0910 23:37:24.561666 22928 authenticator.hpp:290] Received SASL > authentication step > I0910 23:37:24.561702 22928 auxprop.cpp:81] Request to lookup properties for > user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' > SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false > SASL_AUXPROP_AUTHZID: false > I0910 23:37:24.561720 22928 auxprop.cpp:153] Looking up auxiliary property > '*userPassword' > I0910 23:37:24.561741 22928 auxprop.cpp:153] Looking up auxiliary property > '*cmusaslsecretCRAM-MD5' > I0910 23:37:24.561764 22928 auxprop.cpp:81] Request to lookup properties for > user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' > SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false > SASL_AUXPROP_AUTHZID: true > I0910 23:37:24.561781 22928 auxprop.cpp:103] Skipping auxiliary property > '*userPassword' since SASL_AUXPROP_AUTHZID == true > I0910 23:37:24.561795 22928 auxprop.cpp:103] Skipping auxiliary property > '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true > I0910 23:37:24.561817 22928 authenticator.hpp:376] Authentication success > I0910 23:37:24.561874 22928 authenticatee.hpp:305] Authentication success > I0910 23:37:24.561924 22928 master.cpp:3693] Successfully authenticated > principal 'test-principal' at slave(59)@192.168.122.135:36006 > I0910 23:37:24.562062 22928 slave.cpp:731] Successfully authenticated with > master master@192.168.122.135:36006 > Itests/master_tests.cpp:1703: Failure > 0910 23:37:24.562134 22928 slave.cpp:994] Will retry registration in > 12.871919ms if necessary > I0910 23:37:24.562263 22935 master.cpp:2843] Registering slave at > slave(59)@192.168.122.135:36006 (fedora-19) with id > 20140910-233724-2272962752-36006-22914-0 > I0910 23:37:25.531041 22930 hierarchical_allocator_process.hpp:697] No > resources available to allocate! > I0910 23:37:35.832602 22930 hierarchical_allocator_process.hpp:659] Performed > allocation for 0 slaves in 10.301614677secs > I0910 23:37:35.832757 22930 registrar.cpp:422] Attempting to update the > 'registry' > I0910 23:37:29.530727 22929 master.cpp:120] No whitelist given. Advertising > offers for all slaves > Failed to wait 10secs for slaveRegisteredMessage > I0910 23:37:35.834206 22935 master.cpp:650] Master terminating > I0910 23:37:35.834398 22935 slave.cpp:2378] master@192.168.122.135:36006 > exited > W0910 23:37:35.834429 22935 slave.cpp:2381] Master disconnected! Waiting for > a new master to be elected > I0910 23:37:35.835822 22929 log.cpp:680] Attempting to append 312 bytes to > the log > I0910 23:37:35.835921 22929 coordinator.cpp:340] Coordinator attempting to > write APPEND action at position 3 > I0910 23:37:35.836328 22929 replica.cpp:508] Replica received write request > for position 3 > I0910 23:37:35.836452 22929 leveldb.cpp:343] Persisting action (331 bytes) to > leveldb took 95913ns > I0910 23:37:35.836467 22929 replica.cpp:676] Persisted action at 3 > I0910 23:37:35.836714 22929 replica.cpp:655] Replica received learned notice > for position 3 > I0910 23:37:35.836751 22929 leveldb.cpp:343] Persisting action (333 bytes) to > leveldb took 25990ns > I0910 23:37:35.836762 22929 replica.cpp:676] Persisted action at 3 > I0910 23:37:35.836771 22929 replica.cpp:661] Replica learned APPEND action at > position 3 > I0910 23:37:35.837183 22929 log.cpp:699] Attempting to truncate the log to 3 > I0910 23:37:35.837225 22929 coordinator.cpp:340] Coordinator attempting to > write TRUNCATE action at position 4 > I0910 23:37:35.837359 22929 replica.cpp:508] Replica received write request > for position 4 > I0910 23:37:35.837393 22929 leveldb.cpp:343] Persisting action (16 bytes) to > leveldb took 22808ns > I0910 23:37:35.837404 22929 replica.cpp:676] Persisted action at 4 > I0910 23:37:35.837700 22931 replica.cpp:655] Replica received learned notice > for position 4 > I0910 23:37:35.837745 22931 leveldb.cpp:343] Persisting action (18 bytes) to > leveldb took 33314ns > I0910 23:37:35.837765 22931 leveldb.cpp:401] Deleting ~2 keys from leveldb > took 13548ns > I0910 23:37:35.837775 22931 replica.cpp:676] Persisted action at 4 > I0910 23:37:35.837782 22931 replica.cpp:661] Replica learned TRUNCATE action > at position 4 > I0910 23:37:35.839639 22914 slave.cpp:477] Slave terminating > ../3rdparty/libprocess/include/process/gmock.hpp:282: Failure > Actual function call count doesn't match EXPECT_CALL(filter->mock, > filter(testing::A<const MessageEvent&>()))... > Expected args: message matcher (8-byte object <28-3C 02-68 ED-7F 00-00>, > 1, 1-byte object <60>) > Expected: to be called once > Actual: never called - unsatisfied and active > [ FAILED ] MasterTest.RecoveredSlaveReregisters (11320 ms) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)