[ https://issues.apache.org/jira/browse/MESOS-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Benjamin Mahler resolved MESOS-1803. ------------------------------------ Resolution: Cannot Reproduce The log timings here look as if the threads were starved of CPU: {noformat} I0916 22:59:02.136256 21049 leveldb.cpp:343] Persisting action (165 bytes) to leveldb took 141908ns I0916 22:59:02.136267 21047 leveldb.cpp:343] Persisting action (165 bytes) to leveldb took 111061ns I../../src/tests/registrar_tests.cpp:257: Failure 0916 22:59:02.136276 21049 replica.cpp:676] Persisted action at 1 Failed to wait 10secs for registrar.recover(master) I0916 22:59:14.265326 21049 replica.cpp:661] Replica learned APPEND action at position 1 I0916 22:59:02.136291 21047 replica.cpp:676] Persisted action at 1 E0916 22:59:07.135143 21046 registrar.cpp:500] Registrar aborting: Failed to update 'registry': Failed to perform store within 5secs I0916 22:59:14.265393 21047 replica.cpp:661] Replica learned APPEND action at position 1 {noformat} The logging time stamp is determined at the beginning of the LOG(INFO) expression, when the initial LogMessage object is created. The interleaving of times looks to be a stall of the VM or thread starvation: {noformat} 22:59:02.136267 21047 // Thread 1, 1st LogMessage flushed. 22:59:02.136276 21049 // Thread 2, 2nd LogMessage flushed. 22:59:14.265326 21049 // Thread 2, 5th LogMessage flushed. 22:59:02.136291 21047 // Thread 1, 3rd LogMessage flushed. 22:59:07.135143 21046 // Thread 3, 4th LogMessage flushed. 22:59:14.265393 21047 // Thread 1, 6th LogMessage flushed. {noformat} > Strict/RegistrarTest.remove test is flaky on jenkins. > ----------------------------------------------------- > > Key: MESOS-1803 > URL: https://issues.apache.org/jira/browse/MESOS-1803 > Project: Mesos > Issue Type: Bug > Components: test > Reporter: Benjamin Mahler > Assignee: Benjamin Mahler > > https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2374/consoleFull > {noformat} > [ RUN ] Strict/RegistrarTest.remove/1 > Using temporary directory '/tmp/Strict_RegistrarTest_remove_1_3QvnOW' > I0916 22:59:02.112568 21026 leveldb.cpp:176] Opened db in 1.779835ms > I0916 22:59:02.112896 21026 leveldb.cpp:183] Compacted db in 301862ns > I0916 22:59:02.112916 21026 leveldb.cpp:198] Created db iterator in 3065ns > I0916 22:59:02.112926 21026 leveldb.cpp:204] Seeked to beginning of db in > 475ns > I0916 22:59:02.112936 21026 leveldb.cpp:273] Iterated through 0 keys in the > db in 330ns > I0916 22:59:02.112951 21026 replica.cpp:741] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0916 22:59:02.113654 21054 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 421460ns > I0916 22:59:02.113674 21054 replica.cpp:320] Persisted replica status to > VOTING > I0916 22:59:02.115900 21026 leveldb.cpp:176] Opened db in 1.947919ms > I0916 22:59:02.116263 21026 leveldb.cpp:183] Compacted db in 338043ns > I0916 22:59:02.116283 21026 leveldb.cpp:198] Created db iterator in 2809ns > I0916 22:59:02.116293 21026 leveldb.cpp:204] Seeked to beginning of db in > 468ns > I0916 22:59:02.116302 21026 leveldb.cpp:273] Iterated through 0 keys in the > db in 195ns > I0916 22:59:02.116317 21026 replica.cpp:741] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0916 22:59:02.117013 21043 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 472891ns > I0916 22:59:02.117034 21043 replica.cpp:320] Persisted replica status to > VOTING > I0916 22:59:02.119240 21026 leveldb.cpp:176] Opened db in 1.950367ms > I0916 22:59:02.120455 21026 leveldb.cpp:183] Compacted db in 1.188056ms > I0916 22:59:02.120481 21026 leveldb.cpp:198] Created db iterator in 4370ns > I0916 22:59:02.120499 21026 leveldb.cpp:204] Seeked to beginning of db in > 7977ns > I0916 22:59:02.120517 21026 leveldb.cpp:273] Iterated through 1 keys in the > db in 8479ns > I0916 22:59:02.120533 21026 replica.cpp:741] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0916 22:59:02.122890 21026 leveldb.cpp:176] Opened db in 2.301327ms > I0916 22:59:02.124325 21026 leveldb.cpp:183] Compacted db in 1.406223ms > I0916 22:59:02.124351 21026 leveldb.cpp:198] Created db iterator in 4185ns > I0916 22:59:02.124368 21026 leveldb.cpp:204] Seeked to beginning of db in > 7167ns > I0916 22:59:02.124387 21026 leveldb.cpp:273] Iterated through 1 keys in the > db in 8182ns > I0916 22:59:02.124403 21026 replica.cpp:741] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0916 22:59:02.124579 21047 recover.cpp:425] Starting replica recovery > I0916 22:59:02.124651 21047 recover.cpp:451] Replica is in VOTING status > I0916 22:59:02.124793 21047 recover.cpp:440] Recover process terminated > I0916 22:59:02.126404 21046 registrar.cpp:313] Recovering registrar > I0916 22:59:02.126597 21050 log.cpp:656] Attempting to start the writer > I0916 22:59:02.127259 21041 replica.cpp:474] Replica received implicit > promise request with proposal 1 > I0916 22:59:02.127321 21050 replica.cpp:474] Replica received implicit > promise request with proposal 1 > I0916 22:59:02.127835 21041 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 547018ns > I0916 22:59:02.127858 21041 replica.cpp:342] Persisted promised to 1 > I0916 22:59:02.127835 21050 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 487588ns > I0916 22:59:02.127887 21050 replica.cpp:342] Persisted promised to 1 > I0916 22:59:02.128387 21055 coordinator.cpp:230] Coordinator attemping to > fill missing position > I0916 22:59:02.129546 21042 replica.cpp:375] Replica received explicit > promise request for position 0 with proposal 2 > I0916 22:59:02.129600 21053 replica.cpp:375] Replica received explicit > promise request for position 0 with proposal 2 > I0916 22:59:02.129982 21042 leveldb.cpp:343] Persisting action (8 bytes) to > leveldb took 406954ns > I0916 22:59:02.129982 21053 leveldb.cpp:343] Persisting action (8 bytes) to > leveldb took 357253ns > I0916 22:59:02.130009 21042 replica.cpp:676] Persisted action at 0 > I0916 22:59:02.130029 21053 replica.cpp:676] Persisted action at 0 > I0916 22:59:02.130543 21041 replica.cpp:508] Replica received write request > for position 0 > I0916 22:59:02.130585 21041 leveldb.cpp:438] Reading position from leveldb > took 17424ns > I0916 22:59:02.130599 21046 replica.cpp:508] Replica received write request > for position 0 > I0916 22:59:02.130635 21046 leveldb.cpp:438] Reading position from leveldb > took 12702ns > I0916 22:59:02.130728 21041 leveldb.cpp:343] Persisting action (14 bytes) to > leveldb took 112749ns > I0916 22:59:02.130749 21041 replica.cpp:676] Persisted action at 0 > I0916 22:59:02.130795 21046 leveldb.cpp:343] Persisting action (14 bytes) to > leveldb took 131300ns > I0916 22:59:02.130820 21046 replica.cpp:676] Persisted action at 0 > I0916 22:59:02.131093 21042 replica.cpp:655] Replica received learned notice > for position 0 > I0916 22:59:02.131163 21048 replica.cpp:655] Replica received learned notice > for position 0 > I0916 22:59:02.131232 21042 leveldb.cpp:343] Persisting action (16 bytes) to > leveldb took 117351ns > I0916 22:59:02.131256 21042 replica.cpp:676] Persisted action at 0 > I0916 22:59:02.131273 21042 replica.cpp:661] Replica learned NOP action at > position 0 > I0916 22:59:02.131340 21048 leveldb.cpp:343] Persisting action (16 bytes) to > leveldb took 148690ns > I0916 22:59:02.131358 21048 replica.cpp:676] Persisted action at 0 > I0916 22:59:02.131369 21048 replica.cpp:661] Replica learned NOP action at > position 0 > I0916 22:59:02.131569 21046 log.cpp:672] Writer started with ending position 0 > I0916 22:59:02.131950 21055 leveldb.cpp:438] Reading position from leveldb > took 15477ns > I0916 22:59:02.133436 21052 registrar.cpp:346] Successfully fetched the > registry (0B) > I0916 22:59:02.133471 21052 registrar.cpp:422] Attempting to update the > 'registry' > I0916 22:59:02.134899 21049 log.cpp:680] Attempting to append 144 bytes to > the log > I0916 22:59:02.134968 21048 coordinator.cpp:340] Coordinator attempting to > write APPEND action at position 1 > I0916 22:59:02.135306 21042 replica.cpp:508] Replica received write request > for position 1 > I0916 22:59:02.135407 21052 replica.cpp:508] Replica received write request > for position 1 > I0916 22:59:02.135717 21052 leveldb.cpp:343] Persisting action (163 bytes) to > leveldb took 279603ns > I0916 22:59:02.135721 21042 leveldb.cpp:343] Persisting action (163 bytes) to > leveldb took 388839ns > I0916 22:59:02.135742 21052 replica.cpp:676] Persisted action at 1 > I0916 22:59:02.135752 21042 replica.cpp:676] Persisted action at 1 > I0916 22:59:02.136090 21049 replica.cpp:655] Replica received learned notice > for position 1 > I0916 22:59:02.136127 21047 replica.cpp:655] Replica received learned notice > for position 1 > I0916 22:59:02.136256 21049 leveldb.cpp:343] Persisting action (165 bytes) to > leveldb took 141908ns > I0916 22:59:02.136267 21047 leveldb.cpp:343] Persisting action (165 bytes) to > leveldb took 111061ns > I../../src/tests/registrar_tests.cpp:257: Failure > 0916 22:59:02.136276 21049 replica.cpp:676] Persisted action at 1 > Failed to wait 10secs for registrar.recover(master) > I0916 22:59:14.265326 21049 replica.cpp:661] Replica learned APPEND action at > position 1 > I0916 22:59:02.136291 21047 replica.cpp:676] Persisted action at 1 > E0916 22:59:07.135143 21046 registrar.cpp:500] Registrar aborting: Failed to > update 'registry': Failed to perform store within 5secs > I0916 22:59:14.265393 21047 replica.cpp:661] Replica learned APPEND action at > position 1 > [ FAILED ] Strict/RegistrarTest.remove/1, where GetParam() = true (12156 ms) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)