----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23868/#review48685 -----------------------------------------------------------
Chatted with BenM. We've been having this problem: doReliableRegistration calls itself and thus forms a loop which stops when the a "registered" message is received. The starting point of the loop starts when a new master is detected so whenever a new master is detected with the loop still running, a new loop is created and this can go on and on. This doesn't lead to incorrect slave state but generates more events in the slave process and consume more CPU/MEM. We could probalby come up with some "loop" abstraction to handle these tasks safely. This is not a big concern for now as the doReliableRegistration loop is not a tight one and condition under which multiple loops a created is relatively rare. src/master/constants.hpp <https://reviews.apache.org/r/23868/#comment85433> What is this for? src/slave/slave.cpp <https://reviews.apache.org/r/23868/#comment85431> Should we check "pingTimer.timeout().expired()?" If the slave receives a ping before the timer times out but its queue backed up and thus the timer isn't cancelled. The timer then times out and dispatches a redetect() that is executed after ping(), we don't really need to redetect right? - Jiang Yan Xu On July 23, 2014, 7:55 p.m., Ben Mahler wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/23868/ > ----------------------------------------------------------- > > (Updated July 23, 2014, 7:55 p.m.) > > > Review request for mesos, Vinod Kone and Jiang Yan Xu. > > > Bugs: MESOS-1529 > https://issues.apache.org/jira/browse/MESOS-1529 > > > Repository: mesos-git > > > Description > ------- > > This is the first step in MESOS-1529. > > If we get into a situation where the slave thinks it is registered, but the > master does not, then the slave should re-register. This situation can be > often be detected on the slave side when the slave is no longer receiving > pings from the master. > > > Diffs > ----- > > src/master/constants.hpp 8ace682bc58e4fae65038906a4abec5879f35020 > src/slave/constants.hpp 97dc1b30fa81000ea60223c4059a0a64d27e91c4 > src/slave/constants.cpp a75b1ef8eddeb55350810b36ac35136d2e5d6f9d > src/slave/slave.hpp a896bb66db5d8cd27ef02b6498c9db93cb0d525f > src/slave/slave.cpp 1d5691836822c8587e1aa8ed24860a8012c67a6e > src/tests/slave_tests.cpp e45255a6f699e51bf09397da95a5a11edbabe591 > > Diff: https://reviews.apache.org/r/23868/diff/ > > > Testing > ------- > > Added tests. > > > Thanks, > > Ben Mahler > >