----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/19383/#review37918 -----------------------------------------------------------
src/master/master.hpp <https://reviews.apache.org/r/19383/#comment69713> Comment? src/master/master.hpp <https://reviews.apache.org/r/19383/#comment69715> s/Master/master/ to be consistent with the rest of the code base. AFAICT, we use Master when we refer to the class or its methods (Master::foo) but 'master' when we talk about the entity. src/master/master.hpp <https://reviews.apache.org/r/19383/#comment69714> What does first time mean? First time after a master failover? First time ever? src/master/master.hpp <https://reviews.apache.org/r/19383/#comment69716> Why is Registrar in caps? s/Registrar/registrar/ to be consistent. Here and everywhere else. src/master/master.hpp <https://reviews.apache.org/r/19383/#comment69717> s/Registrar/registrar/ src/master/master.hpp <https://reviews.apache.org/r/19383/#comment69719> Why cache instead of a circular buffer? src/master/master.cpp <https://reviews.apache.org/r/19383/#comment69720> But they could still fire if they are already dispatched and are in the queue right? Would that cause problems? src/master/master.cpp <https://reviews.apache.org/r/19383/#comment69769> Are you worried about multiple slave (re-)registration and update messages getting queued up? If yes, can you please add that to the comment? src/master/master.cpp <https://reviews.apache.org/r/19383/#comment69754> CHECK(elected()); src/master/master.cpp <https://reviews.apache.org/r/19383/#comment69772> s/re-register/re-register within timeout/ src/master/master.cpp <https://reviews.apache.org/r/19383/#comment69797> why the change? what does activated mean? src/master/master.cpp <https://reviews.apache.org/r/19383/#comment69799> s/slaveInfo/_slaveInfo/ and s/copy/slaveInfo/ ? src/master/master.cpp <https://reviews.apache.org/r/19383/#comment69800> s/copy/_slaveInfo/ ? src/master/master.cpp <https://reviews.apache.org/r/19383/#comment69804> s/reply/send(from,/ ? src/master/master.cpp <https://reviews.apache.org/r/19383/#comment69819> Why dereferencing here instead of passing the pointer? src/master/master.cpp <https://reviews.apache.org/r/19383/#comment69826> Why not use the getSlave() helper? src/master/master.cpp <https://reviews.apache.org/r/19383/#comment69827> ditto. src/master/master.cpp <https://reviews.apache.org/r/19383/#comment69828> s/if only/if and only/ src/master/master.cpp <https://reviews.apache.org/r/19383/#comment69847> Why do it this way? Why not do the bulk of this method in _removeSlave() ? src/master/master.cpp <https://reviews.apache.org/r/19383/#comment69848> This almost feels like it could use a 'state' variable in the Slave struct to transition it through various states. Have you considered that option? src/master/master.cpp <https://reviews.apache.org/r/19383/#comment69850> also log the hostname. - Vinod Kone On March 19, 2014, 12:57 a.m., Ben Mahler wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/19383/ > ----------------------------------------------------------- > > (Updated March 19, 2014, 12:57 a.m.) > > > Review request for mesos, Benjamin Hindman and Vinod Kone. > > > Bugs: MESOS-764 > https://issues.apache.org/jira/browse/MESOS-764 > > > Repository: mesos-git > > > Description > ------- > > This implements the Registry-backed Master, with the following exceptions > that will be addressed in follow up changes: > > -Note that the --registry_strict flag is enforced to be false in > master/main.cpp. > -Reconciliation remains unimplemented as before. > -Improvements can be made to killTask, specifically we should add SlaveID to > the message in order to drop fewer requests for unknown slaves. > -Orthogonally, this does not address MESOS-682. > > I've updated 'deactivated' slaves to be a cache of SlaveIDs rather than UPIDs > as this was the intent originally (we were concerned about the unbounded > growth of the set, but cache<SlaveID, Nothing> keeps a fixed capacity). > > > Diffs > ----- > > src/master/constants.hpp cdaaad060d4ee777f8b0838b63c0fd031da861ea > src/master/constants.cpp 18548834468243bef8ae090f70363e2b9f571ac5 > src/master/master.hpp 0c7c5204c31087f12c4e98028f90c1b941eab7c7 > src/master/master.cpp 6da776699beb6f449e8160dcb6a125d94c1ab437 > src/messages/messages.proto c26a3d0e69bbbd447c859cf175c139ab8948fde2 > src/slave/slave.cpp d8d3e0fa54972201d72b2650ec0ba922a4912d54 > > Diff: https://reviews.apache.org/r/19383/diff/ > > > Testing > ------- > > This change preserves the previous semantics and so all existing tests pass. > > This is because the Registrar can only operate in a "non-strict" manner. > > An unfortunate effect of this change is that many tests run slower due to the > fact that messages are dropped while we're recovering, an alternative > approach here would be to re-enqueue *all* incoming messages through > recover(). However, this adds queuing delay to each message processed in the > Master and the performance implications of this are not well understood for > large clusters. > > > Thanks, > > Ben Mahler > >
