-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/73131/
-----------------------------------------------------------

Review request for mesos and Benjamin Mahler.


Bugs: MESOS-10209
    https://issues.apache.org/jira/browse/MESOS-10209


Repository: mesos


Description
-------

During master failover if agent reregistration runs concurrently with
marking the agent as unreachable and finishes before the MarkUnreachable
operation is complete, the assertion that the agent is in the recovered
set in Master::_markUnreachable() doesn't hold. The reason for this is
because after readmitting the agent the master removes it from the
recovered set in Master::__reregisterSlave().

We can fix this by ignoring agent reregistration requests while a
marking unreachable operation is in progress, similarly to how we do it
for marking gone. Once the marking operation is complete, the agent will
be able to reregister as usual.


Diffs
-----

  src/master/master.cpp 164720a3ad40773b6de0268e3a7119de04bf297e 
  src/tests/master_tests.cpp cd0973ed4cc8fc33de714d59c7680aef05b97b47 


Diff: https://reviews.apache.org/r/73131/diff/1/


Testing
-------

Ran `make check`. Verified that the new test crashes without the fix.


Thanks,

Ilya Pronin

Reply via email to