----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69451/#review210904 -----------------------------------------------------------
Thanks for jumping on this and producing a fix! Some small edit suggestions to the description: ``` The Framework::send function assumes that either http or pid is set, which is not true for a unregistered framework recovered from agent // s/unregistered// reregistration. As a result, the master would crash when a recovered // s/reregistration/that hasn't yet re-registered/ executor tries to send a message to such a framework. This patch fixes // s/framework/framework (see MESOS-XXXX). this crash bug. ``` src/master/master.hpp Line 2595 (original), 2597 (patched) <https://reviews.apache.org/r/69451/#comment295720> attempting? src/master/master.hpp Line 2599 (original), 2600-2604 (patched) <https://reviews.apache.org/r/69451/#comment295722> Hm.. based on this comment, should we only be sending in the disconnected+pid case? Also, which messages is this referring to? It seems to suggest we let all messages through, but in the call sites we seem to guard this call to avoid calling it for the disconnected case? src/master/master.hpp Line 2609 (original), 2614 (patched) <https://reviews.apache.org/r/69451/#comment295721> Do we want a similar log warning here for the else case? ``` } else { LOG(WARNING) << "Unable to send event to framework " << *this << ":" << " framework is recovered but hasn't re-registered"; } ``` Is this accurate or are there other cases? My read of the http/pid state comment is that it will be set based on the last connection, therefore the only time one of them won't be set is the recovered case (the change that broke the original one-being-set invariant). - Benjamin Mahler On Nov. 27, 2018, 4:58 a.m., Chun-Hung Hsiao wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/69451/ > ----------------------------------------------------------- > > (Updated Nov. 27, 2018, 4:58 a.m.) > > > Review request for mesos, Alexander Rukletsov, Benjamin Mahler, Greg Mann, > and Till Toenshoff. > > > Bugs: MESOS-9419 > https://issues.apache.org/jira/browse/MESOS-9419 > > > Repository: mesos > > > Description > ------- > > The `Framework::send` function assumes that either `http` or `pid` is > set, which is not true for a unregistered framework recovered from agent > reregistration. As a result, the master would crash when a recovered > executor tries to send a message to such a framework. This patch fixes > this crash bug. > > > Diffs > ----- > > src/master/master.hpp 3b3c1a4e61de9503c8d038dd3bee623ded5914c9 > src/master/master.cpp b4b02d8b4d7d6d1aabda1f97b9bf824419f76a9e > > > Diff: https://reviews.apache.org/r/69451/diff/1/ > > > Testing > ------- > > make check > > > Thanks, > > Chun-Hung Hsiao > >