> On March 6, 2018, 2:44 a.m., Zhitao Li wrote: > > src/master/http.cpp > > Lines 845-857 (original), 845-861 (patched) > > <https://reviews.apache.org/r/61262/diff/8/?file=1800242#file1800242line845> > > > > I believe this is a behavior change. > > > > Previous, `master->subscribe(...)` call will not trigger additional > > event being sent, so `SUBSCRIBED` is still guaranteed to be the first event. > > > > However, we start the `heartbeater` process immediately without wait > > for `SUBSCRIBED` to be sent, so this could mean that subscriber can receive > > the `SUBSCRIBED` event after heartbeater sends something. > > > > Should reorder the call so we only call `master->subscribe(http);` > > after the `http.send` on `SUBSCRIBED`? > > Greg Mann wrote: > Reordering is an option, but I wonder if this will really be an issue in > practice? We delay the first heartbeat event sent by the Heartbeater by > DEFAULT_HEARTBEAT_INTERVAL: > https://github.com/apache/mesos/blob/0d247c3887ea08b6273992218cd5899010d89fed/src/master/master.hpp#L1986 > so the Heartbeater sends its first heartbeat event after that interval > elapses. So, the heartbeat event could only arrive first in the presence of > some extreme CPU contention, it doesn't seem likely/possible to me? WDYT?
I think in large cluster, the first SUBSCRIBED message could take longer than `DEFAULT_HEARTBEAT_INTERVAL` under extreme cases. In practice this has hit our cluster maybe less than 10% of master failovers. Still, I would argue that we should ensure `SUBSCRIBED` message is sent before any `HEARTBEAT`, also because subscriber cannot event process the `HEARTBEAT` without the `heartbeat_interval_seconds` value from `SUBSCRIBED` message. - Zhitao ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/61262/#review198678 ----------------------------------------------------------- On Aug. 18, 2017, 6:54 p.m., Quinn Leng wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/61262/ > ----------------------------------------------------------- > > (Updated Aug. 18, 2017, 6:54 p.m.) > > > Review request for mesos, Anand Mazumdar and Greg Mann. > > > Bugs: MESOS-7695 > https://issues.apache.org/jira/browse/MESOS-7695 > > > Repository: mesos > > > Description > ------- > > Added the 'HEARTBEAT' event for the operator API, modified other > related test cases to accept heartbeats. > > > Diffs > ----- > > include/mesos/master/master.proto fc5bd894ce55fe8e946d4c5b4b33d3c0505f3c2b > include/mesos/v1/master/master.proto > c3fb31de2509adcdec8204f8bbe46b46a31540e8 > src/master/http.cpp 959091c8ec03b6ac7bcb5d21b04d2f7d5aff7d54 > src/master/master.hpp b802fd153a10f6012cea381f153c28cc78cae995 > src/tests/api_tests.cpp 3ab4740bcac29ecb89585da6adb1f563d6fc1f5f > > > Diff: https://reviews.apache.org/r/61262/diff/8/ > > > Testing > ------- > > make check -j48 > > > Thanks, > > Quinn Leng > >