> On May 12, 2017, 9:36 p.m., Benjamin Mahler wrote: > > src/master/master.cpp > > Lines 6083-6084 (original), 6060-6061 (patched) > > <https://reviews.apache.org/r/59214/diff/1/?file=1716302#file1716302line6085> > > > > What does this null pointer case correspond to? Is it a framework that > > will be "recovered" later in the re-registration? A comment might help > > clarify this for the reader.
I added a comment and a test case to cover this scenario in a separate RR: https://reviews.apache.org/r/59320/ > On May 12, 2017, 9:36 p.m., Benjamin Mahler wrote: > > src/master/master.cpp > > Lines 6180-6229 (original), 6157-6197 (patched) > > <https://reviews.apache.org/r/59214/diff/1/?file=1716302#file1716302line6182> > > > > Hm.. do we still need `ids`? It seems like we could simplify this for > > the reader a bit by removing the `ids` and by consolidating the two cases > > into a single loop: > > > > ``` > > foreach (const FrameworkInfo& frameworkInfo, frameworks) { > > Framework* framework = getFramework(frameworkId); > > > > if (isCompletedFramework(frameworkId)) { > > // do nothing, __reregister already sent the shutdown message > > } else if (framework != nullptr)) { > > // send update framework message > > } else { > > // recover the framework > > } > > } > > ``` > > > > With the framework cases together I found it easier to read. > > > > Looking at this code, I found the breakdown of responsibility a little > > confusing, in that it wasn't clear to me why `__reregisterSlave` sends the > > shutdown but `___reregisterSlave` deals with the update and recovery of the > > framework (I assume it's due to the call-site requirements). Something to > > think about separately from this change. I implemented this cleanup as a separate RR: https://reviews.apache.org/r/59302/ Your point about the division of logic between `__reregisterSlave` and `___reregisterSlave` is a good one, but seems like we can defer it for later. > On May 12, 2017, 9:36 p.m., Benjamin Mahler wrote: > > src/master/master.cpp > > Lines 6779-6780 (original), 6735-6736 (patched) > > <https://reviews.apache.org/r/59214/diff/1/?file=1716302#file1716302line6782> > > > > If you want to send a separate cleanup patch, looks like this case and > > other PARTITION_AWARE cases were missed in the sweep to use > > `framework->capabilities`: > > > > ``` > > if (!framework->capabilities.partitionAware) { > > newTaskState = TASK_LOST; > > } > > ``` I did this cleanup in a separate RR: https://reviews.apache.org/r/59259/ - Neil ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/59214/#review174850 ----------------------------------------------------------- On May 15, 2017, 11:13 p.m., Neil Conway wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/59214/ > ----------------------------------------------------------- > > (Updated May 15, 2017, 11:13 p.m.) > > > Review request for mesos and Benjamin Mahler. > > > Bugs: MESOS-6977 > https://issues.apache.org/jira/browse/MESOS-6977 > > > Repository: mesos > > > Description > ------- > > In previous versions of Mesos, the master might know about a task > running on an agent, but might not have a `FrameworkInfo` for that > task's framework. This might occur if the master fails over, the agent > re-registers, but the framework has not (yet) re-registered. > > In Mesos 1.0, the agent will now supply the FrameworkInfo for all tasks > it is running when it re-registers with the master. Since we no longer > support 0.x agents, we can remove the old backward compatibility code > for handling a running task with unknown FrameworkInfo. > > Note that this means that "orphan tasks", "orphan executors", and > "unregistered frameworks" are no longer possible. > > > Diffs > ----- > > src/master/http.cpp 7060b8fa53e0682681c50e051908ffbbf50fb7da > src/master/master.cpp 4e7a161f431624bd78d3e9032eb8587687149cad > > > Diff: https://reviews.apache.org/r/59214/diff/2/ > > > Testing > ------- > > `make check` > > > Thanks, > > Neil Conway > >