> On May 12, 2017, 9:36 p.m., Benjamin Mahler wrote:
> > src/master/master.cpp
> > Lines 6083-6084 (original), 6060-6061 (patched)
> > <https://reviews.apache.org/r/59214/diff/1/?file=1716302#file1716302line6085>
> >
> >     What does this null pointer case correspond to? Is it a framework that 
> > will be "recovered" later in the re-registration? A comment might help 
> > clarify this for the reader.

I added a comment and a test case to cover this scenario in a separate RR: 
https://reviews.apache.org/r/59320/


> On May 12, 2017, 9:36 p.m., Benjamin Mahler wrote:
> > src/master/master.cpp
> > Lines 6180-6229 (original), 6157-6197 (patched)
> > <https://reviews.apache.org/r/59214/diff/1/?file=1716302#file1716302line6182>
> >
> >     Hm.. do we still need `ids`? It seems like we could simplify this for 
> > the reader a bit by removing the `ids` and by consolidating the two cases 
> > into a single loop:
> >     
> >     ```
> >     foreach (const FrameworkInfo& frameworkInfo, frameworks) {
> >         Framework* framework = getFramework(frameworkId);
> >     
> >         if (isCompletedFramework(frameworkId)) {
> >           // do nothing, __reregister already sent the shutdown message
> >         } else if (framework != nullptr)) {
> >           // send update framework message
> >         } else {
> >           // recover the framework
> >         }
> >     }
> >     ```
> >     
> >     With the framework cases together I found it easier to read.
> >     
> >     Looking at this code, I found the breakdown of responsibility a little 
> > confusing, in that it wasn't clear to me why `__reregisterSlave` sends the 
> > shutdown but `___reregisterSlave` deals with the update and recovery of the 
> > framework (I assume it's due to the call-site requirements). Something to 
> > think about separately from this change.

I implemented this cleanup as a separate RR: https://reviews.apache.org/r/59302/

Your point about the division of logic between `__reregisterSlave` and 
`___reregisterSlave` is a good one, but seems like we can defer it for later.


> On May 12, 2017, 9:36 p.m., Benjamin Mahler wrote:
> > src/master/master.cpp
> > Lines 6779-6780 (original), 6735-6736 (patched)
> > <https://reviews.apache.org/r/59214/diff/1/?file=1716302#file1716302line6782>
> >
> >     If you want to send a separate cleanup patch, looks like this case and 
> > other PARTITION_AWARE cases were missed in the sweep to use 
> > `framework->capabilities`:
> >     
> >     ```
> >     if (!framework->capabilities.partitionAware) {
> >       newTaskState = TASK_LOST;
> >     }
> >     ```

I did this cleanup in a separate RR: https://reviews.apache.org/r/59259/


- Neil


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59214/#review174850
-----------------------------------------------------------


On May 15, 2017, 11:13 p.m., Neil Conway wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59214/
> -----------------------------------------------------------
> 
> (Updated May 15, 2017, 11:13 p.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-6977
>     https://issues.apache.org/jira/browse/MESOS-6977
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> In previous versions of Mesos, the master might know about a task
> running on an agent, but might not have a `FrameworkInfo` for that
> task's framework. This might occur if the master fails over, the agent
> re-registers, but the framework has not (yet) re-registered.
> 
> In Mesos 1.0, the agent will now supply the FrameworkInfo for all tasks
> it is running when it re-registers with the master. Since we no longer
> support 0.x agents, we can remove the old backward compatibility code
> for handling a running task with unknown FrameworkInfo.
> 
> Note that this means that "orphan tasks", "orphan executors", and
> "unregistered frameworks" are no longer possible.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp 7060b8fa53e0682681c50e051908ffbbf50fb7da 
>   src/master/master.cpp 4e7a161f431624bd78d3e9032eb8587687149cad 
> 
> 
> Diff: https://reviews.apache.org/r/59214/diff/2/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> 
> Thanks,
> 
> Neil Conway
> 
>

Reply via email to