Re: Review Request 59214: Removed code for handling missing FrameworkInfo of a running task.

2017-05-16 Thread Neil Conway


> On May 12, 2017, 9:36 p.m., Benjamin Mahler wrote:
> > src/master/master.cpp
> > Lines 6083-6084 (original), 6060-6061 (patched)
> > 
> >
> > What does this null pointer case correspond to? Is it a framework that 
> > will be "recovered" later in the re-registration? A comment might help 
> > clarify this for the reader.

I added a comment and a test case to cover this scenario in a separate RR: 
https://reviews.apache.org/r/59320/


> On May 12, 2017, 9:36 p.m., Benjamin Mahler wrote:
> > src/master/master.cpp
> > Lines 6180-6229 (original), 6157-6197 (patched)
> > 
> >
> > Hm.. do we still need `ids`? It seems like we could simplify this for 
> > the reader a bit by removing the `ids` and by consolidating the two cases 
> > into a single loop:
> > 
> > ```
> > foreach (const FrameworkInfo& frameworkInfo, frameworks) {
> > Framework* framework = getFramework(frameworkId);
> > 
> > if (isCompletedFramework(frameworkId)) {
> >   // do nothing, __reregister already sent the shutdown message
> > } else if (framework != nullptr)) {
> >   // send update framework message
> > } else {
> >   // recover the framework
> > }
> > }
> > ```
> > 
> > With the framework cases together I found it easier to read.
> > 
> > Looking at this code, I found the breakdown of responsibility a little 
> > confusing, in that it wasn't clear to me why `__reregisterSlave` sends the 
> > shutdown but `___reregisterSlave` deals with the update and recovery of the 
> > framework (I assume it's due to the call-site requirements). Something to 
> > think about separately from this change.

I implemented this cleanup as a separate RR: https://reviews.apache.org/r/59302/

Your point about the division of logic between `__reregisterSlave` and 
`___reregisterSlave` is a good one, but seems like we can defer it for later.


> On May 12, 2017, 9:36 p.m., Benjamin Mahler wrote:
> > src/master/master.cpp
> > Lines 6779-6780 (original), 6735-6736 (patched)
> > 
> >
> > If you want to send a separate cleanup patch, looks like this case and 
> > other PARTITION_AWARE cases were missed in the sweep to use 
> > `framework->capabilities`:
> > 
> > ```
> > if (!framework->capabilities.partitionAware) {
> >   newTaskState = TASK_LOST;
> > }
> > ```

I did this cleanup in a separate RR: https://reviews.apache.org/r/59259/


- Neil


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59214/#review174850
---


On May 15, 2017, 11:13 p.m., Neil Conway wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59214/
> ---
> 
> (Updated May 15, 2017, 11:13 p.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-6977
> https://issues.apache.org/jira/browse/MESOS-6977
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> In previous versions of Mesos, the master might know about a task
> running on an agent, but might not have a `FrameworkInfo` for that
> task's framework. This might occur if the master fails over, the agent
> re-registers, but the framework has not (yet) re-registered.
> 
> In Mesos 1.0, the agent will now supply the FrameworkInfo for all tasks
> it is running when it re-registers with the master. Since we no longer
> support 0.x agents, we can remove the old backward compatibility code
> for handling a running task with unknown FrameworkInfo.
> 
> Note that this means that "orphan tasks", "orphan executors", and
> "unregistered frameworks" are no longer possible.
> 
> 
> Diffs
> -
> 
>   src/master/http.cpp 7060b8fa53e0682681c50e051908ffbbf50fb7da 
>   src/master/master.cpp 4e7a161f431624bd78d3e9032eb8587687149cad 
> 
> 
> Diff: https://reviews.apache.org/r/59214/diff/2/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Neil Conway
> 
>



Re: Review Request 59214: Removed code for handling missing FrameworkInfo of a running task.

2017-05-15 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59214/
---

(Updated May 15, 2017, 11:13 p.m.)


Review request for mesos and Benjamin Mahler.


Changes
---

Clarify code for writing empty JSON arrays.


Bugs: MESOS-6977
https://issues.apache.org/jira/browse/MESOS-6977


Repository: mesos


Description
---

In previous versions of Mesos, the master might know about a task
running on an agent, but might not have a `FrameworkInfo` for that
task's framework. This might occur if the master fails over, the agent
re-registers, but the framework has not (yet) re-registered.

In Mesos 1.0, the agent will now supply the FrameworkInfo for all tasks
it is running when it re-registers with the master. Since we no longer
support 0.x agents, we can remove the old backward compatibility code
for handling a running task with unknown FrameworkInfo.

Note that this means that "orphan tasks", "orphan executors", and
"unregistered frameworks" are no longer possible.


Diffs (updated)
-

  src/master/http.cpp 7060b8fa53e0682681c50e051908ffbbf50fb7da 
  src/master/master.cpp 4e7a161f431624bd78d3e9032eb8587687149cad 


Diff: https://reviews.apache.org/r/59214/diff/2/

Changes: https://reviews.apache.org/r/59214/diff/1-2/


Testing
---

`make check`


Thanks,

Neil Conway



Re: Review Request 59214: Removed code for handling missing FrameworkInfo of a running task.

2017-05-12 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59214/#review174850
---


Fix it, then Ship it!





src/master/http.cpp
Lines 1454-1456 (original), 1454-1456 (patched)


Could we do the following?

```
writer->field(
"unregistered_frameworks",
[](JSON::ArrayWriter*) {});
```

The use of vector tripped me up here since it led me to think we were 
exposing integers before.



src/master/http.cpp
Lines 2874-2880 (original), 2825-2831 (patched)


Ditto for these w.r.t. to vector being misleading to the reader.



src/master/master.cpp
Lines 6083-6084 (original), 6060-6061 (patched)


What does this null pointer case correspond to? Is it a framework that will 
be "recovered" later in the re-registration? A comment might help clarify this 
for the reader.



src/master/master.cpp
Lines 6180-6229 (original), 6157-6197 (patched)


Hm.. do we still need `ids`? It seems like we could simplify this for the 
reader a bit by removing the `ids` and by consolidating the two cases into a 
single loop:

```
foreach (const FrameworkInfo& frameworkInfo, frameworks) {
Framework* framework = getFramework(frameworkId);

if (isCompletedFramework(frameworkId)) {
  // do nothing, __reregister already sent the shutdown message
} else if (framework != nullptr)) {
  // send update framework message
} else {
  // recover the framework
}
}
```

With the framework cases together I found it easier to read.

Looking at this code, I found the breakdown of responsibility a little 
confusing, in that it wasn't clear to me why `__reregisterSlave` sends the 
shutdown but `___reregisterSlave` deals with the update and recovery of the 
framework (I assume it's due to the call-site requirements). Something to think 
about separately from this change.



src/master/master.cpp
Lines 6779-6780 (original), 6735-6736 (patched)


If you want to send a separate cleanup patch, looks like this case and 
other PARTITION_AWARE cases were missed in the sweep to use 
`framework->capabilities`:

```
if (!framework->capabilities.partitionAware) {
  newTaskState = TASK_LOST;
}
```


- Benjamin Mahler


On May 11, 2017, 11:31 p.m., Neil Conway wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59214/
> ---
> 
> (Updated May 11, 2017, 11:31 p.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-6977
> https://issues.apache.org/jira/browse/MESOS-6977
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> In previous versions of Mesos, the master might know about a task
> running on an agent, but might not have a `FrameworkInfo` for that
> task's framework. This might occur if the master fails over, the agent
> re-registers, but the framework has not (yet) re-registered.
> 
> In Mesos 1.0, the agent will now supply the FrameworkInfo for all tasks
> it is running when it re-registers with the master. Since we no longer
> support 0.x agents, we can remove the old backward compatibility code
> for handling a running task with unknown FrameworkInfo.
> 
> Note that this means that "orphan tasks", "orphan executors", and
> "unregistered frameworks" are no longer possible.
> 
> 
> Diffs
> -
> 
>   src/master/http.cpp e2590a17044ac019b24a24629428d4ec8adc0c31 
>   src/master/master.cpp 4e7a161f431624bd78d3e9032eb8587687149cad 
> 
> 
> Diff: https://reviews.apache.org/r/59214/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Neil Conway
> 
>