[GitHub] asfgit closed pull request #319: Fixed incorrect RPM build settings.

2018-11-27 Thread GitBox
asfgit closed pull request #319: Fixed incorrect RPM build settings.
URL: https://github.com/apache/mesos/pull/319
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/support/packaging/centos/mesos.spec 
b/support/packaging/centos/mesos.spec
index 59cb2d3e03..8e719ab419 100644
--- a/support/packaging/centos/mesos.spec
+++ b/support/packaging/centos/mesos.spec
@@ -113,11 +113,11 @@ mkdir -p -m0755 %{buildroot}/%{_var}/log/%{name}
 mkdir -p -m0755 %{buildroot}/%{_var}/lib/%{name}
 
 echo zk://localhost:2181/mesos > %{buildroot}%{_sysconfdir}/mesos/zk
-echo %{_var}/log/%{name}   > 
%{buildroot}%{_sysconfdir}/mesos-master/work_dir
-echo %{_var}/log/%{name}   > 
%{buildroot}%{_sysconfdir}/mesos-slave/work_dir
+echo %{_var}/lib/%{name}   > 
%{buildroot}%{_sysconfdir}/mesos-master/work_dir
+echo %{_var}/lib/%{name}   > 
%{buildroot}%{_sysconfdir}/mesos-slave/work_dir
 echo 1 > %{buildroot}%{_sysconfdir}/mesos-master/quorum
 
-install -m 0644 %{SOURCE1} %{buildroot}%{_bindir}/
+install -m 0755 %{SOURCE1} %{buildroot}%{_bindir}/
 install -m 0644 %{SOURCE2} %{SOURCE3} %{SOURCE4} 
%{buildroot}%{_sysconfdir}/default
 
 %if 0%{?el6}


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: Review Request 69452: Added a test for executors sending messages to recovered frameworks.

2018-11-27 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69452/#review210918
---



PASS: Mesos patch 69452 was successfully built and tested.

Reviews applied: `['69451', '69464', '69452']`

All the build artifacts available at: 
http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2656/mesos-review-69452

- Mesos Reviewbot Windows


On Nov. 27, 2018, 11:07 p.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69452/
> ---
> 
> (Updated Nov. 27, 2018, 11:07 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Benjamin Mahler, Greg Mann, 
> and Till Toenshoff.
> 
> 
> Bugs: MESOS-9419
> https://issues.apache.org/jira/browse/MESOS-9419
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds the `ExecutorMessageToRecoveredHttpFramework` test to
> ensure that a master will not crash when forwarding an executor message
> to a recovered framework to prevent any future regression of MESOS-9419.
> 
> 
> Diffs
> -
> 
>   src/tests/master_tests.cpp 651bb9ba9298fc3d7179ed86487aa55be519ce12 
> 
> 
> Diff: https://reviews.apache.org/r/69452/diff/3/
> 
> 
> Testing
> ---
> 
> make check
> 
> This test will fail without r/69451.
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 69307: Changed master to hold subscribers in a circular buffer.

2018-11-27 Thread Joseph Wu


> On Nov. 27, 2018, 3:33 p.m., Benno Evers wrote:
> > src/tests/api_tests.cpp
> > Lines 3688 (patched)
> > 
> >
> > To be honest, I don't completely understand if this is creating some 
> > special "unclean shutdown" condition for the third connection, or if it is 
> > just the simplest way to force removal of the connection on the master side.
> > 
> > In the former case, this might be better suited for a separate unit 
> > test.
> > 
> > In the latter case, it seems like this some  missing API in Mesos (i.e. 
> > an `UNSUBSCRIBE` call), since intuitively this is something that *should* 
> > be easy to do from the clients perspective.

This little hack is meant to close the client's connection, which triggers the 
some code in the master waiting for disconnections.  I resorted to this because 
our HTTP helpers (i.e. `process::http::streaming::post`) returns a 
`http::Response` object, but this Response object does not control the lifetime 
of the connection (presumably because the same object is returned for 
single-requests, streaming responses, and pipelined requests).

I suppose another way to go about this is explicitly create the `Connection` 
object and then send the request over that connection, effectively unwrapping 
`process::http::streaming::post` into the test body.

Also, the `Connection` object does not do anything when a streaming response's 
reader is closed.  We could probably change the library/helper's behavior here. 
 Right now, the `Connection` object is kept alive like this:
```
  // This is a non Keep-Alive request which means the connection
  // will be closed when the response is received. Since the
  // 'Connection' is reference-counted, we must maintain a copy
  // until the disconnection occurs.
  connection.disconnected()
.onAny([connection]() {});
```


- Joseph


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69307/#review210891
---


On Nov. 26, 2018, 3:40 p.m., Joseph Wu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69307/
> ---
> 
> (Updated Nov. 26, 2018, 3:40 p.m.)
> 
> 
> Review request for mesos, Benno Evers, Benjamin Mahler, Gastón Kleiman, and 
> Greg Mann.
> 
> 
> Bugs: MESOS-9258
> https://issues.apache.org/jira/browse/MESOS-9258
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This adds a flag (--max_event_stream_subscribers) to the master which
> controls how many active subscribers on the Master's event stream will
> be allowed at any time.  The default is 25 subscribers.
> 
> 
> Diffs
> -
> 
>   docs/configuration/master.md 575476728079d884fe86b1a3a56614647d1b572e 
>   src/master/constants.hpp 76ad0c3b1494cb97888af5545acc998b8fb15bf6 
>   src/master/flags.hpp ed2d76af2831d1b9718724936fd3e4850e99b332 
>   src/master/flags.cpp 2677738d19caa7d600e591aeb4fb37f0c2318d45 
>   src/master/master.hpp e77babf22126838c63cd05e483875c9beb3ac5ff 
>   src/master/master.cpp 1e326ec42a7f79a0835529a4655e7ec272f1cf40 
>   src/tests/api_tests.cpp fdd9f871f75617fc26a28679e2a1e41f506c6133 
> 
> 
> Diff: https://reviews.apache.org/r/69307/diff/1/
> 
> 
> Testing
> ---
> 
> ```
> make check
> 
> src/mesos-tests --gtest_filter="*MaxEventStreamSubscribers*" 
> --gtest_repeat=-1 --gtest_break_on_failure
> ```
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>



Re: Review Request 69451: Fixed master crash when executors send messages to recovered frameworks.

2018-11-27 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69451/
---

(Updated Nov. 28, 2018, 12:40 a.m.)


Review request for mesos, Alexander Rukletsov, Benjamin Mahler, Greg Mann, and 
Till Toenshoff.


Changes
---

Addressed BenM's comment.


Bugs: MESOS-9419
https://issues.apache.org/jira/browse/MESOS-9419


Repository: mesos


Description
---

The `Framework::send` function assumes that either `http` or `pid` is
set, which is not true for a framework that hasn't yet reregistered yet
but recovered from a reregistered agent. As a result, the master would
crash when a recovered executor tries to send a message to such a
framework (see MESOS-9419). This patch fixes this crash bug.


Diffs (updated)
-

  src/master/master.hpp 3b3c1a4e61de9503c8d038dd3bee623ded5914c9 
  src/master/master.cpp b4b02d8b4d7d6d1aabda1f97b9bf824419f76a9e 


Diff: https://reviews.apache.org/r/69451/diff/3/

Changes: https://reviews.apache.org/r/69451/diff/2-3/


Testing
---

make check


Thanks,

Chun-Hung Hsiao



Re: Review Request 69463: Added HEARTBEAT events and calls for the executor HTTP API.

2018-11-27 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69463/#review210916
---



PASS: Mesos patch 69463 was successfully built and tested.

Reviews applied: `['69463']`

All the build artifacts available at: 
http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2655/mesos-review-69463

- Mesos Reviewbot Windows


On Nov. 27, 2018, 10:11 p.m., Joseph Wu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69463/
> ---
> 
> (Updated Nov. 27, 2018, 10:11 p.m.)
> 
> 
> Review request for mesos, Benno Evers, Gastón Kleiman, and Greg Mann.
> 
> 
> Bugs: MESOS-7564
> https://issues.apache.org/jira/browse/MESOS-7564
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> These new messages are meant to be backwards compatible, in that
> they won't cause crashes when new executors send heartbeats to old
> agents, or new agents send heartbeats to old executors.  All recipients
> of these heartbeats are currently expected to ignore them, as their
> only purpose is to keep certain connections from being marked "stale"
> by network intermediaries.
> 
> 
> Diffs
> -
> 
>   include/mesos/executor/executor.proto 
> 1b5fa5dab6944a8649fb98447eeec7105495b879 
>   include/mesos/v1/executor/executor.proto 
> b2ef325cf6a72137854355d541889c7c6ae7c230 
> 
> 
> Diff: https://reviews.apache.org/r/69463/diff/1/
> 
> 
> Testing
> ---
> 
> cmake --build . --target mesos-protobufs
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>



Re: Review Request 69157: Fixed handling for offer operation updates.

2018-11-27 Thread Greg Mann


> On Nov. 27, 2018, 11:40 p.m., Greg Mann wrote:
> > src/slave/slave.cpp
> > Line 8037 (original), 8037 (patched)
> > 
> >
> > Hmm it looks like Chun-Hung's comment was resolved, but this line still 
> > uses `update` instead of `operation`?

Whoops sorry for the double-comment.


- Greg


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69157/#review210897
---


On Nov. 27, 2018, 5:59 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69157/
> ---
> 
> (Updated Nov. 27, 2018, 5:59 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8403
> https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The handling of offer operation updates introduced in `c946615ec6d`
> made use of an update's `latest_status` without making sure that any
> value was set. This could lead to situation where an uninitialized
> enum value was switched on which would have caused a fatal error at
> runtime.
> 
> This patch replaces uses of `latest_status` with `state` which does
> contain the information we care about. We also adjust the error
> logging so we log the value that lead to the error, not some other
> value.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp 858b78620e1ef33f3587d0bd95a684917aaf5bbb 
> 
> 
> Diff: https://reviews.apache.org/r/69157/diff/5/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 69157: Fixed handling for offer operation updates.

2018-11-27 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69157/#review210897
---




src/slave/slave.cpp
Lines 8016-8037 (original), 8016-8037 (patched)


Hmm I think you may have mis-read Chun-Hung's comment here - I think he was 
suggesting that we use `operation->latest_status().state()` in both places?



src/slave/slave.cpp
Line 8037 (original), 8037 (patched)


Hmm it looks like Chun-Hung's comment was resolved, but this line still 
uses `update` instead of `operation`?


- Greg Mann


On Nov. 27, 2018, 5:59 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69157/
> ---
> 
> (Updated Nov. 27, 2018, 5:59 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8403
> https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The handling of offer operation updates introduced in `c946615ec6d`
> made use of an update's `latest_status` without making sure that any
> value was set. This could lead to situation where an uninitialized
> enum value was switched on which would have caused a fatal error at
> runtime.
> 
> This patch replaces uses of `latest_status` with `state` which does
> contain the information we care about. We also adjust the error
> logging so we log the value that lead to the error, not some other
> value.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp 858b78620e1ef33f3587d0bd95a684917aaf5bbb 
> 
> 
> Diff: https://reviews.apache.org/r/69157/diff/5/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 69464: Made the `createTask` helper work for both v0 and v1 API.

2018-11-27 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69464/#review210915
---


Ship it!




Ship It!

- Benjamin Mahler


On Nov. 27, 2018, 11:05 p.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69464/
> ---
> 
> (Updated Nov. 27, 2018, 11:05 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Benjamin Mahler, Greg Mann, 
> and Till Toenshoff.
> 
> 
> Bugs: MESOS-9419
> https://issues.apache.org/jira/browse/MESOS-9419
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The patched `createTask` helper originally uses the `slave_id` accessor,
> which does not apply to the v1 API. This patch fixes this problem.
> 
> 
> Diffs
> -
> 
>   src/tests/mesos.hpp 576f4bde88c069ee2fa0dd33912a034437338e7e 
> 
> 
> Diff: https://reviews.apache.org/r/69464/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 69451: Fixed master crash when executors send messages to recovered frameworks.

2018-11-27 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69451/#review210914
---




src/master/master.hpp
Lines 2594-2609 (original), 2596-2619 (patched)


How about:

```
  if (!connected()) {
LOG(WARNING) << "Master attempting to send message to "
 << (recovered() ? "recovered" : "disconnected")
 << " framework " << *this;
 
// NOTE: We proceed here without returning to support the case where a
// `disconnected()` framework is still talking to the master and the
// master wants to shut it down by sending a `FrameworkErrorMessage`.
// This can occur in a one way link breakage where the master ->
// framework link is broken but the framework -> master link remains
// intact. Note that we don't have periodic heartbeating between master
// and pid-based schedulers.
//
// TODO(cshiao): Update the `FrameworkErrorMessage` call-sites that
// rely on the lack of a `return` here to directly call 
`process::send()`
// so that this function doesn't need to deal with the special case.
// Then we can check that if we're connected -> one of `http` or `pid`
// is set.
  }
  
  if (http.isSome()) {
if (!http->send(message)) {
  LOG(WARNING) << "Unable to send event to framework " << *this << ":"
   << " connection closed";
}
  } else if (pid.isSome()) {
master->send(pid.get(), message);
  }
```


- Benjamin Mahler


On Nov. 27, 2018, 11:01 p.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69451/
> ---
> 
> (Updated Nov. 27, 2018, 11:01 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Benjamin Mahler, Greg Mann, 
> and Till Toenshoff.
> 
> 
> Bugs: MESOS-9419
> https://issues.apache.org/jira/browse/MESOS-9419
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The `Framework::send` function assumes that either `http` or `pid` is
> set, which is not true for a framework that hasn't yet reregistered yet
> but recovered from a reregistered agent. As a result, the master would
> crash when a recovered executor tries to send a message to such a
> framework (see MESOS-9419). This patch fixes this crash bug.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 3b3c1a4e61de9503c8d038dd3bee623ded5914c9 
>   src/master/master.cpp b4b02d8b4d7d6d1aabda1f97b9bf824419f76a9e 
> 
> 
> Diff: https://reviews.apache.org/r/69451/diff/2/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 69307: Changed master to hold subscribers in a circular buffer.

2018-11-27 Thread Benno Evers

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69307/#review210891
---



The basic approach still looks good to me, so this review mostly consists of 
the points we already discussed verbally, captured as issues so that readers in 
the future have a chance to reconstruct our thinking.


docs/configuration/master.md
Lines 434 (patched)


Maximum number **of** simultaneous subscribers



docs/configuration/master.md
Lines 435 (patched)


It is already implied here, but I would suggest to explicitly point out 
that older connections will be terminated by the master when the maximum number 
is reached.



src/master/constants.hpp
Lines 96 (patched)


While I don't have any experiments that suggest specific numbers, based on 
gut feeling alone this still seems like at least an order of magnitude too low: 
The situation where we have more active subscribers than this maximum is 
something we ideally never want to encounter during normal operation of mesos, 
so it is probably good to have a generous safety margin.

Also, I think there is some merit in the argument that we should try to 
avoid changing the default behaviour, so maybe we should think about setting 
the default value to `-1`/`std::string::npos`. (along with a comment in 
`boundedhashmap.hpp` that would elevate this from "working hack" to "officially 
supported API" ;)



src/master/master.cpp
Line 12130 (original), 12130 (patched)


We should probably check `subscribers.subscribed.size()` and print a 
warning if we're about to terminate a connection.



src/tests/api_tests.cpp
Lines 3688 (patched)


To be honest, I don't completely understand if this is creating some 
special "unclean shutdown" condition for the third connection, or if it is just 
the simplest way to force removal of the connection on the master side.

In the former case, this might be better suited for a separate unit test.

In the latter case, it seems like this some  missing API in Mesos (i.e. an 
`UNSUBSCRIBE` call), since intuitively this is something that *should* be easy 
to do from the clients perspective.


- Benno Evers


On Nov. 26, 2018, 11:40 p.m., Joseph Wu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69307/
> ---
> 
> (Updated Nov. 26, 2018, 11:40 p.m.)
> 
> 
> Review request for mesos, Benno Evers, Benjamin Mahler, Gastón Kleiman, and 
> Greg Mann.
> 
> 
> Bugs: MESOS-9258
> https://issues.apache.org/jira/browse/MESOS-9258
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This adds a flag (--max_event_stream_subscribers) to the master which
> controls how many active subscribers on the Master's event stream will
> be allowed at any time.  The default is 25 subscribers.
> 
> 
> Diffs
> -
> 
>   docs/configuration/master.md 575476728079d884fe86b1a3a56614647d1b572e 
>   src/master/constants.hpp 76ad0c3b1494cb97888af5545acc998b8fb15bf6 
>   src/master/flags.hpp ed2d76af2831d1b9718724936fd3e4850e99b332 
>   src/master/flags.cpp 2677738d19caa7d600e591aeb4fb37f0c2318d45 
>   src/master/master.hpp e77babf22126838c63cd05e483875c9beb3ac5ff 
>   src/master/master.cpp 1e326ec42a7f79a0835529a4655e7ec272f1cf40 
>   src/tests/api_tests.cpp fdd9f871f75617fc26a28679e2a1e41f506c6133 
> 
> 
> Diff: https://reviews.apache.org/r/69307/diff/1/
> 
> 
> Testing
> ---
> 
> ```
> make check
> 
> src/mesos-tests --gtest_filter="*MaxEventStreamSubscribers*" 
> --gtest_repeat=-1 --gtest_break_on_failure
> ```
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>



Re: Review Request 69452: Added a test for executors sending messages to recovered frameworks.

2018-11-27 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69452/
---

(Updated Nov. 27, 2018, 11:07 p.m.)


Review request for mesos, Alexander Rukletsov, Benjamin Mahler, Greg Mann, and 
Till Toenshoff.


Changes
---

Addressed BenM's comments, added a check for recovered framework, and fixed a 
test flake.


Bugs: MESOS-9419
https://issues.apache.org/jira/browse/MESOS-9419


Repository: mesos


Description
---

This patch adds the `ExecutorMessageToRecoveredHttpFramework` test to
ensure that a master will not crash when forwarding an executor message
to a recovered framework to prevent any future regression of MESOS-9419.


Diffs (updated)
-

  src/tests/master_tests.cpp 651bb9ba9298fc3d7179ed86487aa55be519ce12 


Diff: https://reviews.apache.org/r/69452/diff/2/

Changes: https://reviews.apache.org/r/69452/diff/1-2/


Testing
---

make check

This test will fail without r/69451.


Thanks,

Chun-Hung Hsiao



Review Request 69464: Made the `createTask` helper work for both v0 and v1 API.

2018-11-27 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69464/
---

Review request for mesos, Alexander Rukletsov, Benjamin Mahler, Greg Mann, and 
Till Toenshoff.


Bugs: MESOS-9419
https://issues.apache.org/jira/browse/MESOS-9419


Repository: mesos


Description
---

The patched `createTask` helper originally uses the `slave_id` accessor,
which does not apply to the v1 API. This patch fixes this problem.


Diffs
-

  src/tests/mesos.hpp 576f4bde88c069ee2fa0dd33912a034437338e7e 


Diff: https://reviews.apache.org/r/69464/diff/1/


Testing
---

make check


Thanks,

Chun-Hung Hsiao



Re: Review Request 69451: Fixed master crash when executors send messages to recovered frameworks.

2018-11-27 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69451/
---

(Updated Nov. 27, 2018, 11:01 p.m.)


Review request for mesos, Alexander Rukletsov, Benjamin Mahler, Greg Mann, and 
Till Toenshoff.


Changes
---

Addressed BenM's comments.


Bugs: MESOS-9419
https://issues.apache.org/jira/browse/MESOS-9419


Repository: mesos


Description (updated)
---

The `Framework::send` function assumes that either `http` or `pid` is
set, which is not true for a framework that hasn't yet reregistered yet
but recovered from a reregistered agent. As a result, the master would
crash when a recovered executor tries to send a message to such a
framework (see MESOS-9419). This patch fixes this crash bug.


Diffs (updated)
-

  src/master/master.hpp 3b3c1a4e61de9503c8d038dd3bee623ded5914c9 
  src/master/master.cpp b4b02d8b4d7d6d1aabda1f97b9bf824419f76a9e 


Diff: https://reviews.apache.org/r/69451/diff/2/

Changes: https://reviews.apache.org/r/69451/diff/1-2/


Testing
---

make check


Thanks,

Chun-Hung Hsiao



Review Request 69463: Added HEARTBEAT events and calls for the executor HTTP API.

2018-11-27 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69463/
---

Review request for mesos, Benno Evers, Gastón Kleiman, and Greg Mann.


Bugs: MESOS-7564
https://issues.apache.org/jira/browse/MESOS-7564


Repository: mesos


Description
---

These new messages are meant to be backwards compatible, in that
they won't cause crashes when new executors send heartbeats to old
agents, or new agents send heartbeats to old executors.  All recipients
of these heartbeats are currently expected to ignore them, as their
only purpose is to keep certain connections from being marked "stale"
by network intermediaries.


Diffs
-

  include/mesos/executor/executor.proto 
1b5fa5dab6944a8649fb98447eeec7105495b879 
  include/mesos/v1/executor/executor.proto 
b2ef325cf6a72137854355d541889c7c6ae7c230 


Diff: https://reviews.apache.org/r/69463/diff/1/


Testing
---

cmake --build . --target mesos-protobufs


Thanks,

Joseph Wu



Re: Review Request 69451: Fixed master crash when executors send messages to recovered frameworks.

2018-11-27 Thread Chun-Hung Hsiao


> On Nov. 27, 2018, 9:02 p.m., Benjamin Mahler wrote:
> > src/master/master.hpp
> > Line 2599 (original), 2600-2604 (patched)
> > 
> >
> > Hm.. based on this comment, should we only be sending in the 
> > disconnected+pid case?
> > 
> > Also, which messages is this referring to? It seems to suggest we let 
> > all messages through, but in the call sites we seem to guard this call to 
> > avoid calling it for the disconnected case?

The "message" refers to the `message` parameter of this function.

I was trying to avoid calling out what messages apply to this special case in 
this function, to avoid a misleading comment if it becomes outdated in the 
future.


> On Nov. 27, 2018, 9:02 p.m., Benjamin Mahler wrote:
> > src/master/master.hpp
> > Line 2609 (original), 2614 (patched)
> > 
> >
> > Do we want a similar log warning here for the else case?
> > 
> > ```
> > } else {
> >   LOG(WARNING) << "Unable to send event to framework " << *this << ":"
> ><< " framework is recovered but hasn't re-registered";
> > }
> > ```
> > 
> > Is this accurate or are there other cases? My read of the http/pid 
> > state comment is that it will be set based on the last connection, 
> > therefore the only time one of them won't be set is the recovered case (the 
> > change that broke the original one-being-set invariant).

Yes I believe this is the case. Let me add the warning.


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69451/#review210904
---


On Nov. 27, 2018, 4:58 a.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69451/
> ---
> 
> (Updated Nov. 27, 2018, 4:58 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Benjamin Mahler, Greg Mann, 
> and Till Toenshoff.
> 
> 
> Bugs: MESOS-9419
> https://issues.apache.org/jira/browse/MESOS-9419
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The `Framework::send` function assumes that either `http` or `pid` is
> set, which is not true for a unregistered framework recovered from agent
> reregistration. As a result, the master would crash when a recovered
> executor tries to send a message to such a framework. This patch fixes
> this crash bug.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 3b3c1a4e61de9503c8d038dd3bee623ded5914c9 
>   src/master/master.cpp b4b02d8b4d7d6d1aabda1f97b9bf824419f76a9e 
> 
> 
> Diff: https://reviews.apache.org/r/69451/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 69452: Added a test for executors sending messages to recovered frameworks.

2018-11-27 Thread Benjamin Mahler


> On Nov. 27, 2018, 9:16 p.m., Benjamin Mahler wrote:
> > src/tests/master_tests.cpp
> > Lines 3850 (patched)
> > 
> >
> > Maybe a TODO to simplify this test by having a test scheduler that 
> > knows how to launch a mock executor?
> 
> Chun-Hung Hsiao wrote:
> I'm not sure if this is a good idea, because the default (composing or 
> mesos containerizer) would launch the executor in another linux process, 
> which would be impossible for mocking, unless we have another binary that 
> just forward all the requests back to the mock executor in the test process. 
> Using the test containerizer seems more simpler.

Oh I'm still suggesting we'd use the test containerizer, couldn't we have the 
mock executor test schduler use the test containerizer?

I'm just looking for a way to avoid manual subscription, task launching, and so 
on.


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69452/#review210905
---


On Nov. 27, 2018, 4:58 a.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69452/
> ---
> 
> (Updated Nov. 27, 2018, 4:58 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Benjamin Mahler, Greg Mann, 
> and Till Toenshoff.
> 
> 
> Bugs: MESOS-9419
> https://issues.apache.org/jira/browse/MESOS-9419
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds the `ExecutorMessageToRecoveredHttpFramework` test to
> ensure that a master will not crash when forwarding an executor message
> to a recovered framework to prevent any future regression of MESOS-9419.
> 
> 
> Diffs
> -
> 
>   src/tests/master_tests.cpp 651bb9ba9298fc3d7179ed86487aa55be519ce12 
>   src/tests/mesos.hpp 576f4bde88c069ee2fa0dd33912a034437338e7e 
> 
> 
> Diff: https://reviews.apache.org/r/69452/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> This test will fail without r/69451.
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 69452: Added a test for executors sending messages to recovered frameworks.

2018-11-27 Thread Chun-Hung Hsiao


> On Nov. 27, 2018, 9:16 p.m., Benjamin Mahler wrote:
> > src/tests/master_tests.cpp
> > Lines 3850 (patched)
> > 
> >
> > Maybe a TODO to simplify this test by having a test scheduler that 
> > knows how to launch a mock executor?

I'm not sure if this is a good idea, because the default (composing or mesos 
containerizer) would launch the executor in another linux process, which would 
be impossible for mocking, unless we have another binary that just forward all 
the requests back to the mock executor in the test process. Using the test 
containerizer seems more simpler.


> On Nov. 27, 2018, 9:16 p.m., Benjamin Mahler wrote:
> > src/tests/master_tests.cpp
> > Lines 3861 (patched)
> > 
> >
> > executorConnected?

This holds the `executor::Mesos*` object that we'll use later, and the 
`...Library` name is consistent to other tests where we name the object.


> On Nov. 27, 2018, 9:16 p.m., Benjamin Mahler wrote:
> > src/tests/master_tests.cpp
> > Lines 3974-3976 (patched)
> > 
> >
> > Maybe settle at the end anyway, so that we start calling destructors 
> > after the message finishes processing? Right now it will start after the 
> > message starts processing.

Do you mean pausing the clock and do a `Clock::settle()` before the test 
finishes? Yeah right now the teardown will start after the message starts 
processing, but the teardown process would wait for the master process to 
terminate.


> On Nov. 27, 2018, 9:16 p.m., Benjamin Mahler wrote:
> > src/tests/mesos.hpp
> > Line 907 (original), 919 (patched)
> > 
> >
> > just curious, what happened here? seems like this should be a different 
> > patch?

Sure let me split it into another patch. It's just that the common helper uses 
the `slave_id` accesser, which is not universal to v0/v1.


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69452/#review210905
---


On Nov. 27, 2018, 4:58 a.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69452/
> ---
> 
> (Updated Nov. 27, 2018, 4:58 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Benjamin Mahler, Greg Mann, 
> and Till Toenshoff.
> 
> 
> Bugs: MESOS-9419
> https://issues.apache.org/jira/browse/MESOS-9419
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds the `ExecutorMessageToRecoveredHttpFramework` test to
> ensure that a master will not crash when forwarding an executor message
> to a recovered framework to prevent any future regression of MESOS-9419.
> 
> 
> Diffs
> -
> 
>   src/tests/master_tests.cpp 651bb9ba9298fc3d7179ed86487aa55be519ce12 
>   src/tests/mesos.hpp 576f4bde88c069ee2fa0dd33912a034437338e7e 
> 
> 
> Diff: https://reviews.apache.org/r/69452/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> This test will fail without r/69451.
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 69452: Added a test for executors sending messages to recovered frameworks.

2018-11-27 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69452/#review210905
---


Ship it!




Thanks for the test! Have you run it in repetition to make sure it's not flaky?

Left some suggestions below.


src/tests/master_tests.cpp
Lines 3845 (patched)


s/frameworks that have not re-registered/recovered (but not yet 
re-registered) frameworks/



src/tests/master_tests.cpp
Lines 3846 (patched)


s/messages to their frameworks/executor->framework messages/



src/tests/master_tests.cpp
Lines 3848-3849 (patched)


Probably this could be a bit clearer: that in the pid case the agent 
bypasses the master and directly sends to the scheduler driver.



src/tests/master_tests.cpp
Lines 3850 (patched)


Maybe a TODO to simplify this test by having a test scheduler that knows 
how to launch a mock executor?



src/tests/master_tests.cpp
Lines 3861 (patched)


executorConnected?



src/tests/master_tests.cpp
Lines 3974-3976 (patched)


Maybe settle at the end anyway, so that we start calling destructors after 
the message finishes processing? Right now it will start after the message 
starts processing.



src/tests/mesos.hpp
Line 907 (original), 919 (patched)


just curious, what happened here? seems like this should be a different 
patch?


- Benjamin Mahler


On Nov. 27, 2018, 4:58 a.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69452/
> ---
> 
> (Updated Nov. 27, 2018, 4:58 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Benjamin Mahler, Greg Mann, 
> and Till Toenshoff.
> 
> 
> Bugs: MESOS-9419
> https://issues.apache.org/jira/browse/MESOS-9419
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds the `ExecutorMessageToRecoveredHttpFramework` test to
> ensure that a master will not crash when forwarding an executor message
> to a recovered framework to prevent any future regression of MESOS-9419.
> 
> 
> Diffs
> -
> 
>   src/tests/master_tests.cpp 651bb9ba9298fc3d7179ed86487aa55be519ce12 
>   src/tests/mesos.hpp 576f4bde88c069ee2fa0dd33912a034437338e7e 
> 
> 
> Diff: https://reviews.apache.org/r/69452/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> This test will fail without r/69451.
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 69451: Fixed master crash when executors send messages to recovered frameworks.

2018-11-27 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69451/#review210904
---



Thanks for jumping on this and producing a fix! Some small edit suggestions to 
the description:

```
The Framework::send function assumes that either http or pid is
set, which is not true for a unregistered framework recovered from agent  // 
s/unregistered//
reregistration. As a result, the master would crash when a recovered  // 
s/reregistration/that hasn't yet re-registered/
executor tries to send a message to such a framework. This patch fixes// 
s/framework/framework (see MESOS-).
this crash bug.
```


src/master/master.hpp
Line 2595 (original), 2597 (patched)


attempting?



src/master/master.hpp
Line 2599 (original), 2600-2604 (patched)


Hm.. based on this comment, should we only be sending in the 
disconnected+pid case?

Also, which messages is this referring to? It seems to suggest we let all 
messages through, but in the call sites we seem to guard this call to avoid 
calling it for the disconnected case?



src/master/master.hpp
Line 2609 (original), 2614 (patched)


Do we want a similar log warning here for the else case?

```
} else {
  LOG(WARNING) << "Unable to send event to framework " << *this << ":"
   << " framework is recovered but hasn't re-registered";
}
```

Is this accurate or are there other cases? My read of the http/pid state 
comment is that it will be set based on the last connection, therefore the only 
time one of them won't be set is the recovered case (the change that broke the 
original one-being-set invariant).


- Benjamin Mahler


On Nov. 27, 2018, 4:58 a.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69451/
> ---
> 
> (Updated Nov. 27, 2018, 4:58 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Benjamin Mahler, Greg Mann, 
> and Till Toenshoff.
> 
> 
> Bugs: MESOS-9419
> https://issues.apache.org/jira/browse/MESOS-9419
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The `Framework::send` function assumes that either `http` or `pid` is
> set, which is not true for a unregistered framework recovered from agent
> reregistration. As a result, the master would crash when a recovered
> executor tries to send a message to such a framework. This patch fixes
> this crash bug.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 3b3c1a4e61de9503c8d038dd3bee623ded5914c9 
>   src/master/master.cpp b4b02d8b4d7d6d1aabda1f97b9bf824419f76a9e 
> 
> 
> Diff: https://reviews.apache.org/r/69451/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 69458: Persisted the operation updates we sent out.

2018-11-27 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69458/#review210898
---



PASS: Mesos patch 69458 was successfully built and tested.

Reviews applied: `['69157', '69458']`

All the build artifacts available at: 
http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2654/mesos-review-69458

- Mesos Reviewbot Windows


On Nov. 27, 2018, 5:59 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69458/
> ---
> 
> (Updated Nov. 27, 2018, 5:59 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan 
> Schlicht.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When the agent handles an `UpdateOperationStatusMessage` from a resource
> provider, it injects its own ID which is (at least conceptually) unknown
> to the resource provider before forwarding the message to the master,
> and also updates its own tracking for the operation.
> 
> This patch makes sure that we first mutate the message before handing it
> on for updating the internal operation tracking, while previously we
> used the unmodified message. Always using the same message reduces error
> potential if in future changes we e.g., introduce agent operation status
> update managers.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp 858b78620e1ef33f3587d0bd95a684917aaf5bbb 
> 
> 
> Diff: https://reviews.apache.org/r/69458/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 69157: Fixed handling for offer operation updates.

2018-11-27 Thread Benjamin Bannier


> On Nov. 20, 2018, 8:05 p.m., Chun-Hung Hsiao wrote:
> > Could you update the commit message accordingly?

Do you have a suggestion?


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69157/#review210737
---


On Nov. 27, 2018, 6:59 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69157/
> ---
> 
> (Updated Nov. 27, 2018, 6:59 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8403
> https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The handling of offer operation updates introduced in `c946615ec6d`
> made use of an update's `latest_status` without making sure that any
> value was set. This could lead to situation where an uninitialized
> enum value was switched on which would have caused a fatal error at
> runtime.
> 
> This patch replaces uses of `latest_status` with `state` which does
> contain the information we care about. We also adjust the error
> logging so we log the value that lead to the error, not some other
> value.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp 858b78620e1ef33f3587d0bd95a684917aaf5bbb 
> 
> 
> Diff: https://reviews.apache.org/r/69157/diff/5/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 69163: Set agent and/or resource provider ID in operation status updates.

2018-11-27 Thread Benjamin Bannier


> On Nov. 20, 2018, 4:24 a.m., Chun-Hung Hsiao wrote:
> > src/master/master.cpp
> > Lines 8192-8201 (patched)
> > 
> >
> > We can get rid of this snippet and simply use `providerId`. Or, 
> > validate that `resourceProviderId` is always the same as `providerId`.
> > 
> > We should probably validate that both resources and operations have 
> > their resource provider id set properly in the resource provider manager, 
> > and issue an `ERROR` event back to the resource provider, instead of 
> > crashing the agent: 
> > https://github.com/apache/mesos/blob/master/src/resource_provider/manager.cpp#L907
> > I created https://issues.apache.org/jira/browse/MESOS-9407 to capture 
> > this. No need to address it in this patch.

Dropping since this is not an issue anymore in the updated code.


> On Nov. 20, 2018, 4:24 a.m., Chun-Hung Hsiao wrote:
> > src/master/master.cpp
> > Lines 8199 (patched)
> > 
> >
> > How about inlining this ternary operation into the use below so we can 
> > get rid of `resourceProviderId_`?

Ditto.


> On Nov. 20, 2018, 4:24 a.m., Chun-Hung Hsiao wrote:
> > src/resource_provider/manager.cpp
> > Lines 887-888 (patched)
> > 
> >
> > It seems not cohesive that we include both the agent ID and the 
> > resource provider ID when generating the `UpdateOperationStatusMessage` in 
> > SLRP, but drop them because `Call::UpdateOperationStatus` does not have 
> > corresponding fields, then:
> > 1. Recover the resource provider ID in the RP manager, and
> > 2. Recover the agent ID in `Slave::handleResourceProviderMessage`.
> > 
> > I was wondering if it is a good idea to add `slave_id` and 
> > `resource_provider_id` in `OperationStatus` instead, then generate 
> > `OperationStatusUpdateMessage` based on `OperationStatus` for backward 
> > compatibility.

I did the suggested change in the preceeding patch, dropping.


> On Nov. 20, 2018, 4:24 a.m., Chun-Hung Hsiao wrote:
> > src/resource_provider/storage/provider.cpp
> > Line 3079 (original), 3080-3081 (patched)
> > 
> >
> > We don't need to set up the slave ID and resource provider ID since 
> > dropped operations will not be bookkept and retried.

This piece of code is now gone, but I think it makes sense to provide semantics 
as consistent as possible to schedulers, so I'd prefer to set as many fields as 
possible, even if redundant.

Dropping.


> On Nov. 20, 2018, 4:24 a.m., Chun-Hung Hsiao wrote:
> > src/slave/slave.cpp
> > Lines 4386 (patched)
> > 
> >
> > How about inlining the ternary operation into its use below so we don't 
> > need this `resourceProviderId_`?

Reworked this code, droppping.


> On Nov. 20, 2018, 4:24 a.m., Chun-Hung Hsiao wrote:
> > src/tests/master_tests.cpp
> > Lines 9038 (patched)
> > 
> >
> > Since the resource provider manager is the one adding the resource 
> > provider ID, does it make sense to move this test to 
> > `resource_provider_manager_tests.cpp`?
> > 
> > If we make the decision to add `slave_id` and `resource_provider_id` in 
> > `OperationStatus` and make the SLRP generates them, we should move this 
> > test to `storage_local_resource_provider_tests.cpp`.

I reworked the code so now RPs inject RP IDs, and agents inject agent IDs. The 
test here really only tests the agent part, as we use a `MockResourceProvider`. 
We could probably move it to `slave_tests.cpp`, but at the same time we also 
test an master API.


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69163/#review210695
---


On Nov. 27, 2018, 7 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69163/
> ---
> 
> (Updated Nov. 27, 2018, 7 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, and James DeFelice.
> 
> 
> Bugs: MESOS-9293
> https://issues.apache.org/jira/browse/MESOS-9293
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch sets agent and/or resource provider ID operation status
> update messages. This is not always possible, e.g., some operations
> might fail validation so that no corresponding IDs can be extracted.
> 
> Since operations failing validation are currently directly rejected by
> the master without going 

Re: Review Request 69162: Added agent and resource provider IDs to operation status messages.

2018-11-27 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69162/
---

(Updated Nov. 27, 2018, 7:03 p.m.)


Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, and James DeFelice.


Changes
---

Moved information into `OperationStatus` as suggested by Chun.


Bugs: MESOS-9293
https://issues.apache.org/jira/browse/MESOS-9293


Repository: mesos


Description
---

This patch add agent and resource provide IDs to
`UpdateOperationStatus` and `UpdateOperationStatusMessage`. With that
frameworks are able to reconcile enough information after failover to
construct operation acknowledgements.

We will add code to populate these fields in a follow-up patch.


Diffs (updated)
-

  include/mesos/mesos.proto 06a901d26693757edc653cd833d55aa42e4ff2c6 
  include/mesos/v1/mesos.proto 75cdb2889b2b645e23d9f5ab263ee63bf62b4221 
  src/common/type_utils.cpp ef13eae47b88efc15f1b2d00852b6387c2fffcbc 


Diff: https://reviews.apache.org/r/69162/diff/5/

Changes: https://reviews.apache.org/r/69162/diff/4-5/


Testing
---

`make check`


Thanks,

Benjamin Bannier



Review Request 69458: Persisted the operation updates we sent out.

2018-11-27 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69458/
---

Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan 
Schlicht.


Repository: mesos


Description
---

When the agent handles an `UpdateOperationStatusMessage` from a resource
provider, it injects its own ID which is (at least conceptually) unknown
to the resource provider before forwarding the message to the master,
and also updates its own tracking for the operation.

This patch makes sure that we first mutate the message before handing it
on for updating the internal operation tracking, while previously we
used the unmodified message. Always using the same message reduces error
potential if in future changes we e.g., introduce agent operation status
update managers.


Diffs
-

  src/slave/slave.cpp 858b78620e1ef33f3587d0bd95a684917aaf5bbb 


Diff: https://reviews.apache.org/r/69458/diff/1/


Testing
---

`make check`


Thanks,

Benjamin Bannier



Re: Review Request 69162: Added agent and resource provider IDs to operation status messages.

2018-11-27 Thread Benjamin Bannier


> On Nov. 20, 2018, 2:57 a.m., Chun-Hung Hsiao wrote:
> > include/mesos/scheduler/scheduler.proto
> > Line 140 (original), 140 (patched)
> > 
> >
> > It seems more consistent with `TaskStatus` if we put `slave_id` and 
> > `resource_provider_id` in `OperationStatus`. What's the pros and cons of 
> > doing that instead?

We discussed this offline, and I moved the information into `TaskStatus`.

The gist of the discussion was that moving it into `OperationStatus` instead of 
`UpdateOperationStatus` introduces less coupling and imposes less future 
restrictions (just like for `TaskStatus`).


> On Nov. 20, 2018, 2:57 a.m., Chun-Hung Hsiao wrote:
> > src/messages/messages.cpp
> > Lines 146-149 (original), 156-162 (patched)
> > 
> >
> > How about the following:
> > ```
> >   if (update.has_resource_provider_id()) {
> > stream << " from resource provider " << 
> > update.resource_provider_id();
> >   }
> > 
> >   if (update.has_slave_id()) {
> > stream << " on agent " << update.slave_id();
> >   }
> > ```

Not an issue anymore.


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69162/#review210694
---


On Nov. 12, 2018, 9:49 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69162/
> ---
> 
> (Updated Nov. 12, 2018, 9:49 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, and James DeFelice.
> 
> 
> Bugs: MESOS-9293
> https://issues.apache.org/jira/browse/MESOS-9293
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch add agent and resource provide IDs to
> `UpdateOperationStatus` and `UpdateOperationStatusMessage`. With that
> frameworks are able to reconcile enough information after failover to
> construct operation acknowledgements.
> 
> We will add code to populate these fields in a follow-up patch.
> 
> 
> Diffs
> -
> 
>   include/mesos/scheduler/scheduler.proto 
> f6a780a7b75878b9e74402a28a25bb868f7ac36f 
>   include/mesos/v1/scheduler/scheduler.proto 
> fcfec5e417463103e98dd6917722b4fde41cac7c 
>   src/internal/evolve.cpp aa60efed5a28ed4d847a4a27aa8e994cda82175d 
>   src/messages/messages.cpp dd8f60ecdbc06d10be1152bee1ddb65feaaf8fbb 
>   src/messages/messages.proto 41e6a8a2eab0ae7c2878c1d3286c5dea0eb68ed7 
> 
> 
> Diff: https://reviews.apache.org/r/69162/diff/4/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 69157: Fixed handling for offer operation updates.

2018-11-27 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69157/
---

(Updated Nov. 27, 2018, 6:59 p.m.)


Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.


Bugs: MESOS-8403
https://issues.apache.org/jira/browse/MESOS-8403


Repository: mesos


Description
---

The handling of offer operation updates introduced in `c946615ec6d`
made use of an update's `latest_status` without making sure that any
value was set. This could lead to situation where an uninitialized
enum value was switched on which would have caused a fatal error at
runtime.

This patch replaces uses of `latest_status` with `state` which does
contain the information we care about. We also adjust the error
logging so we log the value that lead to the error, not some other
value.


Diffs (updated)
-

  src/slave/slave.cpp 858b78620e1ef33f3587d0bd95a684917aaf5bbb 


Diff: https://reviews.apache.org/r/69157/diff/5/

Changes: https://reviews.apache.org/r/69157/diff/4-5/


Testing
---

`make check`


Thanks,

Benjamin Bannier



Re: Review Request 69163: Set agent and/or resource provider ID in operation status updates.

2018-11-27 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69163/
---

(Updated Nov. 27, 2018, 7 p.m.)


Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, and James DeFelice.


Bugs: MESOS-9293
https://issues.apache.org/jira/browse/MESOS-9293


Repository: mesos


Description
---

This patch sets agent and/or resource provider ID operation status
update messages. This is not always possible, e.g., some operations
might fail validation so that no corresponding IDs can be extracted.

Since operations failing validation are currently directly rejected by
the master without going through a status update manager, they are not
retried either. If a master status update manager for operations is
introduced at a later point it should be possible to forward
acknowledgements for updates to the master's update manager (no agent
ID, not resource provider ID).


Diffs (updated)
-

  src/common/protobuf_utils.hpp 1662125ed3e47b179ee32d08c1d3af75553066ba 
  src/common/protobuf_utils.cpp a45607eed4c4bae5010bcc3f3ffeabd6d911062a 
  src/master/master.cpp b4b02d8b4d7d6d1aabda1f97b9bf824419f76a9e 
  src/resource_provider/manager.cpp 6c81c430e9e1205d71982a7fa2bcd9aa15fc01b2 
  src/resource_provider/storage/provider.cpp 
c137fa4f13edc58d93c03a9dd32fdf9d38b38316 
  src/slave/slave.cpp 858b78620e1ef33f3587d0bd95a684917aaf5bbb 
  src/tests/master_tests.cpp 651bb9ba9298fc3d7179ed86487aa55be519ce12 
  src/tests/mesos.hpp 576f4bde88c069ee2fa0dd33912a034437338e7e 


Diff: https://reviews.apache.org/r/69163/diff/4/

Changes: https://reviews.apache.org/r/69163/diff/3-4/


Testing
---

`make check`


Thanks,

Benjamin Bannier



Re: Review Request 69449: Added the `DISCARD` blkio cgroup operation.

2018-11-27 Thread Gilbert Song

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69449/#review210892
---


Ship it!




Ship It!

- Gilbert Song


On Nov. 26, 2018, 12:38 p.m., James Peach wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69449/
> ---
> 
> (Updated Nov. 26, 2018, 12:38 p.m.)
> 
> 
> Review request for mesos, Gilbert Song, Jie Yu, Joseph Wu, and Qian Zhang.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The Linux 4.19 kernel added a new `Discard` operation to the blkio
> cgroup statistics. Added this new operation so that the containerizer
> won't fail to parse blkio statistics on the latest kernels.
> 
> 
> Diffs
> -
> 
>   include/mesos/mesos.proto 06a901d26693757edc653cd833d55aa42e4ff2c6 
>   include/mesos/v1/mesos.proto 75cdb2889b2b645e23d9f5ab263ee63bf62b4221 
>   src/linux/cgroups.hpp f4cc2d829fe4414b36532d0d77730f97e00fcb9a 
>   src/linux/cgroups.cpp 45dd5cf6747c1af8b18805dd83fcf542131d5708 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/blkio.cpp 
> fbf1fed783934b10f9f726152ede33d3dade52d0 
> 
> 
> Diff: https://reviews.apache.org/r/69449/diff/1/
> 
> 
> Testing
> ---
> 
> sudo make check (Fedora 29)
> 
> 
> Thanks,
> 
> James Peach
> 
>



Re: Review Request 69454: Added the autoconf `tar-pax` option.

2018-11-27 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69454/#review210886
---



PASS: Mesos patch 69454 was successfully built and tested.

Reviews applied: `['69454']`

All the build artifacts available at: 
http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2653/mesos-review-69454

- Mesos Reviewbot Windows


On Nov. 27, 2018, 5:51 a.m., James Peach wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69454/
> ---
> 
> (Updated Nov. 27, 2018, 5:51 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier and Till Toenshoff.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The default tar format used in `make dist` is v7, which only supports
> paths of up to 99 bytes in length. This causes errors when building
> the CentOS RPM and adding files from 3rd party packages.
> 
> 
> Diffs
> -
> 
>   configure.ac c193adf93531170fb31a31c296e3d4ae1d0300b2 
> 
> 
> Diff: https://reviews.apache.org/r/69454/diff/1/
> 
> 
> Testing
> ---
> 
> Ran `support/packaging/centos/build-rpm-docker.sh` and observed no tar errors.
> 
> Without this change, this is what happens:
> ```
> smelt:centos jpeach$ ./build-rpm-docker.sh
> ...
> tardir=mesos-1.8.0 && ${TAR-tar} chof - "$tardir" | GZIP=--best gzip -c 
> >mesos-1.8.0.tar.gz
> tar: 
> mesos-1.8.0/3rdparty/libprocess/3rdparty/glog-0.3.3/vsprojects/logging_unittest/logging_unittest.vcproj:
>  file name is too long (max 99); not dumped
> tar: 
> mesos-1.8.0/3rdparty/libprocess/3rdparty/glog-0.3.3/vsprojects/logging_unittest_static/logging_unittest_static.vcproj:
>  file name is too long (max 99); not dumped
> ...
> ```
> 
> 
> Thanks,
> 
> James Peach
> 
>