Re: Review Request 59464: Add Fetcher task total and failed fetch metrics.

2017-06-21 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59464/#review178543
---


Ship it!




LGTM.  I can add the comment mentioned below before committing.


src/slave/containerizer/fetcher.hpp
Lines 327-328 (patched)


It's important enough to add a comment here about the counter resolution.  
This number will go up once per task, rather than once per artifact.

So if 1 task asks for 10 artifacts, the total number of fetches is 1.  If 1 
to 10 of those artifacts fail to fetch, the total number of failures is also 1.



src/slave/containerizer/fetcher.cpp
Lines 68-78 (original), 70-78 (patched)


This change isn't strictly related to the metrics, so it could be moved 
elsewhere.


- Joseph Wu


On June 9, 2017, 11:55 a.m., James Peach wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59464/
> ---
> 
> (Updated June 9, 2017, 11:55 a.m.)
> 
> 
> Review request for mesos and Joseph Wu.
> 
> 
> Bugs: MESOS-7524
> https://issues.apache.org/jira/browse/MESOS-7524
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Add the Fetcher metrics to track the number of fetch requests
> sent to the Fetcher (`containerizer/fetcher/task_fetches_total`)
> and the number of errors reported by the Fetcher
> (`containerizer/fetcher/task_fetches_failed`).
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/fetcher.hpp 
> efeadbf4b7804ea4c1e443d1e5212e303796ace4 
>   src/slave/containerizer/fetcher.cpp 
> 770cad3e046e8a6d58b6bc9176eb7ecdbd340db4 
> 
> 
> Diff: https://reviews.apache.org/r/59464/diff/4/
> 
> 
> Testing
> ---
> 
> make check (Fedora 25)
> 
> 
> Thanks,
> 
> James Peach
> 
>



Re: Review Request 59891: Windows: Use `cmd /C exit 1` instead of `false`.

2017-06-21 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59891/#review178554
---


Ship it!




Ship It!

- Joseph Wu


On June 7, 2017, 12:50 p.m., Andrew Schwartzmeyer wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59891/
> ---
> 
> (Updated June 7, 2017, 12:50 p.m.)
> 
> 
> Review request for mesos, Jeff Coffler, John Kordich, Joseph Wu, and Li Li.
> 
> 
> Bugs: MESOS-7638
> https://issues.apache.org/jira/browse/MESOS-7638
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This fixes tests that pass but emit error messages due to `false` not
> existing as a command on Windows.
> 
> 
> Diffs
> -
> 
>   src/tests/health_check_tests.cpp 7917a222daf6a8fbe2a5fe6657852e6ff6e699ba 
> 
> 
> Diff: https://reviews.apache.org/r/59891/diff/2/
> 
> 
> Testing
> ---
> 
> Trying to split out fixes I made in my fix-long-paths branch that can be 
> applied sooner, so this has been tested quite thoroughly but with other 
> patches.
> 
> 
> Thanks,
> 
> Andrew Schwartzmeyer
> 
>



Re: Review Request 59865: Fixed conversion warning in `duration.hpp`.

2017-06-21 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59865/#review178553
---


Ship it!




Ship It!

- Joseph Wu


On June 7, 2017, 1 p.m., Andrew Schwartzmeyer wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59865/
> ---
> 
> (Updated June 7, 2017, 1 p.m.)
> 
> 
> Review request for mesos, Jeff Coffler, John Kordich, Joseph Wu, and Li Li.
> 
> 
> Bugs: MESOS-7459
> https://issues.apache.org/jira/browse/MESOS-7459
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> While `ns()` and our other types are `int64_t`, the variables to which
> we're assigning are (by definition of `struct timeval`) a `time_t` or a
> `long` (depending on platform). We cast it because there is no
> alternative, and we otherwise generate hundreds of warnings. We cast to
> `decltype(...)` because the type is different depending on platform.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/duration.hpp 
> 138993e25c51efc82d447fe82037bf3a5fe7fd91 
> 
> 
> Diff: https://reviews.apache.org/r/59865/diff/2/
> 
> 
> Testing
> ---
> 
> Built `stout-tests` on Windows and Linux, no more `duration.hpp` warnings.
> 
> 
> Thanks,
> 
> Andrew Schwartzmeyer
> 
>



Re: Review Request 59817: Changed "future discarded" log message from error to info.

2017-06-21 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59817/#review178551
---


Ship it!




Ship It!

- Joseph Wu


On June 5, 2017, 12:26 p.m., Andrew Schwartzmeyer wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59817/
> ---
> 
> (Updated June 5, 2017, 12:26 p.m.)
> 
> 
> Review request for mesos and Joseph Wu.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This log line is not usually a problem, and reporting it as an error can
> cause needless debugging.
> 
> 
> Diffs
> -
> 
>   3rdparty/libprocess/src/process.cpp 
> 4ff7448d171f39dbb8cbb81dd9bed136ad43d62d 
> 
> 
> Diff: https://reviews.apache.org/r/59817/diff/1/
> 
> 
> Testing
> ---
> 
> Rebuilt `libprocess-tests` on Windows.
> 
> 
> Thanks,
> 
> Andrew Schwartzmeyer
> 
>



Re: Review Request 59500: Added Windows ReviewBot launch script.

2017-06-21 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59500/#review178550
---


Ship it!




Needs some more newlines between comments (I'll add them), but otherwise LGTM.

- Joseph Wu


On June 5, 2017, 10:34 a.m., Andrew Schwartzmeyer wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59500/
> ---
> 
> (Updated June 5, 2017, 10:34 a.m.)
> 
> 
> Review request for mesos, Jeff Coffler, Joseph Wu, and Li Li.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This adds a support script used to run the Windows ReviewBot in a loop,
> while properly logging the output and providing a URL for the log.
> 
> 
> Diffs
> -
> 
>   support/mesos-reviewbot.ps1 PRE-CREATION 
>   support/verify-reviews.py 391bef5c15a7399f037e54600d1b13c9bd261811 
> 
> 
> Diff: https://reviews.apache.org/r/59500/diff/3/
> 
> 
> Testing
> ---
> 
> This is running the current Windows ReviewBot.
> 
> 
> Thanks,
> 
> Andrew Schwartzmeyer
> 
>



Re: Review Request 59466: Add metrics check to Fetcher tests.

2017-06-21 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59466/#review178546
---


Ship it!




Ship It!

- Joseph Wu


On June 9, 2017, 11:56 a.m., James Peach wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59466/
> ---
> 
> (Updated June 9, 2017, 11:56 a.m.)
> 
> 
> Review request for mesos and Joseph Wu.
> 
> 
> Bugs: MESOS-7524
> https://issues.apache.org/jira/browse/MESOS-7524
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> For each fetcher test, verify that the count of fetches and the count
> of errors is what we expect.
> 
> 
> Diffs
> -
> 
>   src/tests/fetcher_tests.cpp b4124158a0e92f289f0edc0c8cb9394350e1cbb5 
> 
> 
> Diff: https://reviews.apache.org/r/59466/diff/4/
> 
> 
> Testing
> ---
> 
> make check (Fedora 25)
> 
> 
> Thanks,
> 
> James Peach
> 
>



Re: Review Request 59467: Document new Fetcher metrics.

2017-06-21 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59467/#review178547
---


Ship it!




Ship It!

- Joseph Wu


On June 9, 2017, 11:56 a.m., James Peach wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59467/
> ---
> 
> (Updated June 9, 2017, 11:56 a.m.)
> 
> 
> Review request for mesos and Joseph Wu.
> 
> 
> Bugs: MESOS-7524
> https://issues.apache.org/jira/browse/MESOS-7524
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Document new Fetcher metrics.
> 
> 
> Diffs
> -
> 
>   docs/monitoring.md cb2833642e7e41c03c98ea92f7300d156a216a2e 
> 
> 
> Diff: https://reviews.apache.org/r/59467/diff/4/
> 
> 
> Testing
> ---
> 
> None.
> 
> 
> Thanks,
> 
> James Peach
> 
>



Review Request 60290: Handle EINVAL from the capabilities version check.

2017-06-21 Thread James Peach

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60290/
---

Review request for mesos, Alexander Rukletsov, Benjamin Bannier, and Jie Yu.


Repository: mesos


Description
---

capget will fail with the error EINVAL, and set the version field
to the kernel pre- ferred value of _LINUX_CAPABILITY_VERSION_?
when an unsupported version value is specified.

So if we get EINVAL, we can treat that as success because we can
continue to verify the capabilities version.


Diffs
-

  src/linux/capabilities.cpp b362759c06637a3a2b4359b95e86d6d3b87352dd 


Diff: https://reviews.apache.org/r/60290/diff/1/


Testing
---

make check (CentOS 6)


Thanks,

James Peach



Re: Review Request 59294: Optionally scale egress bandwidth with CPU.

2017-06-21 Thread Ian Downes

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59294/
---

(Updated June 21, 2017, 1:48 p.m.)


Review request for mesos, Dmitry Zhuk, Ilya Pronin, and Jie Yu.


Changes
---

Include egress rate limit, burst rate limit, and burst size in 
ResourceStatistics.


Bugs: MESOS-7508
https://issues.apache.org/jira/browse/MESOS-7508


Repository: mesos


Description
---

Add support to isolators/port_mapping for optionally scaling egress bandwidth 
with CPU and with minimum and maximum limits.


Diffs (updated)
-

  include/mesos/mesos.proto e2502acc3542d64c2b2b13f8c0363a3c8372e20e 
  src/slave/containerizer/mesos/isolators/network/helper.cpp 
4ed879dca42f85fc9dd7638e763822cf10fa8405 
  src/slave/containerizer/mesos/isolators/network/port_mapping.hpp 
9d38289c7161d5e931053b587d115684ccc44c94 
  src/slave/containerizer/mesos/isolators/network/port_mapping.cpp 
cd008aaebcd42554a9a81d2b059269546f59c966 
  src/slave/flags.hpp e75c1b4227b443aedf445921b3f2108d930c112c 
  src/slave/flags.cpp c84aa6724170bba46bbe8410b71d42a1626e 
  src/tests/containerizer/port_mapping_tests.cpp 
16e015a8ac53a4aa5336b60c40228720b5f6910a 


Diff: https://reviews.apache.org/r/59294/diff/4/

Changes: https://reviews.apache.org/r/59294/diff/3-4/


Testing
---

# added a new test 
$ make check


Thanks,

Ian Downes



Review Request 60288: Fix ambient capability tests on older systems.

2017-06-21 Thread James Peach

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60288/
---

Review request for mesos, Alexander Rukletsov and Jie Yu.


Repository: mesos


Description
---

If the kernel doesn't support ambinent capabilities we were leaving
stale capabilities in the ambient set from a precious check, causing
the later tests to fail.


Diffs
-

  src/tests/containerizer/capabilities_tests.cpp 
3070e1d57c2f211c46c0435349ac352bc875ee27 


Diff: https://reviews.apache.org/r/60288/diff/1/


Testing
---

make check (CentOS 6)


Thanks,

James Peach



Review Request 60287: Remove unnecessary test logging.

2017-06-21 Thread James Peach

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60287/
---

Review request for mesos, Alexander Rukletsov and Jie Yu.


Repository: mesos


Description
---

The glog expectations already print the capability sets, so we don't
need to add them to the failure log again.


Diffs
-

  src/tests/containerizer/capabilities_tests.cpp 
3070e1d57c2f211c46c0435349ac352bc875ee27 


Diff: https://reviews.apache.org/r/60287/diff/1/


Testing
---

make check (CentOS 6)


Thanks,

James Peach



Re: Review Request 60284: Removed unused `convertResourceFormat` for `Operation`.

2017-06-21 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60284/#review178537
---




src/common/resources_utils.cpp
Line 198 (original)


Also need to update the header file.


- Neil Conway


On June 21, 2017, 7:08 p.m., Michael Park wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60284/
> ---
> 
> (Updated June 21, 2017, 7:08 p.m.)
> 
> 
> Review request for mesos and Neil Conway.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Removed unused `convertResourceFormat` for `Operation`.
> 
> 
> Diffs
> -
> 
>   src/common/resources_utils.cpp 751194c5ff5c794c77be3bcf16892af39de6d4d9 
> 
> 
> Diff: https://reviews.apache.org/r/60284/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Michael Park
> 
>



Re: Review Request 60283: Fixed the uses of `convertResourceFormat` with `upgradeResources`.

2017-06-21 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60283/#review178536
---




src/master/master.cpp
Lines 4234 (patched)


Per offline discussion, naming this `upgradeResources_` vs validation 
errors as `error` is confusing. Call them both `error` and change the upgrade 
API?



src/master/master.cpp
Line 4302 (original), 4316 (patched)


Can we revert this formatting change?



src/master/master.cpp
Line 4592 (original), 4647 (patched)


Not yours, but using a reference here is strange.



src/master/master.cpp
Lines 4784 (patched)


If we can use `Option` for `upgradeResources`, can we avoid 
duplicating all this error handling code?


- Neil Conway


On June 21, 2017, 7:08 p.m., Michael Park wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60283/
> ---
> 
> (Updated June 21, 2017, 7:08 p.m.)
> 
> 
> Review request for mesos and Neil Conway.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Initially, it seemed like calling `convertResourceFormat` after
> operation validation seemed safe since the operation validation
> themselves performed `Resources::validate` within them.
> 
> However, the rest of the operation validation code relies on
> the fact that the resources have been validated, and uses
> functions such as `isDynamicallyReserved`. Since functions such as
> `isDynamicallyReserved` now requires "post-reservation-refinement"
> format, we must perform this conversion earlier.
> 
> In this patch, we use `upgradeResources` to perform resources
> validation __and__ convert the resources before going into the
> operation and task validation.
> 
> We really need a better plan for this going forward. MESOS-7702.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp ec594a8f4fa95e77fc38103c5561d1797fe2b133 
> 
> 
> Diff: https://reviews.apache.org/r/60283/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Michael Park
> 
>



Re: Review Request 60282: Introduced `upgradeResources` which also validates the resources.

2017-06-21 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60282/#review178535
---




src/common/resources_utils.cpp
Lines 271 (patched)


Should we put `validate` in the name, e.g., `validateAndUpgradeResources`? 
As currently named, the fact that it does validation is not obvious. Also, the 
current name suggests it is symmetric with `downgradeResources`, which is not 
quite true.



src/common/resources_utils.cpp
Lines 274 (patched)


Hmmm -- this means that we're going to validate most resources twice in 
most code paths, right? That seems unfortunate, although I guess it isn't easy 
to avoid.


- Neil Conway


On June 21, 2017, 7:07 p.m., Michael Park wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60282/
> ---
> 
> (Updated June 21, 2017, 7:07 p.m.)
> 
> 
> Review request for mesos and Neil Conway.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Validation code for operations, and tasks validated the resources as
> the first step, and assumed valid resources from then on. This means
> that it started to use functions such as `isDynamicallyReserved`.
> 
> However, since `isDynamicallyReserved` requires the resources to be
> in the "post-reservation-refinement" format, we must convert the
> resources before using those functions. For now, we introduce
> a `upgradeResources` abstraction which is called to validate and
> convert the resources before invoking operation / task validation.
> 
> 
> Diffs
> -
> 
>   src/common/resources_utils.hpp 4b033687fa7224ffa36523cb3ca366f4384fd4f8 
>   src/common/resources_utils.cpp 751194c5ff5c794c77be3bcf16892af39de6d4d9 
> 
> 
> Diff: https://reviews.apache.org/r/60282/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Michael Park
> 
>



Re: Review Request 60280: Provide full path to the custom executor.

2017-06-21 Thread Aaron Wood via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60280/#review178534
---



Someone correct me if I'm wrong, but I don't think we need this anymore:
```
  if (launchInfo.has_working_directory()) {
Try chdir = os::chdir(launchInfo.working_directory());
if (chdir.isError()) {
  cerr << "Failed to chdir into current working directory "
   << "'" << launchInfo.working_directory() << "': "
   << chdir.error() << endl;
  exitWithStatus(EXIT_FAILURE);
}
  }
```
Everyone okay with me removing this?

- Aaron Wood


On June 21, 2017, 7:15 p.m., Aaron Wood wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60280/
> ---
> 
> (Updated June 21, 2017, 7:15 p.m.)
> 
> 
> Review request for mesos, Jie Yu, James Peach, and Zhitao Li.
> 
> 
> Bugs: MESOS-7703
> https://issues.apache.org/jira/browse/MESOS-7703
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> If a framework specifies use of its own executor and sets shell to false the 
> executor is never found. Additionally, the name of the binary is never passed 
> as an argument so executors making use of argv[0] will fail. This provides 
> the full path to the executor so that the `execvp` or `execvpe` is 
> successful. The name of the binary is also passed as the first argument for 
> cases where there is no shell used.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/launch.cpp 162ca1c2e 
> 
> 
> Diff: https://reviews.apache.org/r/60280/diff/1/
> 
> 
> Testing
> ---
> 
> `cd build && cmake .. -DCMAKE_BUILD_TYPE=Release && make -j4`
> Also spun up a master and agent, connected and sent a task using the UCR 
> (both with and without the use of an OCI image) via our own framework, and 
> checked the sandbox to verify that things went accordingly.
> 
> 
> Thanks,
> 
> Aaron Wood
> 
>



Re: Review Request 60281: Added the adjusted `TaskInfo` into the `Operation` to be sent out.

2017-06-21 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60281/#review178533
---



Can we write a unit test for this change?

- Neil Conway


On June 21, 2017, 7:06 p.m., Michael Park wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60281/
> ---
> 
> (Updated June 21, 2017, 7:06 p.m.)
> 
> 
> Review request for mesos and Neil Conway.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Since we drop invalid tasks on a `LAUNCH` operation, we send a new
> `Operation` with only the valid tasks appended. At the same time,
> we adjust the `TaskInfo` a little bit to perform validation, and
> to maintain backwards compatibility. In the end, we were appending
> the unadjusted task to the new `Operation` rather than the adjusted
> task. This patch changes this to append the adjusted task instead.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp ec594a8f4fa95e77fc38103c5561d1797fe2b133 
> 
> 
> Diff: https://reviews.apache.org/r/60281/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Michael Park
> 
>



Review Request 60280: Provide full path to the custom executor.

2017-06-21 Thread Aaron Wood via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60280/
---

Review request for mesos, Jie Yu, James Peach, and Zhitao Li.


Bugs: MESOS-7703
https://issues.apache.org/jira/browse/MESOS-7703


Repository: mesos


Description
---

If a framework specifies use of its own executor and sets shell to false the 
executor is never found. Additionally, the name of the binary is never passed 
as an argument so executors making use of argv[0] will fail. This provides the 
full path to the executor so that the `execvp` or `execvpe` is successful. The 
name of the binary is also passed as the first argument for cases where there 
is no shell used.


Diffs
-

  src/slave/containerizer/mesos/launch.cpp 162ca1c2e 


Diff: https://reviews.apache.org/r/60280/diff/1/


Testing
---

`cd build && cmake .. -DCMAKE_BUILD_TYPE=Release && make -j4`
Also spun up a master and agent, connected and sent a task using the UCR (both 
with and without the use of an OCI image) via our own framework, and checked 
the sandbox to verify that things went accordingly.


Thanks,

Aaron Wood



Re: Review Request 59855: Set the fetcher cache size at construction time.

2017-06-21 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59855/#review178528
---


Ship it!




Ship It!

- Joseph Wu


On June 6, 2017, 1:20 p.m., James Peach wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59855/
> ---
> 
> (Updated June 6, 2017, 1:20 p.m.)
> 
> 
> Review request for mesos and Joseph Wu.
> 
> 
> Bugs: MESOS-7524
> https://issues.apache.org/jira/browse/MESOS-7524
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fixing the TODO, set the FetcherProcess::Cache size at construction time
> rather than updating it in a separate setter function.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/fetcher.hpp 
> efeadbf4b7804ea4c1e443d1e5212e303796ace4 
>   src/slave/containerizer/fetcher.cpp 
> 770cad3e046e8a6d58b6bc9176eb7ecdbd340db4 
> 
> 
> Diff: https://reviews.apache.org/r/59855/diff/2/
> 
> 
> Testing
> ---
> 
> make check (Fedora 25)
> 
> 
> Thanks,
> 
> James Peach
> 
>



Review Request 60284: Removed unused `convertResourceFormat` for `Operation`.

2017-06-21 Thread Michael Park

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60284/
---

Review request for mesos and Neil Conway.


Repository: mesos


Description
---

Removed unused `convertResourceFormat` for `Operation`.


Diffs
-

  src/common/resources_utils.cpp 751194c5ff5c794c77be3bcf16892af39de6d4d9 


Diff: https://reviews.apache.org/r/60284/diff/1/


Testing
---


Thanks,

Michael Park



Review Request 60283: Fixed the uses of `convertResourceFormat` with `upgradeResources`.

2017-06-21 Thread Michael Park

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60283/
---

Review request for mesos and Neil Conway.


Repository: mesos


Description
---

Initially, it seemed like calling `convertResourceFormat` after
operation validation seemed safe since the operation validation
themselves performed `Resources::validate` within them.

However, the rest of the operation validation code relies on
the fact that the resources have been validated, and uses
functions such as `isDynamicallyReserved`. Since functions such as
`isDynamicallyReserved` now requires "post-reservation-refinement"
format, we must perform this conversion earlier.

In this patch, we use `upgradeResources` to perform resources
validation __and__ convert the resources before going into the
operation and task validation.

We really need a better plan for this going forward. MESOS-7702.


Diffs
-

  src/master/master.cpp ec594a8f4fa95e77fc38103c5561d1797fe2b133 


Diff: https://reviews.apache.org/r/60283/diff/1/


Testing
---


Thanks,

Michael Park



Review Request 60282: Introduced `upgradeResources` which also validates the resources.

2017-06-21 Thread Michael Park

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60282/
---

Review request for mesos and Neil Conway.


Repository: mesos


Description
---

Validation code for operations, and tasks validated the resources as
the first step, and assumed valid resources from then on. This means
that it started to use functions such as `isDynamicallyReserved`.

However, since `isDynamicallyReserved` requires the resources to be
in the "post-reservation-refinement" format, we must convert the
resources before using those functions. For now, we introduce
a `upgradeResources` abstraction which is called to validate and
convert the resources before invoking operation / task validation.


Diffs
-

  src/common/resources_utils.hpp 4b033687fa7224ffa36523cb3ca366f4384fd4f8 
  src/common/resources_utils.cpp 751194c5ff5c794c77be3bcf16892af39de6d4d9 


Diff: https://reviews.apache.org/r/60282/diff/1/


Testing
---


Thanks,

Michael Park



Review Request 60281: Added the adjusted `TaskInfo` into the `Operation` to be sent out.

2017-06-21 Thread Michael Park

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60281/
---

Review request for mesos and Neil Conway.


Repository: mesos


Description
---

Since we drop invalid tasks on a `LAUNCH` operation, we send a new
`Operation` with only the valid tasks appended. At the same time,
we adjust the `TaskInfo` a little bit to perform validation, and
to maintain backwards compatibility. In the end, we were appending
the unadjusted task to the new `Operation` rather than the adjusted
task. This patch changes this to append the adjusted task instead.


Diffs
-

  src/master/master.cpp ec594a8f4fa95e77fc38103c5561d1797fe2b133 


Diff: https://reviews.apache.org/r/60281/diff/1/


Testing
---


Thanks,

Michael Park



Re: Review Request 59854: Make additional Fetcher and FetcherProcess methods const.

2017-06-21 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59854/#review178524
---


Ship it!




Ship It!

- Joseph Wu


On June 6, 2017, 1:19 p.m., James Peach wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59854/
> ---
> 
> (Updated June 6, 2017, 1:19 p.m.)
> 
> 
> Review request for mesos and Joseph Wu.
> 
> 
> Bugs: MESOS-7524
> https://issues.apache.org/jira/browse/MESOS-7524
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Make additional Fetcher and FetcherProcess methods const.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/fetcher.hpp 
> efeadbf4b7804ea4c1e443d1e5212e303796ace4 
>   src/slave/containerizer/fetcher.cpp 
> 770cad3e046e8a6d58b6bc9176eb7ecdbd340db4 
> 
> 
> Diff: https://reviews.apache.org/r/59854/diff/2/
> 
> 
> Testing
> ---
> 
> make check (Fedora 25)
> 
> 
> Thanks,
> 
> James Peach
> 
>



Re: Review Request 60252: Fixed a bug that causes segfault in ProcessManager::finalize.

2017-06-21 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60252/#review178522
---


Ship it!




I agree this makes the finalization logic safer, but I don't see a way to add a 
`nullptr` to the `ProcessManager::processes` map.

`ProcessManager::spawn` is the only location where the `processes` map is 
inserted, and this method CHECK-fails if you give it a `nullptr`.  Two other 
locations use the map's `operator[]`, but those are guarded by a mutex so they 
will never empty-initialize a map entry.

Can you note how you ran into a segfault here?  It's more likely that the 
segfault was caused by a dereferencing a non-`nullptr` process that had already 
been deallocated (which would not be fixed by this review).

- Joseph Wu


On June 20, 2017, 4:16 p.m., Jiang Yan Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60252/
> ---
> 
> (Updated June 20, 2017, 4:16 p.m.)
> 
> 
> Review request for mesos and Joseph Wu.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> We don't need and shouldn't assume pointers in `processes` are
> non-nullptr.
> 
> 
> Diffs
> -
> 
>   3rdparty/libprocess/src/process.cpp 
> 7ce6d2b13baa68906e091a95c0dd58ee1ca2bc7d 
> 
> 
> Diff: https://reviews.apache.org/r/60252/diff/1/
> 
> 
> Testing
> ---
> 
> make check.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>



Review Request 60279: Add constructor for ObjectApprover::Object.

2017-06-21 Thread Quinn Leng

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60279/
---

Review request for mesos, Anand Mazumdar, Greg Mann, and Vinod Kone.


Bugs: MESOS-7630
https://issues.apache.org/jira/browse/MESOS-7630


Repository: mesos


Description
---

Add constructor for ObjectApprover::Object.

Add constructors for ObjectApprover::Object, making the caller side cleaner.
Only one case is ignored, it's in "src/slave/slave.cpp:6709", since it's 
setting parameters after the construction.


Diffs
-

  include/mesos/authorizer/authorizer.hpp 
95cbcf3f9a962a896c9c23e961f02eb16dff36f8 
  src/common/http.cpp 2f7718cbc2e449b4f7c89754e8f84eac2f3c35b6 
  src/master/http.cpp 4dd43fd7c3fb986f4eed78bce574b6d3af156b67 
  src/slave/http.cpp cbbc1dc27cc90bac8d48cbbc84266c3d87490a3c 


Diff: https://reviews.apache.org/r/60279/diff/1/


Testing
---

Passed "make check -j48"
No specific test case related.


Thanks,

Quinn Leng



Re: Review Request 58096: Added authorization for frameworks in /roles endpoint.

2017-06-21 Thread Jay Guo


> On May 17, 2017, 4:08 p.m., Adam B wrote:
> > src/master/http.cpp
> > Lines 3524 (patched)
> > 
> >
> > `futures` is an over-vague variable name, especially since neither are 
> > Futures by this point. Can we do better?

I'm running out of vocabulary here... two very different thing to be captured 
in one word. Maybe `collection`?


- Jay


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58096/#review175213
---


On June 22, 2017, 2 a.m., Jay Guo wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58096/
> ---
> 
> (Updated June 22, 2017, 2 a.m.)
> 
> 
> Review request for mesos, Adam B, Alexander Rojas, and Benjamin Mahler.
> 
> 
> Bugs: MESOS-7260
> https://issues.apache.org/jira/browse/MESOS-7260
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> While /roles displays a list of frameworksIds that register with
> a role, it did NOT filter them based on VIEW_FRAMEWORK ACL, which
> impose a security risk. This patch fixed this issue by taking a
> frameworksApprover in `Master::Http::roles()` which is used to
> filter framework IDs.
> 
> 
> Diffs
> -
> 
>   src/master/http.cpp 4dd43fd7c3fb986f4eed78bce574b6d3af156b67 
> 
> 
> Diff: https://reviews.apache.org/r/58096/diff/8/
> 
> 
> Testing
> ---
> 
> see next patch in the chain.
> 
> 
> Thanks,
> 
> Jay Guo
> 
>



Re: Review Request 58099: Added authorization for frameworks in `GetRoles` v1 API.

2017-06-21 Thread Jay Guo


> On May 17, 2017, 4:29 p.m., Adam B wrote:
> > Seems like we're adding even more duplicate code into this v1 clone of 
> > `roles()`. Can you find a way to reduce the redundance?

OK, let me take look and may submit some follow-up patches for it.


- Jay


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58099/#review175219
---


On June 22, 2017, 2:02 a.m., Jay Guo wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58099/
> ---
> 
> (Updated June 22, 2017, 2:02 a.m.)
> 
> 
> Review request for mesos, Adam B, Alexander Rojas, and Benjamin Mahler.
> 
> 
> Bugs: MESOS-7260
> https://issues.apache.org/jira/browse/MESOS-7260
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added authorization for frameworks in `GetRoles` v1 API.
> 
> 
> Diffs
> -
> 
>   src/master/http.cpp 4dd43fd7c3fb986f4eed78bce574b6d3af156b67 
>   src/tests/api_tests.cpp cdaa72462eac1687185115771265539d8c2b09a9 
> 
> 
> Diff: https://reviews.apache.org/r/58099/diff/7/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jay Guo
> 
>



Re: Review Request 58099: Added authorization for frameworks in `GetRoles` v1 API.

2017-06-21 Thread Jay Guo

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58099/
---

(Updated June 22, 2017, 2:02 a.m.)


Review request for mesos, Adam B, Alexander Rojas, and Benjamin Mahler.


Changes
---

rebase and address comments


Bugs: MESOS-7260
https://issues.apache.org/jira/browse/MESOS-7260


Repository: mesos


Description
---

Added authorization for frameworks in `GetRoles` v1 API.


Diffs (updated)
-

  src/master/http.cpp 4dd43fd7c3fb986f4eed78bce574b6d3af156b67 
  src/tests/api_tests.cpp cdaa72462eac1687185115771265539d8c2b09a9 


Diff: https://reviews.apache.org/r/58099/diff/7/

Changes: https://reviews.apache.org/r/58099/diff/6-7/


Testing
---


Thanks,

Jay Guo



Re: Review Request 58095: Refactored `Master::Http::roles` to use `jsonify`.

2017-06-21 Thread Jay Guo

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58095/
---

(Updated June 22, 2017, 2:01 a.m.)


Review request for mesos, Adam B, Alexander Rojas, Benjamin Mahler, and Michael 
Park.


Changes
---

minor tweak


Bugs: MESOS-4732 and MESOS-7260
https://issues.apache.org/jira/browse/MESOS-4732
https://issues.apache.org/jira/browse/MESOS-7260


Repository: mesos


Description
---

Refactored `Master::Http::roles` to use `jsonify`.


Diffs (updated)
-

  src/common/http.hpp 93d6088e97c2384f9f6d26e010a501abf2deb43e 
  src/common/http.cpp 2f7718cbc2e449b4f7c89754e8f84eac2f3c35b6 
  src/master/http.cpp 4dd43fd7c3fb986f4eed78bce574b6d3af156b67 


Diff: https://reviews.apache.org/r/58095/diff/9/

Changes: https://reviews.apache.org/r/58095/diff/8-9/


Testing
---

no functional changes
make check


Thanks,

Jay Guo



Re: Review Request 58097: Added a test to check framework filtering in /roles endpoint.

2017-06-21 Thread Jay Guo

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58097/
---

(Updated June 22, 2017, 2:01 a.m.)


Review request for mesos, Adam B, Alexander Rojas, and Benjamin Mahler.


Changes
---

rebase and address comments


Bugs: MESOS-7260
https://issues.apache.org/jira/browse/MESOS-7260


Repository: mesos


Description
---

Added a test to check framework filtering in /roles endpoint.


Diffs (updated)
-

  src/tests/master_authorization_tests.cpp 
8d54472d8a31956e6e8bb5fe7ffbf47dc793c0bc 


Diff: https://reviews.apache.org/r/58097/diff/7/

Changes: https://reviews.apache.org/r/58097/diff/6-7/


Testing
---

make check


Thanks,

Jay Guo



Re: Review Request 58096: Added authorization for frameworks in /roles endpoint.

2017-06-21 Thread Jay Guo

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58096/
---

(Updated June 22, 2017, 2 a.m.)


Review request for mesos, Adam B, Alexander Rojas, and Benjamin Mahler.


Changes
---

rebase & address comments


Bugs: MESOS-7260
https://issues.apache.org/jira/browse/MESOS-7260


Repository: mesos


Description
---

While /roles displays a list of frameworksIds that register with
a role, it did NOT filter them based on VIEW_FRAMEWORK ACL, which
impose a security risk. This patch fixed this issue by taking a
frameworksApprover in `Master::Http::roles()` which is used to
filter framework IDs.


Diffs (updated)
-

  src/master/http.cpp 4dd43fd7c3fb986f4eed78bce574b6d3af156b67 


Diff: https://reviews.apache.org/r/58096/diff/8/

Changes: https://reviews.apache.org/r/58096/diff/7-8/


Testing
---

see next patch in the chain.


Thanks,

Jay Guo



Re: Review Request 58095: Refactored `Master::Http::roles` to use `jsonify`.

2017-06-21 Thread Jay Guo

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58095/
---

(Updated June 22, 2017, 1:57 a.m.)


Review request for mesos, Adam B, Alexander Rojas, Benjamin Mahler, and Michael 
Park.


Changes
---

rebase and address comments


Bugs: MESOS-4732 and MESOS-7260
https://issues.apache.org/jira/browse/MESOS-4732
https://issues.apache.org/jira/browse/MESOS-7260


Repository: mesos


Description
---

Refactored `Master::Http::roles` to use `jsonify`.


Diffs (updated)
-

  src/common/http.hpp 93d6088e97c2384f9f6d26e010a501abf2deb43e 
  src/common/http.cpp 2f7718cbc2e449b4f7c89754e8f84eac2f3c35b6 
  src/master/http.cpp 4dd43fd7c3fb986f4eed78bce574b6d3af156b67 


Diff: https://reviews.apache.org/r/58095/diff/8/

Changes: https://reviews.apache.org/r/58095/diff/7-8/


Testing
---

no functional changes
make check


Thanks,

Jay Guo



Re: Review Request 59294: Optionally scale egress bandwidth with CPU.

2017-06-21 Thread Ian Downes

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59294/
---

(Updated June 21, 2017, 10:27 a.m.)


Review request for mesos, Dmitry Zhuk, Ilya Pronin, and Jie Yu.


Changes
---

Use HTB class introduced in review 60203 to avoid shelling out. Included 
support for setting burst rate (ceil) and bucket size (cburst).
This is intentionally still in the model of an egress qdisc inside each 
container.


Bugs: MESOS-7508
https://issues.apache.org/jira/browse/MESOS-7508


Repository: mesos


Description
---

Add support to isolators/port_mapping for optionally scaling egress bandwidth 
with CPU and with minimum and maximum limits.


Diffs (updated)
-

  src/slave/containerizer/mesos/isolators/network/helper.cpp 
4ed879dca42f85fc9dd7638e763822cf10fa8405 
  src/slave/containerizer/mesos/isolators/network/port_mapping.hpp 
9d38289c7161d5e931053b587d115684ccc44c94 
  src/slave/containerizer/mesos/isolators/network/port_mapping.cpp 
cd008aaebcd42554a9a81d2b059269546f59c966 
  src/slave/flags.hpp e75c1b4227b443aedf445921b3f2108d930c112c 
  src/slave/flags.cpp c84aa6724170bba46bbe8410b71d42a1626e 
  src/tests/containerizer/port_mapping_tests.cpp 
16e015a8ac53a4aa5336b60c40228720b5f6910a 


Diff: https://reviews.apache.org/r/59294/diff/3/

Changes: https://reviews.apache.org/r/59294/diff/2-3/


Testing
---

# added a new test 
$ make check


Thanks,

Ian Downes



Re: Review Request 59746: Separated discarded and failed cases for container launch.

2017-06-21 Thread Alexander Rukletsov


> On June 2, 2017, 4:37 p.m., Jie Yu wrote:
> > src/slave/slave.cpp
> > Line 5147 (original), 5147 (patched)
> > 
> >
> > Can you explain to me in what scenario, the `future` will be in 
> > DISCARDED state? who discard the promise associated with this future?
> 
> Alexander Rukletsov wrote:
> Sure. Consider docker containerizer.
> 
> 1) During container launch, docker containerizer calls `pull()`: 
> https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L1238
> 2) The container enters `PULLING` state: 
> https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L435
> 3) While the image is being pulled by docker, future 
> `containers_[containerId]->pull` is returned from `pull()`: 
> https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L446
> 4) This future is part of the `.then` chain returned from `_launch()`: 
> https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L1269
> 5) Now while docker is pulling, `destroy()` is called, which discards the 
> "pulling future": 
> https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L2126-L2128
> 6) But discarding that future is propagated up the chain: 
> https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/3rdparty/libprocess/include/process/future.hpp#L1410-L1411
> 7) Which triggers the `onAny` callback attached to launch: 
> https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/slave.cpp#L2800-L2810
> 8) Which in turn gives us discarded future treated as launch error: 
> https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/slave.cpp#L5147-L5152
> 
> Jie Yu wrote:
> 
> https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L2126-L2128
> 
> This discards the future, but not necessarily transition the future to 
> DISCARDED state. That's the reason we have `hasDiscard` and `isDiscarded` 
> methods for Future becaue they means different things. Can you point to me 
> where the promise associated with this future is actually being transitioned 
> into DISCARDED state?

Sure. In this case, we discard pulling in case client discarded the future: 
https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/docker/docker.cpp#L1512

Additionally, I've manually reproduced the issue 
(https://issues.apache.org/jira/browse/MESOS-7601)
```
./src/mesos-execute --master=192.99.40.208:5050 --containerizer=docker 
--docker_image=ubuntu:16.04 --name=pull-test --command="sleep 1000"
```
aborted right after the start when docker was pulling the image yielded the 
following verbose agent log:
```
I0621 12:59:22.271728 28980 fetcher.cpp:324] Starting to fetch URIs for 
container: e2227d2f-fb6e-4fba-b6b6-528d2da7b276, directory: 
/tmp/a/slaves/f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-S0/frameworks/f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003/executors/pull-test/runs/e2227d2f-fb6e-4fba-b6b6-528d2da7b276
I0621 12:59:22.272665 28989 docker.cpp:1352] Running docker -H 
unix:///var/run/docker.sock inspect ubuntu:16.04
I0621 12:59:22.420902 28990 docker.cpp:1426] Running docker -H 
unix:///var/run/docker.sock pull ubuntu:16.04
I0621 12:59:23.070950 28980 slave.cpp:3130] Asked to shut down framework 
f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003 by master@192.99.40.208:5050
I0621 12:59:23.071007 28980 slave.cpp:3155] Shutting down framework 
f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003
I0621 12:59:23.071146 28980 slave.cpp:5625] Shutting down executor 'pull-test' 
of framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003
W0621 12:59:23.071171 28980 slave.hpp:732] Unable to send event to executor 
'pull-test' of framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003: unknown 
connection type
I0621 12:59:28.072532 28984 slave.cpp:5698] Killing executor 'pull-test' of 
framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003
I0621 12:59:28.072849 28985 docker.cpp:2125] Destroying container 
e2227d2f-fb6e-4fba-b6b6-528d2da7b276 in PULLING state
I0621 12:59:28.073074 28985 docker.cpp:149] 'docker -H 
unix:///var/run/docker.sock pull ubuntu:16.04' is being discarded
E0621 12:59:28.150388 28981 slave.cpp:5183] Container 
'e2227d2f-fb6e-4fba-b6b6-528d2da7b276' for executor 'pull-test' of framework 
f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003 failed to start: future discarded
E0621 12:59:28.150698 28978 slave.cpp:5290] Termination of executor 'pull-test' 
of framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003 failed: unknown container
W0621 12:59:28.150737 28985 composing.cpp:569] Attempted to destroy unknown 
container e2227d2f-fb6e-

Re: Review Request 60103: Changed variable name _ack to statusUpdateAck.

2017-06-21 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60103/#review178509
---


Ship it!




In the future please always keep the review title up to date and fill out the 
testing done section, even if it's (e.g.,) "make check together with /r/56895/".

- Jiang Yan Xu


On June 20, 2017, 8:41 a.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60103/
> ---
> 
> (Updated June 20, 2017, 8:41 a.m.)
> 
> 
> Review request for mesos, Neil Conway, Vinod Kone, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6223
> https://issues.apache.org/jira/browse/MESOS-6223
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Changed variable name _ack to statusUpdateAck which
> correctly identifies with the purpose of this variable.
> 
> 
> Diffs
> -
> 
>   src/tests/slave_recovery_tests.cpp 38502584186793686f78ff5f4e03f36a3bf7ad1c 
> 
> 
> Diff: https://reviews.apache.org/r/60103/diff/4/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 60105: Added helper method recoverSlaveState.

2017-06-21 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60105/#review178388
---



PTAL at these comments but I realize for this review it's probably easier if we 
sync on the high-level strategy first.


src/slave/slave.cpp
Lines 5798-5867 (original), 5798-5867 (patched)


So all of this work is only useful for recovering frameworks, it looks like 
we don't need to do them before we decide we actually are going to recover the 
frameworks (i.e., not continue as a new agent)?

This reasoning certainly also applies to the current HEAD but since we are 
donig refactor here, we should probably address this (in a separate preceding 
review perhaps)



src/slave/slave.cpp
Lines 5945 (patched)


An empty blank line between multi-line blocks of code like this. Here and 
elsewhere.



src/slave/slave.cpp
Lines 5945-5946 (patched)


This comment seems to be misplaced, why does the code here cares? It just 
processes the `slaveState` which could be optional regardless of what we have 
done for the reboot case.



src/slave/slave.cpp
Lines 5959 (patched)


Open `{` on the next line.



src/slave/slave.cpp
Line 5941 (original), 5960 (patched)


The indentation of the whole methods seems off?



src/slave/slave.cpp
Line 5950 (original), 5969-5974 (patched)


As suggested by related comments, we are not removing the symlink here so 
we shouldn't need to comment about it here. 

"This is being done ...": here the comment describes the low-level 
mechanics but perhaps we need a high level comment about "this" instead, which 
is that:

Prior to Mesos 1.4 we bypass the state recovery and start as a new agent 
upon reboot (introduced in MESOS-844); starting in Mesos 1.4 we'll attempt to 
recover the slave state even after reboot (so we don't discard the agent ID 
unnecessarily) but in case of SlaveInfo mismatch we'll still continue as a new 
agent so we don't introduce a regression for the reboot scenario.

We can go on to explain the rationale: agent resource and attribute changes 
are more often associated with host reboots and we don't want to agent to flap 
in such cases.

But it's fine if we focus on improving the comments after we first get the 
code rigth.



src/slave/slave.cpp
Lines 5975 (patched)


So at this point you still have the `slaveState` object and it has a 
`rebooted` field, why do we need another `rebooted` field in `recoveryInfo` if 
this is the only place it's used?



src/slave/slave.cpp
Lines 5976-5983 (patched)


We shouldn't need to construct the same message twice. If the same message 
can be used either for LOG(INFO) or Failure then we can construct the message 
in a single place first and decide how to use it / whether to add to it.



src/slave/slave.cpp
Lines 5987-5989 (patched)


You don't need to do this here. If you continue as a new agent, it's going 
to update the latest symlink when it's registered:


https://github.com/apache/mesos/blob/3bea3fcb4eef00ce469ba4f27fcfd2d3eec05aea/src/slave/paths.cpp#L621-L622



src/slave/slave.cpp
Lines 5992 (patched)


Would a None() not work?

Also, I know I originally suggested a helper method but given how it looks 
right now, perhaps it's less of a hassle if we keep it inline?

Even if we keep it as a helper, as a one-use helper whose semantics is so 
tightly coupled with the internals of `Slave::recover()`, perhaps it could just 
be done as a lambda.

```
Try adjustSlaveState = [&slaveState]() { ... }();
```

Let's try to inline it first?



src/slave/slave.cpp
Lines 5993-5995 (patched)


You have to clear it here because it's assigned above prematurely. As the 
comment above suggests, it's a hack for comparing the slave infos.

Pehaps we should just make a copy of the SlaveInfo above for comparision. 
If we are going to continue as a new agent, we are just **not** going to 
execute `info = slaveState->info.get();` below?



src/slave/slave.cpp
Line 5965 (original)


Why remove it?



src/slave/slave.cpp
Lines 6122-6125 (original)


Why remove it?


- Jiang Yan Xu


On June 15, 2017, 12:31 p.m., Megha Sharma

<    1   2