Re: Review Request 51299: Fixed memory leak in master during framework teardown.

2016-08-23 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51299/#review146616
---



Patch looks great!

Reviews applied: [51299]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker_build.sh

- Mesos ReviewBot


On Aug. 23, 2016, 10:49 p.m., Joseph Wu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51299/
> ---
> 
> (Updated Aug. 23, 2016, 10:49 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Greg Mann, and Artem Harutyunyan.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The `process::http::Pipe` class leaks its underlying `shared_ptr`
> due to how the master (accidentally) causes the `shared_ptr` to hold
> a self-reference.
> 
> When the master receives a `SUBSCRIBE` call from an V1 HTTP API
> framework, the master creates a `process::http::Pipe` to manage the
> reading and writing in separate locations in the code.  For 
> synchronization, the read/write ends of the HTTP connection share
> some state (via `shared_ptr`).
> 
> The master introduces a self-reference via this line in
> `Master::addFramework`:
> ```
>   http.closed()
> .onAny(defer(self(), &Self::exited, framework->id(), http));
> ```
> 
> `http.closed()` returns a future managed by the read-end of the `Pipe`.
> `http` holds a copy of the write-end of the `Pipe`, which in turn has
> a copy of the `shared_ptr`.  Because `http` is passed into the future's 
> continuation, a copy of `http` will live inside the read-end's future 
> until the either (a) the continuation is executed or (b) the future 
> is destructed.
> 
> Due to how we manage the future, neither (a) nor (b) are performed:
> (a) When the read-end of the `Pipe` is closed, we only complete the
> future if the write-end of the `Pipe` is still open.  During
> framework teardown, the write-end is closed first.
> (b) The future in question lives inside a `Promise`, which lives
> inside the `shared_ptr`.  Because the self-reference exists, the
> `shared_ptr` is never destructed; which means the `Promise` and
> future are never destructed either.
> 
> ---
> 
> This patch fixes the leak by making sure the continuation is always
> executed (a) or discarded.  It does so by discarding the related 
> future when the write-end of the `Pipe` is already closed.
> 
> 
> Diffs
> -
> 
>   3rdparty/libprocess/src/http.cpp 298bd460afdeccd18e201d8a505b961fd4cf3d3e 
>   3rdparty/libprocess/src/tests/http_tests.cpp 
> 24b266df5f17e28e0c95326f5d1ea451952500e8 
> 
> Diff: https://reviews.apache.org/r/51299/diff/
> 
> 
> Testing
> ---
> 
> Found this leak in collaboration with Greg :)
> 
> Ran valgrind before applying this patch:
> ```
> LD_RUN_PATH=/path/to/mesos/build/src/.libs 
> LD_LIBRARY_PATH=/path/to/mesos/build/src/.libs valgrind --leak-check=full 
> src/.libs/mesos-tests --gtest_filter="*SchedulerTest.Teardown*"
> ```
> 
> Observed the following leak (among many false positives):
> ```
> 1,881 (216 direct, 1,665 indirect) bytes in 1 blocks are definitely lost in 
> loss record 2,405 of 2,507
>at 0x4C2A105: operator new(unsigned long) (in 
> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>by 0xF259A9: process::http::Pipe::Pipe() (http.hpp:414)
>by 0x5D629A1: 
> mesos::internal::master::Master::Http::scheduler(process::http::Request 
> const&, Option const&) const (http.cpp:796)
>by 0x5DB5806: operator() (master.cpp:858)
>...
> ```
> 
> Applied the patch and re-ran valgrind.  Confirmed that leak is gone.
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>



Re: Review Request 51342: Printed out the received stream ID in scheduler API.

2016-08-23 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51342/#review146615
---



Patch looks great!

Reviews applied: [51342]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker_build.sh

- Mesos ReviewBot


On Aug. 23, 2016, 8:29 p.m., Abhishek Dasgupta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51342/
> ---
> 
> (Updated Aug. 23, 2016, 8:29 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Vinod Kone, and Zameer Manji.
> 
> 
> Bugs: MESOS-6041
> https://issues.apache.org/jira/browse/MESOS-6041
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch prints out the received stream ID in error output
> caused by Mesos-Stream-Id mismatch in calls to
> scheduler http api.
> Expected stream ID is not printed in error output as
> it may cause security leak.
> 
> 
> Diffs
> -
> 
>   src/master/http.cpp 52dd80b856cf2317c0b73ba54bf501696786088d 
> 
> Diff: https://reviews.apache.org/r/51342/diff/
> 
> 
> Testing
> ---
> 
> On Ubuntu 16.04:
> sudo make -j4 check
> 
> Manual testing as well.
> Steps to reproduce:
> 1. Run mesos master.
> 2. Run Mesos Slave.
> 3. Subscribe a framework with the command : curl -v  -H  "Accept: 
> application/json" -H "Content-Type: application/json" -d "@subscribe.json" 
> http://127.0.0.1:5050/api/v1/scheduler
> 
> subscribe.json :
> {
>"type"   : "SUBSCRIBE",
>"subscribe"  : {
>   "framework_info"  : {
> "user" :  "bar",
> "name" :  "Example HTTP Framework1"
>   }
>   }
> }
> 4. Accept an offer by following command : curl -v  -H  "Accept: 
> application/json" -H "Content-Type: application/json" -H "Mesos-Stream-Id: 
> 000a38b7-8a2a-4c8f-a374-f782d6ea617f" -d "@accept.json" 
> http://127.0.0.1:5050/api/v1/scheduler
> 
> accept.json:
> {
>   "framework_id": {
>   "value": "ee29ca2a-c56b-4a24-bba1-4919afae813e-"
>   },
>   "type": "ACCEPT",
>   "accept": {
>   "offer_ids": [{
>   "value": "369108e7-2bff-4a54-aca0-4f63b11a9361-O0"
>   }],
>   "operations": [{
>   "type": "LAUNCH",
>   "launch": {
>   "task_infos": [{
>   "name": "My Task",
>   "task_id": {
>   "value": 
> "12220-3440-12532-my-task"
>   },
>   "agent_id": {
>   "value": 
> "94d32bba-2bb4-461f-bd96-30d155fe5965-S0"
>   },
>   "executor": {
>   "command": {
>   "shell": true,
>   "value": "sleep 1000"
>   },
>   "executor_id": {
>   "value": 
> "12214-23523-my-executor"
>   }
>   },
>   "resources": [{
>   "name": "cpus",
>   "role": "*",
>   "type": "SCALAR",
>   "scalar": {
>   "value": 1.0
>   }
>   }, {
>   "name": "mem",
>   "role": "*",
>   "type": "SCALAR",
>   "scalar": {
>   "value": 128.0
>   }
>   }]
>   }]
>   }
>   }],
>   "filters": {
>   "refuse_seconds": 5.0
>   }
>   }
> }
> 
> Note: Mesos-Stream-Id passed through header in this curl command is left 
> mismatched to that received from the response to subscribe call.
> 5. Following error output is received as response: 
>The stream ID '000a38b7-8a2a-4c8f-a374-f782d6ea617f' included in this 
> re

Re: Review Request 51294: Removed unnecessary statements in `CgroupsIsolatorProcess::_watch`.

2016-08-23 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51294/#review146609
---



Patch looks great!

Reviews applied: [51293, 51294]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker_build.sh

- Mesos ReviewBot


On Aug. 23, 2016, 1:59 p.m., haosdent huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51294/
> ---
> 
> (Updated Aug. 23, 2016, 1:59 p.m.)
> 
> 
> Review request for mesos, Gilbert Song, Jie Yu, and Qian Zhang.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Removed unnecessary statements in `CgroupsIsolatorProcess::_watch`.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp 
> ed9cff2f4879a7cb89f5b928fce379b663bcdcfb 
> 
> Diff: https://reviews.apache.org/r/51294/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> haosdent huang
> 
>



Re: Review Request 51293: Fixed a typo in the comment of `Subsystem::watch`.

2016-08-23 Thread Qian Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51293/#review146608
---


Ship it!




Ship It!

- Qian Zhang


On Aug. 23, 2016, 9:58 p.m., haosdent huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51293/
> ---
> 
> (Updated Aug. 23, 2016, 9:58 p.m.)
> 
> 
> Review request for mesos, Gilbert Song, Jie Yu, and Qian Zhang.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fixed a typo in the comment of `Subsystem::watch`.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystem.hpp 
> 6dca38cfdd360288c03a7a01aebbf15e3cbc5dce 
> 
> Diff: https://reviews.apache.org/r/51293/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> haosdent huang
> 
>



Review Request 51358: Implemented recursive helper method findContainerDir for provisioner.

2016-08-23 Thread Gilbert Song

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51358/
---

Review request for mesos, Benjamin Hindman, Artem Harutyunyan, Jie Yu, and 
Kevin Klues.


Bugs: MESOS-6067
https://issues.apache.org/jira/browse/MESOS-6067


Repository: mesos


Description
---

Implemented recursive helper method findContainerDir for provisioner.


Diffs
-

  src/slave/containerizer/mesos/provisioner/paths.hpp 
9829d6b52c8547ae22297a5bc47852ce5a219e4c 
  src/slave/containerizer/mesos/provisioner/paths.cpp 
86a45f30c22dc1e41f4779c0ce8c11d02dcc46bb 

Diff: https://reviews.apache.org/r/51358/diff/


Testing
---

make check


Thanks,

Gilbert Song



Review Request 51359: Added unit test for provisioner helper findContainerDir.

2016-08-23 Thread Gilbert Song

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51359/
---

Review request for mesos, Benjamin Hindman, Artem Harutyunyan, Jie Yu, and 
Kevin Klues.


Bugs: MESOS-6067
https://issues.apache.org/jira/browse/MESOS-6067


Repository: mesos


Description
---

Added unit test for provisioner helper findContainerDir.


Diffs
-

  src/Makefile.am 8dc4175c60e4a9776ddb8ad21774fa4b30c28d00 
  src/tests/containerizer/provisioner_paths_tests.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/51359/diff/


Testing
---

make check


Thanks,

Gilbert Song



Re: Review Request 51317: Documented a bug with the use of `pendingTasks` in the master.

2016-08-23 Thread Guangya Liu


> On 八月 23, 2016, 8:45 a.m., Guangya Liu wrote:
> > src/master/master.cpp, line 3366
> > 
> >
> > I saw that this only occur if the task authorization failed for now, 
> > shall we highlight this in the comments here?
> 
> Benjamin Mahler wrote:
> Hm.. it can occur if authorization fails or if validation fails, why did 
> you mention only authorization failure?
> 
> We could update the code to avoid sending TASK_ERROR REASON_UNAUTHORIZED 
> when the task is not in the `pendingTasks` map, since we know that it is 
> either killed or invalid, but this seemed not worth it since it only half-way 
> fixes the issue.
> 
> Guangya Liu wrote:
> Yes, but in `_accept`, we only send out `TASK_ERROR` when authorization 
> failed but did not handle the case if `validation failed`, what about put 
> `validation failed` in `TODO`?

We already have a `TODO` in `_accept`, I think it is good enough to handle the 
question here. Thanks.


- Guangya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51317/#review146492
---


On 八月 23, 2016, 5:11 a.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51317/
> ---
> 
> (Updated 八月 23, 2016, 5:11 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The use of `pendingTasks` cannot distinguish between a duplicate
> TaskID and a task that has been killed while pending. This means
> that if an invalid or unauthorized task is killed while pending,
> TASK_KILLED is sent, and once in Master::_accept, we will also
> send TASK_ERROR.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp d94a8510c4cee9c010706f79caf27ef4a10b41a8 
> 
> Diff: https://reviews.apache.org/r/51317/diff/
> 
> 
> Testing
> ---
> 
> N/A
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Re: Review Request 51317: Documented a bug with the use of `pendingTasks` in the master.

2016-08-23 Thread Guangya Liu


> On 八月 23, 2016, 8:45 a.m., Guangya Liu wrote:
> > src/master/master.cpp, line 3362
> > 
> >
> > The current logic is that the task with duplicate ID will be ignored 
> > and will not be launched, so which task will trigger and where do we send 
> > out the TASK_ERROR message?
> > 
> > I saw that the TASK_ERROR message will only be sent out if the task is 
> > killed in `pendingTasks` and invalid.
> 
> Benjamin Mahler wrote:
> Hm.. I'm not sure I follow your comment. Maybe this will clarify things:
> 
> There are two reasons a task will not be contained in `pendingTasks` when 
> we arrive in `_accept()`:
> 
> 1. The task was killed while in `pendingTasks`.
> 2. The task had a duplicate TaskID with another task in `pendingTasks`.
> 
> So if a task is missing in `pendingTasks` we still have to validate it to 
> catch (2). This means we could have sent TASK_ERROR after TASK_KILLED if the 
> task was killed while in `pendingTasks` but is invalid or unauthorized.
> 
> There are a few ways to fix this, but they're all a bit clunky so I 
> decided on just documenting this a bit better for now.

Yes, I think that we are on the same page, and the only comment is shall we 
update the comment a bit for `(and TASK_ERROR will be sent).`, probably update 
this as `(and TASK_ERROR will be sent after TASK_KILLED if the task was killed 
while in pendingTasks but is invalid or unauthorized).` ?


> On 八月 23, 2016, 8:45 a.m., Guangya Liu wrote:
> > src/master/master.cpp, line 3366
> > 
> >
> > I saw that this only occur if the task authorization failed for now, 
> > shall we highlight this in the comments here?
> 
> Benjamin Mahler wrote:
> Hm.. it can occur if authorization fails or if validation fails, why did 
> you mention only authorization failure?
> 
> We could update the code to avoid sending TASK_ERROR REASON_UNAUTHORIZED 
> when the task is not in the `pendingTasks` map, since we know that it is 
> either killed or invalid, but this seemed not worth it since it only half-way 
> fixes the issue.

Yes, but in `_accept`, we only send out `TASK_ERROR` when authorization failed 
but did not handle the case if `validation failed`, what about put `validation 
failed` in `TODO`?


> On 八月 23, 2016, 8:45 a.m., Guangya Liu wrote:
> > src/master/master.cpp, lines 3765-3767
> > 
> >
> > What about put more detail around line 3807 for `invalid` task case by 
> > adding a `TODO`?
> > 
> > ```
> > // TODO(bmahler): Validate the task. We may send TASK_ERROR
> > // after a TASK_KILLED if a task was killed (removed from
> > // `pendingTasks`) *and* the task is invalid here.
> > ```
> 
> Benjamin Mahler wrote:
> Hm.. this looks like a duplicate of the TODO here? Are you suggesting 
> moving it down? Since it covers both authorization and validation I opted to 
> keep it up higher.

I meant split the comments to two: one for `authorization failed` here and the 
other for `validation failed` in #3807, but I do not have strong intention on 
this as long as we can comment this clearly.


- Guangya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51317/#review146492
---


On 八月 23, 2016, 5:11 a.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51317/
> ---
> 
> (Updated 八月 23, 2016, 5:11 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The use of `pendingTasks` cannot distinguish between a duplicate
> TaskID and a task that has been killed while pending. This means
> that if an invalid or unauthorized task is killed while pending,
> TASK_KILLED is sent, and once in Master::_accept, we will also
> send TASK_ERROR.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp d94a8510c4cee9c010706f79caf27ef4a10b41a8 
> 
> Diff: https://reviews.apache.org/r/51317/diff/
> 
> 
> Testing
> ---
> 
> N/A
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Re: Review Request 51325: Removed unused function `os::dirent_size`.

2016-08-23 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51325/#review146601
---



Patch looks great!

Reviews applied: [51324, 51325]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker_build.sh

- Mesos ReviewBot


On Aug. 23, 2016, 9:41 a.m., Neil Conway wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51325/
> ---
> 
> (Updated Aug. 23, 2016, 9:41 a.m.)
> 
> 
> Review request for mesos and Alexander Rukletsov.
> 
> 
> Bugs: MESOS-6013
> https://issues.apache.org/jira/browse/MESOS-6013
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Removed unused function `os::dirent_size`.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/Makefile.am 1f2ee85846342b3f5d1c4f8dafda6c763f6ecdfa 
>   3rdparty/stout/include/stout/os/direntsize.hpp 
> 819f99a89862491e99873bdedc603317b91266b0 
>   3rdparty/stout/include/stout/os/posix/direntsize.hpp 
> 9d8f72eb607a288e77f92b39b91542ff5eb2fa21 
>   3rdparty/stout/include/stout/os/windows/direntsize.hpp 
> 7c8c7a06f478b3a80341a874494cff21f71fc397 
> 
> Diff: https://reviews.apache.org/r/51325/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Neil Conway
> 
>



Re: Review Request 51317: Documented a bug with the use of `pendingTasks` in the master.

2016-08-23 Thread Benjamin Mahler


> On Aug. 23, 2016, 8:45 a.m., Guangya Liu wrote:
> > src/master/master.cpp, line 3362
> > 
> >
> > The current logic is that the task with duplicate ID will be ignored 
> > and will not be launched, so which task will trigger and where do we send 
> > out the TASK_ERROR message?
> > 
> > I saw that the TASK_ERROR message will only be sent out if the task is 
> > killed in `pendingTasks` and invalid.

Hm.. I'm not sure I follow your comment. Maybe this will clarify things:

There are two reasons a task will not be contained in `pendingTasks` when we 
arrive in `_accept()`:

1. The task was killed while in `pendingTasks`.
2. The task had a duplicate TaskID with another task in `pendingTasks`.

So if a task is missing in `pendingTasks` we still have to validate it to catch 
(2). This means we could have sent TASK_ERROR after TASK_KILLED if the task was 
killed while in `pendingTasks` but is invalid or unauthorized.

There are a few ways to fix this, but they're all a bit clunky so I decided on 
just documenting this a bit better for now.


> On Aug. 23, 2016, 8:45 a.m., Guangya Liu wrote:
> > src/master/master.cpp, line 3364
> > 
> >
> > Just a question here: Why do you want to distinguish the difference 
> > between duplicate TaskID and getting killed while pending? What is the 
> > relatinonship of those two?

See my comment above about the 2 cases, hopefully that clarifies things?


> On Aug. 23, 2016, 8:45 a.m., Guangya Liu wrote:
> > src/master/master.cpp, line 3366
> > 
> >
> > I saw that this only occur if the task authorization failed for now, 
> > shall we highlight this in the comments here?

Hm.. it can occur if authorization fails or if validation fails, why did you 
mention only authorization failure?

We could update the code to avoid sending TASK_ERROR REASON_UNAUTHORIZED when 
the task is not in the `pendingTasks` map, since we know that it is either 
killed or invalid, but this seemed not worth it since it only half-way fixes 
the issue.


> On Aug. 23, 2016, 8:45 a.m., Guangya Liu wrote:
> > src/master/master.cpp, lines 3765-3767
> > 
> >
> > What about put more detail around line 3807 for `invalid` task case by 
> > adding a `TODO`?
> > 
> > ```
> > // TODO(bmahler): Validate the task. We may send TASK_ERROR
> > // after a TASK_KILLED if a task was killed (removed from
> > // `pendingTasks`) *and* the task is invalid here.
> > ```

Hm.. this looks like a duplicate of the TODO here? Are you suggesting moving it 
down? Since it covers both authorization and validation I opted to keep it up 
higher.


> On Aug. 23, 2016, 8:45 a.m., Guangya Liu wrote:
> > src/master/master.cpp, line 3767
> > 
> >
> > Do we need to put `unauthorized` in `TODO`? I saw it aws already 
> > handled in line 3774 and 3793.

Yes, we may send TASK_ERROR for an unauthorized task after a TASK_KILLED was 
already sent, much like the validation case.


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51317/#review146492
---


On Aug. 23, 2016, 5:11 a.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51317/
> ---
> 
> (Updated Aug. 23, 2016, 5:11 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The use of `pendingTasks` cannot distinguish between a duplicate
> TaskID and a task that has been killed while pending. This means
> that if an invalid or unauthorized task is killed while pending,
> TASK_KILLED is sent, and once in Master::_accept, we will also
> send TASK_ERROR.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp d94a8510c4cee9c010706f79caf27ef4a10b41a8 
> 
> Diff: https://reviews.apache.org/r/51317/diff/
> 
> 
> Testing
> ---
> 
> N/A
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Re: Review Request 51321: Added a test to ensure the master handles launching task groups.

2016-08-23 Thread Guangya Liu


> On 八月 24, 2016, 12:41 a.m., Vinod Kone wrote:
> > src/tests/scheduler_tests.cpp, line 548
> > 
> >
> > ASSERT_EQ(runTaskGroupMessage->task_group(), 2);

I think that we are always putting the `expected value` as the first parameter, 
so would expect the following:

```
ASSERT_EQ(2, runTaskGroupMessage->task_group().tasks().size());
```


- Guangya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51321/#review146597
---


On 八月 23, 2016, 5:11 a.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51321/
> ---
> 
> (Updated 八月 23, 2016, 5:11 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-6045
> https://issues.apache.org/jira/browse/MESOS-6045
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> For now this test ensures the message is sent to the agent
> in the successful launch path. More tests will be added to
> test the all-or-nothing paths (killed, invalid, unauthorized).
> 
> 
> Diffs
> -
> 
>   src/tests/scheduler_tests.cpp ccd0f2f602a7a1a9a44b68fd0f59bdc8e0fa58b1 
> 
> Diff: https://reviews.apache.org/r/51321/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Re: Review Request 50523: Updated docker recovery to account for GPU resources.

2016-08-23 Thread Rajat Phull


> On Aug. 21, 2016, 3 p.m., Guangya Liu wrote:
> > src/slave/containerizer/docker.cpp, line 1021
> > 
> >
> > I'd prefer
> > 
> > ```
> > if (!deviceInspect.isSome()) {
> >   return Nothing();
> > }
> > 
> > other logic here...
> > 
> > ```

The method parseInspectDevices and deviceInspect.isSome will be dropped as the 
other commit gets squashed into this one.


> On Aug. 21, 2016, 3 p.m., Guangya Liu wrote:
> > src/slave/containerizer/docker.cpp, line 1065
> > 
> >
> > kill this

The method parseInspectDevices will be dropped as the other commit gets 
squashed into this commit.


> On Aug. 21, 2016, 3 p.m., Guangya Liu wrote:
> > src/slave/containerizer/docker.cpp, lines 1065-1099
> > 
> >
> > two spaces is enough

This method will be dropped as the commits are squashed


> On Aug. 21, 2016, 3 p.m., Guangya Liu wrote:
> > src/slave/containerizer/docker.cpp, line 1079
> > 
> >
> > s/Option>/Option>

This method will be dropped as the commits are squashed


> On Aug. 21, 2016, 3 p.m., Guangya Liu wrote:
> > src/slave/containerizer/docker.cpp, lines 1081-1082
> > 
> >
> > new line here

This method will be dropped as the commits are squashed


> On Aug. 21, 2016, 3 p.m., Guangya Liu wrote:
> > src/slave/containerizer/docker.cpp, lines 1081-1082
> > 
> >
> > new line

This method will be dropped as the commits are squashed


> On Aug. 21, 2016, 3 p.m., Guangya Liu wrote:
> > src/slave/containerizer/docker.cpp, line 1082
> > 
> >
> > I'd prefer
> > 
> > ```
> > if (!deviceJson.isSome()) {
> >   return None();
> > }
> > 
> > other logic...
> > ```

This method will be dropped as the commits are squashed


> On Aug. 21, 2016, 3 p.m., Guangya Liu wrote:
> > src/slave/containerizer/docker.cpp, line 1084
> > 
> >
> > s/std::vector/vector
> > 
> > s/.get()./->

This method will be dropped as the commits are squashed


> On Aug. 21, 2016, 3 p.m., Guangya Liu wrote:
> > src/slave/containerizer/docker.cpp, line 1085
> > 
> >
> > I'd prefer
> > 
> > ```
> > if (values.size() == 0) {
> >   return None();
> > }
> > ```

This method will be dropped as the commits are squashed


> On Aug. 21, 2016, 3 p.m., Guangya Liu wrote:
> > src/slave/containerizer/docker.cpp, lines 1088-1089
> > 
> >
> > ```
> > Result devicePath =
> >   value.as().find("PathOnHost");
> > ```

This method will be dropped as the commits are squashed


> On Aug. 21, 2016, 3 p.m., Guangya Liu wrote:
> > src/slave/containerizer/docker.cpp, lines 1089-1090
> > 
> >
> > new line

This method will be dropped as the commits are squashed


> On Aug. 21, 2016, 3 p.m., Guangya Liu wrote:
> > src/slave/containerizer/docker.cpp, line 1092
> > 
> >
> > s/.get()./->

This method will be dropped as the commits are squashed


- Rajat


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50523/#review146261
---


On Aug. 24, 2016, 12:56 a.m., Rajat Phull wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50523/
> ---
> 
> (Updated Aug. 24, 2016, 12:56 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Kevin Klues, Yubo Li, and Vikrama 
> Ditya.
> 
> 
> Bugs: MESOS-5795
> https://issues.apache.org/jira/browse/MESOS-5795
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Updated docker recovery to account for GPU resources.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/docker.hpp f2a06065cf99fed934c2c1ffc47461ec8a97f50d 
>   src/slave/containerizer/docker.cpp 5c1ee8e467d1c54c60b67dc5275ef71e1bb90723 
>   src/tests/containerizer/docker_containerizer_tests.cpp 
> 42d4364b6fcbc94c7852721511001c103cb5a90d 
> 
> Diff: https://reviews.apache.org/r/50523/diff/
> 
> 
> T

Re: Review Request 50523: Updated docker recovery to account for GPU resources.

2016-08-23 Thread Rajat Phull

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50523/
---

(Updated Aug. 24, 2016, 12:56 a.m.)


Review request for mesos, Benjamin Mahler, Kevin Klues, Yubo Li, and Vikrama 
Ditya.


Changes
---

Addressed first set of comments with the exception on one comment on about 
changing nvidiaGpuAllocator.get().allocate(gpus.get()). Investigating it 
further before making that change.


Bugs: MESOS-5795
https://issues.apache.org/jira/browse/MESOS-5795


Repository: mesos


Description
---

Updated docker recovery to account for GPU resources.


Diffs (updated)
-

  src/slave/containerizer/docker.hpp f2a06065cf99fed934c2c1ffc47461ec8a97f50d 
  src/slave/containerizer/docker.cpp 5c1ee8e467d1c54c60b67dc5275ef71e1bb90723 
  src/tests/containerizer/docker_containerizer_tests.cpp 
42d4364b6fcbc94c7852721511001c103cb5a90d 

Diff: https://reviews.apache.org/r/50523/diff/


Testing (updated)
---

GTEST_FILTER="DockerContainerizerTest.ROOT_NVIDIA_GPU_DOCKER_LaunchWithGpuRecovery"
 make -j check


Thanks,

Rajat Phull



Re: Review Request 51321: Added a test to ensure the master handles launching task groups.

2016-08-23 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51321/#review146597
---


Fix it, then Ship it!





src/tests/scheduler_tests.cpp (line 436)


'DEFAULT'



src/tests/scheduler_tests.cpp (line 548)


ASSERT_EQ(runTaskGroupMessage->task_group(), 2);


- Vinod Kone


On Aug. 23, 2016, 5:11 a.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51321/
> ---
> 
> (Updated Aug. 23, 2016, 5:11 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-6045
> https://issues.apache.org/jira/browse/MESOS-6045
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> For now this test ensures the message is sent to the agent
> in the successful launch path. More tests will be added to
> test the all-or-nothing paths (killed, invalid, unauthorized).
> 
> 
> Diffs
> -
> 
>   src/tests/scheduler_tests.cpp ccd0f2f602a7a1a9a44b68fd0f59bdc8e0fa58b1 
> 
> Diff: https://reviews.apache.org/r/51321/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Re: Review Request 51320: Implemented the LaunchGroup Offer::Operation in the master.

2016-08-23 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51320/#review146585
---


Fix it, then Ship it!





src/master/master.cpp (line 3257)


do you want to add a "messages_launch_task_groups" metric?



src/master/master.cpp (line 3916)


any tasks



src/master/master.cpp (line 3933)


move this to next line.



src/master/master.cpp (lines 3937 - 3939)


this could be `else {` ?



src/master/master.cpp (lines 4002 - 4004)


why is this comment here? move this to #3975 may be?



src/master/master.cpp (line 4014)


can you add a TODO here to do this killing in `killTask()` instead of here?



src/master/master.cpp (line 4026)


s/executor/agent/



src/master/master.cpp (line 4041)


s/Ok../Now/



src/master/master.cpp (line 4056)


implement the CHECK.



src/messages/messages.proto (line 306)


wrap in back ticks?


- Vinod Kone


On Aug. 23, 2016, 5:11 a.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51320/
> ---
> 
> (Updated Aug. 23, 2016, 5:11 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-6045
> https://issues.apache.org/jira/browse/MESOS-6045
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This operation is all-or-nothing, in that all tasks must be
> launched together. If the operation fails, all tasks will
> fail with TASK_ERROR and the appropriate GROUP reason.
> If a task was killed before delivery to the executor, all
> tasks are killed.
> 
> 
> Diffs
> -
> 
>   include/mesos/mesos.proto 53b6547281b23ce9f47c9f1a418b60667fb4f602 
>   include/mesos/v1/mesos.proto f6b59e156c92a26dd63b11bf403bdba677b9644b 
>   src/master/master.cpp d94a8510c4cee9c010706f79caf27ef4a10b41a8 
>   src/messages/messages.proto 7b5e24fb1e9baf09ce024daeca90745f380d4c2f 
> 
> Diff: https://reviews.apache.org/r/51320/diff/
> 
> 
> Testing
> ---
> 
> Added a test in the subsequent patch.
> 
> More tests will be added for all-or-nothing validation / authorization paths.
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Review Request 51352: Make docker executor unversioned.

2016-08-23 Thread Yong Tang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51352/
---

Review request for mesos, Alexander Rukletsov, Anand Mazumdar, Qian Zhang, and 
Vinod Kone.


Bugs: MESOS-5227
https://issues.apache.org/jira/browse/MESOS-5227


Repository: mesos


Description
---

This fix updates the docker executor so that it is unversioned
(instead of v1).

This fix is part of MESOS-5227.


Diffs
-

  src/docker/executor.cpp 8d679cd33b6ddf3a5c11bb8c458a97b8809473ac 

Diff: https://reviews.apache.org/r/51352/diff/


Testing
---

make check (Ubuntu 14.04)


Thanks,

Yong Tang



Review Request 51351: Update HttpDockerExecutor with v1 API.

2016-08-23 Thread Yong Tang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51351/
---

Review request for mesos, Alexander Rukletsov, Anand Mazumdar, Qian Zhang, and 
Vinod Kone.


Bugs: MESOS-5227
https://issues.apache.org/jira/browse/MESOS-5227


Repository: mesos


Description
---

This patch is part of the MESOS-5227:
Implement HTTP Docker Executor that uses the Executor Library.


Diffs
-

  src/docker/executor.cpp 8d679cd33b6ddf3a5c11bb8c458a97b8809473ac 
  src/internal/devolve.hpp 90681ebe454e12e39b214248f7457931e32295dc 
  src/internal/devolve.cpp efcc5d6050c5cb2fb79a4d676e4427f02e03eae9 

Diff: https://reviews.apache.org/r/51351/diff/


Testing
---

make check (Ubuntu 14.04)


Thanks,

Yong Tang



Re: Review Request 51052: Make mesos-docker-execute understand cgroups_enable_cfs: WIP.

2016-08-23 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51052/#review146592
---



Patch looks great!

Reviews applied: [51009, 51052]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker_build.sh

- Mesos ReviewBot


On Aug. 22, 2016, 10:24 p.m., Zhitao Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51052/
> ---
> 
> (Updated Aug. 22, 2016, 10:24 p.m.)
> 
> 
> Review request for mesos, Gilbert Song, haosdent huang, and Jie Yu.
> 
> 
> Bugs: MESOS-2154
> https://issues.apache.org/jira/browse/MESOS-2154
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This fixes cpu quota for command executor (which runs outside
> of the docker container) by ensuing --cpu-quota flag to docker
> run.
> 
> Note: we have to add the boolean variable to `Docker` class
> because `Docker::run()` has reached the maximum argument length
> GMOCK can support.
> 
> 
> Diffs
> -
> 
>   src/docker/docker.hpp 9093371afc8ea792ba94f61c6875703e547ea6b0 
>   src/docker/docker.cpp e07c4aa723f57db7f8bc31d4af5d32a30ebf98f2 
>   src/docker/executor.hpp 7b63d784d6b8685912598b77fb38cf6e70646ae3 
>   src/docker/executor.cpp 8d679cd33b6ddf3a5c11bb8c458a97b8809473ac 
>   src/slave/containerizer/docker.cpp e447c58bd46ba080529e8f349347eccf5a54110a 
>   src/tests/mesos.cpp f5034f9f8de7040182e10f51be125a87b29fdd24 
> 
> Diff: https://reviews.apache.org/r/51052/diff/
> 
> 
> Testing
> ---
> 
> I am now able to make docker containers launched through mesos-execute have a 
> cpu quota.
> 
> Also making sure `make check` still works on mac os for the linux only flag.
> 
> 
> Thanks,
> 
> Zhitao Li
> 
>



Re: Review Request 51027: Track allocation candidates to bound allocator.

2016-08-23 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51027/#review146591
---



This patch really improved the performance a lot for allocator, I did some test 
with 6000 frameworks and 1000 agents (Add 6000 frameworks first then add 1000 
agents), the result is quite promising! Please take a look at the result of 
adding 1000 agents, the time decreased from `4.6 mins` to `49 secs`! It would 
help a lot for master failover with thousands of frameworks and agents.

Before fix:
```
[ RUN  ] 
SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.DeclineOffers/7
Using 1000 agents and 6000 frameworks
Added 6000 frameworks in 702547us
Added 1000 agents in 4.6021463667mins
```
After fix:
```
[ RUN  ] 
SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.DeclineOffers/7
Using 1000 agents and 6000 frameworks
Added 6000 frameworks in 89707us
Added 1000 agents in 49.351196secs
```

- Guangya Liu


On 八月 23, 2016, 8:49 a.m., Jacob Janco wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51027/
> ---
> 
> (Updated 八月 23, 2016, 8:49 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, James Peach, Klaus 
> Ma, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-3157
> https://issues.apache.org/jira/browse/MESOS-3157
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> - Triggered allocations dispatch allocate() only
>   if there is no pending allocation in the queue.
> - Allocation candidates are accumulated and only
>   cleared when enqueued allocations are processed.
> - Batched allocations are handled synchronously.
> 
> - Carrying over work from https://reviews.apache.org/r/41658/ and added the 
> previous reviewers
> - Specifically, this patch introduces the boolean flag pendingAllocation, 
> which when set on event 
>   triggered allocations, will prevent additional no-op allocations: the flag 
> is cleared when 
>   the enqueued allocation is processed, subsequent event triggered 
> allocations will update a set
>   of allocation candidates rather than dispatching an additional allocate().
> 
> 
> Diffs
> -
> 
>   src/master/allocator/mesos/hierarchical.hpp 
> bdbc6d3b5b959990538f4e3b7b1a3b031d9aea05 
>   src/master/allocator/mesos/hierarchical.cpp 
> 234ef98529964a0b6d3f132426a6c8ccbb1263ee 
> 
> Diff: https://reviews.apache.org/r/51027/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> note: check without filters depends on https://reviews.apache.org/r/51028
> 
> With new benchmark https://reviews.apache.org/r/49617: 
> Sample output without 51027:
> [ RUN  ] 
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
> Using 1 agents and 3000 frameworks
> Added 3000 frameworks in 57251us
> Added 1 agents in 3.2134535333mins
> allocator settled after  1.6123603833mins
> [   OK ] 
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
>  (290578 ms)
> 
> Sample output with 51027:
> [ RUN  ] 
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
> Using 1 agents and 3000 frameworks
> Added 3000 frameworks in 39817us
> Added 1 agents in 3.2286054167mins
> allocator settled after  25.525654secs
> [   OK ] 
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
>  (220137 ms)
> 
> 
> Thanks,
> 
> Jacob Janco
> 
>



Re: Review Request 50836: Made add/subtract resource object as private method.

2016-08-23 Thread Guangya Liu


> On 八月 16, 2016, 11:44 p.m., Jiang Yan Xu wrote:
> > include/mesos/resources.hpp, line 59
> > 
> >
> > I think it would still work if you keep the `static` keyword right?
> > 
> > The `friend` keyword wouldn't allow you to specify `static` but here 
> > you can. 
> > 
> > Then in the cpp files you don't have to remove the `static` keyword on 
> > the definition.
> 
> Guangya Liu wrote:
> Thanks Jiang Yan, but it still does not work as your proposal by adding 
> `static` for the `convertJSON` definition in cpp file and forward 
> declarations in hpp file as the `friend Try 
> internal::convertJSON()` will still think this method is not a `static` 
> method and this will cause build error as following.
> 
> ```
> In file included from ../../src/common/resources_utils.cpp:21:
> In file included from ../../src/common/resources_utils.hpp:21:
> ../../include/mesos/resources.hpp:59:25: error: unused function 
> 'convertJSON' [-Werror,-Wunused-function]
>   static Try convertJSON(
> ^
> 1 error generated.
> make[2]: *** [common/libmesos_no_3rdparty_la-resources_utils.lo] Error 1
> ```
> 
> Another is that I have posted some comments in 
> https://reviews.apache.org/r/50568/ and proposing keep `add` as public, can 
> you please take a look and show your comments if any.
> 
> The following are comments cited from https://reviews.apache.org/r/50568/
> 
> ```
> @Ben and @Yan, I found another place where we can enhance for allocator 
> https://github.com/apache/mesos/blob/master/src/master/allocator/mesos/hierarchical.cpp#L1274
>  and this was only used by quota now, I think we can use add here. The 
> problem is that @Yan porposed to make the add as a private method and I've 
> posted a patch here https://reviews.apache.org/r/50836/diff/1#index_header , 
> but after a second thought, perhaps we can enable framework developers or 
> other users call this add API directly. As sometimes, the user may want to 
> traverse all Resources object and make some update for each Resource object 
> and then add those Resource again just like what we did in allocator for now. 
> So what about keep add as a public method?
> ```
> 
> Jiang Yan Xu wrote:
> My apologies. Looks like it's problematic to declare methods as static in 
> the header anyways. To keep the method internal to resources.cpp you can 
> define it as `inline` and keep the forward and friend declarations without 
> `inline` or `static`.
> 
> On the subject of keeping `add()/subtract()` public so some external code 
> can call them, I feel that this is a limitation of the current 
> Resource-related abstractions. If we keep the method public, we have to count 
> on the callers to know a lot about the internal implementation of `Resources` 
> and use it carefully. This is in general against information hiding and loose 
> coupling. 
> 
> Ideally if we have a C++ `Resource` abstraction that handles validation 
> and we know that a (non-proto) `Resource` object must be valid, we can make 
> `+=` validation free but until then, I understand that there are places in 
> Mesos codebase already with compromises like this. I think we need to weigh 
> the performance gains we can get against the drawback in the cleanliness of 
> the abstraction. We can put a hold on this review and keep the methods public 
> for now if you are thinking about using them to improve performance and 
> conducting benchmarks to compare. :)
> 
> What are your thoughts @BenM?
> 
> Guangya Liu wrote:
> Thanks Jiang Yan, yes, the `inline` works for this.
> 
> For keeping `add()/subtract()`, agree that we need a benchmark test for 
> this. The current 
> https://github.com/apache/mesos/blob/master/src/master/allocator/mesos/hierarchical.cpp#L1274
>  was only used by quota and there is no benchmark test related for quota now, 
> perhaps we need to add one such case for benchmark test, @Ben and @Jiang Yan, 
> what do you think?
> 
> Another point is from 
> https://github.com/apache/mesos/blob/master/src/master/allocator/mesos/hierarchical.cpp#L1283-L1295
>  , if we have thousandes of agents and some quotas, I think the performance 
> will improve much based on the test result of 
> https://reviews.apache.org/r/50557/
> 
> Guangya Liu wrote:
> Had some offline discussion with Ben, we will make those two methods 
> add/subtract as private for now as it only impacts quota related logic in 
> allocator. And we can add some other wrapper functions to expose those two 
> APIs if they are needed in future. Also the benchmark test for `Quota` is not 
> priority, so there is no need to add them for now.
> 
> Jiang Yan Xu wrote:
> Alright I'll commit this then. So you haven't found other places on the 
> critical path that loops through a Resources object and add to

Re: Review Request 50836: Made add/subtract resource object as private method.

2016-08-23 Thread Jiang Yan Xu


> On Aug. 16, 2016, 4:44 p.m., Jiang Yan Xu wrote:
> > include/mesos/resources.hpp, line 59
> > 
> >
> > I think it would still work if you keep the `static` keyword right?
> > 
> > The `friend` keyword wouldn't allow you to specify `static` but here 
> > you can. 
> > 
> > Then in the cpp files you don't have to remove the `static` keyword on 
> > the definition.
> 
> Guangya Liu wrote:
> Thanks Jiang Yan, but it still does not work as your proposal by adding 
> `static` for the `convertJSON` definition in cpp file and forward 
> declarations in hpp file as the `friend Try 
> internal::convertJSON()` will still think this method is not a `static` 
> method and this will cause build error as following.
> 
> ```
> In file included from ../../src/common/resources_utils.cpp:21:
> In file included from ../../src/common/resources_utils.hpp:21:
> ../../include/mesos/resources.hpp:59:25: error: unused function 
> 'convertJSON' [-Werror,-Wunused-function]
>   static Try convertJSON(
> ^
> 1 error generated.
> make[2]: *** [common/libmesos_no_3rdparty_la-resources_utils.lo] Error 1
> ```
> 
> Another is that I have posted some comments in 
> https://reviews.apache.org/r/50568/ and proposing keep `add` as public, can 
> you please take a look and show your comments if any.
> 
> The following are comments cited from https://reviews.apache.org/r/50568/
> 
> ```
> @Ben and @Yan, I found another place where we can enhance for allocator 
> https://github.com/apache/mesos/blob/master/src/master/allocator/mesos/hierarchical.cpp#L1274
>  and this was only used by quota now, I think we can use add here. The 
> problem is that @Yan porposed to make the add as a private method and I've 
> posted a patch here https://reviews.apache.org/r/50836/diff/1#index_header , 
> but after a second thought, perhaps we can enable framework developers or 
> other users call this add API directly. As sometimes, the user may want to 
> traverse all Resources object and make some update for each Resource object 
> and then add those Resource again just like what we did in allocator for now. 
> So what about keep add as a public method?
> ```
> 
> Jiang Yan Xu wrote:
> My apologies. Looks like it's problematic to declare methods as static in 
> the header anyways. To keep the method internal to resources.cpp you can 
> define it as `inline` and keep the forward and friend declarations without 
> `inline` or `static`.
> 
> On the subject of keeping `add()/subtract()` public so some external code 
> can call them, I feel that this is a limitation of the current 
> Resource-related abstractions. If we keep the method public, we have to count 
> on the callers to know a lot about the internal implementation of `Resources` 
> and use it carefully. This is in general against information hiding and loose 
> coupling. 
> 
> Ideally if we have a C++ `Resource` abstraction that handles validation 
> and we know that a (non-proto) `Resource` object must be valid, we can make 
> `+=` validation free but until then, I understand that there are places in 
> Mesos codebase already with compromises like this. I think we need to weigh 
> the performance gains we can get against the drawback in the cleanliness of 
> the abstraction. We can put a hold on this review and keep the methods public 
> for now if you are thinking about using them to improve performance and 
> conducting benchmarks to compare. :)
> 
> What are your thoughts @BenM?
> 
> Guangya Liu wrote:
> Thanks Jiang Yan, yes, the `inline` works for this.
> 
> For keeping `add()/subtract()`, agree that we need a benchmark test for 
> this. The current 
> https://github.com/apache/mesos/blob/master/src/master/allocator/mesos/hierarchical.cpp#L1274
>  was only used by quota and there is no benchmark test related for quota now, 
> perhaps we need to add one such case for benchmark test, @Ben and @Jiang Yan, 
> what do you think?
> 
> Another point is from 
> https://github.com/apache/mesos/blob/master/src/master/allocator/mesos/hierarchical.cpp#L1283-L1295
>  , if we have thousandes of agents and some quotas, I think the performance 
> will improve much based on the test result of 
> https://reviews.apache.org/r/50557/
> 
> Guangya Liu wrote:
> Had some offline discussion with Ben, we will make those two methods 
> add/subtract as private for now as it only impacts quota related logic in 
> allocator. And we can add some other wrapper functions to expose those two 
> APIs if they are needed in future. Also the benchmark test for `Quota` is not 
> priority, so there is no need to add them for now.

Alright I'll commit this then. So you haven't found other places on the 
critical path that loops through a Resources object and add to another 
Resources object?


-

Re: Review Request 51299: Fixed memory leak in master during framework teardown.

2016-08-23 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51299/
---

(Updated Aug. 23, 2016, 3:49 p.m.)


Review request for mesos, Anand Mazumdar, Greg Mann, and Artem Harutyunyan.


Changes
---

Realized that a call to `writer.fail(...)` would show the same leak if the 
master used it.  Change the patch to modify the reader's code instead of the 
writer's code.


Repository: mesos


Description (updated)
---

The `process::http::Pipe` class leaks its underlying `shared_ptr`
due to how the master (accidentally) causes the `shared_ptr` to hold
a self-reference.

When the master receives a `SUBSCRIBE` call from an V1 HTTP API
framework, the master creates a `process::http::Pipe` to manage the
reading and writing in separate locations in the code.  For 
synchronization, the read/write ends of the HTTP connection share
some state (via `shared_ptr`).

The master introduces a self-reference via this line in
`Master::addFramework`:
```
  http.closed()
.onAny(defer(self(), &Self::exited, framework->id(), http));
```

`http.closed()` returns a future managed by the read-end of the `Pipe`.
`http` holds a copy of the write-end of the `Pipe`, which in turn has
a copy of the `shared_ptr`.  Because `http` is passed into the future's 
continuation, a copy of `http` will live inside the read-end's future 
until the either (a) the continuation is executed or (b) the future 
is destructed.

Due to how we manage the future, neither (a) nor (b) are performed:
(a) When the read-end of the `Pipe` is closed, we only complete the
future if the write-end of the `Pipe` is still open.  During
framework teardown, the write-end is closed first.
(b) The future in question lives inside a `Promise`, which lives
inside the `shared_ptr`.  Because the self-reference exists, the
`shared_ptr` is never destructed; which means the `Promise` and
future are never destructed either.

---

This patch fixes the leak by making sure the continuation is always
executed (a) or discarded.  It does so by discarding the related 
future when the write-end of the `Pipe` is already closed.


Diffs (updated)
-

  3rdparty/libprocess/src/http.cpp 298bd460afdeccd18e201d8a505b961fd4cf3d3e 
  3rdparty/libprocess/src/tests/http_tests.cpp 
24b266df5f17e28e0c95326f5d1ea451952500e8 

Diff: https://reviews.apache.org/r/51299/diff/


Testing
---

Found this leak in collaboration with Greg :)

Ran valgrind before applying this patch:
```
LD_RUN_PATH=/path/to/mesos/build/src/.libs 
LD_LIBRARY_PATH=/path/to/mesos/build/src/.libs valgrind --leak-check=full 
src/.libs/mesos-tests --gtest_filter="*SchedulerTest.Teardown*"
```

Observed the following leak (among many false positives):
```
1,881 (216 direct, 1,665 indirect) bytes in 1 blocks are definitely lost in 
loss record 2,405 of 2,507
   at 0x4C2A105: operator new(unsigned long) (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0xF259A9: process::http::Pipe::Pipe() (http.hpp:414)
   by 0x5D629A1: 
mesos::internal::master::Master::Http::scheduler(process::http::Request const&, 
Option const&) const (http.cpp:796)
   by 0x5DB5806: operator() (master.cpp:858)
   ...
```

Applied the patch and re-ran valgrind.  Confirmed that leak is gone.


Thanks,

Joseph Wu



Re: Review Request 51027: Track allocation candidates to bound allocator.

2016-08-23 Thread Jiang Yan Xu


> On Aug. 23, 2016, 2:26 a.m., Alexander Rukletsov wrote:
> > src/master/allocator/mesos/hierarchical.cpp, lines 1194-1195
> > 
> >
> > `delay` already contains `dispatch`, so under "synchronously" you 
> > actually mean double dispatch.

I originally suggested putting this comment inside `batch()` and directly above 
`allocate()` so it's more clear what `synchronously` applies to: calling 
`allocate()` without dispatching.

*delay* in this sentence doesn't mean the `delay()` call but in the literal 
sense. To make it more clear, how about we say:

```
// We run the allocation synchronously here instead of dispatching it **again** 
so
// a batched allocation **doesn't lag behind further** if the allocator is 
backed up.
```

Note the asterisks are just to emphasize the changes.


> On Aug. 23, 2016, 2:26 a.m., Alexander Rukletsov wrote:
> > src/master/allocator/mesos/hierarchical.cpp, lines 276-279
> > 
> >
> > Probably extract this snippet in a function, e.g. 
> > `conditionalAllocate()`?

I thought about it but was struggling to find a short and clear method name. So 
to describe the function in a full sentense it's "dispatch an allocate() call 
if the condition `!allocationPending` is met". I think `conditionalAllocate()` 
is OK but not great, it's not clear what the condition is and not clear about 
the dispatch. I agree it's worth doing if we can abstract this out without 
needing to explain what each line of a 4-line method is doing in the comment...


- Jiang Yan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51027/#review146499
---


On Aug. 23, 2016, 1:49 a.m., Jacob Janco wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51027/
> ---
> 
> (Updated Aug. 23, 2016, 1:49 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, James Peach, Klaus 
> Ma, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-3157
> https://issues.apache.org/jira/browse/MESOS-3157
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> - Triggered allocations dispatch allocate() only
>   if there is no pending allocation in the queue.
> - Allocation candidates are accumulated and only
>   cleared when enqueued allocations are processed.
> - Batched allocations are handled synchronously.
> 
> - Carrying over work from https://reviews.apache.org/r/41658/ and added the 
> previous reviewers
> - Specifically, this patch introduces the boolean flag pendingAllocation, 
> which when set on event 
>   triggered allocations, will prevent additional no-op allocations: the flag 
> is cleared when 
>   the enqueued allocation is processed, subsequent event triggered 
> allocations will update a set
>   of allocation candidates rather than dispatching an additional allocate().
> 
> 
> Diffs
> -
> 
>   src/master/allocator/mesos/hierarchical.hpp 
> bdbc6d3b5b959990538f4e3b7b1a3b031d9aea05 
>   src/master/allocator/mesos/hierarchical.cpp 
> 234ef98529964a0b6d3f132426a6c8ccbb1263ee 
> 
> Diff: https://reviews.apache.org/r/51027/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> note: check without filters depends on https://reviews.apache.org/r/51028
> 
> With new benchmark https://reviews.apache.org/r/49617: 
> Sample output without 51027:
> [ RUN  ] 
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
> Using 1 agents and 3000 frameworks
> Added 3000 frameworks in 57251us
> Added 1 agents in 3.2134535333mins
> allocator settled after  1.6123603833mins
> [   OK ] 
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
>  (290578 ms)
> 
> Sample output with 51027:
> [ RUN  ] 
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
> Using 1 agents and 3000 frameworks
> Added 3000 frameworks in 39817us
> Added 1 agents in 3.2286054167mins
> allocator settled after  25.525654secs
> [   OK ] 
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
>  (220137 ms)
> 
> 
> Thanks,
> 
> Jacob Janco
> 
>



Re: Review Request 51027: Track allocation candidates to bound allocator.

2016-08-23 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51027/#review146588
---




src/master/allocator/mesos/hierarchical.hpp (line 380)


s/are/that are



src/master/allocator/mesos/hierarchical.cpp (lines 782 - 783)


What about adding a comment similar as `setQuota`?

```
// Trigger the allocation synchronously in order to promptly
// react to the operator's host maintain request.
```


- Guangya Liu


On 八月 23, 2016, 8:49 a.m., Jacob Janco wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51027/
> ---
> 
> (Updated 八月 23, 2016, 8:49 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, James Peach, Klaus 
> Ma, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-3157
> https://issues.apache.org/jira/browse/MESOS-3157
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> - Triggered allocations dispatch allocate() only
>   if there is no pending allocation in the queue.
> - Allocation candidates are accumulated and only
>   cleared when enqueued allocations are processed.
> - Batched allocations are handled synchronously.
> 
> - Carrying over work from https://reviews.apache.org/r/41658/ and added the 
> previous reviewers
> - Specifically, this patch introduces the boolean flag pendingAllocation, 
> which when set on event 
>   triggered allocations, will prevent additional no-op allocations: the flag 
> is cleared when 
>   the enqueued allocation is processed, subsequent event triggered 
> allocations will update a set
>   of allocation candidates rather than dispatching an additional allocate().
> 
> 
> Diffs
> -
> 
>   src/master/allocator/mesos/hierarchical.hpp 
> bdbc6d3b5b959990538f4e3b7b1a3b031d9aea05 
>   src/master/allocator/mesos/hierarchical.cpp 
> 234ef98529964a0b6d3f132426a6c8ccbb1263ee 
> 
> Diff: https://reviews.apache.org/r/51027/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> note: check without filters depends on https://reviews.apache.org/r/51028
> 
> With new benchmark https://reviews.apache.org/r/49617: 
> Sample output without 51027:
> [ RUN  ] 
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
> Using 1 agents and 3000 frameworks
> Added 3000 frameworks in 57251us
> Added 1 agents in 3.2134535333mins
> allocator settled after  1.6123603833mins
> [   OK ] 
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
>  (290578 ms)
> 
> Sample output with 51027:
> [ RUN  ] 
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
> Using 1 agents and 3000 frameworks
> Added 3000 frameworks in 39817us
> Added 1 agents in 3.2286054167mins
> allocator settled after  25.525654secs
> [   OK ] 
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
>  (220137 ms)
> 
> 
> Thanks,
> 
> Jacob Janco
> 
>



Re: Review Request 50836: Made add/subtract resource object as private method.

2016-08-23 Thread Guangya Liu


> On 八月 16, 2016, 11:44 p.m., Jiang Yan Xu wrote:
> > include/mesos/resources.hpp, line 59
> > 
> >
> > I think it would still work if you keep the `static` keyword right?
> > 
> > The `friend` keyword wouldn't allow you to specify `static` but here 
> > you can. 
> > 
> > Then in the cpp files you don't have to remove the `static` keyword on 
> > the definition.
> 
> Guangya Liu wrote:
> Thanks Jiang Yan, but it still does not work as your proposal by adding 
> `static` for the `convertJSON` definition in cpp file and forward 
> declarations in hpp file as the `friend Try 
> internal::convertJSON()` will still think this method is not a `static` 
> method and this will cause build error as following.
> 
> ```
> In file included from ../../src/common/resources_utils.cpp:21:
> In file included from ../../src/common/resources_utils.hpp:21:
> ../../include/mesos/resources.hpp:59:25: error: unused function 
> 'convertJSON' [-Werror,-Wunused-function]
>   static Try convertJSON(
> ^
> 1 error generated.
> make[2]: *** [common/libmesos_no_3rdparty_la-resources_utils.lo] Error 1
> ```
> 
> Another is that I have posted some comments in 
> https://reviews.apache.org/r/50568/ and proposing keep `add` as public, can 
> you please take a look and show your comments if any.
> 
> The following are comments cited from https://reviews.apache.org/r/50568/
> 
> ```
> @Ben and @Yan, I found another place where we can enhance for allocator 
> https://github.com/apache/mesos/blob/master/src/master/allocator/mesos/hierarchical.cpp#L1274
>  and this was only used by quota now, I think we can use add here. The 
> problem is that @Yan porposed to make the add as a private method and I've 
> posted a patch here https://reviews.apache.org/r/50836/diff/1#index_header , 
> but after a second thought, perhaps we can enable framework developers or 
> other users call this add API directly. As sometimes, the user may want to 
> traverse all Resources object and make some update for each Resource object 
> and then add those Resource again just like what we did in allocator for now. 
> So what about keep add as a public method?
> ```
> 
> Jiang Yan Xu wrote:
> My apologies. Looks like it's problematic to declare methods as static in 
> the header anyways. To keep the method internal to resources.cpp you can 
> define it as `inline` and keep the forward and friend declarations without 
> `inline` or `static`.
> 
> On the subject of keeping `add()/subtract()` public so some external code 
> can call them, I feel that this is a limitation of the current 
> Resource-related abstractions. If we keep the method public, we have to count 
> on the callers to know a lot about the internal implementation of `Resources` 
> and use it carefully. This is in general against information hiding and loose 
> coupling. 
> 
> Ideally if we have a C++ `Resource` abstraction that handles validation 
> and we know that a (non-proto) `Resource` object must be valid, we can make 
> `+=` validation free but until then, I understand that there are places in 
> Mesos codebase already with compromises like this. I think we need to weigh 
> the performance gains we can get against the drawback in the cleanliness of 
> the abstraction. We can put a hold on this review and keep the methods public 
> for now if you are thinking about using them to improve performance and 
> conducting benchmarks to compare. :)
> 
> What are your thoughts @BenM?
> 
> Guangya Liu wrote:
> Thanks Jiang Yan, yes, the `inline` works for this.
> 
> For keeping `add()/subtract()`, agree that we need a benchmark test for 
> this. The current 
> https://github.com/apache/mesos/blob/master/src/master/allocator/mesos/hierarchical.cpp#L1274
>  was only used by quota and there is no benchmark test related for quota now, 
> perhaps we need to add one such case for benchmark test, @Ben and @Jiang Yan, 
> what do you think?
> 
> Another point is from 
> https://github.com/apache/mesos/blob/master/src/master/allocator/mesos/hierarchical.cpp#L1283-L1295
>  , if we have thousandes of agents and some quotas, I think the performance 
> will improve much based on the test result of 
> https://reviews.apache.org/r/50557/

Had some offline discussion with Ben, we will make those two methods 
add/subtract as private for now as it only impacts quota related logic in 
allocator. And we can add some other wrapper functions to expose those two APIs 
if they are needed in future. Also the benchmark test for `Quota` is not 
priority, so there is no need to add them for now.


- Guangya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50836/#review145918
--

Re: Review Request 50836: Made add/subtract resource object as private method.

2016-08-23 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50836/
---

(Updated 八月 23, 2016, 10:19 p.m.)


Review request for mesos, Benjamin Mahler and Jiang Yan Xu.


Repository: mesos


Description
---

We do not want people call add/subtract resoure object directly,
so should make those methods as private.


Diffs (updated)
-

  include/mesos/resources.hpp bdbe8ea03a50cd1361640bde13a2342909723fc5 
  include/mesos/v1/resources.hpp c05cb634c7a5add78da00cb84fc75d3472a341bc 
  src/common/resources.cpp 96b4c39507bdb9b88aca5c2178b88057a5fc1881 
  src/v1/resources.cpp 3cc7580d5567370530c53759713be05c369bf298 

Diff: https://reviews.apache.org/r/50836/diff/


Testing
---

make
make check

```
./bin/mesos-tests.sh --gtest_filter="ResourcesTest.*"
./bin/mesos-tests.sh --gtest_filter="SharedResourcesTest.*"
```


Thanks,

Guangya Liu



Re: Review Request 50568: Updated -=/+= to subtract/add for resource object.

2016-08-23 Thread Guangya Liu


> On 八月 1, 2016, 10:39 p.m., Jiang Yan Xu wrote:
> > Can we keep add/subtract private? The fact they were added to save some 
> > redudant work suggests their `private` nature. Otherwise what if people use 
> > `add` directly instead of `+=`?
> > 
> > Looks like they were changed to public just so it can be called from the 
> > free-standing `convertJSON`, which is a private util method itself. We can 
> > just friend it right?
> 
> Benjamin Mahler wrote:
> Making it private sounds good and having a friend sounds good.
> 
> Guangya Liu wrote:
> Thanks Jinag Yan and Ben, I will follow up a patch later.
> 
> Jiang Yan Xu wrote:
> Hey Guangya have you started on it? If not we can take care of it because 
> we'd like to also consolidate `Resources::add(const Resource& r)` and 
> `add(const Resource_& r);`
> 
> Guangya Liu wrote:
> Yes, I was working on this but found one issue and still checking. 
> Currently, the function `static Try convertJSON` is a `static` 
> function and the `friend` do not support `static` function so here we may 
> need to make the `Try convertJSON` as a non-static function if we 
> want to make it a `friend`, what do you think of this?
> 
> BTW: Can we make this and `consolidate Resources::add(const Resource& r) 
> and add(const Resource_& r);` as separate patches?
> 
> Guangya Liu wrote:
> Jiang Yan, I posted a patch here https://reviews.apache.org/r/50836/ and 
> added you as reviewer, can you help check? Thanks.
> 
> Guangya Liu wrote:
> @Ben and @Yan, I found another place where we can enhance for allocator 
> https://github.com/apache/mesos/blob/master/src/master/allocator/mesos/hierarchical.cpp#L1274
>  and this was only used by quota now, I think we can use `add` here. The 
> problem is that @Yan porposed to make the `add` as a private method and I've 
> posted a patch here https://reviews.apache.org/r/50836/diff/1#index_header , 
> but after a second thought, perhaps we can enable framework developers or 
> other users call this `add` API directly. As sometimes, the user may want to 
> traverse all `Resources` object and make some update for each `Resource` 
> object and then add those `Resource` again just like what we did in allocator 
> for now. So what about keep `add` as a public method?

Had some offline discussion with Ben, we will make those two methods 
`add/subtract` as private for now as it only impacts `quota` related logic in 
allocator. And we can add some other wrapper functions to expose those two APIs 
if they are needed in future.


- Guangya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50568/#review144394
---


On 七月 29, 2016, 1:34 a.m., Guangya Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50568/
> ---
> 
> (Updated 七月 29, 2016, 1:34 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Joris Van Remoortere, and Klaus Ma.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This is a follow up action for MESOS-5919. Based on the patch of
> https://reviews.apache.org/r/50553/ and
> https://reviews.apache.org/r/50557/ , we should update -=/+= to
> subtract/add if the resource object is from resources object.
> 
> 
> Diffs
> -
> 
>   include/mesos/resources.hpp 88a9feabf66ed34e7e5b1c6cb7e831818e7f7883 
>   include/mesos/v1/resources.hpp 054ed00a03319ae5e350542add34f497eaf79152 
>   src/common/resources.cpp 3dbff24d6859d3b1ed8589cec50170a5202cfbcb 
>   src/v1/resources.cpp 3c85dc8aa8125962b44e60806ece83a7653d0dc7 
> 
> Diff: https://reviews.apache.org/r/50568/diff/
> 
> 
> Testing
> ---
> 
> make
> make check
> 
> 
> Thanks,
> 
> Guangya Liu
> 
>



Re: Review Request 51277: Added an 'ns::enter' helper.

2016-08-23 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51277/#review146575
---




src/linux/ns.hpp (line 230)


One suggestion on naming of this function. Looks like this is more like a 
`spawn` as it does actually clone the namespaces. So maybe we rename this to 
`spawn`?


- Jie Yu


On Aug. 22, 2016, 4:54 a.m., Benjamin Hindman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51277/
> ---
> 
> (Updated Aug. 22, 2016, 4:54 a.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added an 'ns::enter' helper.
> 
> 
> Diffs
> -
> 
>   src/linux/ns.hpp 2d6c359ed24d6e964882f34df60d8182491a27c9 
> 
> Diff: https://reviews.apache.org/r/51277/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>



Re: Review Request 50423: Avoided using integers as booleans.

2016-08-23 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50423/#review146570
---


Ship it!




Ship It!

- Jie Yu


On Aug. 22, 2016, 11:34 p.m., Guangya Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50423/
> ---
> 
> (Updated Aug. 22, 2016, 11:34 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Gilbert Song, and Jie Yu.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This is a follow up action for https://reviews.apache.org/r/49352/,
> we should avoid using integers as booleans in favor of explicitly
> checking the condition we care about.
> 
> 
> Diffs
> -
> 
>   src/docker/docker.cpp e07c4aa723f57db7f8bc31d4af5d32a30ebf98f2 
>   src/slave/containerizer/mesos/containerizer.cpp 
> 1cd3b3831feefeef6ab2ade6e3f33ecdbd971ecd 
> 
> Diff: https://reviews.apache.org/r/50423/diff/
> 
> 
> Testing
> ---
> 
> make
> make check
> 
> 
> Thanks,
> 
> Guangya Liu
> 
>



Re: Review Request 51272: Refactored the agent 'launcher' flag to always have a value.

2016-08-23 Thread Jie Yu


> On Aug. 22, 2016, 7 p.m., Jie Yu wrote:
> > src/local/local.cpp, lines 343-350
> > 
> >
> > I'd suggest we move this logic down below. more readable to me (current 
> > logic is broken into two pieces).
> > ```
> > // ...
> > if (flags.num_slaves > 1) {
> >   LOG(WARNING) << ...;
> >   
> >   flags.launcher = "posix";
> > }
> > ```
> 
> Benjamin Hindman wrote:
> Agreed I'd rather have the logic in one place, however: (1) then we'd be 
> printing out a lot of log messages (one for each agent) and (2) we'd still 
> have logic in two places because we'll need to pull out a 'num_slaves' 
> variable (or 'launcher' variable, as I did here) above the loop because 
> 'flags' gets overriden within the loop and I didn't want to make a bigger 
> change by renaming 'flags'. So, either we leave it, rename the inner 'flags', 
> or do you have another suggestion?

Yeah, i'd prefer rename the inner flag to `slaveFlags`. It's confusing to me 
that we have two set of flags using the same name.


- Jie


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51272/#review146399
---


On Aug. 22, 2016, 4:53 a.m., Benjamin Hindman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51272/
> ---
> 
> (Updated Aug. 22, 2016, 4:53 a.m.)
> 
> 
> Review request for mesos, Jie Yu and Kevin Klues.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Refactored the agent 'launcher' flag to always have a value.
> 
> 
> Diffs
> -
> 
>   src/local/local.cpp 1e1d246790a0d44d1baf802b903dc4f2bde1ac63 
>   src/slave/containerizer/mesos/containerizer.cpp 
> 77a502f853e3e04ea8e274419544601778be9421 
>   src/slave/flags.hpp 1a006663e7cc58ee548b3dda686cfbac0c240baa 
>   src/slave/flags.cpp 0f2be1700f41b74da4ea1ce699a81ec33cf92a9a 
> 
> Diff: https://reviews.apache.org/r/51272/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>



Re: Review Request 51278: Refactored LinuxLauncher to be nested container aware.

2016-08-23 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51278/#review146532
---




src/slave/containerizer/mesos/linux_launcher.hpp (lines 20 - 22)


Swap these two.



src/slave/containerizer/mesos/linux_launcher.hpp (lines 64 - 85)


Swap these two definitions. According to google style, type definitions 
should be listed first.



src/slave/containerizer/mesos/linux_launcher.hpp (lines 97 - 99)


Move this definition up below `recover` helper.



src/slave/containerizer/mesos/linux_launcher.cpp (lines 131 - 145)


I suggest we move this to a helper. We already have a helper to get cgroup 
from containerId, this will be the reverse operation.

```
string getCgroup(const ContainerID& containerId);
Try getContainerId(const string& cgroup);
```

We can add necessary validation in the helper.



src/slave/containerizer/mesos/linux_launcher.cpp (lines 149 - 151)


As we discussed yesterday, we probably should checkpoint the pid in 
runtime_dir before creating the cgroup in the freezer hierarchy.



src/slave/containerizer/mesos/linux_launcher.cpp (lines 186 - 193)


The accumulation logic here is quite unintuitive. I'd suggest we make 
`cgroups::traverse` simple in the sense that it only applies a _map_ function.

```
template 
Try> cgroups::traverse(
  const string& hierarchy,
  const string& cgroup,
  const lambda::function& f,
  const Traversal& traversal = PRE_ORDER);
```

And we perform the accumulation in LinuxLauncher. In that way, `recover` 
helper can be simplified to just return a `Try`.



src/slave/containerizer/mesos/linux_launcher.cpp (lines 201 - 206)


I am wondering if should traverse the runtime_dir. Since we checkpoint the 
pid before we create the cgroup (i.e., we remove cgroup before we remove the 
checkpointed pid), we might run into the case where  the checkpointed pid 
exists but the freezer cgroup does not exist.

However, for legacy containers, we still have to traverse freezer hierarchy.



src/slave/containerizer/mesos/linux_launcher.cpp (line 209)


s/our process/agent



src/slave/containerizer/mesos/linux_launcher.cpp (line 223)


Why do you need this if check?



src/slave/containerizer/mesos/linux_launcher.cpp (line 229)


s/our process/agent



src/slave/containerizer/mesos/linux_launcher.cpp (lines 402 - 409)


When do we remove `Container` from `containers`? We cannot keep them 
forever as it'll cause a memory leak. I think we should remove it when the top 
level container is being destroyed, and we should do it _after_ the 
`cgroups::destory` is successful.



src/slave/containerizer/mesos/linux_launcher.cpp (lines 408 - 409)


Can you be more specific here? Who will be responsible for cleaning up 
sub-directories under runtime_dir when the top level container terminates? I 
feel that Launcher should be responsible for doing that because it was the one 
that checkpoints the data.

The caveat here is that we cannot remove checkpoint dir for nested 
containers until the top level container terminates. This is because the 
executor might issue `WAIT` call to the agent to get the exit status of the 
nested container.

For the top level container, if we move the entire checkpointed dir during 
destroy, and if slave crashes before it checkpoints the sentinel file. During 
recovery, the containerizer will still call `launcher->wait` on the containerId 
that launcher does not know about. But since the agent does not need the exit 
code for the executor, this is fine for now. In the future, if we want to add 
executor status update, we need to adjust the logics here so that we only 
remove the top level checkpoint dir for the top level container after agent 
checkpoint the sentinel file. We probably need an explicit `launcher->cleanup`.



src/slave/containerizer/mesos/linux_launcher.cpp (lines 414 - 419)


This sounds like an uncessary optimization that buy us nothing. This also 
creates an explicit dependency from launcher to pid namespace isolator. I'd 
suggest we simply just remove this code here (i.e., always use freezer).


- Jie Yu

Re: Review Request 51342: Printed out the received stream ID in scheduler API.

2016-08-23 Thread Abhishek Dasgupta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51342/
---

(Updated Aug. 23, 2016, 8:29 p.m.)


Review request for mesos, Anand Mazumdar, Vinod Kone, and Zameer Manji.


Bugs: MESOS-6041
https://issues.apache.org/jira/browse/MESOS-6041


Repository: mesos


Description (updated)
---

This patch prints out the received stream ID in error output
caused by Mesos-Stream-Id mismatch in calls to
scheduler http api.
Expected stream ID is not printed in error output as
it may cause security leak.


Diffs
-

  src/master/http.cpp 52dd80b856cf2317c0b73ba54bf501696786088d 

Diff: https://reviews.apache.org/r/51342/diff/


Testing
---

On Ubuntu 16.04:
sudo make -j4 check

Manual testing as well.
Steps to reproduce:
1. Run mesos master.
2. Run Mesos Slave.
3. Subscribe a framework with the command : curl -v  -H  "Accept: 
application/json" -H "Content-Type: application/json" -d "@subscribe.json" 
http://127.0.0.1:5050/api/v1/scheduler

subscribe.json :
{
   "type"   : "SUBSCRIBE",
   "subscribe"  : {
  "framework_info"  : {
"user" :  "bar",
"name" :  "Example HTTP Framework1"
  }
  }
}
4. Accept an offer by following command : curl -v  -H  "Accept: 
application/json" -H "Content-Type: application/json" -H "Mesos-Stream-Id: 
000a38b7-8a2a-4c8f-a374-f782d6ea617f" -d "@accept.json" 
http://127.0.0.1:5050/api/v1/scheduler

accept.json:
{
"framework_id": {
"value": "ee29ca2a-c56b-4a24-bba1-4919afae813e-"
},
"type": "ACCEPT",
"accept": {
"offer_ids": [{
"value": "369108e7-2bff-4a54-aca0-4f63b11a9361-O0"
}],
"operations": [{
"type": "LAUNCH",
"launch": {
"task_infos": [{
"name": "My Task",
"task_id": {
"value": 
"12220-3440-12532-my-task"
},
"agent_id": {
"value": 
"94d32bba-2bb4-461f-bd96-30d155fe5965-S0"
},
"executor": {
"command": {
"shell": true,
"value": "sleep 1000"
},
"executor_id": {
"value": 
"12214-23523-my-executor"
}
},
"resources": [{
"name": "cpus",
"role": "*",
"type": "SCALAR",
"scalar": {
"value": 1.0
}
}, {
"name": "mem",
"role": "*",
"type": "SCALAR",
"scalar": {
"value": 128.0
}
}]
}]
}
}],
"filters": {
"refuse_seconds": 5.0
}
}
}

Note: Mesos-Stream-Id passed through header in this curl command is left 
mismatched to that received from the response to subscribe call.
5. Following error output is received as response: 
   The stream ID '000a38b7-8a2a-4c8f-a374-f782d6ea617f' included in this 
request didn't match the stream ID currently associated with framework ID 
'ee29ca2a-c56b-4a24-bba1-4919afae813e-'


Thanks,

Abhishek Dasgupta



Re: Review Request 51347: Fixed distcheck build.

2016-08-23 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51347/#review146567
---


Ship it!




Ship It!

- Jie Yu


On Aug. 23, 2016, 8:21 p.m., Joseph Wu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51347/
> ---
> 
> (Updated Aug. 23, 2016, 8:21 p.m.)
> 
> 
> Review request for mesos, Artem Harutyunyan, Jie Yu, and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> `include/mesos/slave/containerizer.hpp` was removed from the makefile
> in: https://reviews.apache.org/r/51263/
> 
> 
> Diffs
> -
> 
>   src/Makefile.am 8dc4175c60e4a9776ddb8ad21774fa4b30c28d00 
> 
> Diff: https://reviews.apache.org/r/51347/diff/
> 
> 
> Testing
> ---
> 
> make distcheck (CentOS 7)
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>



Review Request 51347: Fixed distcheck build.

2016-08-23 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51347/
---

Review request for mesos, Artem Harutyunyan, Jie Yu, and Vinod Kone.


Repository: mesos


Description
---

`include/mesos/slave/containerizer.hpp` was removed from the makefile
in: https://reviews.apache.org/r/51263/


Diffs
-

  src/Makefile.am 8dc4175c60e4a9776ddb8ad21774fa4b30c28d00 

Diff: https://reviews.apache.org/r/51347/diff/


Testing
---

make distcheck (CentOS 7)


Thanks,

Joseph Wu



Re: Review Request 51342: Printed out the received stream ID in scheduler API.

2016-08-23 Thread Abhishek Dasgupta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51342/
---

(Updated Aug. 23, 2016, 8:16 p.m.)


Review request for mesos, Anand Mazumdar, Vinod Kone, and Zameer Manji.


Bugs: MESOS-6041
https://issues.apache.org/jira/browse/MESOS-6041


Repository: mesos


Description (updated)
---

Printed out the received stream ID in scheduler API.


Diffs (updated)
-

  src/master/http.cpp 52dd80b856cf2317c0b73ba54bf501696786088d 

Diff: https://reviews.apache.org/r/51342/diff/


Testing
---

On Ubuntu 16.04:
sudo make -j4 check

Manual testing as well.
Steps to reproduce:
1. Run mesos master.
2. Run Mesos Slave.
3. Subscribe a framework with the command : curl -v  -H  "Accept: 
application/json" -H "Content-Type: application/json" -d "@subscribe.json" 
http://127.0.0.1:5050/api/v1/scheduler

subscribe.json :
{
   "type"   : "SUBSCRIBE",
   "subscribe"  : {
  "framework_info"  : {
"user" :  "bar",
"name" :  "Example HTTP Framework1"
  }
  }
}
4. Accept an offer by following command : curl -v  -H  "Accept: 
application/json" -H "Content-Type: application/json" -H "Mesos-Stream-Id: 
000a38b7-8a2a-4c8f-a374-f782d6ea617f" -d "@accept.json" 
http://127.0.0.1:5050/api/v1/scheduler

accept.json:
{
"framework_id": {
"value": "ee29ca2a-c56b-4a24-bba1-4919afae813e-"
},
"type": "ACCEPT",
"accept": {
"offer_ids": [{
"value": "369108e7-2bff-4a54-aca0-4f63b11a9361-O0"
}],
"operations": [{
"type": "LAUNCH",
"launch": {
"task_infos": [{
"name": "My Task",
"task_id": {
"value": 
"12220-3440-12532-my-task"
},
"agent_id": {
"value": 
"94d32bba-2bb4-461f-bd96-30d155fe5965-S0"
},
"executor": {
"command": {
"shell": true,
"value": "sleep 1000"
},
"executor_id": {
"value": 
"12214-23523-my-executor"
}
},
"resources": [{
"name": "cpus",
"role": "*",
"type": "SCALAR",
"scalar": {
"value": 1.0
}
}, {
"name": "mem",
"role": "*",
"type": "SCALAR",
"scalar": {
"value": 128.0
}
}]
}]
}
}],
"filters": {
"refuse_seconds": 5.0
}
}
}

Note: Mesos-Stream-Id passed through header in this curl command is left 
mismatched to that received from the response to subscribe call.
5. Following error output is received as response: 
   The stream ID '000a38b7-8a2a-4c8f-a374-f782d6ea617f' included in this 
request didn't match the stream ID currently associated with framework ID 
'ee29ca2a-c56b-4a24-bba1-4919afae813e-'


Thanks,

Abhishek Dasgupta



Re: Review Request 51342: Printed out the received stream ID in scheduler API.

2016-08-23 Thread Anand Mazumdar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51342/#review146566
---


Fix it, then Ship it!




Modulo Vinod's comment.


src/master/http.cpp (line 847)


Don't quote the `frameworkId` as it can't have spaces.


- Anand Mazumdar


On Aug. 23, 2016, 5:55 p.m., Abhishek Dasgupta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51342/
> ---
> 
> (Updated Aug. 23, 2016, 5:55 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Vinod Kone, and Zameer Manji.
> 
> 
> Bugs: MESOS-6041
> https://issues.apache.org/jira/browse/MESOS-6041
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch prints out the received stream ID in error output
> caused by Mesos-Stream-Id mismatch in calls to
> scheduler http api.
> Expected stream ID is not printed in error output as
> it may cause security leak.
> 
> 
> Diffs
> -
> 
>   src/master/http.cpp 52dd80b856cf2317c0b73ba54bf501696786088d 
> 
> Diff: https://reviews.apache.org/r/51342/diff/
> 
> 
> Testing
> ---
> 
> On Ubuntu 16.04:
> sudo make -j4 check
> 
> Manual testing as well.
> Steps to reproduce:
> 1. Run mesos master.
> 2. Run Mesos Slave.
> 3. Subscribe a framework with the command : curl -v  -H  "Accept: 
> application/json" -H "Content-Type: application/json" -d "@subscribe.json" 
> http://127.0.0.1:5050/api/v1/scheduler
> 
> subscribe.json :
> {
>"type"   : "SUBSCRIBE",
>"subscribe"  : {
>   "framework_info"  : {
> "user" :  "bar",
> "name" :  "Example HTTP Framework1"
>   }
>   }
> }
> 4. Accept an offer by following command : curl -v  -H  "Accept: 
> application/json" -H "Content-Type: application/json" -H "Mesos-Stream-Id: 
> 000a38b7-8a2a-4c8f-a374-f782d6ea617f" -d "@accept.json" 
> http://127.0.0.1:5050/api/v1/scheduler
> 
> accept.json:
> {
>   "framework_id": {
>   "value": "ee29ca2a-c56b-4a24-bba1-4919afae813e-"
>   },
>   "type": "ACCEPT",
>   "accept": {
>   "offer_ids": [{
>   "value": "369108e7-2bff-4a54-aca0-4f63b11a9361-O0"
>   }],
>   "operations": [{
>   "type": "LAUNCH",
>   "launch": {
>   "task_infos": [{
>   "name": "My Task",
>   "task_id": {
>   "value": 
> "12220-3440-12532-my-task"
>   },
>   "agent_id": {
>   "value": 
> "94d32bba-2bb4-461f-bd96-30d155fe5965-S0"
>   },
>   "executor": {
>   "command": {
>   "shell": true,
>   "value": "sleep 1000"
>   },
>   "executor_id": {
>   "value": 
> "12214-23523-my-executor"
>   }
>   },
>   "resources": [{
>   "name": "cpus",
>   "role": "*",
>   "type": "SCALAR",
>   "scalar": {
>   "value": 1.0
>   }
>   }, {
>   "name": "mem",
>   "role": "*",
>   "type": "SCALAR",
>   "scalar": {
>   "value": 128.0
>   }
>   }]
>   }]
>   }
>   }],
>   "filters": {
>   "refuse_seconds": 5.0
>   }
>   }
> }
> 
> Note: Mesos-Stream-Id passed through header in this curl command is left 
> mismatched to that received from the response to subscribe call.
> 5. Following error output is received as response: 
>The stream ID '000a38b7-8a2a-4c8f-a374-f782d6ea617f' included in this 
> request didn't match the s

Re: Review Request 51319: Removed a hard CHECK for ExecutorInfo.framework_id.

2016-08-23 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51319/#review146565
---


Ship it!





src/tests/master_validation_tests.cpp (line 696)


A TASK_RUNNING would've been nice instead of TASK_FAILED which might trip 
up readers. I think if you use a test containerizer with mock executor that 
should work? Anyway up to you.


- Vinod Kone


On Aug. 23, 2016, 5:11 a.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51319/
> ---
> 
> (Updated Aug. 23, 2016, 5:11 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Removed a hard CHECK for ExecutorInfo.framework_id.
> 
> 
> Diffs
> -
> 
>   src/master/validation.cpp 30319e75d77546754cdc1f39a8a7c5f9d267e80c 
>   src/tests/master_validation_tests.cpp 
> e1a5030d0ec8db978df7f785e3a842b1e81f0d5b 
> 
> Diff: https://reviews.apache.org/r/51319/diff/
> 
> 
> Testing
> ---
> 
> Also added an explicit test to ensure that the filling in
> functions correctly. This was captured by most other tests,
> but wanted to have one that was responsible for testing only
> this behavior.
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Re: Review Request 51318: Added a TODO to CHECK resource arithemtic invariant when launching task.

2016-08-23 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51318/#review146564
---


Fix it, then Ship it!





src/master/master.cpp (lines 3850 - 3851)


lets implement the TODO instead? you could either do a CHECK or return 
TASK_ERROR.


- Vinod Kone


On Aug. 23, 2016, 5:11 a.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51318/
> ---
> 
> (Updated Aug. 23, 2016, 5:11 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> See summary.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp d94a8510c4cee9c010706f79caf27ef4a10b41a8 
> 
> Diff: https://reviews.apache.org/r/51318/diff/
> 
> 
> Testing
> ---
> 
> N/A
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Re: Review Request 51317: Documented a bug with the use of `pendingTasks` in the master.

2016-08-23 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51317/#review146562
---


Fix it, then Ship it!





src/master/master.cpp (line 3366)


`_accept()`


- Vinod Kone


On Aug. 23, 2016, 5:11 a.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51317/
> ---
> 
> (Updated Aug. 23, 2016, 5:11 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The use of `pendingTasks` cannot distinguish between a duplicate
> TaskID and a task that has been killed while pending. This means
> that if an invalid or unauthorized task is killed while pending,
> TASK_KILLED is sent, and once in Master::_accept, we will also
> send TASK_ERROR.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp d94a8510c4cee9c010706f79caf27ef4a10b41a8 
> 
> Diff: https://reviews.apache.org/r/51317/diff/
> 
> 
> Testing
> ---
> 
> N/A
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Re: Review Request 51316: Documented 'Kill' behavior for task groups.

2016-08-23 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51316/#review146561
---


Fix it, then Ship it!





include/mesos/scheduler/scheduler.proto (lines 289 - 293)


How about:

// If the task belongs to a task group, the kill behavior is as follows:
//
// If the task group hasn't been delivered to the executor, killing of
// any task will result in all tasks in the task group being killed.
//
// If the task group has been delivered to the executor, it is up to the
// executor to decide how to deal with the kill. The default Mesos executor
// will kill all the tasks in the task group if it gets a kill for any task.



include/mesos/v1/scheduler/scheduler.proto (line 281)


see above.


- Vinod Kone


On Aug. 23, 2016, 5:11 a.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51316/
> ---
> 
> (Updated Aug. 23, 2016, 5:11 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> See summary.
> 
> 
> Diffs
> -
> 
>   include/mesos/scheduler/scheduler.proto 
> ac5bce993a586e1794743a098692d77fe0f3f5ac 
>   include/mesos/v1/scheduler/scheduler.proto 
> 05ff0e15bd1c0c607d6fd1d261a95013797a2dad 
> 
> Diff: https://reviews.apache.org/r/51316/diff/
> 
> 
> Testing
> ---
> 
> N/A
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Re: Review Request 51342: Printed out the received stream ID in scheduler API.

2016-08-23 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51342/#review146559
---


Fix it, then Ship it!





src/master/http.cpp (line 842)


s/requestStreamId/streamId/


- Vinod Kone


On Aug. 23, 2016, 5:55 p.m., Abhishek Dasgupta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51342/
> ---
> 
> (Updated Aug. 23, 2016, 5:55 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Vinod Kone, and Zameer Manji.
> 
> 
> Bugs: MESOS-6041
> https://issues.apache.org/jira/browse/MESOS-6041
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch prints out the received stream ID in error output
> caused by Mesos-Stream-Id mismatch in calls to
> scheduler http api.
> Expected stream ID is not printed in error output as
> it may cause security leak.
> 
> 
> Diffs
> -
> 
>   src/master/http.cpp 52dd80b856cf2317c0b73ba54bf501696786088d 
> 
> Diff: https://reviews.apache.org/r/51342/diff/
> 
> 
> Testing
> ---
> 
> On Ubuntu 16.04:
> sudo make -j4 check
> 
> Manual testing as well.
> Steps to reproduce:
> 1. Run mesos master.
> 2. Run Mesos Slave.
> 3. Subscribe a framework with the command : curl -v  -H  "Accept: 
> application/json" -H "Content-Type: application/json" -d "@subscribe.json" 
> http://127.0.0.1:5050/api/v1/scheduler
> 
> subscribe.json :
> {
>"type"   : "SUBSCRIBE",
>"subscribe"  : {
>   "framework_info"  : {
> "user" :  "bar",
> "name" :  "Example HTTP Framework1"
>   }
>   }
> }
> 4. Accept an offer by following command : curl -v  -H  "Accept: 
> application/json" -H "Content-Type: application/json" -H "Mesos-Stream-Id: 
> 000a38b7-8a2a-4c8f-a374-f782d6ea617f" -d "@accept.json" 
> http://127.0.0.1:5050/api/v1/scheduler
> 
> accept.json:
> {
>   "framework_id": {
>   "value": "ee29ca2a-c56b-4a24-bba1-4919afae813e-"
>   },
>   "type": "ACCEPT",
>   "accept": {
>   "offer_ids": [{
>   "value": "369108e7-2bff-4a54-aca0-4f63b11a9361-O0"
>   }],
>   "operations": [{
>   "type": "LAUNCH",
>   "launch": {
>   "task_infos": [{
>   "name": "My Task",
>   "task_id": {
>   "value": 
> "12220-3440-12532-my-task"
>   },
>   "agent_id": {
>   "value": 
> "94d32bba-2bb4-461f-bd96-30d155fe5965-S0"
>   },
>   "executor": {
>   "command": {
>   "shell": true,
>   "value": "sleep 1000"
>   },
>   "executor_id": {
>   "value": 
> "12214-23523-my-executor"
>   }
>   },
>   "resources": [{
>   "name": "cpus",
>   "role": "*",
>   "type": "SCALAR",
>   "scalar": {
>   "value": 1.0
>   }
>   }, {
>   "name": "mem",
>   "role": "*",
>   "type": "SCALAR",
>   "scalar": {
>   "value": 128.0
>   }
>   }]
>   }]
>   }
>   }],
>   "filters": {
>   "refuse_seconds": 5.0
>   }
>   }
> }
> 
> Note: Mesos-Stream-Id passed through header in this curl command is left 
> mismatched to that received from the response to subscribe call.
> 5. Following error output is received as response: 
>The stream ID '000a38b7-8a2a-4c8f-a374-f782d6ea617f' included in this 
> request didn't match the stream ID currently associated with framework ID 
> 'ee29

Re: Review Request 51342: Printed out the received stream ID in scheduler API.

2016-08-23 Thread Abhishek Dasgupta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51342/
---

(Updated Aug. 23, 2016, 5:55 p.m.)


Review request for mesos, Anand Mazumdar, Vinod Kone, and Zameer Manji.


Bugs: MESOS-6041
https://issues.apache.org/jira/browse/MESOS-6041


Repository: mesos


Description
---

This patch prints out the received stream ID in error output
caused by Mesos-Stream-Id mismatch in calls to
scheduler http api.
Expected stream ID is not printed in error output as
it may cause security leak.


Diffs
-

  src/master/http.cpp 52dd80b856cf2317c0b73ba54bf501696786088d 

Diff: https://reviews.apache.org/r/51342/diff/


Testing (updated)
---

On Ubuntu 16.04:
sudo make -j4 check

Manual testing as well.
Steps to reproduce:
1. Run mesos master.
2. Run Mesos Slave.
3. Subscribe a framework with the command : curl -v  -H  "Accept: 
application/json" -H "Content-Type: application/json" -d "@subscribe.json" 
http://127.0.0.1:5050/api/v1/scheduler

subscribe.json :
{
   "type"   : "SUBSCRIBE",
   "subscribe"  : {
  "framework_info"  : {
"user" :  "bar",
"name" :  "Example HTTP Framework1"
  }
  }
}
4. Accept an offer by following command : curl -v  -H  "Accept: 
application/json" -H "Content-Type: application/json" -H "Mesos-Stream-Id: 
000a38b7-8a2a-4c8f-a374-f782d6ea617f" -d "@accept.json" 
http://127.0.0.1:5050/api/v1/scheduler

accept.json:
{
"framework_id": {
"value": "ee29ca2a-c56b-4a24-bba1-4919afae813e-"
},
"type": "ACCEPT",
"accept": {
"offer_ids": [{
"value": "369108e7-2bff-4a54-aca0-4f63b11a9361-O0"
}],
"operations": [{
"type": "LAUNCH",
"launch": {
"task_infos": [{
"name": "My Task",
"task_id": {
"value": 
"12220-3440-12532-my-task"
},
"agent_id": {
"value": 
"94d32bba-2bb4-461f-bd96-30d155fe5965-S0"
},
"executor": {
"command": {
"shell": true,
"value": "sleep 1000"
},
"executor_id": {
"value": 
"12214-23523-my-executor"
}
},
"resources": [{
"name": "cpus",
"role": "*",
"type": "SCALAR",
"scalar": {
"value": 1.0
}
}, {
"name": "mem",
"role": "*",
"type": "SCALAR",
"scalar": {
"value": 128.0
}
}]
}]
}
}],
"filters": {
"refuse_seconds": 5.0
}
}
}

Note: Mesos-Stream-Id passed through header in this curl command is left 
mismatched to that received from the response to subscribe call.
5. Following error output is received as response: 
   The stream ID '000a38b7-8a2a-4c8f-a374-f782d6ea617f' included in this 
request didn't match the stream ID currently associated with framework ID 
'ee29ca2a-c56b-4a24-bba1-4919afae813e-'


Thanks,

Abhishek Dasgupta



Re: Review Request 51342: Printed out the received stream ID in scheduler API.

2016-08-23 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51342/#review146552
---


Ship it!




Ship It!

- Zameer Manji


On Aug. 23, 2016, 10:34 a.m., Abhishek Dasgupta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51342/
> ---
> 
> (Updated Aug. 23, 2016, 10:34 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Vinod Kone, and Zameer Manji.
> 
> 
> Bugs: MESOS-6041
> https://issues.apache.org/jira/browse/MESOS-6041
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch prints out the received stream ID in error output
> caused by Mesos-Stream-Id mismatch in calls to
> scheduler http api.
> Expected stream ID is not printed in error output as
> it may cause security leak.
> 
> 
> Diffs
> -
> 
>   src/master/http.cpp 52dd80b856cf2317c0b73ba54bf501696786088d 
> 
> Diff: https://reviews.apache.org/r/51342/diff/
> 
> 
> Testing
> ---
> 
> On Ubuntu 16.04:
> sudo make -j4 check
> 
> Manual testing as well.
> Steps to reproduce:
> 1. Run mesos master.
> 2. Run Mesos Slave.
> 3. Subscribe a framework with the command : curl -v  -H  "Accept: 
> application/json" -H "Content-Type: application/json" -d "@subscribe.json" 
> http://127.0.0.1:5050/api/v1/scheduler
> 
> subscribe.json :
> {
>"type"   : "SUBSCRIBE",
>"subscribe"  : {
>   "framework_info"  : {
> "user" :  "bar",
> "name" :  "Example HTTP Framework1"
>   }
>   }
> }
> 4. Accept an offer by following command : curl -v  -H  "Accept: 
> application/json" -H "Content-Type: application/json" -H "Mesos-Stream-Id: 
> 000a38b7-8a2a-4c8f-a374-f782d6ea617f" -d "@accept.json" 
> http://127.0.0.1:5050/api/v1/scheduler
> 
> accept.json:
> {
>   "framework_id": {
>   "value": "ee29ca2a-c56b-4a24-bba1-4919afae813e-"
>   },
>   "type": "ACCEPT",
>   "accept": {
>   "offer_ids": [{
>   "value": "369108e7-2bff-4a54-aca0-4f63b11a9361-O0"
>   }],
>   "operations": [{
>   "type": "LAUNCH",
>   "launch": {
>   "task_infos": [{
>   "name": "My Task",
>   "task_id": {
>   "value": 
> "12220-3440-12532-my-task"
>   },
>   "agent_id": {
>   "value": 
> "94d32bba-2bb4-461f-bd96-30d155fe5965-S0"
>   },
>   "executor": {
>   "command": {
>   "shell": true,
>   "value": "sleep 1000"
>   },
>   "executor_id": {
>   "value": 
> "12214-23523-my-executor"
>   }
>   },
>   "resources": [{
>   "name": "cpus",
>   "role": "*",
>   "type": "SCALAR",
>   "scalar": {
>   "value": 1.0
>   }
>   }, {
>   "name": "mem",
>   "role": "*",
>   "type": "SCALAR",
>   "scalar": {
>   "value": 128.0
>   }
>   }]
>   }]
>   }
>   }],
>   "filters": {
>   "refuse_seconds": 5.0
>   }
>   }
> }
> 
> Note: Mesos-Stream-Id passed through header in this curl command is left 
> mismatched to that received from the response to subscribe call.
> 5. Following error output is received as response: 
>The stream ID '000a38b7-8a2a-4c8f-a374-f782d6ea617f' included in this 
> request didn't match the stream ID currently associated with framework ID 
> 'd2740805-e94a-4dc0-ad9f-96ffe3daedb3-'
> 
> 
> Thanks,
> 
> Abhishek Dasgupta
> 
>



Review Request 51342: Printed out the received stream ID in scheduler API.

2016-08-23 Thread Abhishek Dasgupta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51342/
---

Review request for mesos, Anand Mazumdar, Vinod Kone, and Zameer Manji.


Bugs: MESOS-6041
https://issues.apache.org/jira/browse/MESOS-6041


Repository: mesos


Description
---

This patch prints out the received stream ID in error output
caused by Mesos-Stream-Id mismatch in calls to
scheduler http api.
Expected stream ID is not printed in error output as
it may cause security leak.


Diffs
-

  src/master/http.cpp 52dd80b856cf2317c0b73ba54bf501696786088d 

Diff: https://reviews.apache.org/r/51342/diff/


Testing
---

On Ubuntu 16.04:
sudo make -j4 check

Manual testing as well.
Steps to reproduce:
1. Run mesos master.
2. Run Mesos Slave.
3. Subscribe a framework with the command : curl -v  -H  "Accept: 
application/json" -H "Content-Type: application/json" -d "@subscribe.json" 
http://127.0.0.1:5050/api/v1/scheduler

subscribe.json :
{
   "type"   : "SUBSCRIBE",
   "subscribe"  : {
  "framework_info"  : {
"user" :  "bar",
"name" :  "Example HTTP Framework1"
  }
  }
}
4. Accept an offer by following command : curl -v  -H  "Accept: 
application/json" -H "Content-Type: application/json" -H "Mesos-Stream-Id: 
000a38b7-8a2a-4c8f-a374-f782d6ea617f" -d "@accept.json" 
http://127.0.0.1:5050/api/v1/scheduler

accept.json:
{
"framework_id": {
"value": "ee29ca2a-c56b-4a24-bba1-4919afae813e-"
},
"type": "ACCEPT",
"accept": {
"offer_ids": [{
"value": "369108e7-2bff-4a54-aca0-4f63b11a9361-O0"
}],
"operations": [{
"type": "LAUNCH",
"launch": {
"task_infos": [{
"name": "My Task",
"task_id": {
"value": 
"12220-3440-12532-my-task"
},
"agent_id": {
"value": 
"94d32bba-2bb4-461f-bd96-30d155fe5965-S0"
},
"executor": {
"command": {
"shell": true,
"value": "sleep 1000"
},
"executor_id": {
"value": 
"12214-23523-my-executor"
}
},
"resources": [{
"name": "cpus",
"role": "*",
"type": "SCALAR",
"scalar": {
"value": 1.0
}
}, {
"name": "mem",
"role": "*",
"type": "SCALAR",
"scalar": {
"value": 128.0
}
}]
}]
}
}],
"filters": {
"refuse_seconds": 5.0
}
}
}

Note: Mesos-Stream-Id passed through header in this curl command is left 
mismatched to that received from the response to subscribe call.
5. Following error output is received as response: 
   The stream ID '000a38b7-8a2a-4c8f-a374-f782d6ea617f' included in this 
request didn't match the stream ID currently associated with framework ID 
'd2740805-e94a-4dc0-ad9f-96ffe3daedb3-'


Thanks,

Abhishek Dasgupta



Review Request 51343: Refactored the redundant logic in provisioner recover().

2016-08-23 Thread Gilbert Song

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51343/
---

Review request for mesos, Benjamin Hindman, Artem Harutyunyan, Jie Yu, and 
Kevin Klues.


Bugs: MESOS-6067
https://issues.apache.org/jira/browse/MESOS-6067


Repository: mesos


Description
---

This patch removed the unnecessary loop to construct an identical
hashmap for info->rootfses.


Diffs
-

  src/slave/containerizer/mesos/provisioner/provisioner.cpp 
8e35ff49ec99a242e764095dcfbb541c5e41ec71 

Diff: https://reviews.apache.org/r/51343/diff/


Testing
---

make check


Thanks,

Gilbert Song



Re: Review Request 50000: Added test to simulate slow/unresponsive fetch.

2016-08-23 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5/#review146439
---


Ship it!




Will commit with the following edits.


src/tests/slave_tests.cpp (lines 282 - 284)


I removed these lines as they actually don't show up in the fetcher log. 
The reason is that the command would be stuck at 
https://github.com/apache/mesos/blob/c3228f3c3d1a1b2c145d1377185cfe22da6079eb/src/hdfs/hdfs.cpp#L118



src/tests/slave_tests.cpp (line 368)


No space.



src/tests/slave_tests.cpp (line 376)


No space.



src/tests/slave_tests.cpp (lines 385 - 386)


Fits in one line.


- Jiang Yan Xu


On Aug. 16, 2016, 10:35 a.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5/
> ---
> 
> (Updated Aug. 16, 2016, 10:35 a.m.)
> 
> 
> Review request for mesos and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-5763
> https://issues.apache.org/jira/browse/MESOS-5763
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added test to simulate the scenario of slow/unresponsive HDFS leading
> to executor register timeout and verify that slave gets notified of the
> failure.
> 
> 
> Diffs
> -
> 
>   src/tests/slave_tests.cpp 30ca3da6ca2f1e068bb3f8a6abc7efeebd5bfe8c 
> 
> Diff: https://reviews.apache.org/r/5/diff/
> 
> 
> Testing
> ---
> 
> Testing done with make check on linux
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 51272: Refactored the agent 'launcher' flag to always have a value.

2016-08-23 Thread Benjamin Hindman


> On Aug. 23, 2016, 2:30 p.m., Qian Zhang wrote:
> > src/local/local.cpp, line 345
> > 
> >
> > Do we need to check if `launcher` is already `"posix"` first?

Yes!


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51272/#review146505
---


On Aug. 22, 2016, 4:53 a.m., Benjamin Hindman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51272/
> ---
> 
> (Updated Aug. 22, 2016, 4:53 a.m.)
> 
> 
> Review request for mesos, Jie Yu and Kevin Klues.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Refactored the agent 'launcher' flag to always have a value.
> 
> 
> Diffs
> -
> 
>   src/local/local.cpp 1e1d246790a0d44d1baf802b903dc4f2bde1ac63 
>   src/slave/containerizer/mesos/containerizer.cpp 
> 77a502f853e3e04ea8e274419544601778be9421 
>   src/slave/flags.hpp 1a006663e7cc58ee548b3dda686cfbac0c240baa 
>   src/slave/flags.cpp 0f2be1700f41b74da4ea1ce699a81ec33cf92a9a 
> 
> Diff: https://reviews.apache.org/r/51272/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>



Re: Review Request 51272: Refactored the agent 'launcher' flag to always have a value.

2016-08-23 Thread Benjamin Hindman


> On Aug. 22, 2016, 7 p.m., Jie Yu wrote:
> > src/local/local.cpp, lines 343-350
> > 
> >
> > I'd suggest we move this logic down below. more readable to me (current 
> > logic is broken into two pieces).
> > ```
> > // ...
> > if (flags.num_slaves > 1) {
> >   LOG(WARNING) << ...;
> >   
> >   flags.launcher = "posix";
> > }
> > ```

Agreed I'd rather have the logic in one place, however: (1) then we'd be 
printing out a lot of log messages (one for each agent) and (2) we'd still have 
logic in two places because we'll need to pull out a 'num_slaves' variable (or 
'launcher' variable, as I did here) above the loop because 'flags' gets 
overriden within the loop and I didn't want to make a bigger change by renaming 
'flags'. So, either we leave it, rename the inner 'flags', or do you have 
another suggestion?


> On Aug. 22, 2016, 7 p.m., Jie Yu wrote:
> > src/slave/containerizer/mesos/containerizer.cpp, line 232
> > 
> >
> > I would use the following. Looks more readable to me.
> > ```
> > if (flags_.launcher != "posix" && flags_.launcher != "windows")
> > ```

Agreed!


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51272/#review146399
---


On Aug. 22, 2016, 4:53 a.m., Benjamin Hindman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51272/
> ---
> 
> (Updated Aug. 22, 2016, 4:53 a.m.)
> 
> 
> Review request for mesos, Jie Yu and Kevin Klues.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Refactored the agent 'launcher' flag to always have a value.
> 
> 
> Diffs
> -
> 
>   src/local/local.cpp 1e1d246790a0d44d1baf802b903dc4f2bde1ac63 
>   src/slave/containerizer/mesos/containerizer.cpp 
> 77a502f853e3e04ea8e274419544601778be9421 
>   src/slave/flags.hpp 1a006663e7cc58ee548b3dda686cfbac0c240baa 
>   src/slave/flags.cpp 0f2be1700f41b74da4ea1ce699a81ec33cf92a9a 
> 
> Diff: https://reviews.apache.org/r/51272/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>



Re: Review Request 50701: Added registrar operations for marking agents (un-)reachable.

2016-08-23 Thread Neil Conway


> On Aug. 22, 2016, 7:39 p.m., Vinod Kone wrote:
> > src/master/master.hpp, line 1907
> > 
> >
> > do you want to get rid of this parameter (in a different review and for 
> > all operations) because this is no longer configurable?

Yep, definitely: I want to remove all the `registry_strict` stuff, here and 
throughout the master: https://issues.apache.org/jira/browse/MESOS-5951

I'm planning to do that once the main functional changes are landed, though.


- Neil


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50701/#review146394
---


On Aug. 12, 2016, 12:22 a.m., Neil Conway wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50701/
> ---
> 
> (Updated Aug. 12, 2016, 12:22 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-4049
> https://issues.apache.org/jira/browse/MESOS-4049
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added registrar operations for marking agents (un-)reachable.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 6decff6f4b9c3434de030fd5c06df4c683a7abad 
>   src/tests/registrar_tests.cpp 9a71d8fd0c8d8e662a5e364015d144396a0b1a4c 
> 
> Diff: https://reviews.apache.org/r/50701/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Neil Conway
> 
>



Re: Review Request 50699: Added new TaskState values and PARTITION_AWARE capability.

2016-08-23 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50699/
---

(Updated Aug. 23, 2016, 3:35 p.m.)


Review request for mesos and Vinod Kone.


Changes
---

Tweak UNREACHABLE() -> LOG(FATAL).


Bugs: MESOS-4049
https://issues.apache.org/jira/browse/MESOS-4049


Repository: mesos


Description
---

TASK_DROPPED, TASK_UNREACHABLE, TASK_GONE, TASK_GONE_BY_OPERATOR, and
TASK_UNKNOWN. These values are intended to replace the existing
TASK_LOST state by offering more fine-grained information on the
current state of a task. These states will only be sent to frameworks
that opt into this new behavior via the PARTITION_AWARE capability.

Note that this commit doesn't add a master metric for the TASK_UNKNOWN
status, because this is a "default" status reported when the master has
no knowledge of a particular task/agent ID. Hence the number of
"unknown" tasks at any given time is not a well-defined metric.


Diffs (updated)
-

  include/mesos/mesos.proto 53b6547281b23ce9f47c9f1a418b60667fb4f602 
  include/mesos/v1/mesos.proto f6b59e156c92a26dd63b11bf403bdba677b9644b 
  src/common/protobuf_utils.cpp 8c4a7264080385789157703b05b42716fb0563b3 
  src/examples/disk_full_framework.cpp ad304fee94d443125a0dec2b2820267c69508621 
  src/master/http.cpp 52dd80b856cf2317c0b73ba54bf501696786088d 
  src/master/master.cpp d94a8510c4cee9c010706f79caf27ef4a10b41a8 
  src/master/metrics.hpp 9d201fcce1c46a890c86a889ab31029f9a061561 
  src/master/metrics.cpp 88a752dc2b4b73ccb919e99478b9ea2bd83842a0 
  src/tests/master_tests.cpp 398164d09b8ef14f916122ed8780023c4a3cd0f6 

Diff: https://reviews.apache.org/r/50699/diff/


Testing
---

make check


Thanks,

Neil Conway



Re: Review Request 51272: Refactored the agent 'launcher' flag to always have a value.

2016-08-23 Thread Qian Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51272/#review146505
---




src/local/local.cpp (line 345)


Do we need to check if `launcher` is already `"posix"` first?


- Qian Zhang


On Aug. 22, 2016, 12:53 p.m., Benjamin Hindman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51272/
> ---
> 
> (Updated Aug. 22, 2016, 12:53 p.m.)
> 
> 
> Review request for mesos, Jie Yu and Kevin Klues.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Refactored the agent 'launcher' flag to always have a value.
> 
> 
> Diffs
> -
> 
>   src/local/local.cpp 1e1d246790a0d44d1baf802b903dc4f2bde1ac63 
>   src/slave/containerizer/mesos/containerizer.cpp 
> 77a502f853e3e04ea8e274419544601778be9421 
>   src/slave/flags.hpp 1a006663e7cc58ee548b3dda686cfbac0c240baa 
>   src/slave/flags.cpp 0f2be1700f41b74da4ea1ce699a81ec33cf92a9a 
> 
> Diff: https://reviews.apache.org/r/51272/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>



Re: Review Request 51333: Fixed typos in upgrade doc.

2016-08-23 Thread Till Toenshoff

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51333/#review146525
---


Ship it!




Ship It!

- Till Toenshoff


On Aug. 23, 2016, 2:24 p.m., Neil Conway wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51333/
> ---
> 
> (Updated Aug. 23, 2016, 2:24 p.m.)
> 
> 
> Review request for mesos and Till Toenshoff.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fixed typos in upgrade doc.
> 
> 
> Diffs
> -
> 
>   docs/upgrades.md a07fb096d3afbf9bf1e85b42f2ab1a40477466d0 
> 
> Diff: https://reviews.apache.org/r/51333/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Neil Conway
> 
>



Review Request 51333: Fixed typos in upgrade doc.

2016-08-23 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51333/
---

Review request for mesos and Till Toenshoff.


Repository: mesos


Description
---

Fixed typos in upgrade doc.


Diffs
-

  docs/upgrades.md a07fb096d3afbf9bf1e85b42f2ab1a40477466d0 

Diff: https://reviews.apache.org/r/51333/diff/


Testing
---


Thanks,

Neil Conway



Re: Review Request 51294: Removed unnecessary statements in `CgroupsIsolatorProcess::_watch`.

2016-08-23 Thread haosdent huang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51294/
---

(Updated Aug. 23, 2016, 1:59 p.m.)


Review request for mesos, Gilbert Song, Jie Yu, and Qian Zhang.


Changes
---

Rebase.


Repository: mesos


Description
---

Removed unnecessary statements in `CgroupsIsolatorProcess::_watch`.


Diffs (updated)
-

  src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp 
ed9cff2f4879a7cb89f5b928fce379b663bcdcfb 

Diff: https://reviews.apache.org/r/51294/diff/


Testing
---


Thanks,

haosdent huang



Re: Review Request 51293: Fixed a typo in the comment of `Subsystem::watch`.

2016-08-23 Thread haosdent huang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51293/
---

(Updated Aug. 23, 2016, 1:58 p.m.)


Review request for mesos, Gilbert Song, Jie Yu, and Qian Zhang.


Changes
---

Fixed @qianzhang's comment.


Repository: mesos


Description
---

Fixed a typo in the comment of `Subsystem::watch`.


Diffs (updated)
-

  src/slave/containerizer/mesos/isolators/cgroups/subsystem.hpp 
6dca38cfdd360288c03a7a01aebbf15e3cbc5dce 

Diff: https://reviews.apache.org/r/51293/diff/


Testing
---


Thanks,

haosdent huang



Re: Review Request 51293: Fixed a typo in the comment of `Subsystem::watch`.

2016-08-23 Thread haosdent huang


> On Aug. 23, 2016, 7:54 a.m., Qian Zhang wrote:
> > src/slave/containerizer/mesos/isolators/cgroups/subsystem.hpp, line 99
> > 
> >
> > I think we are not returning `Nothing`, instead we are returning 
> > `ContainerLimitation` on success, right?

Thanks a lot! Fixed.


- haosdent


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51293/#review146486
---


On Aug. 22, 2016, 4:11 p.m., haosdent huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51293/
> ---
> 
> (Updated Aug. 22, 2016, 4:11 p.m.)
> 
> 
> Review request for mesos, Gilbert Song, Jie Yu, and Qian Zhang.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fixed a typo in the comment of `Subsystem::watch`.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystem.hpp 
> 6dca38cfdd360288c03a7a01aebbf15e3cbc5dce 
> 
> Diff: https://reviews.apache.org/r/51293/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> haosdent huang
> 
>



Re: Review Request 51147: Corrected typo in docker_build example.

2016-08-23 Thread Till Toenshoff

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51147/#review146516
---


Ship it!




Ship It!

- Till Toenshoff


On Aug. 16, 2016, 8:25 p.m., Joerg Schad wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51147/
> ---
> 
> (Updated Aug. 16, 2016, 8:25 p.m.)
> 
> 
> Review request for mesos and Till Toenshoff.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The example given in the error message was incorrect.
> 
> 
> Diffs
> -
> 
>   support/docker_build.sh 5c38c5c3e4570bb2c1a223a2197c8fc8f39e2c2d 
> 
> Diff: https://reviews.apache.org/r/51147/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Joerg Schad
> 
>



Re: Review Request 51283: Fixed broken link in multiple disk documentation.

2016-08-23 Thread Till Toenshoff

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51283/#review146515
---


Ship it!




Ship It!

- Till Toenshoff


On Aug. 22, 2016, 9:22 a.m., Joerg Schad wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51283/
> ---
> 
> (Updated Aug. 22, 2016, 9:22 a.m.)
> 
> 
> Review request for mesos, Joris Van Remoortere and Till Toenshoff.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fixed broken link in multiple disk documentation.
> 
> 
> Diffs
> -
> 
>   docs/multiple-disk.md 20b99e0e57360ad43804201b27e593d7ed48ce2c 
> 
> Diff: https://reviews.apache.org/r/51283/diff/
> 
> 
> Testing
> ---
> 
> Build and checked with website docker container.
> 
> 
> Thanks,
> 
> Joerg Schad
> 
>



Re: Review Request 51329: Fixed incorrect html in upgrades.md.

2016-08-23 Thread Till Toenshoff

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51329/#review146514
---


Ship it!




Ship It!

- Till Toenshoff


On Aug. 23, 2016, 12:41 p.m., Joerg Schad wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51329/
> ---
> 
> (Updated Aug. 23, 2016, 12:41 p.m.)
> 
> 
> Review request for mesos and Till Toenshoff.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fixed incorrect html in upgrades.md.
> 
> 
> Diffs
> -
> 
>   docs/upgrades.md d3e7cdd51b4c294ef49447595d9cb074c5a993fe 
> 
> Diff: https://reviews.apache.org/r/51329/diff/
> 
> 
> Testing
> ---
> 
> Viewed in website docker container.
> 
> 
> Thanks,
> 
> Joerg Schad
> 
>



Re: Review Request 51328: Added some missing includes.

2016-08-23 Thread Till Toenshoff

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51328/#review146513
---


Ship it!




Ship It!

- Till Toenshoff


On Aug. 23, 2016, 12:45 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51328/
> ---
> 
> (Updated Aug. 23, 2016, 12:45 p.m.)
> 
> 
> Review request for mesos and Till Toenshoff.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> See summary.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/os/posix/su.hpp 
> ad7202916a0a072fd11fd9367621cfe261bead86 
> 
> Diff: https://reviews.apache.org/r/51328/diff/
> 
> 
> Testing
> ---
> 
> `make check` (OS X, clang trunk w/o optimizations; CentOS7, GCC-4.8.5 w/o 
> optimizations)
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 51328: Added some missing includes.

2016-08-23 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51328/
---

(Updated Aug. 23, 2016, 2:45 p.m.)


Review request for mesos and Till Toenshoff.


Repository: mesos


Description (updated)
---

See summary.


Diffs
-

  3rdparty/stout/include/stout/os/posix/su.hpp 
ad7202916a0a072fd11fd9367621cfe261bead86 

Diff: https://reviews.apache.org/r/51328/diff/


Testing (updated)
---

`make check` (OS X, clang trunk w/o optimizations; CentOS7, GCC-4.8.5 w/o 
optimizations)


Thanks,

Benjamin Bannier



Re: Review Request 51329: Fixed incorrect html in upgrades.md.

2016-08-23 Thread Joerg Schad

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51329/
---

(Updated Aug. 23, 2016, 12:41 p.m.)


Review request for mesos and Till Toenshoff.


Repository: mesos


Description
---

Fixed incorrect html in upgrades.md.


Diffs
-

  docs/upgrades.md d3e7cdd51b4c294ef49447595d9cb074c5a993fe 

Diff: https://reviews.apache.org/r/51329/diff/


Testing (updated)
---

Viewed in website docker container.


Thanks,

Joerg Schad



Review Request 51329: Fixed incorrect html in upgrades.md.

2016-08-23 Thread Joerg Schad

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51329/
---

Review request for mesos and Till Toenshoff.


Repository: mesos


Description
---

Fixed incorrect html in upgrades.md.


Diffs
-

  docs/upgrades.md d3e7cdd51b4c294ef49447595d9cb074c5a993fe 

Diff: https://reviews.apache.org/r/51329/diff/


Testing
---

Viewed in docker container.


Thanks,

Joerg Schad



Review Request 51328: Added some missing includes.

2016-08-23 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51328/
---

Review request for mesos and Till Toenshoff.


Repository: mesos


Description
---

Added some missing includes.


Diffs
-

  3rdparty/stout/include/stout/os/posix/su.hpp 
ad7202916a0a072fd11fd9367621cfe261bead86 

Diff: https://reviews.apache.org/r/51328/diff/


Testing
---

`make check` (OS X, clang trunk w/ optimizations)


Thanks,

Benjamin Bannier



Re: Review Request 51321: Added a test to ensure the master handles launching task groups.

2016-08-23 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51321/#review146510
---




src/tests/scheduler_tests.cpp (lines 504 - 506)


Just a question here: Can I say that we do not need to add blank line if a 
code block has strong affinity even if there are code sentense include multiple 
line?


- Guangya Liu


On 八月 23, 2016, 5:11 a.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51321/
> ---
> 
> (Updated 八月 23, 2016, 5:11 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-6045
> https://issues.apache.org/jira/browse/MESOS-6045
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> For now this test ensures the message is sent to the agent
> in the successful launch path. More tests will be added to
> test the all-or-nothing paths (killed, invalid, unauthorized).
> 
> 
> Diffs
> -
> 
>   src/tests/scheduler_tests.cpp ccd0f2f602a7a1a9a44b68fd0f59bdc8e0fa58b1 
> 
> Diff: https://reviews.apache.org/r/51321/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Re: Review Request 51320: Implemented the LaunchGroup Offer::Operation in the master.

2016-08-23 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51320/#review146501
---




src/master/master.cpp (line 93)


I'd prefer that we move this under `using std::xxx`.

```
using std::list;
using std::set;
using std::shared_ptr;
using std::string;
using std::vector;

using google::protobuf::RepeatedPtrField;

sing process::await;
using process::wait; // Necessary on some OS's to disambiguate.
```



src/master/master.cpp (lines 96 - 97)


switch the order



src/master/master.cpp (line 3326)


Just a question here: Do we need the `CHECK` here? I think that from here 
we are pretty sure that the operation type here is 
`Offer::Operation::LAUNCH_GROUP` as we already filtered the operations in line 
3317 and 3318.



src/master/master.cpp (line 3915)


s/they/all tasks in the group



src/master/master.cpp (lines 3932 - 3933)


new line here



src/master/master.cpp (lines 3947 - 3949)


What about following? 

```
error = Error("Failed to authorize task '" +
  stringify(task.task_id()) + "': " +
  authorization.failure());
```



src/master/master.cpp (lines 3949 - 3950)


new line here



src/master/master.cpp (line 3951)


kill this



src/master/master.cpp (lines 3961 - 3963)


How about:

```
error = Error("Task '" + stringify(task.task_id()) + "' "
  "is not authorized to launch as user '" + 
  user + "'");
```



src/master/master.cpp (lines 3963 - 3964)


new line here



src/master/master.cpp (line 3965)


kill this



src/master/master.cpp (lines 4025 - 4026)


What about 

```
"A task within the task group was killed before "
"delivery to the executor");
```



src/master/master.cpp (line 4050)


how about s/int/size_t?



src/master/master.cpp (line 4076)


kill this?


- Guangya Liu


On 八月 23, 2016, 5:11 a.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51320/
> ---
> 
> (Updated 八月 23, 2016, 5:11 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-6045
> https://issues.apache.org/jira/browse/MESOS-6045
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This operation is all-or-nothing, in that all tasks must be
> launched together. If the operation fails, all tasks will
> fail with TASK_ERROR and the appropriate GROUP reason.
> If a task was killed before delivery to the executor, all
> tasks are killed.
> 
> 
> Diffs
> -
> 
>   include/mesos/mesos.proto 53b6547281b23ce9f47c9f1a418b60667fb4f602 
>   include/mesos/v1/mesos.proto f6b59e156c92a26dd63b11bf403bdba677b9644b 
>   src/master/master.cpp d94a8510c4cee9c010706f79caf27ef4a10b41a8 
>   src/messages/messages.proto 7b5e24fb1e9baf09ce024daeca90745f380d4c2f 
> 
> Diff: https://reviews.apache.org/r/51320/diff/
> 
> 
> Testing
> ---
> 
> Added a test in the subsequent patch.
> 
> More tests will be added for all-or-nothing validation / authorization paths.
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Review Request 51324: Replaced use of `readdir_r` with `readdir`.

2016-08-23 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51324/
---

Review request for mesos and Alexander Rukletsov.


Bugs: MESOS-6013
https://issues.apache.org/jira/browse/MESOS-6013


Repository: mesos


Description
---

`readdir_r` is deprecated in glibc >= 2.24 [1], which breaks the build.
In practice, modern implementations of `readdir` should be thread-safe
when concurrently invoked on different directory streams.

[1] https://sourceware.org/ml/libc-alpha/2016-02/msg00093.html


Diffs
-

  3rdparty/stout/include/stout/internal/windows/dirent.hpp 
c6aa907f27350a8d9d4192d320cde3deb307d48b 
  3rdparty/stout/include/stout/os/ls.hpp 
f8da9ef74a885cc39424b3e50cebca905d88ca44 

Diff: https://reviews.apache.org/r/51324/diff/


Testing
---

make check


Thanks,

Neil Conway



Review Request 51325: Removed unused function `os::dirent_size`.

2016-08-23 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51325/
---

Review request for mesos and Alexander Rukletsov.


Bugs: MESOS-6013
https://issues.apache.org/jira/browse/MESOS-6013


Repository: mesos


Description
---

Removed unused function `os::dirent_size`.


Diffs
-

  3rdparty/stout/include/Makefile.am 1f2ee85846342b3f5d1c4f8dafda6c763f6ecdfa 
  3rdparty/stout/include/stout/os/direntsize.hpp 
819f99a89862491e99873bdedc603317b91266b0 
  3rdparty/stout/include/stout/os/posix/direntsize.hpp 
9d8f72eb607a288e77f92b39b91542ff5eb2fa21 
  3rdparty/stout/include/stout/os/windows/direntsize.hpp 
7c8c7a06f478b3a80341a874494cff21f71fc397 

Diff: https://reviews.apache.org/r/51325/diff/


Testing
---

make check


Thanks,

Neil Conway



Re: Review Request 51027: Track allocation candidates to bound allocator.

2016-08-23 Thread Alexander Rukletsov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51027/#review146499
---




src/master/allocator/mesos/hierarchical.cpp (lines 275 - 278)


Probably extract this snippet in a function, e.g. `conditionalAllocate()`?



src/master/allocator/mesos/hierarchical.cpp (lines 1193 - 1194)


`delay` already contains `dispatch`, so under "synchronously" you actually 
mean double dispatch.



src/master/allocator/mesos/hierarchical.cpp (line 1200)


Remove extra blank line


- Alexander Rukletsov


On Aug. 23, 2016, 8:49 a.m., Jacob Janco wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51027/
> ---
> 
> (Updated Aug. 23, 2016, 8:49 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, James Peach, Klaus 
> Ma, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-3157
> https://issues.apache.org/jira/browse/MESOS-3157
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> - Triggered allocations dispatch allocate() only
>   if there is no pending allocation in the queue.
> - Allocation candidates are accumulated and only
>   cleared when enqueued allocations are processed.
> - Batched allocations are handled synchronously.
> 
> - Carrying over work from https://reviews.apache.org/r/41658/ and added the 
> previous reviewers
> - Specifically, this patch introduces the boolean flag pendingAllocation, 
> which when set on event 
>   triggered allocations, will prevent additional no-op allocations: the flag 
> is cleared when 
>   the enqueued allocation is processed, subsequent event triggered 
> allocations will update a set
>   of allocation candidates rather than dispatching an additional allocate().
> 
> 
> Diffs
> -
> 
>   src/master/allocator/mesos/hierarchical.hpp 
> bdbc6d3b5b959990538f4e3b7b1a3b031d9aea05 
>   src/master/allocator/mesos/hierarchical.cpp 
> 234ef98529964a0b6d3f132426a6c8ccbb1263ee 
> 
> Diff: https://reviews.apache.org/r/51027/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> note: check without filters depends on https://reviews.apache.org/r/51028
> 
> With new benchmark https://reviews.apache.org/r/49617: 
> Sample output without 51027:
> [ RUN  ] 
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
> Using 1 agents and 3000 frameworks
> Added 3000 frameworks in 57251us
> Added 1 agents in 3.2134535333mins
> allocator settled after  1.6123603833mins
> [   OK ] 
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
>  (290578 ms)
> 
> Sample output with 51027:
> [ RUN  ] 
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
> Using 1 agents and 3000 frameworks
> Added 3000 frameworks in 39817us
> Added 1 agents in 3.2286054167mins
> allocator settled after  25.525654secs
> [   OK ] 
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
>  (220137 ms)
> 
> 
> Thanks,
> 
> Jacob Janco
> 
>



Re: Review Request 51027: Track allocation candidates to bound allocator.

2016-08-23 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51027/#review146497
---



You may want to update the `Description` section to make sure each line does 
not exceed 72 characters, otherwise, we cannot apply your patch.

- Guangya Liu


On 八月 23, 2016, 8:49 a.m., Jacob Janco wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51027/
> ---
> 
> (Updated 八月 23, 2016, 8:49 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, James Peach, Klaus 
> Ma, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-3157
> https://issues.apache.org/jira/browse/MESOS-3157
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> - Triggered allocations dispatch allocate() only
>   if there is no pending allocation in the queue.
> - Allocation candidates are accumulated and only
>   cleared when enqueued allocations are processed.
> - Batched allocations are handled synchronously.
> 
> - Carrying over work from https://reviews.apache.org/r/41658/ and added the 
> previous reviewers
> - Specifically, this patch introduces the boolean flag pendingAllocation, 
> which when set on event 
>   triggered allocations, will prevent additional no-op allocations: the flag 
> is cleared when 
>   the enqueued allocation is processed, subsequent event triggered 
> allocations will update a set
>   of allocation candidates rather than dispatching an additional allocate().
> 
> 
> Diffs
> -
> 
>   src/master/allocator/mesos/hierarchical.hpp 
> bdbc6d3b5b959990538f4e3b7b1a3b031d9aea05 
>   src/master/allocator/mesos/hierarchical.cpp 
> 234ef98529964a0b6d3f132426a6c8ccbb1263ee 
> 
> Diff: https://reviews.apache.org/r/51027/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> note: check without filters depends on https://reviews.apache.org/r/51028
> 
> With new benchmark https://reviews.apache.org/r/49617: 
> Sample output without 51027:
> [ RUN  ] 
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
> Using 1 agents and 3000 frameworks
> Added 3000 frameworks in 57251us
> Added 1 agents in 3.2134535333mins
> allocator settled after  1.6123603833mins
> [   OK ] 
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
>  (290578 ms)
> 
> Sample output with 51027:
> [ RUN  ] 
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
> Using 1 agents and 3000 frameworks
> Added 3000 frameworks in 39817us
> Added 1 agents in 3.2286054167mins
> allocator settled after  25.525654secs
> [   OK ] 
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
>  (220137 ms)
> 
> 
> Thanks,
> 
> Jacob Janco
> 
>



Re: Review Request 51317: Documented a bug with the use of `pendingTasks` in the master.

2016-08-23 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51317/#review146492
---




src/master/master.cpp (line 3362)


The current logic is that the task with duplicate ID will be ignored and 
will not be launched, so which task will trigger and where do we send out the 
TASK_ERROR message?

I saw that the TASK_ERROR message will only be sent out if the task is 
killed in `pendingTasks` and invalid.



src/master/master.cpp (line 3364)


Just a question here: Why do you want to distinguish the difference between 
duplicate TaskID and getting killed while pending? What is the relatinonship of 
those two?



src/master/master.cpp (line 3366)


I saw that this only occur if the task authorization failed for now, shall 
we highlight this in the comments here?



src/master/master.cpp (lines 3765 - 3767)


What about put more detail around line 3807 for `invalid` task case by 
adding a `TODO`?

```
// TODO(bmahler): Validate the task. We may send TASK_ERROR
// after a TASK_KILLED if a task was killed (removed from
// `pendingTasks`) *and* the task is invalid here.
```



src/master/master.cpp (line 3767)


Do we need to put `unauthorized` in `TODO`? I saw it aws already handled in 
line 3774 and 3793.


- Guangya Liu


On 八月 23, 2016, 5:11 a.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51317/
> ---
> 
> (Updated 八月 23, 2016, 5:11 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The use of `pendingTasks` cannot distinguish between a duplicate
> TaskID and a task that has been killed while pending. This means
> that if an invalid or unauthorized task is killed while pending,
> TASK_KILLED is sent, and once in Master::_accept, we will also
> send TASK_ERROR.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp d94a8510c4cee9c010706f79caf27ef4a10b41a8 
> 
> Diff: https://reviews.apache.org/r/51317/diff/
> 
> 
> Testing
> ---
> 
> N/A
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Re: Review Request 51027: Track allocation candidates to bound allocator.

2016-08-23 Thread Jacob Janco


> On Aug. 18, 2016, 12:38 a.m., Guangya Liu wrote:
> > Can you please also clarify some difference with 
> > https://reviews.apache.org/r/41658/, as here the work is carry from 
> > https://reviews.apache.org/r/41658/ , I think that this is more simple and 
> > clear compared with counter in https://reviews.apache.org/r/41658/ with a 
> > boolean here.
> 
> Jiang Yan Xu wrote:
> This is exactly a rework based on comments in 
> https://reviews.apache.org/r/41658/.
> 
> Guangya Liu wrote:
> Yes, but as the `Description` part mentioned 
> https://reviews.apache.org/r/41658/ , so here it would be great to list some 
> difference with https://reviews.apache.org/r/41658/ , and the most important 
> thing is with the `boolean` value here, we can avoid some no-op allocate 
> operations.

Added.


- Jacob


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51027/#review146058
---


On Aug. 23, 2016, 8:32 a.m., Jacob Janco wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51027/
> ---
> 
> (Updated Aug. 23, 2016, 8:32 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, James Peach, Klaus Ma, and Jiang 
> Yan Xu.
> 
> 
> Bugs: MESOS-3157
> https://issues.apache.org/jira/browse/MESOS-3157
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> - Triggered allocations dispatch allocate() only
>   if there is no pending allocation in the queue.
> - Allocation candidates are accumulated and only
>   cleared when enqueued allocations are processed.
> - Batched allocations are handled synchronously.
> 
> - Carrying over work from https://reviews.apache.org/r/41658/ and added the 
> previous reviewers
> - Specifically, this patch introduces the boolean flag pendingAllocation, 
> which when set on event 
>   triggered allocations, will prevent additional no-op allocations: the flag 
> is cleared when 
>   the enqueued allocation is processed, subsequent event triggered 
> allocations will update a set
>   of allocation candidates rather than dispatching an additional allocate().
> 
> 
> Diffs
> -
> 
>   src/master/allocator/mesos/hierarchical.hpp 
> bdbc6d3b5b959990538f4e3b7b1a3b031d9aea05 
>   src/master/allocator/mesos/hierarchical.cpp 
> 234ef98529964a0b6d3f132426a6c8ccbb1263ee 
> 
> Diff: https://reviews.apache.org/r/51027/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> note: check without filters depends on https://reviews.apache.org/r/51028
> 
> With new benchmark https://reviews.apache.org/r/49617: 
> Sample output without 51027:
> [ RUN  ] 
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
> Using 1 agents and 3000 frameworks
> Added 3000 frameworks in 57251us
> Added 1 agents in 3.2134535333mins
> allocator settled after  1.6123603833mins
> [   OK ] 
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
>  (290578 ms)
> 
> Sample output with 51027:
> [ RUN  ] 
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
> Using 1 agents and 3000 frameworks
> Added 3000 frameworks in 39817us
> Added 1 agents in 3.2286054167mins
> allocator settled after  25.525654secs
> [   OK ] 
> SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
>  (220137 ms)
> 
> 
> Thanks,
> 
> Jacob Janco
> 
>



Re: Review Request 51027: Track allocation candidates to bound allocator.

2016-08-23 Thread Jacob Janco

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51027/
---

(Updated Aug. 23, 2016, 8:32 a.m.)


Review request for mesos, Benjamin Mahler, James Peach, Klaus Ma, and Jiang Yan 
Xu.


Changes
---

Adding back reviewers and dependencies.


Bugs: MESOS-3157
https://issues.apache.org/jira/browse/MESOS-3157


Repository: mesos


Description (updated)
---

- Triggered allocations dispatch allocate() only
  if there is no pending allocation in the queue.
- Allocation candidates are accumulated and only
  cleared when enqueued allocations are processed.
- Batched allocations are handled synchronously.

- Carrying over work from https://reviews.apache.org/r/41658/ and added the 
previous reviewers
- Specifically, this patch introduces the boolean flag pendingAllocation, which 
when set on event 
  triggered allocations, will prevent additional no-op allocations: the flag is 
cleared when 
  the enqueued allocation is processed, subsequent event triggered allocations 
will update a set
  of allocation candidates rather than dispatching an additional allocate().


Diffs
-

  src/master/allocator/mesos/hierarchical.hpp 
bdbc6d3b5b959990538f4e3b7b1a3b031d9aea05 
  src/master/allocator/mesos/hierarchical.cpp 
234ef98529964a0b6d3f132426a6c8ccbb1263ee 

Diff: https://reviews.apache.org/r/51027/diff/


Testing
---

make check

note: check without filters depends on https://reviews.apache.org/r/51028

With new benchmark https://reviews.apache.org/r/49617: 
Sample output without 51027:
[ RUN  ] 
SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
Using 1 agents and 3000 frameworks
Added 3000 frameworks in 57251us
Added 1 agents in 3.2134535333mins
allocator settled after  1.6123603833mins
[   OK ] 
SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
 (290578 ms)

Sample output with 51027:
[ RUN  ] 
SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
Using 1 agents and 3000 frameworks
Added 3000 frameworks in 39817us
Added 1 agents in 3.2286054167mins
allocator settled after  25.525654secs
[   OK ] 
SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
 (220137 ms)


Thanks,

Jacob Janco



Re: Review Request 51027: Track allocation candidates to bound allocator.

2016-08-23 Thread Jacob Janco

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51027/
---

(Updated Aug. 23, 2016, 8:24 a.m.)


Review request for .


Bugs: MESOS-3157
https://issues.apache.org/jira/browse/MESOS-3157


Repository: mesos


Description (updated)
---

- Triggered allocations dispatch allocate() only
  if there is no pending allocation in the queue.
- Allocation candidates are accumulated and only
  cleared when enqueued allocations are processed.
- Batched allocations are handled synchronously.


Diffs (updated)
-

  src/master/allocator/mesos/hierarchical.hpp 
bdbc6d3b5b959990538f4e3b7b1a3b031d9aea05 
  src/master/allocator/mesos/hierarchical.cpp 
234ef98529964a0b6d3f132426a6c8ccbb1263ee 

Diff: https://reviews.apache.org/r/51027/diff/


Testing
---

make check

note: check without filters depends on https://reviews.apache.org/r/51028

With new benchmark https://reviews.apache.org/r/49617: 
Sample output without 51027:
[ RUN  ] 
SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
Using 1 agents and 3000 frameworks
Added 3000 frameworks in 57251us
Added 1 agents in 3.2134535333mins
allocator settled after  1.6123603833mins
[   OK ] 
SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
 (290578 ms)

Sample output with 51027:
[ RUN  ] 
SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
Using 1 agents and 3000 frameworks
Added 3000 frameworks in 39817us
Added 1 agents in 3.2286054167mins
allocator settled after  25.525654secs
[   OK ] 
SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
 (220137 ms)


Thanks,

Jacob Janco



Re: Review Request 51027: Track allocation candidates to bound allocator.

2016-08-23 Thread Jacob Janco

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51027/
---

(Updated Aug. 23, 2016, 8:24 a.m.)


Review request for .


Bugs: MESOS-3157
https://issues.apache.org/jira/browse/MESOS-3157


Repository: mesos


Description (updated)
---

- Triggered allocations dispatch allocate() only
  if there is no pending allocation in the queue.
- Allocation candidates are accumulated and only
  cleared when enqueued allocations are processed.
- Batched allocations are handled synchronously.


Diffs (updated)
-

  src/master/allocator/mesos/hierarchical.hpp 
bdbc6d3b5b959990538f4e3b7b1a3b031d9aea05 
  src/master/allocator/mesos/hierarchical.cpp 
234ef98529964a0b6d3f132426a6c8ccbb1263ee 

Diff: https://reviews.apache.org/r/51027/diff/


Testing
---

make check

note: check without filters depends on https://reviews.apache.org/r/51028

With new benchmark https://reviews.apache.org/r/49617: 
Sample output without 51027:
[ RUN  ] 
SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
Using 1 agents and 3000 frameworks
Added 3000 frameworks in 57251us
Added 1 agents in 3.2134535333mins
allocator settled after  1.6123603833mins
[   OK ] 
SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
 (290578 ms)

Sample output with 51027:
[ RUN  ] 
SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
Using 1 agents and 3000 frameworks
Added 3000 frameworks in 39817us
Added 1 agents in 3.2286054167mins
allocator settled after  25.525654secs
[   OK ] 
SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
 (220137 ms)


Thanks,

Jacob Janco



Review Request 51323: Supported provisioner provision() and destroy() to be nested aware.

2016-08-23 Thread Gilbert Song

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51323/
---

Review request for mesos, Benjamin Hindman, Artem Harutyunyan, Jie Yu, and 
Kevin Klues.


Bugs: MESOS-6067
https://issues.apache.org/jira/browse/MESOS-6067


Repository: mesos


Description
---

This patch makes the provisioner provision() and destory() methods
to be multi-level nested aware, by simply change the helper function
getContainerDir() to be recursive:

1. In provisioner provision(), just need to make sure the rootfs
   returned from getContainerRootfsDir() and the backend dir
   returned from getBackendDir() are correctly nested.
2. In provisioner destroy(), need to simply make sure the rootfs
   from getContainerRootfsDir() is correctly nested.

Both getContainerRootfsDir() and getBackendDir() rely on
the helper method getContainerDir(). So we can simply make them
nested aware by change getContainerDir() to be recursive.


Diffs
-

  src/slave/containerizer/mesos/provisioner/paths.hpp 
9829d6b52c8547ae22297a5bc47852ce5a219e4c 
  src/slave/containerizer/mesos/provisioner/paths.cpp 
86a45f30c22dc1e41f4779c0ce8c11d02dcc46bb 

Diff: https://reviews.apache.org/r/51323/diff/


Testing
---


Thanks,

Gilbert Song



Re: Review Request 51294: Removed unnecessary statements in `CgroupsIsolatorProcess::_watch`.

2016-08-23 Thread Qian Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51294/#review146487
---


Ship it!




Ship It!

- Qian Zhang


On Aug. 23, 2016, 12:11 a.m., haosdent huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51294/
> ---
> 
> (Updated Aug. 23, 2016, 12:11 a.m.)
> 
> 
> Review request for mesos, Gilbert Song, Jie Yu, and Qian Zhang.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Removed unnecessary statements in `CgroupsIsolatorProcess::_watch`.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp 
> ed9cff2f4879a7cb89f5b928fce379b663bcdcfb 
> 
> Diff: https://reviews.apache.org/r/51294/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> haosdent huang
> 
>



Re: Review Request 51293: Fixed a typo in the comment of `Subsystem::watch`.

2016-08-23 Thread Qian Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51293/#review146486
---




src/slave/containerizer/mesos/isolators/cgroups/subsystem.hpp (line 99)


I think we are not returning `Nothing`, instead we are returning 
`ContainerLimitation` on success, right?


- Qian Zhang


On Aug. 23, 2016, 12:11 a.m., haosdent huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51293/
> ---
> 
> (Updated Aug. 23, 2016, 12:11 a.m.)
> 
> 
> Review request for mesos, Gilbert Song, Jie Yu, and Qian Zhang.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fixed a typo in the comment of `Subsystem::watch`.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystem.hpp 
> 6dca38cfdd360288c03a7a01aebbf15e3cbc5dce 
> 
> Diff: https://reviews.apache.org/r/51293/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> haosdent huang
> 
>