Re: Review Request 59687: Added tests for recovering ContainerConfig.

2017-10-27 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59687/
---

(Updated Oct. 27, 2017, 6:02 p.m.)


Review request for mesos, Gilbert Song and Jason Lai.


Changes
---

Fix format lint.


Bugs: MESOS-6894
https://issues.apache.org/jira/browse/MESOS-6894


Repository: mesos


Description
---

Added tests for recovering ContainerConfig.


Diffs (updated)
-

  src/tests/containerizer/nested_mesos_containerizer_tests.cpp 
fbd2887800ccfd64c8628c5d6fd77a511c8f91d5 


Diff: https://reviews.apache.org/r/59687/diff/3/

Changes: https://reviews.apache.org/r/59687/diff/2-3/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 62997: Added checkpoint and recover capability for layers in provisioner.

2017-10-27 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62997/
---

(Updated Oct. 27, 2017, 6:03 p.m.)


Review request for mesos, Gilbert Song, Jason Lai, and Jie Yu.


Changes
---

Fix populating `info->layers` after provision


Bugs: MESOS-8079
https://issues.apache.org/jira/browse/MESOS-8079


Repository: mesos


Description
---

Added checkpoint and recover capability for layers in provisioner.


Diffs (updated)
-

  include/mesos/slave/containerizer.proto 
689acfcbbb07f071b6195472118a7a7520a44abd 
  src/slave/containerizer/mesos/provisioner/paths.hpp 
466f5edab40732b0d8da4252a71fde9c2956e8e9 
  src/slave/containerizer/mesos/provisioner/paths.cpp 
268dbeb4b18374ef53bc73254bf20ce6830e384f 
  src/slave/containerizer/mesos/provisioner/provisioner.hpp 
7cba54ce490d1e6e17081cd7e04fd6759ceddb8e 
  src/slave/containerizer/mesos/provisioner/provisioner.cpp 
450a3b32d69d2882973a6ed4e94e169a0256056b 


Diff: https://reviews.apache.org/r/62997/diff/3/

Changes: https://reviews.apache.org/r/62997/diff/2-3/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 67822: Avoid duplicate unmount dangling mount point.

2018-07-03 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67822/
---

(Updated July 3, 2018, 4:05 p.m.)


Review request for mesos, Chun-Hung Hsiao, Jason Lai, and Jie Yu.


Changes
---

Minor style fix


Bugs: MESOS-9049
https://issues.apache.org/jira/browse/MESOS-9049


Repository: mesos


Description
---

We could potentially schedule the framework dir, executor dir, and
executor run sandbox to gc at the same time, and then these
paths will be gc independently, although they are parents and
children directories. This patch makes sure we do not call unmount
anymore after a success.


Diffs (updated)
-

  src/slave/gc.cpp 407f6b23f87cf2e2bdaf873c8adcda57f5d559b3 


Diff: https://reviews.apache.org/r/67822/diff/2/

Changes: https://reviews.apache.org/r/67822/diff/1-2/


Testing
---

```make```


Thanks,

Zhitao Li



Review Request 67822: Avoid duplicate unmount dangling mount point.

2018-07-03 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67822/
---

Review request for mesos, Chun-Hung Hsiao, Jason Lai, and Jie Yu.


Bugs: MESOS-8830
https://issues.apache.org/jira/browse/MESOS-8830


Repository: mesos


Description
---

We could potentially schedule the framework dir, executor dir, and
executor run sandbox to gc at the same time, and then these
paths will be gc independently, although they are parents and
children directories. This patch makes sure we do not call unmount
anymore after a success.


Diffs
-

  src/slave/gc.cpp 407f6b23f87cf2e2bdaf873c8adcda57f5d559b3 


Diff: https://reviews.apache.org/r/67822/diff/1/


Testing
---

```make```


Thanks,

Zhitao Li



Re: Review Request 64812: Added `excluded_images` parameter to `PRUNE_IMAGES` agent API.

2017-12-23 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64812/
---

(Updated Dec. 24, 2017, 1:21 a.m.)


Review request for mesos, Gilbert Song and Jie Yu.


Changes
---

Add `excluded_images` from `image_gc_config` is not empty.


Bugs: MESOS-4945
https://issues.apache.org/jira/browse/MESOS-4945


Repository: mesos


Description
---

Added `excluded_images` parameter to `PRUNE_IMAGES` agent API.


Diffs (updated)
-

  include/mesos/agent/agent.proto 25af6dc64ca35126fd70dd10388bd2cea6dcbd36 
  include/mesos/v1/agent/agent.proto 4ae9013e6fb27bdce72b4aa9fc9cbd99f79bc548 
  src/slave/http.cpp 3e20c90fb94f0128a757745052534ef03612d849 


Diff: https://reviews.apache.org/r/64812/diff/2/

Changes: https://reviews.apache.org/r/64812/diff/1-2/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 56722: Added a new operator API for `PRUNE_IMAGES`.

2017-12-24 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56722/
---

(Updated Dec. 24, 2017, 7:31 p.m.)


Review request for mesos, Gilbert Song, Jason Lai, and Jie Yu.


Changes
---

Remove unnecessary patch dependancy.


Bugs: MESOS-4945
https://issues.apache.org/jira/browse/MESOS-4945


Repository: mesos


Description
---

Added a new operator API for `PRUNE_IMAGES`.


Diffs
-

  include/mesos/agent/agent.proto 25af6dc64ca35126fd70dd10388bd2cea6dcbd36 
  include/mesos/v1/agent/agent.proto 4ae9013e6fb27bdce72b4aa9fc9cbd99f79bc548 
  src/slave/http.hpp 3cdbf988db923801e2b3d93cf60b33ad14653742 
  src/slave/http.cpp 3e20c90fb94f0128a757745052534ef03612d849 
  src/slave/validation.cpp b7243fc2316a69efa23285bf3394104e0d58759e 


Diff: https://reviews.apache.org/r/56722/diff/9/


Testing
---

1. New unit test to trigger the new call.
2. Manually tested that images previous pulled but not running can be purged 
through new operator API call while active images (running or being pulled) are 
not affected.


Thanks,

Zhitao Li



Re: Review Request 56722: Added a new operator API for `PRUNE_IMAGES`.

2017-12-24 Thread Zhitao Li


> On Dec. 23, 2017, 3:34 a.m., Gilbert Song wrote:
> > src/slave/http.cpp
> > Lines 2452 (patched)
> > <https://reviews.apache.org/r/56722/diff/9/?file=1926989#file1926989line2452>
> >
> > As we discussed offline, we should follow the semantic that if there 
> > are excluded images from the agent flag, they should have higher priority.

Fixed in r/64812


- Zhitao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56722/#review194469
-------


On Dec. 24, 2017, 7:31 p.m., Zhitao Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56722/
> ---
> 
> (Updated Dec. 24, 2017, 7:31 p.m.)
> 
> 
> Review request for mesos, Gilbert Song, Jason Lai, and Jie Yu.
> 
> 
> Bugs: MESOS-4945
> https://issues.apache.org/jira/browse/MESOS-4945
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added a new operator API for `PRUNE_IMAGES`.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 25af6dc64ca35126fd70dd10388bd2cea6dcbd36 
>   include/mesos/v1/agent/agent.proto 4ae9013e6fb27bdce72b4aa9fc9cbd99f79bc548 
>   src/slave/http.hpp 3cdbf988db923801e2b3d93cf60b33ad14653742 
>   src/slave/http.cpp 3e20c90fb94f0128a757745052534ef03612d849 
>   src/slave/validation.cpp b7243fc2316a69efa23285bf3394104e0d58759e 
> 
> 
> Diff: https://reviews.apache.org/r/56722/diff/9/
> 
> 
> Testing
> ---
> 
> 1. New unit test to trigger the new call.
> 2. Manually tested that images previous pulled but not running can be purged 
> through new operator API call while active images (running or being pulled) 
> are not affected.
> 
> 
> Thanks,
> 
> Zhitao Li
> 
>



Re: Review Request 65167: Detached `virtualLatestPath` when recovering the executor.

2018-01-16 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65167/#review195467
---


Ship it!




Ship It!

- Zhitao Li


On Jan. 15, 2018, 1:49 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65167/
> ---
> 
> (Updated Jan. 15, 2018, 1:49 p.m.)
> 
> 
> Review request for mesos, Vinod Kone and Zhitao Li.
> 
> 
> Bugs: MESOS-8446
> https://issues.apache.org/jira/browse/MESOS-8446
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Previously we miss to detach `/frameworks/FID/executors/EID/runs/latest`
> when we find the latest run of the executor was completed in the method
> `Framework::recoverExecutor`, that is a leak.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp aeb0fdaeda78f26de2ec790735bfa88c5be72850 
> 
> 
> Diff: https://reviews.apache.org/r/65167/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 65294: Add revocable resources support to mesos-execute.

2018-01-26 Thread Zhitao Li


> On Jan. 25, 2018, 9:10 p.m., Anish Gupta wrote:
> > src/cli/execute.cpp
> > Lines 1077-1082 (patched)
> > <https://reviews.apache.org/r/65294/diff/1/?file=1944612#file1944612line1077>
> >
> > Why not unconditionally set it since resources are not revocable unless 
> > have RevocableInfo

I feel that is unnecessary, and receiving resources which this utility may not 
need is unnecessary behavior change.

Inferring this capability from input resources seems clean enough to me.


- Zhitao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65294/#review196286
-------


On Jan. 24, 2018, 9:51 p.m., Zhitao Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65294/
> ---
> 
> (Updated Jan. 24, 2018, 9:51 p.m.)
> 
> 
> Review request for mesos, Anish Gupta, Jason Lai, and James Peach.
> 
> 
> Bugs: MESOS-8471
> https://issues.apache.org/jira/browse/MESOS-8471
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This allows user to use `mesos-execute` to quickly test over subscription.
> 
> 
> Diffs
> -
> 
>   src/cli/execute.cpp 6e626893a64dc07ac9a5f9a5c31e680cb80938fe 
> 
> 
> Diff: https://reviews.apache.org/r/65294/diff/1/
> 
> 
> Testing
> ---
> 
> Submitted a test task with following configuration:
> ```
>  {
>"name": "Name",
>"task_id": {"value" : "1"},
>"agent_id": {"value" : ""},
>"resources": [
>  {
>"name": "cpus",
>"type": "SCALAR",
>"revocable": {},
>"scalar": {
>  "value": 0.1
>}
>  }
>],
>"command": {
>  "value": "sleep 1000"
>}
>  }
>  ```
>  
> With command `mesos-execute --master=localhost:5050 --revocable_resources 
> --task=file:///home/user/test_rev_task.json`
> 
> If the master/agent has revocable cpu, this allows the task to execute.
> 
> 
> Thanks,
> 
> Zhitao Li
> 
>



Re: Review Request 65414: Added tests for `Resources.find` on revocable resources.

2018-02-01 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65414/
---

(Updated Feb. 1, 2018, 9:50 p.m.)


Review request for mesos, Anish Gupta and James Peach.


Bugs: MESOS-8506
https://issues.apache.org/jira/browse/MESOS-8506


Repository: mesos


Description
---

Added tests for `Resources.find` on revocable resources.


Diffs (updated)
-

  src/tests/resources_tests.cpp bd328b254eff3815fa2e85d17179cf593cb8a349 


Diff: https://reviews.apache.org/r/65414/diff/2/

Changes: https://reviews.apache.org/r/65414/diff/1-2/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 65294: Support `revocable_resources` capability in `mesos-execute`.

2018-01-29 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65294/
---

(Updated Jan. 30, 2018, 12:03 a.m.)


Review request for mesos, Anish Gupta, Jason Lai, and James Peach.


Changes
---

Unconditionally set `revocable_resources` capability.


Summary (updated)
-

Support `revocable_resources` capability in `mesos-execute`.


Bugs: MESOS-8471
https://issues.apache.org/jira/browse/MESOS-8471


Repository: mesos


Description (updated)
---

Add revocable resources support to mesos-execute.


Diffs (updated)
-

  src/cli/execute.cpp 6e626893a64dc07ac9a5f9a5c31e680cb80938fe 


Diff: https://reviews.apache.org/r/65294/diff/2/

Changes: https://reviews.apache.org/r/65294/diff/1-2/


Testing
---

Submitted a test task with following configuration:
```
 {
   "name": "Name",
   "task_id": {"value" : "1"},
   "agent_id": {"value" : ""},
   "resources": [
 {
   "name": "cpus",
   "type": "SCALAR",
   "revocable": {},
   "scalar": {
 "value": 0.1
   }
 }
   ],
   "command": {
 "value": "sleep 1000"
   }
 }
 ```
 
With command `mesos-execute --master=localhost:5050 --revocable_resources 
--task=file:///home/user/test_rev_task.json`

If the master/agent has revocable cpu, this allows the task to execute.


Thanks,

Zhitao Li



Review Request 65414: Added tests for `Resources.find` on revocable resources.

2018-01-29 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65414/
---

Review request for mesos, Anish Gupta and James Peach.


Bugs: MESOS-8506
https://issues.apache.org/jira/browse/MESOS-8506


Repository: mesos


Description
---

Added tests for `Resources.find` on revocable resources.


Diffs
-

  src/tests/resources_tests.cpp bd328b254eff3815fa2e85d17179cf593cb8a349 


Diff: https://reviews.apache.org/r/65414/diff/1/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 56722: Added a new operator API for `PRUNE_IMAGES`.

2017-12-26 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56722/#review194516
---




src/slave/http.cpp
Lines 2452 (patched)
<https://reviews.apache.org/r/56722/#comment273307>

In your scenario, if the operator wants to persist the excluded_images 
selection while auto gc is enabled, requiring it to be configured in the agent 
flag seems reasonable to me.

I think this API is useful for opertors who cannot use auto gc for whatever 
reason (i.e, disk space calculation is not applicable), so I don't see it is a 
real problem.

Keeping this field out seems a strange discrepancy between the operator API 
and agent flag.


- Zhitao Li


On Dec. 24, 2017, 7:31 p.m., Zhitao Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56722/
> ---
> 
> (Updated Dec. 24, 2017, 7:31 p.m.)
> 
> 
> Review request for mesos, Gilbert Song, Jason Lai, and Jie Yu.
> 
> 
> Bugs: MESOS-4945
> https://issues.apache.org/jira/browse/MESOS-4945
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added a new operator API for `PRUNE_IMAGES`.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 25af6dc64ca35126fd70dd10388bd2cea6dcbd36 
>   include/mesos/v1/agent/agent.proto 4ae9013e6fb27bdce72b4aa9fc9cbd99f79bc548 
>   src/slave/http.hpp 3cdbf988db923801e2b3d93cf60b33ad14653742 
>   src/slave/http.cpp 3e20c90fb94f0128a757745052534ef03612d849 
>   src/slave/validation.cpp b7243fc2316a69efa23285bf3394104e0d58759e 
> 
> 
> Diff: https://reviews.apache.org/r/56722/diff/9/
> 
> 
> Testing
> ---
> 
> 1. New unit test to trigger the new call.
> 2. Manually tested that images previous pulled but not running can be purged 
> through new operator API call while active images (running or being pulled) 
> are not affected.
> 
> 
> Thanks,
> 
> Zhitao Li
> 
>



Re: Review Request 64812: Added `excluded_images` parameter to `PRUNE_IMAGES` agent API.

2017-12-26 Thread Zhitao Li


> On Dec. 26, 2017, 4:06 p.m., Gilbert Song wrote:
> > src/slave/http.cpp
> > Line 2450 (original), 2450-2459 (patched)
> > <https://reviews.apache.org/r/64812/diff/2/?file=1927543#file1927543line2450>
> >
> > see my comment at https://reviews.apache.org/r/56722/
> > 
> > excluded images from the operator API might be neglected due to another 
> > auto-triggered GC.

IMO the content of this API was not inteded to be persisted. Even consecutive 
calls with different `excluded_images` can have similar effect of what you 
described and there is nothing we can or should do. If operator enabled auto 
gc, then the list should be maintained or duplicated there.


- Zhitao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64812/#review194515
-------


On Dec. 24, 2017, 1:21 a.m., Zhitao Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64812/
> ---
> 
> (Updated Dec. 24, 2017, 1:21 a.m.)
> 
> 
> Review request for mesos, Gilbert Song and Jie Yu.
> 
> 
> Bugs: MESOS-4945
> https://issues.apache.org/jira/browse/MESOS-4945
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added `excluded_images` parameter to `PRUNE_IMAGES` agent API.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 25af6dc64ca35126fd70dd10388bd2cea6dcbd36 
>   include/mesos/v1/agent/agent.proto 4ae9013e6fb27bdce72b4aa9fc9cbd99f79bc548 
>   src/slave/http.cpp 3e20c90fb94f0128a757745052534ef03612d849 
> 
> 
> Diff: https://reviews.apache.org/r/64812/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zhitao Li
> 
>



Re: Review Request 62853: Added test for `PRUNE_IMAGES` operator API call.

2017-12-28 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62853/
---

(Updated Dec. 28, 2017, 8:06 p.m.)


Review request for mesos, Gilbert Song and Jason Lai.


Changes
---

Rebase and fix mock usage.


Bugs: MESOS-4945
https://issues.apache.org/jira/browse/MESOS-4945


Repository: mesos


Description
---

Added test for `PRUNE_IMAGES` operator API call.


Diffs (updated)
-

  src/tests/api_tests.cpp 9246f4222548f8d19a8f4f89fb3c22ecae5b58b0 


Diff: https://reviews.apache.org/r/62853/diff/6/

Changes: https://reviews.apache.org/r/62853/diff/5-6/


Testing
---


Thanks,

Zhitao Li



Review Request 64865: Added test for `prune_images` acl validation.

2017-12-28 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64865/
---

Review request for mesos, Gilbert Song and Jie Yu.


Bugs: MESOS-8365
https://issues.apache.org/jira/browse/MESOS-8365


Repository: mesos


Description
---

Added test for `prune_images` acl validation.


Diffs
-

  src/tests/authorization_tests.cpp 4f3da0862a175b90de192270211fa21549d80e77 


Diff: https://reviews.apache.org/r/64865/diff/1/


Testing
---


Thanks,

Zhitao Li



Review Request 64864: Added authorization for prune images API call.

2017-12-28 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64864/
---

Review request for mesos, Gilbert Song and Jie Yu.


Bugs: MESOS-8365
https://issues.apache.org/jira/browse/MESOS-8365


Repository: mesos


Description
---

Added authorization for prune images API call.


Diffs
-

  include/mesos/authorizer/acls.proto 5c7ed340a41a6808e4bf25cc38b54c7b07d7bb63 
  include/mesos/authorizer/authorizer.proto 
90ffdfa0c0d96c5d682441d318b83e6ad857d09b 
  src/authorizer/local/authorizer.cpp 0722ac9d673a7829b762cce60b2cb9bfc190ab11 
  src/slave/http.cpp 446be55f7bff26d35400e79982cb08d37c9c739f 


Diff: https://reviews.apache.org/r/64864/diff/1/


Testing
---

Tested locally for the cases:
- allowing any principal to prune images;
- allowing no principal to prune images.


Thanks,

Zhitao Li



Re: Review Request 64885: Made sure continuations using actor state are executed in actor context.

2018-01-02 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64885/#review194629
---




src/slave/containerizer/mesos/provisioner/provisioner.cpp
Line 568 (original), 568 (patched)
<https://reviews.apache.org/r/64885/#comment273498>

I don't think the callback here uses any internal state of calling actor, 
so I don't see it really necessary that continuation *must* happen in `self()`.

Do we have a rule to require even continuation without side effect to 
calling actor be defered?


- Zhitao Li


On Jan. 2, 2018, 10:38 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64885/
> ---
> 
> (Updated Jan. 2, 2018, 10:38 a.m.)
> 
> 
> Review request for mesos, Gilbert Song and Zhitao Li.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> In `e273efe6976434858edb85bbcf367a02e963a467` we introduced layer
> checkpointing to the provisioner, but did not make sure that
> continutations making use of internal state of the provisioner actor
> installed on futures where always executed in the actor's context.
> This was problematic as continuations could be executed while actor
> state was changing (data races), or after the the actor had terminated
> (use after free).
> 
> In this patch we instead defer execution of continutions to the actor
> owning the data.
> 
> This issue was identified with the clang-tidy `mesos-this-capture`
> check.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/provisioner/provisioner.cpp 
> 61e771872ba2c3f5ecabbe085db102433c7bac3f 
> 
> 
> Diff: https://reviews.apache.org/r/64885/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 64885: Made sure continuations using actor state are executed in actor context.

2018-01-02 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64885/#review194634
---


Ship it!




Ship It!


src/slave/containerizer/mesos/provisioner/provisioner.cpp
Line 568 (original), 568 (patched)
<https://reviews.apache.org/r/64885/#comment273513>

Thanks for pointing it out now. I like your rule of avoiding default 
captures in continuation since it makes such issues more readable.

I can submit a future patch to capture explicitly if you don't do it this 
patch.

Thanks for fixing this!


- Zhitao Li


On Jan. 2, 2018, 10:38 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64885/
> ---
> 
> (Updated Jan. 2, 2018, 10:38 a.m.)
> 
> 
> Review request for mesos, Gilbert Song and Zhitao Li.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> In `e273efe6976434858edb85bbcf367a02e963a467` we introduced layer
> checkpointing to the provisioner, but did not make sure that
> continutations making use of internal state of the provisioner actor
> installed on futures where always executed in the actor's context.
> This was problematic as continuations could be executed while actor
> state was changing (data races), or after the the actor had terminated
> (use after free).
> 
> In this patch we instead defer execution of continutions to the actor
> owning the data.
> 
> This issue was identified with the clang-tidy `mesos-this-capture`
> check.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/provisioner/provisioner.cpp 
> 61e771872ba2c3f5ecabbe085db102433c7bac3f 
> 
> 
> Diff: https://reviews.apache.org/r/64885/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 64885: Made sure continuations using actor state are executed in actor context.

2018-01-02 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64885/#review194632
---


Ship it!




Ship It!

- Zhitao Li


On Jan. 2, 2018, 10:38 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64885/
> ---
> 
> (Updated Jan. 2, 2018, 10:38 a.m.)
> 
> 
> Review request for mesos, Gilbert Song and Zhitao Li.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> In `e273efe6976434858edb85bbcf367a02e963a467` we introduced layer
> checkpointing to the provisioner, but did not make sure that
> continutations making use of internal state of the provisioner actor
> installed on futures where always executed in the actor's context.
> This was problematic as continuations could be executed while actor
> state was changing (data races), or after the the actor had terminated
> (use after free).
> 
> In this patch we instead defer execution of continutions to the actor
> owning the data.
> 
> This issue was identified with the clang-tidy `mesos-this-capture`
> check.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/provisioner/provisioner.cpp 
> 61e771872ba2c3f5ecabbe085db102433c7bac3f 
> 
> 
> Diff: https://reviews.apache.org/r/64885/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 56722: Added a new operator API for `PRUNE_IMAGES`.

2017-12-22 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56722/
---

(Updated Dec. 22, 2017, 8:10 p.m.)


Review request for mesos, Gilbert Song, Jason Lai, and Jie Yu.


Changes
---

Rebase.


Bugs: MESOS-4945
https://issues.apache.org/jira/browse/MESOS-4945


Repository: mesos


Description
---

Added a new operator API for `PRUNE_IMAGES`.


Diffs (updated)
-

  include/mesos/agent/agent.proto 25af6dc64ca35126fd70dd10388bd2cea6dcbd36 
  include/mesos/v1/agent/agent.proto 4ae9013e6fb27bdce72b4aa9fc9cbd99f79bc548 
  src/slave/http.hpp 3cdbf988db923801e2b3d93cf60b33ad14653742 
  src/slave/http.cpp 3e20c90fb94f0128a757745052534ef03612d849 
  src/slave/validation.cpp b7243fc2316a69efa23285bf3394104e0d58759e 


Diff: https://reviews.apache.org/r/56722/diff/9/

Changes: https://reviews.apache.org/r/56722/diff/8-9/


Testing
---

1. New unit test to trigger the new call.
2. Manually tested that images previous pulled but not running can be purged 
through new operator API call while active images (running or being pulled) are 
not affected.


Thanks,

Zhitao Li



Re: Review Request 64813: Documented new image gc support in Mesos containerizer.

2017-12-22 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64813/
---

(Updated Dec. 22, 2017, 8:21 p.m.)


Review request for mesos, Gilbert Song and Jie Yu.


Bugs: MESOS-8310
https://issues.apache.org/jira/browse/MESOS-8310


Repository: mesos


Description
---

Documented new image gc support in Mesos containerizer.


Diffs
-

  docs/container-image.md 99f4f5c5b617f5deb618614fda1365d72c8685de 
  docs/operator-http-api.md f81e5bcb38455e6595e54892604773665140933d 


Diff: https://reviews.apache.org/r/64813/diff/1/


Testing (updated)
---

https://github.com/zhitaoli/mesos/blob/public/zhitao/image_gc_revision3/docs/container-image.md


Thanks,

Zhitao Li



Review Request 64812: Added `excluded_images` parameter to `PRUNE_IMAGES` agent API.

2017-12-22 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64812/
---

Review request for mesos, Gilbert Song and Jie Yu.


Bugs: MESOS-4945
https://issues.apache.org/jira/browse/MESOS-4945


Repository: mesos


Description
---

Added `excluded_images` parameter to `PRUNE_IMAGES` agent API.


Diffs
-

  include/mesos/agent/agent.proto 25af6dc64ca35126fd70dd10388bd2cea6dcbd36 
  include/mesos/v1/agent/agent.proto 4ae9013e6fb27bdce72b4aa9fc9cbd99f79bc548 
  src/slave/http.cpp 3e20c90fb94f0128a757745052534ef03612d849 


Diff: https://reviews.apache.org/r/64812/diff/1/


Testing
---


Thanks,

Zhitao Li



Review Request 64813: Documented new image gc support in Mesos containerizer.

2017-12-22 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64813/
---

Review request for mesos, Gilbert Song and Jie Yu.


Bugs: MESOS-8310
https://issues.apache.org/jira/browse/MESOS-8310


Repository: mesos


Description
---

Documented new image gc support in Mesos containerizer.


Diffs
-

  docs/container-image.md 99f4f5c5b617f5deb618614fda1365d72c8685de 
  docs/operator-http-api.md f81e5bcb38455e6595e54892604773665140933d 


Diff: https://reviews.apache.org/r/64813/diff/1/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 68088: Replaced `cgroups::DESTROY_TIMEOUT` with new agent flag.

2018-07-30 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68088/
---

(Updated July 30, 2018, 10:50 a.m.)


Review request for mesos, Gilbert Song and Jason Lai.


Changes
---

Squash both diffs.


Summary (updated)
-

Replaced `cgroups::DESTROY_TIMEOUT` with new agent flag.


Bugs: MESOS-8038
https://issues.apache.org/jira/browse/MESOS-8038


Repository: mesos


Description (updated)
---

The new agent flag can be used to reconfigure how long a container
destroy is allowed to take on Mesos containerizer.

The default is also increased to 5 min based on suggestion from Gilbert
because certain containers could have deep system calls which may not
finish within previous 1 min timeout.


Diffs (updated)
-

  src/linux/cgroups.hpp 6a6f0e6df208bc0b0a888d132b3befd062755851 
  src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp 
8fba6f9d335cc43a7cd0fcb51ed047ba0e7027ed 
  src/slave/containerizer/mesos/linux_launcher.cpp 
3bddcece7028745cec6623ac33dbfcaced629629 
  src/slave/flags.hpp eeb9708f9ec76d83b6719541f4a012544c7c0cbe 
  src/slave/flags.cpp 58cdc0f1100fe244e5bf1036e1ccf39478d5d478 


Diff: https://reviews.apache.org/r/68088/diff/2/

Changes: https://reviews.apache.org/r/68088/diff/1-2/


Testing
---

`make` and `./bin/mesos-slave.sh --help`


Thanks,

Zhitao Li



Review Request 68088: Replaced `cgroups::DESTROY_TIMEOUT` with agent flag.

2018-07-27 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68088/
---

Review request for mesos, Gilbert Song and Jason Lai.


Bugs: MESOS-8038
https://issues.apache.org/jira/browse/MESOS-8038


Repository: mesos


Description
---

Replaced `cgroups::DESTROY_TIMEOUT` with agent flag.


Diffs
-

  src/linux/cgroups.hpp 6a6f0e6df208bc0b0a888d132b3befd062755851 
  src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp 
8fba6f9d335cc43a7cd0fcb51ed047ba0e7027ed 
  src/slave/containerizer/mesos/linux_launcher.cpp 
3bddcece7028745cec6623ac33dbfcaced629629 


Diff: https://reviews.apache.org/r/68088/diff/1/


Testing
---

`make` and `./bin/mesos-slave.sh --help`


Thanks,

Zhitao Li



Review Request 68087: Added new agent flag `cgroups_destroy_timeout`.

2018-07-27 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68087/
---

Review request for mesos, Gilbert Song and Jason Lai.


Bugs: MESOS-8038
https://issues.apache.org/jira/browse/MESOS-8038


Repository: mesos


Description
---

Added new agent flag `cgroups_destroy_timeout`.


Diffs
-

  src/slave/flags.hpp eeb9708f9ec76d83b6719541f4a012544c7c0cbe 
  src/slave/flags.cpp 58cdc0f1100fe244e5bf1036e1ccf39478d5d478 


Diff: https://reviews.apache.org/r/68087/diff/1/


Testing
---

`make` and run `./bin/mesos-slave.sh --help`.


Thanks,

Zhitao Li



Review Request 68299: Documented new `--cgroups_destroy_timeout` agent option.

2018-08-10 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68299/
---

Review request for mesos, Gilbert Song and Jason Lai.


Repository: mesos


Description
---

Documented new `--cgroups_destroy_timeout` agent option.


Diffs
-

  docs/configuration/agent.md 83b5fed5a8bf287700688507eaa584f37e8ba2b7 


Diff: https://reviews.apache.org/r/68299/diff/1/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 68088: Replaced `cgroups::DESTROY_TIMEOUT` with new agent flag.

2018-08-10 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68088/
---

(Updated Aug. 10, 2018, 10:36 a.m.)


Review request for mesos, Gilbert Song, Jason Lai, and James Peach.


Bugs: MESOS-8038
https://issues.apache.org/jira/browse/MESOS-8038


Repository: mesos


Description (updated)
---

The new agent flag can be used to reconfigure how long a container
destroy is allowed to take on Mesos containerizer.


Diffs (updated)
-

  src/linux/cgroups.hpp 6a6f0e6df208bc0b0a888d132b3befd062755851 
  src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp 
e016d6dcff15bf49b003c9f3ca94477ce313f7b6 
  src/slave/containerizer/mesos/linux_launcher.cpp 
cd677cc652dc53ea3bf9a715b415c3e5c48c1f89 
  src/slave/flags.hpp eeb9708f9ec76d83b6719541f4a012544c7c0cbe 
  src/slave/flags.cpp 58cdc0f1100fe244e5bf1036e1ccf39478d5d478 


Diff: https://reviews.apache.org/r/68088/diff/3/

Changes: https://reviews.apache.org/r/68088/diff/2-3/


Testing
---

`make` and `./bin/mesos-slave.sh --help`


Thanks,

Zhitao Li



Review Request 68300: Added `gpus` to failure message.

2018-08-10 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68300/
---

Review request for mesos, Gilbert Song and Jason Lai.


Repository: mesos


Description
---

This message would be propagated all the way to scheduler, so making it
explicit that this is related to gpu can speed up debugging.


Diffs
-

  src/slave/containerizer/mesos/isolators/gpu/allocator.cpp 
c288ad634b856702483b9751f41445308babd0c9 


Diff: https://reviews.apache.org/r/68300/diff/1/


Testing
---


Thanks,

Zhitao Li



Review Request 65294: Add revocable resources support to mesos-execute.

2018-01-23 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65294/
---

Review request for mesos and Jason Lai.


Bugs: MESOS-8471
https://issues.apache.org/jira/browse/MESOS-8471


Repository: mesos


Description
---

This allows user to use `mesos-execute` to quickly test over subscription.


Diffs
-

  src/cli/execute.cpp 6e626893a64dc07ac9a5f9a5c31e680cb80938fe 


Diff: https://reviews.apache.org/r/65294/diff/1/


Testing
---

Submitted a test task with following configuration:
```
 {
   "name": "Name",
   "task_id": {"value" : "1"},
   "agent_id": {"value" : ""},
   "resources": [
 {
   "name": "cpus",
   "type": "SCALAR",
   "revocable": {},
   "scalar": {
 "value": 0.1
   }
 }
   ],
   "command": {
 "value": "sleep 1000"
   }
 }
 ```
 
With command `mesos-execute --master=localhost:5050 --revocable_resources 
--task=file:///home/user/test_rev_task.json`

If the master/agent has revocable cpu, this allows the task to execute.


Thanks,

Zhitao Li



Re: Review Request 65930: Start heartbeater after SUBSCRIBED event.

2018-03-07 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65930/
---

(Updated March 7, 2018, 6:54 p.m.)


Review request for mesos, Greg Mann, Jason Lai, Quinn Leng, and Vinod Kone.


Bugs: MESOS-8641
https://issues.apache.org/jira/browse/MESOS-8641


Repository: mesos


Description
---

https://reviews.apache.org/r/61262 added heartbeat support to
event stream, but the implementation could send a `HEARTBEAT`
event before `SUBSCRIBED` event, which changed previous behavior.

This patch fixed the issue by starting heartbeater only after
`SUBSCRIBED` event is sent.


Diffs (updated)
-

  src/master/http.cpp acf10a0f396234ec9f444c9b41515d657379d6ad 


Diff: https://reviews.apache.org/r/65930/diff/3/

Changes: https://reviews.apache.org/r/65930/diff/2-3/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 65930: Start heartbeater after SUBSCRIBED event.

2018-03-06 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65930/
---

(Updated March 6, 2018, 9:07 p.m.)


Review request for mesos, Greg Mann, Jason Lai, Quinn Leng, and Vinod Kone.


Changes
---

Subscribe only after first HEARTBEAT is also sent.


Bugs: MESOS-8641
https://issues.apache.org/jira/browse/MESOS-8641


Repository: mesos


Description
---

https://reviews.apache.org/r/61262 added heartbeat support to
event stream, but the implementation could send a `HEARTBEAT`
event before `SUBSCRIBED` event, which changed previous behavior.

This patch fixed the issue by starting heartbeater only after
`SUBSCRIBED` event is sent.


Diffs (updated)
-

  src/master/http.cpp acf10a0f396234ec9f444c9b41515d657379d6ad 


Diff: https://reviews.apache.org/r/65930/diff/2/

Changes: https://reviews.apache.org/r/65930/diff/1-2/


Testing
---


Thanks,

Zhitao Li



Review Request 65959: Added a test to make sure `slave/recovery_secs` is reported in metrics.

2018-03-07 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65959/
---

Review request for mesos, Gilbert Song, Greg Mann, Jason Lai, and James Peach.


Bugs: MESOS-8609
https://issues.apache.org/jira/browse/MESOS-8609


Repository: mesos


Description
---

Added a test to make sure `slave/recovery_secs` is reported in metrics.


Diffs
-

  src/tests/slave_tests.cpp e253317a67019302f18afe11e2a314e716cec226 


Diff: https://reviews.apache.org/r/65959/diff/1/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 65954: Add a gauge for how long agent recovery takes.

2018-03-07 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65954/
---

(Updated March 7, 2018, 11:20 p.m.)


Review request for mesos, Gilbert Song, Greg Mann, Jason Lai, and James Peach.


Bugs: MESOS-8609
https://issues.apache.org/jira/browse/MESOS-8609


Repository: mesos


Description
---

The new metric `slave/recover_secs` can be used to tell us how long
Mesos agent needed to finish its recovery cycle. This is an important
metric on agent machines which have a lot of completed executor
sandboxes.

Note that the metric 1) will only be available after recovery succeeded
and 2) never change its value across agent process lifecycle afterwards.


Diffs (updated)
-

  src/slave/metrics.hpp 3fc933ca65690d6fad63156398ad9c2c53789296 
  src/slave/metrics.cpp 0eb2b59ed67e14e73b29d7592c239441df0008d5 
  src/slave/slave.cpp e2facb3c15a2f907f6497c58a36842ed707f2c70 


Diff: https://reviews.apache.org/r/65954/diff/2/

Changes: https://reviews.apache.org/r/65954/diff/1-2/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 65811: Add `path::clean` to stout for normalizing path (for POSIX only now)

2018-02-27 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65811/#review198334
---




3rdparty/stout/include/stout/path.hpp
Lines 61 (patched)
<https://reviews.apache.org/r/65811/#comment278472>

Can you clarify whether the rules also apply to windows properly with 
correct separator?



3rdparty/stout/include/stout/path.hpp
Lines 85 (patched)
<https://reviews.apache.org/r/65811/#comment278475>

I don't think c++ vector supports negative index (this is not python).

You can use `components.back()` as long as `components` is guaranteed not 
empty.



3rdparty/stout/include/stout/path.hpp
Lines 99 (patched)
<https://reviews.apache.org/r/65811/#comment278473>

s/slash/separator in comment



3rdparty/stout/include/stout/path.hpp
Lines 100-101 (patched)
<https://reviews.apache.org/r/65811/#comment278474>

`components.insert(components.begin(), "");`


- Zhitao Li


On Feb. 27, 2018, 6:23 a.m., Jason Lai wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65811/
> ---
> 
> (Updated Feb. 27, 2018, 6:23 a.m.)
> 
> 
> Review request for mesos, Anish Gupta, Eric Chung, Gilbert Song, Jie Yu, 
> James Peach, and Zhitao Li.
> 
> 
> Bugs: MESOS-8257
> https://issues.apache.org/jira/browse/MESOS-8257
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Add `path::clean` to stout for normalizing path (for POSIX only now)
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/include/stout/path.hpp 
> 27438d31617b3b78bf3d4deffd25c93340610e8d 
> 
> 
> Diff: https://reviews.apache.org/r/65811/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jason Lai
> 
>



Review Request 66048: Added missing comment on a test case.

2018-03-13 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66048/
---

Review request for mesos and Chun-Hung Hsiao.


Repository: mesos


Description
---

Added missing comment on a test case.


Diffs
-

  src/tests/persistent_volume_tests.cpp 
924d8458e54e34a49c99593482b5908c5f7c7a48 


Diff: https://reviews.apache.org/r/66048/diff/1/


Testing
---


Thanks,

Zhitao Li



Review Request 66049: Added offer operation to grow and shrink persistent volumes.

2018-03-13 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66049/
---

Review request for mesos, Chun-Hung Hsiao and Gaston Kleiman.


Bugs: MESOS-4965
https://issues.apache.org/jira/browse/MESOS-4965


Repository: mesos


Description
---

Added offer operation to grow and shrink persistent volumes.


Diffs
-

  include/mesos/mesos.proto e6ba3746456c9241ceaefac39200f68562dd5cb9 
  include/mesos/v1/mesos.proto 30d4d35e865db2af1ba85b12e2b5b0e499ef8de8 


Diff: https://reviews.apache.org/r/66049/diff/1/


Testing
---


Thanks,

Zhitao Li



Review Request 66051: Implemented operator API to grow and shrink persistent volume.

2018-03-13 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66051/
---

Review request for mesos, Chun-Hung Hsiao and Gaston Kleiman.


Bugs: MESOS-4965
https://issues.apache.org/jira/browse/MESOS-4965


Repository: mesos


Description
---

Implemented operator API to grow and shrink persistent volume.


Diffs
-

  include/mesos/master/master.proto 6622e1b82861f380346f505eca326f7174dd8bd6 
  include/mesos/v1/master/master.proto 6034bd5af8ad764f625f9fe80be08b48707bbadb 
  src/master/http.cpp cf03d8bf7df583c849a3b2a326d3772820181b0f 
  src/master/master.hpp 8bf2c763dafdb7df55c46a56f2ff66f2a951d947 
  src/master/validation.cpp 9254cd22e67d9aca53135701575fb0857ff6f7d6 


Diff: https://reviews.apache.org/r/66051/diff/1/


Testing
---


Thanks,

Zhitao Li



Review Request 66052: Added new operator API to grow and shrink persistent volume.

2018-03-13 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66052/
---

Review request for mesos, Chun-Hung Hsiao and Gaston Kleiman.


Bugs: MESOS-4965
https://issues.apache.org/jira/browse/MESOS-4965


Repository: mesos


Description
---

Added new operator API to grow and shrink persistent volume.


Diffs
-

  include/mesos/master/master.proto 6622e1b82861f380346f505eca326f7174dd8bd6 
  include/mesos/v1/master/master.proto 6034bd5af8ad764f625f9fe80be08b48707bbadb 


Diff: https://reviews.apache.org/r/66052/diff/1/


Testing
---


Thanks,

Zhitao Li



Review Request 66050: Implement grow and shrink of persistent volume.

2018-03-13 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66050/
---

Review request for mesos, Chun-Hung Hsiao and Gaston Kleiman.


Bugs: MESOS-4965
https://issues.apache.org/jira/browse/MESOS-4965


Repository: mesos


Description
---

Implement grow and shrink of persistent volume.


Diffs
-

  src/common/protobuf_utils.cpp 9c5fb97afb58f98013b79f3cbaea7dacc3603271 
  src/common/resources_utils.cpp 99b16e0d17b224eefa1e28f5f66c4284069c0e57 
  src/master/master.cpp 223ebf29ac4dd1dea9080e4bef4b2d4d064d847f 
  src/master/validation.hpp 7c129ceb929596acbb64d37025e055661277e6bf 
  src/master/validation.cpp 9254cd22e67d9aca53135701575fb0857ff6f7d6 
  src/tests/mesos.hpp 2c3d0c9350cd2223ff20c0797d1849d38c19 


Diff: https://reviews.apache.org/r/66050/diff/1/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 66034: Remount several proc filesystem entries as read-only.

2018-03-15 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66034/#review199281
---



I feel that the complexity of this code justifies better user doc, possibly 
when we create a new isolator for this?

Also, how much of each mount should be allow to reconfigure? Should this 
behavior be dictated for every user of Mesos containerizer?


src/linux/fs.cpp
Lines 686-692 (original), 686-692 (patched)
<https://reviews.apache.org/r/66034/#comment279584>

Can we move the `TODO` to the sentence about follow-up? The sentence `These 
special filesystem mount points need to be bind-mounted prior to all other ...` 
is a comment on requirement which your follow up work would not change.


- Zhitao Li


On March 15, 2018, 6:24 p.m., Jason Lai wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66034/
> ---
> 
> (Updated March 15, 2018, 6:24 p.m.)
> 
> 
> Review request for mesos, Eric Chung, Gilbert Song, Ian Downes, Jie Yu, James 
> Peach, and Zhitao Li.
> 
> 
> Bugs: MESOS-8654
> https://issues.apache.org/jira/browse/MESOS-8654
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Several entries under the proc FS within Mesos containers need to be
> remounted as readonly for improved security reasons.
> 
> The list should include the important ones introduced by Systemd's
> `ProtectKernelTunables` option:
> 
> * `/proc/bus`
> * `/proc/fs`
> * `/proc/irq`
> * `/proc/sys`
> * `/proc/sysrq-trigger`
> 
> It is particularly necessary to remount `/proc/sysrq-trigger` as
> read-only. Otherwise, it would be possible for processes running in
> containers as `root` to perform privileged operations, such as host
> reboot.
> 
> Extra mount options should include `nosuid,noexec,nodev` (see also
> `mount(2)` for detailed explanations of the options).
> 
> 
> Diffs
> -
> 
>   src/linux/fs.cpp ed26f80ef7315809a1df9f2c50b4fe3445810f8a 
> 
> 
> Diff: https://reviews.apache.org/r/66034/diff/1/
> 
> 
> Testing
> ---
> 
> The mount table of the container launched by the patched version of 
> `mesos-containerizer launch` include the entries listed below, with 
> `nosuid,noexec,nodev` mount options:
> ```
> $ sudo unshare -m -p -f /usr/local/libexec/mesos/mesos-containerizer launch 
> --launch_info="$(jq -c . launch_info.json)" --runtime_directory="$(pwd)"
> Marked '/' as rslave
> Prepared mount 
> '{"flags":20480,"source":"\/etc\/hostname","target":"\/home\/jlai\/containers\/rootfs\/etc\/hostname"}'
> Prepared mount 
> '{"flags":20480,"source":"\/etc\/hosts","target":"\/home\/jlai\/containers\/rootfs\/etc\/hosts"}'
> Prepared mount 
> '{"flags":20480,"source":"\/etc\/resolv.conf","target":"\/home\/jlai\/containers\/rootfs\/etc\/resolv.conf"}'
> Changing root to /home/jlai/containers/rootfs
> bash-4.4# findmnt -a
> TARGET  SOURCE  FSTYPE  OPTIONS
> /   alpine  overlay 
> rw,relatime,lowerdir=overlay/lower,upperdir=overlay/upper,workdir=overlay/work
> |-/etc/hostname /dev/dm-0[/etc/hostname]ext4
> rw,noatime,errors=panic,data=ordered
> |-/etc/hosts/dev/dm-0[/etc/hosts]   ext4
> rw,noatime,errors=panic,data=ordered
> |-/etc/resolv.conf  /dev/dm-0[/etc/resolv.conf] ext4
> rw,noatime,errors=panic,data=ordered
> |-/proc procproc
> rw,nosuid,nodev,noexec,relatime
> | |-/proc/bus   proc[/bus]  proc
> ro,nosuid,nodev,noexec,relatime
> | |-/proc/fsproc[/fs]   proc
> ro,nosuid,nodev,noexec,relatime
> | |-/proc/irq   proc[/irq]  proc
> ro,nosuid,nodev,noexec,relatime
> | |-/proc/sys   proc[/sys]  proc
> ro,nosuid,nodev,noexec,relatime
> | `-/proc/sysrq-trigger proc[/sysrq-trigger]proc
> ro,nosuid,nodev,noexec,relatime
> |-/sys  sysfs   sysfs   
> ro,nosuid,nodev,noexec,relatime
> `-/dev  tmpfs   tmpfs   
> rw,nosuid,noexec,mode=755
>   |-/dev/ptsdevpts  devpts  
> rw,nosuid,noexec,relatime,mode=600,ptmxmode=666
>   `-/dev/shmtmpfs   tmpfs   rw,nosuid,nodev
> ```
> 
> 
> Thanks,
> 
> Jason Lai
> 
>



Re: Review Request 66104: Fixed potential memory leak in the `volume/sandbox_path` isolator.

2018-03-15 Thread Zhitao Li


> On March 15, 2018, 4:33 p.m., Gilbert Song wrote:
> > src/slave/containerizer/mesos/isolators/volume/sandbox_path.cpp
> > Lines 404 (patched)
> > <https://reviews.apache.org/r/66104/diff/1/?file=1978194#file1978194line404>
> >
> > Instead of the check failure. We probably should return a failure 
> > instead.

Do we even need to return a failure here? Is a warning log sufficient?


- Zhitao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66104/#review199300
---


On March 15, 2018, 3:31 p.m., Jason Lai wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66104/
> ---
> 
> (Updated March 15, 2018, 3:31 p.m.)
> 
> 
> Review request for mesos, Eric Chung, Gilbert Song, Jie Yu, and Zhitao Li.
> 
> 
> Bugs: MESOS-8651
> https://issues.apache.org/jira/browse/MESOS-8651
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The `volume/sandbox_path` isolator inserts a string of the sandbox path
> to its `sandboxes` hashmap instance variable upon the launch of each
> container. However, it never cleans it up properly and can cause
> unbounded growth of the hashmap object, as isolators are global
> singleton objects.
> 
> The patch ensures the sandbox path associated with a given container ID
> gets removed from the `sandboxes` hashmap upon container cleanup.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/volume/sandbox_path.hpp 
> 20d5b32fb7ada1a17a40bf1a2438db4d85cf1063 
>   src/slave/containerizer/mesos/isolators/volume/sandbox_path.cpp 
> 5801977e93bcb8f463a2635f73e763098f2aa97d 
> 
> 
> Diff: https://reviews.apache.org/r/66104/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jason Lai
> 
>



Re: Review Request 61262: Added 'heartbeat' event for the operator API.

2018-03-06 Thread Zhitao Li


> On March 6, 2018, 2:44 a.m., Zhitao Li wrote:
> > src/master/http.cpp
> > Lines 845-857 (original), 845-861 (patched)
> > <https://reviews.apache.org/r/61262/diff/8/?file=1800242#file1800242line845>
> >
> > I believe this is a behavior change.
> > 
> > Previous, `master->subscribe(...)` call will not trigger additional 
> > event being sent, so `SUBSCRIBED` is still guaranteed to be the first event.
> > 
> > However, we start the `heartbeater` process immediately without wait 
> > for `SUBSCRIBED` to be sent, so this could mean that subscriber can receive 
> > the `SUBSCRIBED` event after heartbeater sends something.
> > 
> > Should reorder the call so we only call `master->subscribe(http);` 
> > after the `http.send` on `SUBSCRIBED`?
> 
> Greg Mann wrote:
> Reordering is an option, but I wonder if this will really be an issue in 
> practice? We delay the first heartbeat event sent by the Heartbeater by 
> DEFAULT_HEARTBEAT_INTERVAL: 
> https://github.com/apache/mesos/blob/0d247c3887ea08b6273992218cd5899010d89fed/src/master/master.hpp#L1986
> so the Heartbeater sends its first heartbeat event after that interval 
> elapses. So, the heartbeat event could only arrive first in the presence of 
> some extreme CPU contention, it doesn't seem likely/possible to me? WDYT?

I think in large cluster, the first SUBSCRIBED message could take longer than 
`DEFAULT_HEARTBEAT_INTERVAL` under extreme cases. In practice this has hit our 
cluster maybe less than 10% of master failovers.

Still, I would argue that we should ensure `SUBSCRIBED` message is sent before 
any `HEARTBEAT`, also because subscriber cannot event process the `HEARTBEAT` 
without the `heartbeat_interval_seconds` value from `SUBSCRIBED` message.


- Zhitao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61262/#review198678
---


On Aug. 18, 2017, 6:54 p.m., Quinn Leng wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61262/
> ---
> 
> (Updated Aug. 18, 2017, 6:54 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Greg Mann.
> 
> 
> Bugs: MESOS-7695
> https://issues.apache.org/jira/browse/MESOS-7695
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added the 'HEARTBEAT' event for the operator API, modified other
> related test cases to accept heartbeats.
> 
> 
> Diffs
> -
> 
>   include/mesos/master/master.proto fc5bd894ce55fe8e946d4c5b4b33d3c0505f3c2b 
>   include/mesos/v1/master/master.proto 
> c3fb31de2509adcdec8204f8bbe46b46a31540e8 
>   src/master/http.cpp 959091c8ec03b6ac7bcb5d21b04d2f7d5aff7d54 
>   src/master/master.hpp b802fd153a10f6012cea381f153c28cc78cae995 
>   src/tests/api_tests.cpp 3ab4740bcac29ecb89585da6adb1f563d6fc1f5f 
> 
> 
> Diff: https://reviews.apache.org/r/61262/diff/8/
> 
> 
> Testing
> ---
> 
> make check -j48
> 
> 
> Thanks,
> 
> Quinn Leng
> 
>



Review Request 65954: Add a gauge for how long agent recovery takes.

2018-03-07 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65954/
---

Review request for mesos, Gilbert Song and Jason Lai.


Bugs: MESOS-8609
https://issues.apache.org/jira/browse/MESOS-8609


Repository: mesos


Description
---

The new metric `slave/recover_secs` can be used to tell us how long
Mesos agent needed to finish its recovery cycle. This is an important
metric on agent machines which have a lot of completed executor
sandboxes.

Note that the metric 1) will only be available after recovery succeeded
and 2) never change its value across agent process lifecycle afterwards.


Diffs
-

  src/slave/metrics.hpp 3fc933ca65690d6fad63156398ad9c2c53789296 
  src/slave/metrics.cpp 0eb2b59ed67e14e73b29d7592c239441df0008d5 
  src/slave/slave.cpp e2facb3c15a2f907f6497c58a36842ed707f2c70 


Diff: https://reviews.apache.org/r/65954/diff/1/


Testing
---


Thanks,

Zhitao Li



Review Request 65930: Start heartbeater after SUBSCRIBED event.

2018-03-06 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65930/
---

Review request for mesos, Greg Mann, Jason Lai, Quinn Leng, and Vinod Kone.


Bugs: MESOS-8641
https://issues.apache.org/jira/browse/MESOS-8641


Repository: mesos


Description
---

https://reviews.apache.org/r/61262 added heartbeat support to
event stream, but the implementation could send a `HEARTBEAT`
event before `SUBSCRIBED` event, which changed previous behavior.

This patch fixed the issue by starting heartbeater only after
`SUBSCRIBED` event is sent.


Diffs
-

  src/master/http.cpp acf10a0f396234ec9f444c9b41515d657379d6ad 


Diff: https://reviews.apache.org/r/65930/diff/1/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 65959: Added a test to make sure `slave/recovery_time_secs` is reported.

2018-03-14 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65959/
---

(Updated March 14, 2018, 6:03 p.m.)


Review request for mesos, Gilbert Song, Greg Mann, Jason Lai, and James Peach.


Changes
---

Updated metric name.


Summary (updated)
-

Added a test to make sure `slave/recovery_time_secs` is reported.


Bugs: MESOS-8609
https://issues.apache.org/jira/browse/MESOS-8609


Repository: mesos


Description (updated)
---

Added a test to make sure `slave/recovery_time_secs` is reported.


Diffs (updated)
-

  src/tests/slave_tests.cpp c2afaa573039e25fb59496765e6717e7643a692f 


Diff: https://reviews.apache.org/r/65959/diff/2/

Changes: https://reviews.apache.org/r/65959/diff/1-2/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 65954: Add a gauge for how long agent recovery takes.

2018-03-14 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65954/
---

(Updated March 14, 2018, 4:06 p.m.)


Review request for mesos, Gilbert Song, Greg Mann, Jason Lai, and James Peach.


Changes
---

Remove usage of atomic and rename metric.


Bugs: MESOS-8609
https://issues.apache.org/jira/browse/MESOS-8609


Repository: mesos


Description (updated)
---

The new metric `slave/recover_time_secs` can be used to tell us how long
Mesos agent needed to finish its recovery cycle. This is an important
metric on agent machines which have a lot of completed executor
sandboxes.

Note that the metric 1) will only be available after recovery succeeded
and 2) never change its value across agent process lifecycle afterwards.


Diffs (updated)
-

  src/slave/metrics.hpp 3fc933ca65690d6fad63156398ad9c2c53789296 
  src/slave/metrics.cpp 0eb2b59ed67e14e73b29d7592c239441df0008d5 
  src/slave/slave.cpp 2f4ab157448eafc0f41372ee50255a76129e90db 


Diff: https://reviews.apache.org/r/65954/diff/3/

Changes: https://reviews.apache.org/r/65954/diff/2-3/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 65954: Add a gauge for how long agent recovery takes.

2018-03-14 Thread Zhitao Li


> On March 9, 2018, 6:53 p.m., James Peach wrote:
> > src/slave/metrics.hpp
> > Lines 45 (patched)
> > <https://reviews.apache.org/r/65954/diff/2/?file=1972383#file1972383line45>
> >
> > This doesn't need to be atomic. The reader will just read either the 
> > old or new values and it doesn't matter which it gets.

Actually I don't even need a class variable. Directly capture it in lambda 
should be sufficient.


- Zhitao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65954/#review198952
-----------


On March 14, 2018, 4:06 p.m., Zhitao Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65954/
> ---
> 
> (Updated March 14, 2018, 4:06 p.m.)
> 
> 
> Review request for mesos, Gilbert Song, Greg Mann, Jason Lai, and James Peach.
> 
> 
> Bugs: MESOS-8609
> https://issues.apache.org/jira/browse/MESOS-8609
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The new metric `slave/recover_time_secs` can be used to tell us how long
> Mesos agent needed to finish its recovery cycle. This is an important
> metric on agent machines which have a lot of completed executor
> sandboxes.
> 
> Note that the metric 1) will only be available after recovery succeeded
> and 2) never change its value across agent process lifecycle afterwards.
> 
> 
> Diffs
> -
> 
>   src/slave/metrics.hpp 3fc933ca65690d6fad63156398ad9c2c53789296 
>   src/slave/metrics.cpp 0eb2b59ed67e14e73b29d7592c239441df0008d5 
>   src/slave/slave.cpp 2f4ab157448eafc0f41372ee50255a76129e90db 
> 
> 
> Diff: https://reviews.apache.org/r/65954/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zhitao Li
> 
>



Review Request 66070: Document new `slave/recovery_time_secs` gauge.

2018-03-14 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66070/
---

Review request for mesos, Jason Lai and James Peach.


Bugs: MESOS-8609
https://issues.apache.org/jira/browse/MESOS-8609


Repository: mesos


Description
---

Document new `slave/recovery_time_secs` gauge.


Diffs
-

  docs/monitoring.md 0fba5cede553e2030260dcf8401d48e66e740270 


Diff: https://reviews.apache.org/r/66070/diff/1/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 65954: Add a gauge for how long agent recovery takes.

2018-03-13 Thread Zhitao Li


> On March 9, 2018, 6:54 p.m., James Peach wrote:
> > This also needs to be documented in `docs/monitoring.md`.

Will do in a follow up patch after receiving shipit.


- Zhitao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65954/#review198954
---


On March 7, 2018, 11:20 p.m., Zhitao Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65954/
> ---
> 
> (Updated March 7, 2018, 11:20 p.m.)
> 
> 
> Review request for mesos, Gilbert Song, Greg Mann, Jason Lai, and James Peach.
> 
> 
> Bugs: MESOS-8609
> https://issues.apache.org/jira/browse/MESOS-8609
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The new metric `slave/recover_secs` can be used to tell us how long
> Mesos agent needed to finish its recovery cycle. This is an important
> metric on agent machines which have a lot of completed executor
> sandboxes.
> 
> Note that the metric 1) will only be available after recovery succeeded
> and 2) never change its value across agent process lifecycle afterwards.
> 
> 
> Diffs
> -
> 
>   src/slave/metrics.hpp 3fc933ca65690d6fad63156398ad9c2c53789296 
>   src/slave/metrics.cpp 0eb2b59ed67e14e73b29d7592c239441df0008d5 
>   src/slave/slave.cpp e2facb3c15a2f907f6497c58a36842ed707f2c70 
> 
> 
> Diff: https://reviews.apache.org/r/65954/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zhitao Li
> 
>



Re: Review Request 65954: Add a gauge for how long agent recovery takes.

2018-03-13 Thread Zhitao Li


> On March 9, 2018, 6:53 p.m., James Peach wrote:
> > src/slave/metrics.cpp
> > Lines 259 (patched)
> > <https://reviews.apache.org/r/65954/diff/2/?file=1972384#file1972384line259>
> >
> > I don't know that I like the idea of a metric that is absent and then 
> > present. I'd prefer that we just published a `0.0` until recovert is 
> > complete.
> > 
> > Suggest we keep the recovery timestamp in the `Slave` and just publish 
> > that.

I thought about that too, but I actually like the idea of the metric being 
absent when the value is not available yet. A zero value could confuse 
downstream aggregation.

For example, our team want to gather an average of recovery time across our 
cluster of thousands of agents, but a presence of zero value could mistake the 
calculation.

I think Mesos already have some precedence on absent then present metrics. For 
instance, metrics in `allocator/mesos/roles//...` could show up if 
framework under a new role registers after Master started.

Let me know what do you think.


> On March 9, 2018, 6:53 p.m., James Peach wrote:
> > src/slave/slave.cpp
> > Lines 7322 (patched)
> > <https://reviews.apache.org/r/65954/diff/2/?file=1972385#file1972385line7322>
> >
> > Since the gauge is being published in seconds, you need to use 
> > `Duration::secs` to convert.

I prefer the API call to work on `Duration` and perform the `secs()` as late as 
possible, as I've seen so many times when people pass a wrong time unit if the 
API task an integer/float.


- Zhitao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65954/#review198952
---


On March 7, 2018, 11:20 p.m., Zhitao Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65954/
> ---
> 
> (Updated March 7, 2018, 11:20 p.m.)
> 
> 
> Review request for mesos, Gilbert Song, Greg Mann, Jason Lai, and James Peach.
> 
> 
> Bugs: MESOS-8609
> https://issues.apache.org/jira/browse/MESOS-8609
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The new metric `slave/recover_secs` can be used to tell us how long
> Mesos agent needed to finish its recovery cycle. This is an important
> metric on agent machines which have a lot of completed executor
> sandboxes.
> 
> Note that the metric 1) will only be available after recovery succeeded
> and 2) never change its value across agent process lifecycle afterwards.
> 
> 
> Diffs
> -
> 
>   src/slave/metrics.hpp 3fc933ca65690d6fad63156398ad9c2c53789296 
>   src/slave/metrics.cpp 0eb2b59ed67e14e73b29d7592c239441df0008d5 
>   src/slave/slave.cpp e2facb3c15a2f907f6497c58a36842ed707f2c70 
> 
> 
> Diff: https://reviews.apache.org/r/65954/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zhitao Li
> 
>



Review Request 66481: Converted `pid` in command executor to `Option`.

2018-04-05 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66481/
---

Review request for mesos, Andrew Schwartzmeyer and James Peach.


Repository: mesos


Description
---

This avoids unix's assumption that `pid_t` is a signed integer (which is
not the case on Windows) and explicitly shows whether a pid has been
assigned from launching.

We also changed the member variable to `pid_` to avoid variable
shadowing in `reaped` method.


Diffs
-

  src/launcher/executor.cpp 383655230fcdad30b652ea984cb9f5fc9c49dd38 


Diff: https://reviews.apache.org/r/66481/diff/1/


Testing
---

`make` on Linux.


Thanks,

Zhitao Li



Re: Review Request 66481: Converted `pid` in command executor to `Option`.

2018-04-09 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66481/
---

(Updated April 9, 2018, 10:20 a.m.)


Review request for mesos, Andrew Schwartzmeyer and James Peach.


Repository: mesos


Description
---

This avoids unix's assumption that `pid_t` is a signed integer (which is
not the case on Windows) and explicitly shows whether a pid has been
assigned from launching.

We also changed argument name in `reaped` method to `_pid` to avoid
shadowing.


Diffs (updated)
-

  src/launcher/executor.cpp 383655230fcdad30b652ea984cb9f5fc9c49dd38 


Diff: https://reviews.apache.org/r/66481/diff/3/

Changes: https://reviews.apache.org/r/66481/diff/2-3/


Testing
---

`make` on Linux.


Thanks,

Zhitao Li



Review Request 66531: Added new authorization for `UpdateVolume`.

2018-04-10 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66531/
---

Review request for mesos, Chun-Hung Hsiao and Greg Mann.


Bugs: MESOS-8748
https://issues.apache.org/jira/browse/MESOS-8748


Repository: mesos


Description
---

The new authorization action is modelled after `CreateVolume`, and will
be shared by both `GROW_VOLUME` and `SHRINK_VOLUME` operations and
corresponding operator APIs.


Diffs
-

  include/mesos/authorizer/acls.proto 8ef33210644e7d2924b402a3158b1b38c1fdb464 
  include/mesos/authorizer/authorizer.proto 
1508c01130013d74ed2b2574a2428f38e3d2c064 
  src/authorizer/local/authorizer.cpp c098ba9ded1b29a2e079cf09ab80b61f6fa4af05 
  src/master/master.hpp 0d9620dd0c232dc1df83477e838eeb7313bf8828 
  src/master/master.cpp f7da675e8fe96159e5335c9e83b65b67ed90eda8 


Diff: https://reviews.apache.org/r/66531/diff/1/


Testing
---

Manually created ACL and verified that untrusted principals will not allow to 
grow/shrink volumes.
Also created unit test in next patch.


Thanks,

Zhitao Li



Review Request 66532: Added test for authorization actions for `UPDATE_VOLUME`.

2018-04-10 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66532/
---

Review request for mesos, Chun-Hung Hsiao and Greg Mann.


Bugs: MESOS-8748
https://issues.apache.org/jira/browse/MESOS-8748


Repository: mesos


Description
---

Added test for authorization actions for `UPDATE_VOLUME`.


Diffs
-

  src/tests/authorization_tests.cpp a76ad18a54ec232e6d6c92ec6a45b445a83f174c 


Diff: https://reviews.apache.org/r/66532/diff/1/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 66220: Added test for `GROW_VOLUME` and `SHRINK_VOLUME` operations.

2018-04-11 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66220/
---

(Updated April 11, 2018, 2:19 p.m.)


Review request for mesos, Chun-Hung Hsiao, Gaston Kleiman, and Greg Mann.


Changes
---

Use proper test macros.


Bugs: MESOS-4965
https://issues.apache.org/jira/browse/MESOS-4965


Repository: mesos


Description
---

Added test for `GROW_VOLUME` and `SHRINK_VOLUME` operations.


Diffs (updated)
-

  src/tests/persistent_volume_tests.cpp 
4edf781711d9efdb994114aeb6289b6af750b87a 


Diff: https://reviews.apache.org/r/66220/diff/4/

Changes: https://reviews.apache.org/r/66220/diff/3-4/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 66219: Added helper functions to create grow and shrink volume in test.

2018-04-11 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66219/
---

(Updated April 11, 2018, 2:18 p.m.)


Review request for mesos, Chun-Hung Hsiao, Gaston Kleiman, and Greg Mann.


Changes
---

Fix proper macros.


Bugs: MESOS-4965
https://issues.apache.org/jira/browse/MESOS-4965


Repository: mesos


Description
---

Added helper functions to create grow and shrink volume in test.


Diffs (updated)
-

  src/tests/mesos.hpp 6f4e0c5567b99891f6d0eceb7e6917d25083db0e 


Diff: https://reviews.apache.org/r/66219/diff/5/

Changes: https://reviews.apache.org/r/66219/diff/4-5/


Testing
---


Thanks,

Zhitao Li



Review Request 66569: Added a test to verify that grow and shrink cannot be combined.

2018-04-11 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66569/
---

Review request for mesos, Chun-Hung Hsiao and Greg Mann.


Bugs: MESOS-4965
https://issues.apache.org/jira/browse/MESOS-4965


Repository: mesos


Description
---

Added a test to verify that grow and shrink cannot be combined.


Diffs
-

  src/tests/persistent_volume_tests.cpp 
4edf781711d9efdb994114aeb6289b6af750b87a 


Diff: https://reviews.apache.org/r/66569/diff/1/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 66227: Added test for `GROW_VOLUME` and `SHRINK_VOLUME` operator API.

2018-04-11 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66227/
---

(Updated April 11, 2018, 2:21 p.m.)


Review request for mesos, Chun-Hung Hsiao, Gaston Kleiman, and Greg Mann.


Changes
---

Fix macro so it will skip on windows.


Bugs: MESOS-8747
https://issues.apache.org/jira/browse/MESOS-8747


Repository: mesos


Description
---

Added test for `GROW_VOLUME` and `SHRINK_VOLUME` operator API.


Diffs (updated)
-

  src/tests/api_tests.cpp dd8e221d8fd1b2a241505345337897e4ee4a6347 


Diff: https://reviews.apache.org/r/66227/diff/4/

Changes: https://reviews.apache.org/r/66227/diff/3-4/


Testing
---


Thanks,

Zhitao Li



Review Request 66568: Dropped GROW and SHRINK volume if combined with other operations.

2018-04-11 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66568/
---

Review request for mesos, Chun-Hung Hsiao and Greg Mann.


Bugs: MESOS-4965
https://issues.apache.org/jira/browse/MESOS-4965


Repository: mesos


Description
---

These two operations are intended to be non-speculative eventually but
are implemented as speculative right now. To avoid frameworks opt-in to
dangerous behavior, we require that accept can only container one
`GROW_VOLUME` or `SHRINK_VOLUME` and no other operations.


Diffs
-

  src/master/master.cpp f7da675e8fe96159e5335c9e83b65b67ed90eda8 


Diff: https://reviews.apache.org/r/66568/diff/1/


Testing
---


Thanks,

Zhitao Li



Review Request 66591: Added a validation that `max_completion_time` must be non-negative.

2018-04-12 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66591/
---

Review request for mesos, Jason Lai and James Peach.


Bugs: MESOS-8725
https://issues.apache.org/jira/browse/MESOS-8725


Repository: mesos


Description
---

Added a validation that `max_completion_time` must be non-negative.


Diffs
-

  src/master/validation.hpp 7c129ceb929596acbb64d37025e055661277e6bf 
  src/master/validation.cpp ac2e1bb8771841ec59b3bdcdeffb6c6230680d4d 


Diff: https://reviews.apache.org/r/66591/diff/1/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 66049: Added offer operation to grow and shrink persistent volumes.

2018-04-12 Thread Zhitao Li


> On March 29, 2018, 5:07 a.m., Benjamin Bannier wrote:
> > include/mesos/mesos.proto
> > Lines 1920-1921 (patched)
> > <https://reviews.apache.org/r/66049/diff/5/?file=1989744#file1989744line1920>
> >
> > These enum values are not handled in a number of switch statements in 
> > the code so that this patch cannot be used in isolation (fatal compiler 
> > warnings). It might be enough to enumerate the new cases explicitly and 
> > mark them `UNIMPLEMENTED`.
> > 
> > Similar for v1 enum below.
> 
> Zhitao Li wrote:
> My plan was to commit all patches leading to proper full implementation 
> in one shot so code on master is still compilable (which seems previous norm 
> to me?)
> 
> If you think marking them as `UNIMPLEMENTED` then gradually implement 
> each in later patches is better, I can happily do that.
> 
> Greg Mann wrote:
> We generally try to make each patch self-contained, so that the codebase 
> will still compile and tests will still pass if that patch were applied 
> alone. We do sometimes break from this when it's very difficult to 
> accomplish. I think that Benjamin's comment here is valid; I'll leave it up 
> to you what you want to do with this patch, but in the future, try to make 
> patches self-contained when possible.

That's fair. I like your proposal to keep patches self-contained if possible 
(just weren't clear on this practice).

I'll add `UNIMPLEMENTED();` to broken enum checks in this patch.


- Zhitao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66049/#review200185
---


On March 28, 2018, 11:24 a.m., Zhitao Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66049/
> ---
> 
> (Updated March 28, 2018, 11:24 a.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gaston Kleiman, and Greg Mann.
> 
> 
> Bugs: MESOS-4965
> https://issues.apache.org/jira/browse/MESOS-4965
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added offer operation to grow and shrink persistent volumes.
> 
> 
> Diffs
> -
> 
>   include/mesos/mesos.proto 676f0b090cad7ebf59eb32556f17ff8b5f247907 
>   include/mesos/v1/mesos.proto 10d506517c9f098374ab0c8f4dcfda42e1896c95 
> 
> 
> Diff: https://reviews.apache.org/r/66049/diff/6/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zhitao Li
> 
>



Re: Review Request 53267: Added a gauge to track active subscribers.

2018-04-12 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53267/
---

(Updated April 12, 2018, 2:32 p.m.)


Review request for mesos, Anand Mazumdar, Xiaojian Huang, and Kunal Thakar.


Changes
---

Rebase and updated summary.


Summary (updated)
-

Added a gauge to track active subscribers.


Bugs: MESOS-6499
https://issues.apache.org/jira/browse/MESOS-6499


Repository: mesos


Description (updated)
---

Added a gauge to track active subscribers.


Diffs (updated)
-

  src/master/master.hpp 0d9620dd0c232dc1df83477e838eeb7313bf8828 
  src/master/metrics.hpp 5699c64bac19cb735a4c72f9a067b1338c2ac3c2 
  src/master/metrics.cpp 894c04173d218ffd891d203254035fe4756cc8bb 


Diff: https://reviews.apache.org/r/53267/diff/2/

Changes: https://reviews.apache.org/r/53267/diff/1-2/


Testing
---

Ran this on master with a client keeps subscribing and disconnecting. Observes 
that gauge gets updated and new log line is printed.


Thanks,

Zhitao Li



Re: Review Request 66049: Added offer operation to grow and shrink persistent volumes.

2018-04-13 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66049/
---

(Updated April 13, 2018, 10:20 a.m.)


Review request for mesos, Chun-Hung Hsiao, Gaston Kleiman, and Greg Mann.


Changes
---

Remove unnecessary whiteline.


Bugs: MESOS-4965
https://issues.apache.org/jira/browse/MESOS-4965


Repository: mesos


Description
---

Added offer operation to grow and shrink persistent volumes.


Diffs (updated)
-

  include/mesos/mesos.proto 676f0b090cad7ebf59eb32556f17ff8b5f247907 
  include/mesos/v1/mesos.proto 10d506517c9f098374ab0c8f4dcfda42e1896c95 
  src/common/protobuf_utils.cpp 141a444534b776a2c90e2a0daf9727cd21e39080 
  src/common/resources_utils.cpp 9be01c1abd48264e308960f35cc7c2ee8a367518 
  src/master/master.cpp 767ad8cfe142b47ef07172bcb2a4fb49fc3e833a 
  src/tests/mesos.hpp 6f4e0c5567b99891f6d0eceb7e6917d25083db0e 
  src/tests/persistent_volume_tests.cpp 
4edf781711d9efdb994114aeb6289b6af750b87a 
  src/tests/reservation_tests.cpp 5570df2e0b208512d0a0a3079a180e1acfe08f3d 


Diff: https://reviews.apache.org/r/66049/diff/8/

Changes: https://reviews.apache.org/r/66049/diff/7-8/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 66049: Added offer operation to grow and shrink persistent volumes.

2018-04-13 Thread Zhitao Li


> On April 12, 2018, 5:27 p.m., Greg Mann wrote:
> > include/mesos/mesos.proto
> > Lines 1975 (patched)
> > <https://reviews.apache.org/r/66049/diff/6/?file=1994606#file1994606line1975>
> >
> > Hmm I wonder if we should just use a `double` here? Does the 
> > `Value.Scalar` type provide some benefit?
> 
> Chun-Hung Hsiao wrote:
> From the API perspective, `double` is easier to use, but `Value.Scalar` 
> is more consistent with how we represesnt a disk resource.
> 
> From the implementation perspective, I prefer `Value.Scalar` because it's 
> arithmetic operators will do fixed-point conversions for me so I won't forget 
> that.
> 
> Greg Mann wrote:
> My recommendation would be to prioritize easy-of-use in the API over 
> ease-of-implementation when possible :)
> 
> In the implementation, as long as we toss the provided double into a 
> `Scalar::Value` before we do anything with it, I think we could get the same 
> benefit?

I feel that `Scalar` is actually more consistent than `double` here. I do not 
see either API is much easier to use.


- Zhitao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66049/#review201067
-------


On April 13, 2018, 10:20 a.m., Zhitao Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66049/
> ---
> 
> (Updated April 13, 2018, 10:20 a.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gaston Kleiman, and Greg Mann.
> 
> 
> Bugs: MESOS-4965
> https://issues.apache.org/jira/browse/MESOS-4965
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added offer operation to grow and shrink persistent volumes.
> 
> 
> Diffs
> -
> 
>   include/mesos/mesos.proto 676f0b090cad7ebf59eb32556f17ff8b5f247907 
>   include/mesos/v1/mesos.proto 10d506517c9f098374ab0c8f4dcfda42e1896c95 
>   src/common/protobuf_utils.cpp 141a444534b776a2c90e2a0daf9727cd21e39080 
>   src/common/resources_utils.cpp 9be01c1abd48264e308960f35cc7c2ee8a367518 
>   src/master/master.cpp 767ad8cfe142b47ef07172bcb2a4fb49fc3e833a 
>   src/tests/mesos.hpp 6f4e0c5567b99891f6d0eceb7e6917d25083db0e 
>   src/tests/persistent_volume_tests.cpp 
> 4edf781711d9efdb994114aeb6289b6af750b87a 
>   src/tests/reservation_tests.cpp 5570df2e0b208512d0a0a3079a180e1acfe08f3d 
> 
> 
> Diff: https://reviews.apache.org/r/66049/diff/8/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zhitao Li
> 
>



Re: Review Request 66049: Added offer operation to grow and shrink persistent volumes.

2018-04-13 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66049/
---

(Updated April 13, 2018, 10:15 a.m.)


Review request for mesos, Chun-Hung Hsiao, Gaston Kleiman, and Greg Mann.


Changes
---

Add `UNIMPLEMENTED` annotations so this compiles.


Bugs: MESOS-4965
https://issues.apache.org/jira/browse/MESOS-4965


Repository: mesos


Description
---

Added offer operation to grow and shrink persistent volumes.


Diffs (updated)
-

  include/mesos/mesos.proto 676f0b090cad7ebf59eb32556f17ff8b5f247907 
  include/mesos/v1/mesos.proto 10d506517c9f098374ab0c8f4dcfda42e1896c95 
  src/common/protobuf_utils.cpp 141a444534b776a2c90e2a0daf9727cd21e39080 
  src/common/resources_utils.cpp 9be01c1abd48264e308960f35cc7c2ee8a367518 
  src/master/master.cpp 767ad8cfe142b47ef07172bcb2a4fb49fc3e833a 
  src/tests/mesos.hpp 6f4e0c5567b99891f6d0eceb7e6917d25083db0e 
  src/tests/persistent_volume_tests.cpp 
4edf781711d9efdb994114aeb6289b6af750b87a 
  src/tests/reservation_tests.cpp 5570df2e0b208512d0a0a3079a180e1acfe08f3d 


Diff: https://reviews.apache.org/r/66049/diff/7/

Changes: https://reviews.apache.org/r/66049/diff/6-7/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 66050: Implemented grow and shrink of persistent volumes.

2018-04-09 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66050/
---

(Updated April 9, 2018, 9:18 p.m.)


Review request for mesos, Chun-Hung Hsiao, Gaston Kleiman, and Greg Mann.


Changes
---

Make operation impl as speculative for now.


Bugs: MESOS-4965
https://issues.apache.org/jira/browse/MESOS-4965


Repository: mesos


Description (updated)
---

The new offer operations are implemented as speculative operations, but
we will use validation to make them non-speculative on API level so that
we can transition later without a breaking change.


Diffs (updated)
-

  src/common/protobuf_utils.cpp 141a444534b776a2c90e2a0daf9727cd21e39080 
  src/common/resources_utils.cpp 9be01c1abd48264e308960f35cc7c2ee8a367518 
  src/master/master.cpp f7da675e8fe96159e5335c9e83b65b67ed90eda8 
  src/master/validation.hpp 7c129ceb929596acbb64d37025e055661277e6bf 
  src/master/validation.cpp ac2e1bb8771841ec59b3bdcdeffb6c6230680d4d 
  src/tests/mesos.hpp 6f4e0c5567b99891f6d0eceb7e6917d25083db0e 
  src/tests/persistent_volume_tests.cpp 
4edf781711d9efdb994114aeb6289b6af750b87a 
  src/tests/reservation_tests.cpp 5570df2e0b208512d0a0a3079a180e1acfe08f3d 


Diff: https://reviews.apache.org/r/66050/diff/8/

Changes: https://reviews.apache.org/r/66050/diff/7-8/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 66051: Implemented operator API to grow and shrink persistent volume.

2018-04-09 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66051/
---

(Updated April 9, 2018, 9:21 p.m.)


Review request for mesos, Chun-Hung Hsiao, Gaston Kleiman, and Greg Mann.


Changes
---

Implement as speculative for now.


Bugs: MESOS-4965
https://issues.apache.org/jira/browse/MESOS-4965


Repository: mesos


Description (updated)
---

These operator APIs is implemented as speculative for now, but
we plan to convert them to non-speculative in the future.


Diffs (updated)
-

  src/master/http.cpp 34c9023906eca94965acc994f20e888c1f47b962 
  src/master/master.hpp 0d9620dd0c232dc1df83477e838eeb7313bf8828 
  src/master/validation.cpp ac2e1bb8771841ec59b3bdcdeffb6c6230680d4d 


Diff: https://reviews.apache.org/r/66051/diff/9/

Changes: https://reviews.apache.org/r/66051/diff/8-9/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 66227: Added test for `GROW_VOLUME` and `SHRINK_VOLUME` operator API.

2018-04-09 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66227/
---

(Updated April 9, 2018, 9:22 p.m.)


Review request for mesos, Chun-Hung Hsiao, Gaston Kleiman, and Greg Mann.


Changes
---

Fix typoes and missing expect and make test fully pass.


Bugs: MESOS-8747
https://issues.apache.org/jira/browse/MESOS-8747


Repository: mesos


Description
---

Added test for `GROW_VOLUME` and `SHRINK_VOLUME` operator API.


Diffs (updated)
-

  src/tests/api_tests.cpp dd8e221d8fd1b2a241505345337897e4ee4a6347 


Diff: https://reviews.apache.org/r/66227/diff/3/

Changes: https://reviews.apache.org/r/66227/diff/2-3/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 66051: Implemented operator API to grow and shrink persistent volume.

2018-04-09 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66051/
---

(Updated April 9, 2018, 9:32 p.m.)


Review request for mesos, Chun-Hung Hsiao, Gaston Kleiman, and Greg Mann.


Changes
---

Style and validation comments from Greg.


Bugs: MESOS-4965
https://issues.apache.org/jira/browse/MESOS-4965


Repository: mesos


Description
---

These operator APIs is implemented as speculative for now, but
we plan to convert them to non-speculative in the future.


Diffs (updated)
-

  src/master/http.cpp 34c9023906eca94965acc994f20e888c1f47b962 
  src/master/master.hpp 0d9620dd0c232dc1df83477e838eeb7313bf8828 
  src/master/validation.cpp ac2e1bb8771841ec59b3bdcdeffb6c6230680d4d 


Diff: https://reviews.apache.org/r/66051/diff/10/

Changes: https://reviews.apache.org/r/66051/diff/9-10/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 66051: Implemented operator API to grow and shrink persistent volume.

2018-04-09 Thread Zhitao Li


> On March 27, 2018, 6:05 p.m., Greg Mann wrote:
> > src/master/master.cpp
> > Lines 10798-10837 (original), 10794-10839 (patched)
> > <https://reviews.apache.org/r/66051/diff/7/?file=1988982#file1988982line10798>
> >
> > As discussed in chat, we need to update the agent's resources in the 
> > allocator for operator-initiated operations.

I'm separating the actual logic change of supporting non-speculative operator 
API into future task and patch.


- Zhitao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66051/#review200051
-------


On April 9, 2018, 9:32 p.m., Zhitao Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66051/
> ---
> 
> (Updated April 9, 2018, 9:32 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gaston Kleiman, and Greg Mann.
> 
> 
> Bugs: MESOS-4965
> https://issues.apache.org/jira/browse/MESOS-4965
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> These operator APIs is implemented as speculative for now, but
> we plan to convert them to non-speculative in the future.
> 
> 
> Diffs
> -
> 
>   src/master/http.cpp 34c9023906eca94965acc994f20e888c1f47b962 
>   src/master/master.hpp 0d9620dd0c232dc1df83477e838eeb7313bf8828 
>   src/master/validation.cpp ac2e1bb8771841ec59b3bdcdeffb6c6230680d4d 
> 
> 
> Diff: https://reviews.apache.org/r/66051/diff/10/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zhitao Li
> 
>



Re: Review Request 66051: Implemented operator API to grow and shrink persistent volume.

2018-04-09 Thread Zhitao Li


> On March 27, 2018, 6:58 p.m., Greg Mann wrote:
> > src/master/master.cpp
> > Lines 11881-11887 (original), 11885-11890 (patched)
> > <https://reviews.apache.org/r/66051/diff/7/?file=1988982#file1988982line11889>
> >
> > Currently, it looks like `Slave::usedResources` is the same as the 
> > "allocated resources" from that agent.
> > 
> > When a GROW or SHRINK operation is initiated by an operator, its 
> > consumed resource is in a unique state in the master while that operation 
> > is pending: the volume is not allocated, but it is "used" in some sense.
> > 
> > One place where we use `Slave::usedResources` is when validating 
> > DESTROY operations: 
> > https://github.com/apache/mesos/blob/21e4b45b388d0b272236b1c58313569f8a1d1fc8/src/master/master.cpp#L4841
> > 
> > This means if the master receives a DESTROY call for a persistent 
> > volume while there is a pending operator-initiated GROW_VOLUME call for 
> > that same volume (i.e. the operation was forwarded to the agent for 
> > processing but the master hasn't heard back yet), we would accept the 
> > DESTROY operation and forward it to the agent. Is this what we want?
> > 
> > This brings something else to mind: would it be possible for the 
> > allocator to send out an offer containing the consumed resource of a 
> > GROW_VOLUME or SHRINK_VOLUME operation, while the operation is pending on 
> > the agent? This seems bad, but I believe it's possible. I think this may be 
> > the first time we have a set of resources which we don't want the allocator 
> > to offer for a period of time, but which are not currently allocated to 
> > some framework/role. I wonder if we need some new tools in the allocator to 
> > handle this?
> 
> Chun-Hung Hsiao wrote:
> Yeah this sounds bad. Would it help if we remove the consumed resources 
> from the allocator and put it back after receiving the terminal status update?
> 
> Zhitao Li wrote:
> I agree this is problematic.
> 
> It seems like we need to make sure several things:
> 1. `consumed` resources of a non-speculative operation can not offered to 
> any framework while pending;
> 2. `consumed` resources should still be visible for various validation;
> 
> Therefore, I'm thinking about introducing a new field in `Master::Slave` 
> struct which represents `pendingConsumedResources` (or a function in `Slave` 
> which can extract this info from pending operations). We can then use it to 
> validation of `DESTROY` call.
> 
> Another issue I found was in `Master::apply()`, which we directly call 
> `allocator->updateAvailable(slave->id, {operation})`, but semantically we 
> should be calling `allocator->updateAvailable(slave->id, consumed, empty)` so 
> that allocator would not see the `converted` resources until operation 
> confirmations comes back.

I'm separating the actual logic change of supporting non-speculative operator 
API into future task and patch.


- Zhitao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66051/#review200086
---


On April 9, 2018, 9:32 p.m., Zhitao Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66051/
> ---
> 
> (Updated April 9, 2018, 9:32 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gaston Kleiman, and Greg Mann.
> 
> 
> Bugs: MESOS-4965
> https://issues.apache.org/jira/browse/MESOS-4965
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> These operator APIs is implemented as speculative for now, but
> we plan to convert them to non-speculative in the future.
> 
> 
> Diffs
> -
> 
>   src/master/http.cpp 34c9023906eca94965acc994f20e888c1f47b962 
>   src/master/master.hpp 0d9620dd0c232dc1df83477e838eeb7313bf8828 
>   src/master/validation.cpp ac2e1bb8771841ec59b3bdcdeffb6c6230680d4d 
> 
> 
> Diff: https://reviews.apache.org/r/66051/diff/10/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zhitao Li
> 
>



Re: Review Request 66343: Added test for difference operator of hashset.

2018-04-09 Thread Zhitao Li


> On April 5, 2018, 8:26 p.m., Benjamin Mahler wrote:
> > looks like all of these ASSERTs can be EXPECTs?

Existing tests are all using ASSERTs. We can perform a sweeping change 
separately if desired.


- Zhitao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66343/#review200623
---


On March 28, 2018, 11:23 a.m., Zhitao Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66343/
> ---
> 
> (Updated March 28, 2018, 11:23 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Chun-Hung Hsiao, and Jason Lai.
> 
> 
> Bugs: MESOS-8746
> https://issues.apache.org/jira/browse/MESOS-8746
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added test for difference operator of hashset.
> 
> 
> Diffs
> -
> 
>   3rdparty/stout/tests/hashset_tests.cpp 
> d2b7bf248647c63c00c8c82b8176130c87c50eb4 
> 
> 
> Diff: https://reviews.apache.org/r/66343/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zhitao Li
> 
>



Re: Review Request 66049: Added offer operation to grow and shrink persistent volumes.

2018-04-09 Thread Zhitao Li


> On March 29, 2018, 5:07 a.m., Benjamin Bannier wrote:
> > include/mesos/mesos.proto
> > Lines 1920-1921 (patched)
> > <https://reviews.apache.org/r/66049/diff/5/?file=1989744#file1989744line1920>
> >
> > These enum values are not handled in a number of switch statements in 
> > the code so that this patch cannot be used in isolation (fatal compiler 
> > warnings). It might be enough to enumerate the new cases explicitly and 
> > mark them `UNIMPLEMENTED`.
> > 
> > Similar for v1 enum below.

My plan was to commit all patches leading to proper full implementation in one 
shot so code on master is still compilable (which seems previous norm to me?)

If you think marking them as `UNIMPLEMENTED` then gradually implement each in 
later patches is better, I can happily do that.


- Zhitao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66049/#review200185
-------


On March 28, 2018, 11:24 a.m., Zhitao Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66049/
> ---
> 
> (Updated March 28, 2018, 11:24 a.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gaston Kleiman, and Greg Mann.
> 
> 
> Bugs: MESOS-4965
> https://issues.apache.org/jira/browse/MESOS-4965
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added offer operation to grow and shrink persistent volumes.
> 
> 
> Diffs
> -
> 
>   include/mesos/mesos.proto 676f0b090cad7ebf59eb32556f17ff8b5f247907 
>   include/mesos/v1/mesos.proto 10d506517c9f098374ab0c8f4dcfda42e1896c95 
> 
> 
> Diff: https://reviews.apache.org/r/66049/diff/5/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zhitao Li
> 
>



Re: Review Request 66481: Converted `pid` in command executor to `Option`.

2018-04-07 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66481/
---

(Updated April 7, 2018, 5:50 p.m.)


Review request for mesos, Andrew Schwartzmeyer and James Peach.


Repository: mesos


Description (updated)
---

This avoids unix's assumption that `pid_t` is a signed integer (which is
not the case on Windows) and explicitly shows whether a pid has been
assigned from launching.

We also changed argument name in `reaped` method to `_pid` to avoid
shadowing.


Diffs (updated)
-

  src/launcher/executor.cpp 383655230fcdad30b652ea984cb9f5fc9c49dd38 


Diff: https://reviews.apache.org/r/66481/diff/2/

Changes: https://reviews.apache.org/r/66481/diff/1-2/


Testing
---

`make` on Linux.


Thanks,

Zhitao Li



Re: Review Request 66258: Added `max_completion_time` to `TaskInfo` and new reason.

2018-04-07 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66258/
---

(Updated April 7, 2018, 5:51 p.m.)


Review request for mesos, Jason Lai and James Peach.


Changes
---

Rename as we discussed.


Summary (updated)
-

Added `max_completion_time` to `TaskInfo` and new reason.


Bugs: MESOS-8725
https://issues.apache.org/jira/browse/MESOS-8725


Repository: mesos


Description
---

This new field can be used support tasks which have a maximum duration
scheduling requirement. The new reason is added to distinguish from
normal task kills.


Diffs (updated)
-

  include/mesos/mesos.proto 676f0b090cad7ebf59eb32556f17ff8b5f247907 
  include/mesos/v1/mesos.proto 10d506517c9f098374ab0c8f4dcfda42e1896c95 


Diff: https://reviews.apache.org/r/66258/diff/2/

Changes: https://reviews.apache.org/r/66258/diff/1-2/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 66259: Added `max_completion_time` support to command executor.

2018-04-07 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66259/
---

(Updated April 7, 2018, 5:51 p.m.)


Review request for mesos, Jason Lai and James Peach.


Changes
---

Rename and proper state.


Summary (updated)
-

Added `max_completion_time` support to command executor.


Bugs: MESOS-8725
https://issues.apache.org/jira/browse/MESOS-8725


Repository: mesos


Description (updated)
---

If `TaskInfo.max_completion_time` is set, command executor will kill
the task with `SIGKILL` immediately. Note that no KillPolicy will be
observed. Framework should only received a `TASK_FAILED` state with
`REASON_MAX_COMPLETION_TIME_REACHED` reason.


Diffs (updated)
-

  src/launcher/executor.cpp 383655230fcdad30b652ea984cb9f5fc9c49dd38 


Diff: https://reviews.apache.org/r/66259/diff/2/

Changes: https://reviews.apache.org/r/66259/diff/1-2/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 66481: Converted `pid` in command executor to `Option`.

2018-04-07 Thread Zhitao Li


> On April 5, 2018, 5:01 p.m., James Peach wrote:
> > I think we should just change all the usages of `pid` rather than renaming 
> > it to `pid_`.

Fixed.


- Zhitao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66481/#review200605
---


On April 7, 2018, 5:50 p.m., Zhitao Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66481/
> ---
> 
> (Updated April 7, 2018, 5:50 p.m.)
> 
> 
> Review request for mesos, Andrew Schwartzmeyer and James Peach.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This avoids unix's assumption that `pid_t` is a signed integer (which is
> not the case on Windows) and explicitly shows whether a pid has been
> assigned from launching.
> 
> We also changed argument name in `reaped` method to `_pid` to avoid
> shadowing.
> 
> 
> Diffs
> -
> 
>   src/launcher/executor.cpp 383655230fcdad30b652ea984cb9f5fc9c49dd38 
> 
> 
> Diff: https://reviews.apache.org/r/66481/diff/2/
> 
> 
> Testing
> ---
> 
> `make` on Linux.
> 
> 
> Thanks,
> 
> Zhitao Li
> 
>



Re: Review Request 66291: Added support to `max_completion_time` in default executor.

2018-04-07 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66291/
---

(Updated April 7, 2018, 5:53 p.m.)


Review request for mesos, Jason Lai and James Peach.


Changes
---

Rename and proper state.


Summary (updated)
-

Added support to `max_completion_time` in default executor.


Bugs: MESOS-8725
https://issues.apache.org/jira/browse/MESOS-8725


Repository: mesos


Description (updated)
---

When a task group has multiple tasks:
- each task can have its own `max_completion_time`, or not have one;
- if a task succeeds before its `max_completion_time`, all other tasks
will keep running;
- if a task fails, all other tasks in the same group will fail (as
before);
- if a task does not succeed before its `max_completion_time`, it will
fail with `TASK_FAILED` and reason `REASON_MAX_COMPLETION_TIME_REACHED`,
while other tasks will be killed without above reason.


Diffs (updated)
-

  src/launcher/default_executor.cpp ea0d425bb60e970f209f411632e1d486c279259b 


Diff: https://reviews.apache.org/r/66291/diff/2/

Changes: https://reviews.apache.org/r/66291/diff/1-2/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 66284: Tested `max_completion_time` support in docker executor.

2018-04-07 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66284/
---

(Updated April 7, 2018, 5:53 p.m.)


Review request for mesos, Jason Lai and James Peach.


Changes
---

Updated test according to rename.


Summary (updated)
-

Tested `max_completion_time` support in docker executor.


Bugs: MESOS-8725
https://issues.apache.org/jira/browse/MESOS-8725


Repository: mesos


Description (updated)
---

Tested `max_completion_time` support in docker executor.


Diffs (updated)
-

  src/tests/containerizer/docker_containerizer_tests.cpp 
c198ecbac50ef4eb7ea243bebc17b65db7ce5cb7 


Diff: https://reviews.apache.org/r/66284/diff/2/

Changes: https://reviews.apache.org/r/66284/diff/1-2/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 66283: Added support of `max_completion_time` in docker executor.

2018-04-07 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66283/
---

(Updated April 7, 2018, 5:52 p.m.)


Review request for mesos, Jason Lai and James Peach.


Changes
---

Rename and proper state.


Summary (updated)
-

Added support of `max_completion_time` in docker executor.


Bugs: MESOS-8725
https://issues.apache.org/jira/browse/MESOS-8725


Repository: mesos


Description (updated)
---

If `TaskInfo.max_completion_time` is set, docker executor will kill
the task immediately. We reuse the `shutdown` method to achieve a
forced kill ignoring any `KillPolicy`.

Framework should only received a `TASK_FAILED` state with
`REASON_MAX_COMPLETION_TIME_REACHED` reason.


Diffs (updated)
-

  src/docker/executor.cpp 1d672112bae889cc9d19343a59e0ff66569785c4 


Diff: https://reviews.apache.org/r/66283/diff/2/

Changes: https://reviews.apache.org/r/66283/diff/1-2/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 66293: Tested default executor support of `max_completion_time`.

2018-04-07 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66293/
---

(Updated April 7, 2018, 5:54 p.m.)


Review request for mesos, Jason Lai and James Peach.


Changes
---

Rename and updated test.


Summary (updated)
-

Tested default executor support of `max_completion_time`.


Bugs: MESOS-8725
https://issues.apache.org/jira/browse/MESOS-8725


Repository: mesos


Description (updated)
---

Tested default executor support of `max_completion_time`.


Diffs (updated)
-

  src/tests/default_executor_tests.cpp 293dd20d882447401572835bd31e197faf76861b 


Diff: https://reviews.apache.org/r/66293/diff/2/

Changes: https://reviews.apache.org/r/66293/diff/1-2/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 66260: Tested `max_completion_time` support in command executor.

2018-04-07 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66260/
---

(Updated April 7, 2018, 5:52 p.m.)


Review request for mesos, Jason Lai and James Peach.


Changes
---

Fix test accordingly.


Summary (updated)
-

Tested `max_completion_time` support in command executor.


Bugs: MESOS-8725
https://issues.apache.org/jira/browse/MESOS-8725


Repository: mesos


Description (updated)
---

Tested `max_completion_time` support in command executor.


Diffs (updated)
-

  src/tests/command_executor_tests.cpp 3c5687f918deb8cdda8a97abb0fbd8d9b089926a 


Diff: https://reviews.apache.org/r/66260/diff/2/

Changes: https://reviews.apache.org/r/66260/diff/1-2/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 66568: Dropped combined operations with GROW and SHRINK volumes.

2018-04-17 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66568/
---

(Updated April 17, 2018, 10 a.m.)


Review request for mesos, Chun-Hung Hsiao and Greg Mann.


Changes
---

Drop all operations if combined with GROW/SHRINK.


Summary (updated)
-

Dropped combined operations with GROW and SHRINK volumes.


Bugs: MESOS-4965
https://issues.apache.org/jira/browse/MESOS-4965


Repository: mesos


Description (updated)
---

These two operations are intended to be non-speculative eventually but
are implemented as speculative right now. To avoid frameworks opt-in to
dangerous behavior, we require that accept can only contain one
`GROW_VOLUME` or `SHRINK_VOLUME` and no other operations.

This is implemented by reuse existing code which already drops `LAUNCH`
or `LAUNCH_GROUP` with proper reason and message.


Diffs (updated)
-

  src/master/master.cpp 767ad8cfe142b47ef07172bcb2a4fb49fc3e833a 


Diff: https://reviews.apache.org/r/66568/diff/2/

Changes: https://reviews.apache.org/r/66568/diff/1-2/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 66050: Implemented grow and shrink of persistent volumes.

2018-04-17 Thread Zhitao Li


> On April 12, 2018, 5:33 p.m., Greg Mann wrote:
> > src/master/master.cpp
> > Lines 4986-4994 (patched)
> > <https://reviews.apache.org/r/66050/diff/8/?file=1994610#file1994610line4986>
> >
> > What will happen if these operations are sent to a 1.5 agent which has 
> > the RESOURCE_PROVIDER capability, but does not support the operation?
> 
> Zhitao Li wrote:
> Good catch.
> 
> For a 1.5 agent, I think it will crash at this line: 
> https://github.com/apache/mesos/blob/1.5.0/src/common/protobuf_utils.cpp#L851 
> because it cannot know whether this unknown operation is speculative. I guess 
> that triggers the agent to reregister with master without changing its 
> checkpointed resources.
> 
> In fact, this is much better than attempting to apply, because agent 
> would do a delete followed by recreate and cause data loss on the persistent 
> volume (the fix is in r/66218).
> 
> It seems like agent version was never present in `AgentInfo` so master 
> cannot really know this is an old agent version. Do you want to consider 
> adding that?
> 
> Greg Mann wrote:
> Ah, it looks like the version is included in the 'RegisterSlaveMessage' 
> and 'ReregisterSlaveMessage', and the master places it in the 'Slave' struct: 
> https://github.com/apache/mesos/blob/bd688e4cf9f6f35c9d75aad50b99e1fdeb8104da/src/master/master.cpp#L6586
> 
> Perhaps we should check that the version is >= 1.6.0 rather than checking 
> for the RESOURCE_PROVIDER capability?
> 
> Chun-Hung Hsiao wrote:
> In proto2, since the enum is not defined in 1.5 proto, the enum field 
> will be treated as an unknown field.
> See: https://developers.google.com/protocol-buffers/docs/proto#enum

I'm inclined not to add a version check anymore, but just mention that this is 
not safe to use if operator is using 1.5.x agent with *experimental* 
`RESOURCE_PROVIDER` feature.


- Zhitao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66050/#review201055
---


On April 17, 2018, 9:58 a.m., Zhitao Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66050/
> ---
> 
> (Updated April 17, 2018, 9:58 a.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gaston Kleiman, and Greg Mann.
> 
> 
> Bugs: MESOS-4965
> https://issues.apache.org/jira/browse/MESOS-4965
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The new offer operations are implemented as speculative operations, but
> we will use validation to make them non-speculative on API level so that
> we can transition later without a breaking change.
> 
> 
> Diffs
> -
> 
>   src/common/protobuf_utils.cpp 141a444534b776a2c90e2a0daf9727cd21e39080 
>   src/common/resources_utils.cpp 9be01c1abd48264e308960f35cc7c2ee8a367518 
>   src/master/master.cpp 767ad8cfe142b47ef07172bcb2a4fb49fc3e833a 
>   src/master/validation.hpp 7c129ceb929596acbb64d37025e055661277e6bf 
>   src/master/validation.cpp ac2e1bb8771841ec59b3bdcdeffb6c6230680d4d 
> 
> 
> Diff: https://reviews.apache.org/r/66050/diff/11/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zhitao Li
> 
>



Re: Review Request 66050: Implemented grow and shrink of persistent volumes.

2018-04-17 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66050/
---

(Updated April 17, 2018, 9:58 a.m.)


Review request for mesos, Chun-Hung Hsiao, Gaston Kleiman, and Greg Mann.


Changes
---

Review comments.


Bugs: MESOS-4965
https://issues.apache.org/jira/browse/MESOS-4965


Repository: mesos


Description
---

The new offer operations are implemented as speculative operations, but
we will use validation to make them non-speculative on API level so that
we can transition later without a breaking change.


Diffs (updated)
-

  src/common/protobuf_utils.cpp 141a444534b776a2c90e2a0daf9727cd21e39080 
  src/common/resources_utils.cpp 9be01c1abd48264e308960f35cc7c2ee8a367518 
  src/master/master.cpp 767ad8cfe142b47ef07172bcb2a4fb49fc3e833a 
  src/master/validation.hpp 7c129ceb929596acbb64d37025e055661277e6bf 
  src/master/validation.cpp ac2e1bb8771841ec59b3bdcdeffb6c6230680d4d 


Diff: https://reviews.apache.org/r/66050/diff/11/

Changes: https://reviews.apache.org/r/66050/diff/10-11/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 66049: Added offer operation to grow and shrink persistent volumes.

2018-04-20 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66049/
---

(Updated April 20, 2018, 11:38 a.m.)


Review request for mesos, Chun-Hung Hsiao, Gaston Kleiman, and Greg Mann.


Changes
---

Rebase.


Bugs: MESOS-4965
https://issues.apache.org/jira/browse/MESOS-4965


Repository: mesos


Description
---

Added offer operation to grow and shrink persistent volumes.


Diffs (updated)
-

  include/mesos/mesos.proto 9e24d3ea46edc21572e9226e2e76c7d55618db24 
  include/mesos/v1/mesos.proto 0f3fd8a2608b5edabc21f5fe5df9b70fc0fa8dc2 
  src/common/protobuf_utils.cpp 141a444534b776a2c90e2a0daf9727cd21e39080 
  src/common/resources_utils.cpp 9be01c1abd48264e308960f35cc7c2ee8a367518 
  src/master/master.cpp 767ad8cfe142b47ef07172bcb2a4fb49fc3e833a 
  src/tests/mesos.hpp 7356523a2a37858bbe866527425f35502ff5ad82 
  src/tests/persistent_volume_tests.cpp 
4edf781711d9efdb994114aeb6289b6af750b87a 
  src/tests/reservation_tests.cpp 5570df2e0b208512d0a0a3079a180e1acfe08f3d 


Diff: https://reviews.apache.org/r/66049/diff/10/

Changes: https://reviews.apache.org/r/66049/diff/9-10/


Testing
---


Thanks,

Zhitao Li



Review Request 66733: Added a new `RESIZE_VOLUME` agent capability.

2018-04-20 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66733/
---

Review request for mesos, Chun-Hung Hsiao, Gaston Kleiman, and Greg Mann.


Bugs: MESOS-4945
https://issues.apache.org/jira/browse/MESOS-4945


Repository: mesos


Description
---

This will be used as a feature flag to gate the new volume resize
feature. This feature will be turn on by default once released.


Diffs
-

  include/mesos/mesos.proto 9e24d3ea46edc21572e9226e2e76c7d55618db24 
  include/mesos/v1/mesos.proto 0f3fd8a2608b5edabc21f5fe5df9b70fc0fa8dc2 
  src/common/protobuf_utils.hpp ae060f3a7946ac5862faeca15330e9642f934137 
  src/common/protobuf_utils.cpp 141a444534b776a2c90e2a0daf9727cd21e39080 
  src/slave/constants.cpp 51de71bc45ff4fb862690efd9428b25d92055391 
  src/tests/master_tests.cpp be7a3cc6126d2fe7c56c49b3da5f6f4bf29657f5 
  src/tests/slave_tests.cpp 04f7acad47aa980d507bbe29b29db5eb050c02f3 


Diff: https://reviews.apache.org/r/66733/diff/1/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 66050: Implemented grow and shrink of persistent volumes.

2018-04-20 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66050/
---

(Updated April 20, 2018, 11:40 a.m.)


Review request for mesos, Chun-Hung Hsiao, Gaston Kleiman, and Greg Mann.


Changes
---

Rebase and check resize volume capability.


Bugs: MESOS-4965
https://issues.apache.org/jira/browse/MESOS-4965


Repository: mesos


Description
---

The new offer operations are implemented as speculative operations, but
we will use validation to make them non-speculative on API level so that
we can transition later without a breaking change.


Diffs (updated)
-

  src/common/protobuf_utils.cpp 141a444534b776a2c90e2a0daf9727cd21e39080 
  src/common/resources_utils.cpp 9be01c1abd48264e308960f35cc7c2ee8a367518 
  src/master/master.cpp 767ad8cfe142b47ef07172bcb2a4fb49fc3e833a 
  src/master/validation.hpp 7c129ceb929596acbb64d37025e055661277e6bf 
  src/master/validation.cpp ac2e1bb8771841ec59b3bdcdeffb6c6230680d4d 


Diff: https://reviews.apache.org/r/66050/diff/12/

Changes: https://reviews.apache.org/r/66050/diff/11-12/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 66051: Implemented operator API to grow and shrink persistent volume.

2018-04-20 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66051/
---

(Updated April 20, 2018, 4:18 p.m.)


Review request for mesos, Chun-Hung Hsiao, Gaston Kleiman, and Greg Mann.


Changes
---

Renamed Authorization.


Bugs: MESOS-8747
https://issues.apache.org/jira/browse/MESOS-8747


Repository: mesos


Description
---

These operator APIs is implemented as speculative for now, but
we plan to convert them to non-speculative in the future.


Diffs (updated)
-

  src/master/http.cpp 34c9023906eca94965acc994f20e888c1f47b962 
  src/master/master.hpp 0d9620dd0c232dc1df83477e838eeb7313bf8828 
  src/master/validation.cpp ac2e1bb8771841ec59b3bdcdeffb6c6230680d4d 


Diff: https://reviews.apache.org/r/66051/diff/13/

Changes: https://reviews.apache.org/r/66051/diff/12-13/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 66568: Dropped combined operations with GROW and SHRINK volumes.

2018-04-20 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66568/
---

(Updated April 20, 2018, 4:15 p.m.)


Review request for mesos, Chun-Hung Hsiao and Greg Mann.


Changes
---

Review comments.


Bugs: MESOS-4965
https://issues.apache.org/jira/browse/MESOS-4965


Repository: mesos


Description
---

These two operations are intended to be non-speculative eventually but
are implemented as speculative right now. To avoid frameworks opt-in to
dangerous behavior, we require that accept can only contain one
`GROW_VOLUME` or `SHRINK_VOLUME` and no other operations.

This is implemented by reuse existing code which already drops `LAUNCH`
or `LAUNCH_GROUP` with proper reason and message.


Diffs (updated)
-

  src/master/master.cpp 767ad8cfe142b47ef07172bcb2a4fb49fc3e833a 


Diff: https://reviews.apache.org/r/66568/diff/4/

Changes: https://reviews.apache.org/r/66568/diff/3-4/


Testing
---

Behavior tested in https://reviews.apache.org/r/66569/.


Thanks,

Zhitao Li



Re: Review Request 66532: Added test for authorization actions for `RESIZE_VOLUME`.

2018-04-20 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66532/
---

(Updated April 20, 2018, 4:17 p.m.)


Review request for mesos, Chun-Hung Hsiao and Greg Mann.


Changes
---

Renamed to `ResizeVolume`.


Summary (updated)
-

Added test for authorization actions for `RESIZE_VOLUME`.


Bugs: MESOS-8748
https://issues.apache.org/jira/browse/MESOS-8748


Repository: mesos


Description (updated)
---

Added test for authorization actions for `RESIZE_VOLUME`.


Diffs (updated)
-

  src/tests/authorization_tests.cpp a76ad18a54ec232e6d6c92ec6a45b445a83f174c 


Diff: https://reviews.apache.org/r/66532/diff/4/

Changes: https://reviews.apache.org/r/66532/diff/3-4/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 66531: Added new authorization for `ResizeVolume`.

2018-04-20 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66531/
---

(Updated April 20, 2018, 4:16 p.m.)


Review request for mesos, Chun-Hung Hsiao and Greg Mann.


Changes
---

Rename to `ResizeVolume`.


Summary (updated)
-

Added new authorization for `ResizeVolume`.


Bugs: MESOS-8748
https://issues.apache.org/jira/browse/MESOS-8748


Repository: mesos


Description
---

The new authorization action is modelled after `CreateVolume`, and will
be shared by both `GROW_VOLUME` and `SHRINK_VOLUME` operations and
corresponding operator APIs.


Diffs (updated)
-

  include/mesos/authorizer/acls.proto 8ef33210644e7d2924b402a3158b1b38c1fdb464 
  include/mesos/authorizer/authorizer.proto 
1508c01130013d74ed2b2574a2428f38e3d2c064 
  src/authorizer/local/authorizer.cpp c098ba9ded1b29a2e079cf09ab80b61f6fa4af05 
  src/master/master.hpp 0d9620dd0c232dc1df83477e838eeb7313bf8828 
  src/master/master.cpp 767ad8cfe142b47ef07172bcb2a4fb49fc3e833a 


Diff: https://reviews.apache.org/r/66531/diff/3/

Changes: https://reviews.apache.org/r/66531/diff/2-3/


Testing
---

Manually created ACL and verified that untrusted principals will not allow to 
grow/shrink volumes.
Also created unit test in next patch.


Thanks,

Zhitao Li



Re: Review Request 66050: Implemented grow and shrink of persistent volumes.

2018-04-16 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66050/
---

(Updated April 16, 2018, 10:09 a.m.)


Review request for mesos, Chun-Hung Hsiao, Gaston Kleiman, and Greg Mann.


Changes
---

Style review comments.


Bugs: MESOS-4965
https://issues.apache.org/jira/browse/MESOS-4965


Repository: mesos


Description
---

The new offer operations are implemented as speculative operations, but
we will use validation to make them non-speculative on API level so that
we can transition later without a breaking change.


Diffs (updated)
-

  src/common/protobuf_utils.cpp 141a444534b776a2c90e2a0daf9727cd21e39080 
  src/common/resources_utils.cpp 9be01c1abd48264e308960f35cc7c2ee8a367518 
  src/master/master.cpp 767ad8cfe142b47ef07172bcb2a4fb49fc3e833a 
  src/master/validation.hpp 7c129ceb929596acbb64d37025e055661277e6bf 
  src/master/validation.cpp ac2e1bb8771841ec59b3bdcdeffb6c6230680d4d 


Diff: https://reviews.apache.org/r/66050/diff/10/

Changes: https://reviews.apache.org/r/66050/diff/9-10/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 66220: Added test for `GROW_VOLUME` and `SHRINK_VOLUME` operations.

2018-04-16 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66220/#review201235
---



Please consider unreplied comments as "will do".


src/tests/persistent_volume_tests.cpp
Lines 453 (patched)
<https://reviews.apache.org/r/66220/#comment282370>

Is the goal of splitting the test to have more focused tests? Otherwise, 
just taking out the task launching part from the code seems resulting in 
minimum amount of code.

The file check actually verifies the fix for r/66218, which I think is 
something pretty important (not losing data during a grow/shrink). Only 
checking path exists for the volume would not capture that behavior.



src/tests/persistent_volume_tests.cpp
Lines 455-459 (patched)
<https://reviews.apache.org/r/66220/#comment282372>

I think we discussed about not allowing this but I forgot add this to 
validation. I'll update parent patch to include that validation.

If we will drop any `GROW`/`SHRINK` operation, then we can verify that 
`GROW` does not trigger any `ApplyOperationMessage` and framework will be 
offered original volume as is and keep this well tested.



src/tests/persistent_volume_tests.cpp
Lines 492 (patched)
<https://reviews.apache.org/r/66220/#comment282373>

I do not see a possibility. Let me try to drop it and see what happens.



src/tests/persistent_volume_tests.cpp
Lines 541-542 (patched)
<https://reviews.apache.org/r/66220/#comment282374>

Let's capture this message and test its content to make sure proper 
operation is forwarded.



src/tests/persistent_volume_tests.cpp
Lines 597-603 (patched)
<https://reviews.apache.org/r/66220/#comment282376>

This was necessary if the operation is implemented as non-speculative, 
because master/allocator only sends out `converted` resources after receiving 
the feedback from agent. If we don't ensure this is processed, the offer could 
be subject to race condition (I think).

Now that we changed the course, this may not be necessary in 1.6, but we'd 
need them again once we get to MESOS-8791.

Please let me know if you think keeping it is still a bad idea.



src/tests/persistent_volume_tests.cpp
Lines 677-678 (patched)
<https://reviews.apache.org/r/66220/#comment282378>

Resources::contains only matches a volume if their size perfectly matches, 
so I think it is sufficient.


- Zhitao Li


On April 11, 2018, 2:19 p.m., Zhitao Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66220/
> ---
> 
> (Updated April 11, 2018, 2:19 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gaston Kleiman, and Greg Mann.
> 
> 
> Bugs: MESOS-4965
> https://issues.apache.org/jira/browse/MESOS-4965
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added test for `GROW_VOLUME` and `SHRINK_VOLUME` operations.
> 
> 
> Diffs
> -
> 
>   src/tests/persistent_volume_tests.cpp 
> 4edf781711d9efdb994114aeb6289b6af750b87a 
> 
> 
> Diff: https://reviews.apache.org/r/66220/diff/4/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zhitao Li
> 
>



Re: Review Request 66050: Implemented grow and shrink of persistent volumes.

2018-04-16 Thread Zhitao Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66050/
---

(Updated April 16, 2018, 9:42 a.m.)


Review request for mesos, Chun-Hung Hsiao, Gaston Kleiman, and Greg Mann.


Changes
---

Rebase after dependent patch change.


Bugs: MESOS-4965
https://issues.apache.org/jira/browse/MESOS-4965


Repository: mesos


Description
---

The new offer operations are implemented as speculative operations, but
we will use validation to make them non-speculative on API level so that
we can transition later without a breaking change.


Diffs (updated)
-

  src/common/protobuf_utils.cpp 141a444534b776a2c90e2a0daf9727cd21e39080 
  src/common/resources_utils.cpp 9be01c1abd48264e308960f35cc7c2ee8a367518 
  src/master/master.cpp 767ad8cfe142b47ef07172bcb2a4fb49fc3e833a 
  src/master/validation.hpp 7c129ceb929596acbb64d37025e055661277e6bf 
  src/master/validation.cpp ac2e1bb8771841ec59b3bdcdeffb6c6230680d4d 


Diff: https://reviews.apache.org/r/66050/diff/9/

Changes: https://reviews.apache.org/r/66050/diff/8-9/


Testing
---


Thanks,

Zhitao Li



Re: Review Request 66050: Implemented grow and shrink of persistent volumes.

2018-04-16 Thread Zhitao Li


> On April 12, 2018, 5:33 p.m., Greg Mann wrote:
> >

Will do for any comment not explicitly replied.


> On April 12, 2018, 5:33 p.m., Greg Mann wrote:
> > src/master/master.cpp
> > Lines 4986-4994 (patched)
> > <https://reviews.apache.org/r/66050/diff/8/?file=1994610#file1994610line4986>
> >
> > What will happen if these operations are sent to a 1.5 agent which has 
> > the RESOURCE_PROVIDER capability, but does not support the operation?

Good catch.

For a 1.5 agent, I think it will crash at this line: 
https://github.com/apache/mesos/blob/1.5.0/src/common/protobuf_utils.cpp#L851 
because it cannot know whether this unknown operation is speculative. I guess 
that triggers the agent to reregister with master without changing its 
checkpointed resources.

In fact, this is much better than attempting to apply, because agent would do a 
delete followed by recreate and cause data loss on the persistent volume (the 
fix is in r/66218).

It seems like agent version was never present in `AgentInfo` so master cannot 
really know this is an old agent version. Do you want to consider adding that?


- Zhitao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66050/#review201055
-------


On April 16, 2018, 9:42 a.m., Zhitao Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66050/
> ---
> 
> (Updated April 16, 2018, 9:42 a.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gaston Kleiman, and Greg Mann.
> 
> 
> Bugs: MESOS-4965
> https://issues.apache.org/jira/browse/MESOS-4965
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The new offer operations are implemented as speculative operations, but
> we will use validation to make them non-speculative on API level so that
> we can transition later without a breaking change.
> 
> 
> Diffs
> -
> 
>   src/common/protobuf_utils.cpp 141a444534b776a2c90e2a0daf9727cd21e39080 
>   src/common/resources_utils.cpp 9be01c1abd48264e308960f35cc7c2ee8a367518 
>   src/master/master.cpp 767ad8cfe142b47ef07172bcb2a4fb49fc3e833a 
>   src/master/validation.hpp 7c129ceb929596acbb64d37025e055661277e6bf 
>   src/master/validation.cpp ac2e1bb8771841ec59b3bdcdeffb6c6230680d4d 
> 
> 
> Diff: https://reviews.apache.org/r/66050/diff/9/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zhitao Li
> 
>



Re: Review Request 66569: Added a test to verify that grow and shrink cannot be combined.

2018-04-19 Thread Zhitao Li


> On April 16, 2018, 3:05 p.m., Chun-Hung Hsiao wrote:
> >

Thanks for the review. I'll combine this patch into r/66220 and address your 
comment in that process.


> On April 16, 2018, 3:05 p.m., Chun-Hung Hsiao wrote:
> > src/tests/persistent_volume_tests.cpp
> > Lines 691-695 (patched)
> > <https://reviews.apache.org/r/66569/diff/1/?file=1996763#file1996763line691>
> >
> > Since this test just verifies that the grow and shrink operations 
> > cannot be combined with other operations, I was wondering if there's a way 
> > to check this without worry about the different behaviors produced by PATH 
> > or MOUNT.
> > 
> > Alternatively, if you take my suggestion to create another 
> > `PersistentVolumeResizeTest` fixture in the previous patch, we can move 
> > this test into that fixture.

Actually the check happens before type of `MOUNT` so this conditional check can 
be safely removed.


> On April 16, 2018, 3:05 p.m., Chun-Hung Hsiao wrote:
> > src/tests/persistent_volume_tests.cpp
> > Lines 775-781 (patched)
> > <https://reviews.apache.org/r/66569/diff/1/?file=1996763#file1996763line775>
> >
> > As discussed, I'm not sure if this is the semantics we want to enforce. 
> > If we end up rejecting the whole `ACCEPT` call, then we could simplify the 
> > test by:
> > 
> > 1. Starting with a persistent volume, some free disk and some CPU.
> > 2. Open receiving an offer, accept it with reserving the CPU and 
> > growing the persistent volume. Different type of resources are used here to 
> > make sure it's not just for preventing pipelining.
> > 3. Verify that the `ACCEPT` call fails by checking that the next offer 
> > contains the original resources.

Semantic already fixed in the other patch. I will fix the test to reflect your 
suggested semantic.


- Zhitao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66569/#review201179
---


On April 11, 2018, 2:19 p.m., Zhitao Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66569/
> ---
> 
> (Updated April 11, 2018, 2:19 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Greg Mann.
> 
> 
> Bugs: MESOS-4965
> https://issues.apache.org/jira/browse/MESOS-4965
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added a test to verify that grow and shrink cannot be combined.
> 
> 
> Diffs
> -
> 
>   src/tests/persistent_volume_tests.cpp 
> 4edf781711d9efdb994114aeb6289b6af750b87a 
> 
> 
> Diff: https://reviews.apache.org/r/66569/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zhitao Li
> 
>



Re: Review Request 66531: Added new authorization for `UpdateVolume`.

2018-04-19 Thread Zhitao Li


> On April 17, 2018, 4:14 p.m., Greg Mann wrote:
> > src/master/master.cpp
> > Lines 4924-4925 (original), 4980-4981 (patched)
> > <https://reviews.apache.org/r/66531/diff/1/?file=1995228#file1995228line4980>
> >
> > You need to finish implementing authorization here, and below for 
> > SHRINK_VOLUME.

Will fix, but I have question/comments on operation authorization handling in 
`Master::_accept`:
1. We don't really check that `authorizations.size() == operations.size()`. My 
feeling is that this at least leaves space for error when devs adding new 
operations;
2. we allow part of the operations which are authorized to proceed while drop 
unauthorized operations. I'm not convinced it's either safe from security 
perspective or clear from user perspective;
3. I see some comments about `We may want to retry this failed authorization 
request rather than dropping it immediately.`. If authorization result is 
"Unavaiable" rather than "Rejected", should that be reflected from result type?


- Zhitao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66531/#review201357
-------


On April 10, 2018, 12:24 p.m., Zhitao Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66531/
> ---
> 
> (Updated April 10, 2018, 12:24 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Greg Mann.
> 
> 
> Bugs: MESOS-8748
> https://issues.apache.org/jira/browse/MESOS-8748
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The new authorization action is modelled after `CreateVolume`, and will
> be shared by both `GROW_VOLUME` and `SHRINK_VOLUME` operations and
> corresponding operator APIs.
> 
> 
> Diffs
> -
> 
>   include/mesos/authorizer/acls.proto 
> 8ef33210644e7d2924b402a3158b1b38c1fdb464 
>   include/mesos/authorizer/authorizer.proto 
> 1508c01130013d74ed2b2574a2428f38e3d2c064 
>   src/authorizer/local/authorizer.cpp 
> c098ba9ded1b29a2e079cf09ab80b61f6fa4af05 
>   src/master/master.hpp 0d9620dd0c232dc1df83477e838eeb7313bf8828 
>   src/master/master.cpp f7da675e8fe96159e5335c9e83b65b67ed90eda8 
> 
> 
> Diff: https://reviews.apache.org/r/66531/diff/1/
> 
> 
> Testing
> ---
> 
> Manually created ACL and verified that untrusted principals will not allow to 
> grow/shrink volumes.
> Also created unit test in next patch.
> 
> 
> Thanks,
> 
> Zhitao Li
> 
>



<    3   4   5   6   7   8   9   10   >