Re: Review Request 69669: Notified frameworks when operations are marked as unreachable.

2019-01-14 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69669/#review211978
---


Ship it!




Ship It!

- Greg Mann


On Jan. 11, 2019, 2:24 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69669/
> ---
> 
> (Updated Jan. 11, 2019, 2:24 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Gastón Kleiman, Greg Mann, and 
> Joseph Wu.
> 
> 
> Bugs: MESOS-8783
> https://issues.apache.org/jira/browse/MESOS-8783
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When an agent is being marked as unreachable due to missing
> the reregistration timeout, all operations on that agent
> are implicilty transitioned to status `OPERATION_UNREACHABLE`.
> 
> This commit adds an explicit notification for this transition
> to frameworks which opted-in to operation feedback.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 8de73124dbfb81d6edd0d1d5193adc21756f3fad 
>   src/master/master.cpp 9624832d017dae717da803ca2ff2f8fc5135ea9d 
>   src/tests/api_tests.cpp c597243e2e210e83a4ab7441fbcfa3198b43d849 
> 
> 
> Diff: https://reviews.apache.org/r/69669/diff/3/
> 
> 
> Testing
> ---
> 
> Internal CI run.
> 
> `./src/mesos-tests --gtest_filter="*OperationUpdatesUponUnreachable*" 
> --verbose --gtest_repeat=5000`
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 69669: Notified frameworks when operations are marked as unreachable.

2019-01-11 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69669/#review211889
---



PASS: Mesos patch 69669 was successfully built and tested.

Reviews applied: `['69669']`

All the build artifacts available at: 
http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2761/mesos-review-69669

- Mesos Reviewbot Windows


On Jan. 11, 2019, 2:24 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69669/
> ---
> 
> (Updated Jan. 11, 2019, 2:24 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Gastón Kleiman, Greg Mann, and 
> Joseph Wu.
> 
> 
> Bugs: MESOS-8783
> https://issues.apache.org/jira/browse/MESOS-8783
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When an agent is being marked as unreachable due to missing
> the reregistration timeout, all operations on that agent
> are implicilty transitioned to status `OPERATION_UNREACHABLE`.
> 
> This commit adds an explicit notification for this transition
> to frameworks which opted-in to operation feedback.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 8de73124dbfb81d6edd0d1d5193adc21756f3fad 
>   src/master/master.cpp 9624832d017dae717da803ca2ff2f8fc5135ea9d 
>   src/tests/api_tests.cpp c597243e2e210e83a4ab7441fbcfa3198b43d849 
> 
> 
> Diff: https://reviews.apache.org/r/69669/diff/2/
> 
> 
> Testing
> ---
> 
> Internal CI run.
> 
> `./src/mesos-tests --gtest_filter="*OperationUpdatesUponUnreachable*" 
> --verbose --gtest_repeat=5000`
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 69669: Notified frameworks when operations are marked as unreachable.

2019-01-11 Thread Benno Evers

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69669/
---

(Updated Jan. 11, 2019, 2:24 p.m.)


Review request for mesos, Benjamin Bannier, Gastón Kleiman, Greg Mann, and 
Joseph Wu.


Changes
---

Address comments and rebase onto latest master.


Bugs: MESOS-8783
https://issues.apache.org/jira/browse/MESOS-8783


Repository: mesos


Description
---

When an agent is being marked as unreachable due to missing
the reregistration timeout, all operations on that agent
are implicilty transitioned to status `OPERATION_UNREACHABLE`.

This commit adds an explicit notification for this transition
to frameworks which opted-in to operation feedback.


Diffs (updated)
-

  src/master/master.hpp 8de73124dbfb81d6edd0d1d5193adc21756f3fad 
  src/master/master.cpp 9624832d017dae717da803ca2ff2f8fc5135ea9d 
  src/tests/api_tests.cpp c597243e2e210e83a4ab7441fbcfa3198b43d849 


Diff: https://reviews.apache.org/r/69669/diff/2/

Changes: https://reviews.apache.org/r/69669/diff/1-2/


Testing (updated)
---

Internal CI run.

`./src/mesos-tests --gtest_filter="*OperationUpdatesUponUnreachable*" --verbose 
--gtest_repeat=5000`


Thanks,

Benno Evers



Re: Review Request 69669: Notified frameworks when operations are marked as unreachable.

2019-01-11 Thread Benno Evers


> On Jan. 10, 2019, 8:53 p.m., Greg Mann wrote:
> > src/master/master.cpp
> > Line 8954 (original), 8982-8984 (patched)
> > 
> >
> > Nit: fits on one line.

I'm curious, do we have an (informal) guideline that says when things fit on 
one line we should try to fit them there?

I actually intentionally spread this out, because I found my eyes were skipping 
over this line while reading, and due to that I was wondering why we only loop 
over `slave->resourceProviders` below.


> On Jan. 10, 2019, 8:53 p.m., Greg Mann wrote:
> > src/tests/api_tests.cpp
> > Lines 5127 (patched)
> > 
> >
> > Since we don't reference the contents of `slaveFlags` anywhere, you can 
> > omit this variable; `StartSlave()` will use the default argument value of 
> > `None()` and create the slave flags itself before calling 
> > `cluster::Slave::create()`.

Dropping this since we have to use the `slaveFlags` when pausing the clock in 
this test.


- Benno


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69669/#review211836
---


On Jan. 11, 2019, 2:24 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69669/
> ---
> 
> (Updated Jan. 11, 2019, 2:24 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Gastón Kleiman, Greg Mann, and 
> Joseph Wu.
> 
> 
> Bugs: MESOS-8783
> https://issues.apache.org/jira/browse/MESOS-8783
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When an agent is being marked as unreachable due to missing
> the reregistration timeout, all operations on that agent
> are implicilty transitioned to status `OPERATION_UNREACHABLE`.
> 
> This commit adds an explicit notification for this transition
> to frameworks which opted-in to operation feedback.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 8de73124dbfb81d6edd0d1d5193adc21756f3fad 
>   src/master/master.cpp 9624832d017dae717da803ca2ff2f8fc5135ea9d 
>   src/tests/api_tests.cpp c597243e2e210e83a4ab7441fbcfa3198b43d849 
> 
> 
> Diff: https://reviews.apache.org/r/69669/diff/2/
> 
> 
> Testing
> ---
> 
> Internal CI run.
> 
> `./src/mesos-tests --gtest_filter="*OperationUpdatesUponUnreachable*" 
> --verbose --gtest_repeat=5000`
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 69669: Notified frameworks when operations are marked as unreachable.

2019-01-10 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69669/#review211836
---




src/master/master.cpp
Line 8954 (original), 8982-8984 (patched)


Nit: fits on one line.



src/tests/api_tests.cpp
Lines 5127 (patched)


Since we don't reference the contents of `slaveFlags` anywhere, you can 
omit this variable; `StartSlave()` will use the default argument value of 
`None()` and create the slave flags itself before calling 
`cluster::Slave::create()`.



src/tests/api_tests.cpp
Lines 5161-5164 (patched)


I think this variable is unused?



src/tests/api_tests.cpp
Lines 5203-5204 (patched)


Let's get rid of the parentheses:

"Try to reserve the resources managed by the resource provider, because 
currently operation feedback is only supported for that case."



src/tests/api_tests.cpp
Lines 5236-5239 (patched)


Could we just pause the clock for the whole test? It might be necessary to 
retain the `slaveFlags` variable if you do this.

We should also resume the clock at the end of the test.


- Greg Mann


On Jan. 4, 2019, 4:57 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69669/
> ---
> 
> (Updated Jan. 4, 2019, 4:57 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Gastón Kleiman, Greg Mann, and 
> Joseph Wu.
> 
> 
> Bugs: MESOS-8783
> https://issues.apache.org/jira/browse/MESOS-8783
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When an agent is being marked as unreachable due to missing
> the reregistration timeout, all operations on that agent
> are implicilty transitioned to status `OPERATION_UNREACHABLE`.
> 
> This commit adds an explicit notification for this transition
> to frameworks which opted-in to operation feedback.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 99549ab857b16d722f0dd991f98dbe54e9ed19a1 
>   src/master/master.cpp b4faf2b077a0288ba36195b7a21402932489d316 
>   src/tests/api_tests.cpp b6064cd749e42e45c2b471c71e9769a41b59f726 
> 
> 
> Diff: https://reviews.apache.org/r/69669/diff/1/
> 
> 
> Testing
> ---
> 
> Internal CI run.
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 69669: Notified frameworks when operations are marked as unreachable.

2019-01-09 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69669/#review211789
---



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['69575', '69597', '69669']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: 
http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2728/mesos-review-69669

Relevant logs:

- 
[mesos-tests.log](http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2728/mesos-review-69669/logs/mesos-tests.log):

```
I0109 18:58:57.001946  5572 master.cpp:1271] Agent 
b4e6f5db-e8a0-4bbc-bad7-f7400b2e66b5-S0 at slave(468)@192.10.1.4:52219 
(windows-01.chtsmhjxogyevckjfayqqcnjda.xx.internal.cloudapp.net) disconnected
I0109 18:58:57.001946  5572 master.cpp:3274] Disconnecting agent 
b4e6f5db-e8a0-4bbc-bad7-f7400b2e66b5-S0 at slave(468)@192.10.1.4:52219 
(windows-01.chtsmhjxogyevckjfayqqcnjda.xx.internal.cloudapp.net)
I0109 18:58:57.002950  5572 master.cpp:3293] Deactivating agent 
b4e6f5db-e8a0-4bbc-bad7-f7400b2e66b5-S0 at slave(468)@192.10.1.4:52219 
(windows-01.chtsmhjxogyevckjfayqqcnjda.xx.internal.cloudapp.net)
I0109 18:58:57.002950  6192 hierarchical.cpp:358] Removed framework 
b4e6f5db-e8a0-4bbc-bad7-f7400b2e66b5-
I0109 18:58:57.003962  6192 hierarchical.cpp:802] Agent 
b4e6f5db-e8a0-4bbc-bad7-f7400b2e66b5-S0 deactivated
I0109 18:58:57.003962  5572 containerizer.cpp:2463] Destroying container 
d726a03e-24ba-426d-95fd-07dcf876572d in RUNNING state
I0109 18:58:57.003962  5572 containerizer.cpp:3130] Transitioning the state of 
container d726a03e-24ba-426d-95fd-07dcf876572d from RUNNING to DESTROYING
I0109 18:58:57.004971  5572 launcher.cpp:161] Asked to destroy container 
d726a03e-24ba-426d-95fd-07dcf876572d
W0109 18:58:57.004971  6352 process.cpp:838] Failed to recv on socket 
WindowsFD::Type::SOCKET=1788 to peer '192.10.1.4:54061': IO failed with error 
code: The specified network name is no longer available.

W0109 18:58:57.005954  6352 process.cpp:1423] Failed to recv on socket 
WindowsFD::Type::SOCKET=1800 to peer '192.10.1.4:54060': IO failed with error 
code: The specified network name is no longer available.

I0109 18:58:57.084885  6192 containerizer.cpp:2969] Container 
d726a03e-24ba-426d-95fd-07dcf876572d has exited
I0109 18:58:57.113801  6064 master.cpp:11[   OK ] 
IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (686 ms)
[--] 1 test from IsolationFlag/MemoryIsolatorTest (704 ms total)

[--] Global test environment tear-down
[==] 1086 tests from 104 test cases ran. (499593 ms total)
[  PASSED  ] 1084 tests.
[  FAILED  ] 2 tests, listed below:
[  FAILED  ] DockerFetcherPluginTest.INTERNET_CURL_FetchImage
[  FAILED  ] ContentType/MasterAPITest.OperationUpdatesUponAgentGone/1, where 
GetParam() = application/json

 2 FAILED TESTS
  YOU HAVE 231 DISABLED TESTS

11] Master terminating
I0109 18:58:57.115829  5176 hierarchical.cpp:644] Removed agent 
b4e6f5db-e8a0-4bbc-bad7-f7400b2e66b5-S0
I0109 18:58:57.374857  6352 process.cpp:927] Stopped the socket accept loop
```

- Mesos Reviewbot Windows


On Jan. 4, 2019, 4:57 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69669/
> ---
> 
> (Updated Jan. 4, 2019, 4:57 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Gastón Kleiman, Greg Mann, and 
> Joseph Wu.
> 
> 
> Bugs: MESOS-8783
> https://issues.apache.org/jira/browse/MESOS-8783
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When an agent is being marked as unreachable due to missing
> the reregistration timeout, all operations on that agent
> are implicilty transitioned to status `OPERATION_UNREACHABLE`.
> 
> This commit adds an explicit notification for this transition
> to frameworks which opted-in to operation feedback.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 99549ab857b16d722f0dd991f98dbe54e9ed19a1 
>   src/master/master.cpp b4faf2b077a0288ba36195b7a21402932489d316 
>   src/tests/api_tests.cpp b6064cd749e42e45c2b471c71e9769a41b59f726 
> 
> 
> Diff: https://reviews.apache.org/r/69669/diff/1/
> 
> 
> Testing
> ---
> 
> Internal CI run.
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 69669: Notified frameworks when operations are marked as unreachable.

2019-01-08 Thread Benno Evers

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69669/#review211765
---




src/master/master.hpp
Lines 904 (patched)


I thought about making this a member function of `struct Slave` instead, 
but it felt a bit unclean to suddenly add networking code to it because it 
currently acts mostly as a fancy data storage.


- Benno Evers


On Jan. 4, 2019, 4:57 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69669/
> ---
> 
> (Updated Jan. 4, 2019, 4:57 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Gastón Kleiman, Greg Mann, and 
> Joseph Wu.
> 
> 
> Bugs: MESOS-8783
> https://issues.apache.org/jira/browse/MESOS-8783
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When an agent is being marked as unreachable due to missing
> the reregistration timeout, all operations on that agent
> are implicilty transitioned to status `OPERATION_UNREACHABLE`.
> 
> This commit adds an explicit notification for this transition
> to frameworks which opted-in to operation feedback.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 99549ab857b16d722f0dd991f98dbe54e9ed19a1 
>   src/master/master.cpp b4faf2b077a0288ba36195b7a21402932489d316 
>   src/tests/api_tests.cpp b6064cd749e42e45c2b471c71e9769a41b59f726 
> 
> 
> Diff: https://reviews.apache.org/r/69669/diff/1/
> 
> 
> Testing
> ---
> 
> Internal CI run.
> 
> 
> Thanks,
> 
> Benno Evers
> 
>