Re: Review Request 69505: Added an operation status update manager to the agent.

2018-12-11 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69505/#review211209
---


Ship it!




Ship It!

- Gastón Kleiman


On Dec. 10, 2018, 3:34 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69505/
> ---
> 
> (Updated Dec. 10, 2018, 3:34 p.m.)
> 
> 
> Review request for mesos, Benno Evers and Gastón Kleiman.
> 
> 
> Bugs: MESOS-9278
> https://issues.apache.org/jira/browse/MESOS-9278
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds an operation status update manager to the agent
> in order to handle updates for operations on agent default
> resources. A new test is also added which verifies that such
> updates are retried.
> 
> 
> Later patches will integrate this status update manager with
> the agent's checkpointing/recovery code.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp ae5b24093156a6ba8ffa3e94a46e600eb4def5fa 
>   src/slave/slave.hpp edf7269d4057ec8c95bb54c855210ad00d002a50 
>   src/slave/slave.cpp e13b955ffd92a8872dd513c006e96bd72e298c85 
>   src/tests/slave_tests.cpp 1e918712d73d0a03c4c15b41babff1f649ba9c07 
> 
> 
> Diff: https://reviews.apache.org/r/69505/diff/3/
> 
> 
> Testing
> ---
> 
> `bin/mesos-tests.sh --gtest_filter="*UpdateOperationStatusRetry*" 
> --gtest_repeat=-1 --gtest_break_on_failure`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 69505: Added an operation status update manager to the agent.

2018-12-10 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69505/#review211178
---



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['69505']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: 
http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2691/mesos-review-69505

Relevant logs:

- 
[mesos-tests.log](http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2691/mesos-review-69505/logs/mesos-tests.log):

```
W1211 01:12:40.491051 20332 slave.cpp:3933] Ignoring shutdown framework 
45c94a58-1600-4fb7-87c4-d18d704106d2- because it is terminating
I1211 01:12:40.493053 18688 master.cpp:1275] Agent 
45c94a58-1600-4fb7-87c4-d18d704106d2-S0 at slave(462)@192.10.1.6:52101 
(windows-02.chtsmhjxogyevckjfayqqcnjda.xx.internal.cloudapp.net) disconnected
I1211 01:12:40.493053 18688 master.cpp:3278] Disconnecting agent 
45c94a58-1600-4fb7-87c4-d18d704106d2-S0 at slave(462)@192.10.1.6:52101 
(windows-02.chtsmhjxogyevckjfayqqcnjda.xx.internal.cloudapp.net)
I1211 01:12:40.493053 18688 master.cpp:3297] Deactivating agent 
45c94a58-1600-4fb7-87c4-d18d704106d2-S0 at slave(462)@192.10.1.6:52101 
(windows-02.chtsmhjxogyevckjfayqqcnjda.xx.internal.cloudapp.net)
I1211 01:12:40.494048 20708 hierarchical.cpp:357] Removed framework 
45c94a58-1600-4fb7-87c4-d18d704106d2-
I1211 01:12:40.494048 20708 hierarchical.cpp:801] Agent 
45c94a58-1600-4fb7-87c4-d18d704106d2-S0 deactivated
I1211 01:12:40.495060 20708 containerizer.cpp:2463] Destroying container 
94ca8fda-3f8e-4a66-8728-cd3a3337cb82 in RUNNING state
I1211 01:12:40.495060 20708 containerizer.cpp:3130] Transitioning the state of 
container 94ca8fda-3f8e-4a66-8728-cd3a3337cb82 from RUNNING to DESTROYING
I1211 01:12:40.496067 20708 launcher.cpp:161] Asked to destroy container 
94ca8fda-3f8e-4a66-8728-cd3a3337cb82
W1211 01:12:40.497061 14416 process.cpp:1423] Failed to recv on socket 
WindowsFD::Type::SOCKET=2616 to peer '192.10.1.6:53879': IO failed with error 
code: The specified network name is no longer available.

W1211 01:12:40.498051 14416[   OK ] 
IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (683 ms)
[--] 1 test from IsolationFlag/MemoryIsolatorTest (705 ms total)

[--] Global test environment tear-down
[==] 1058 tests from 104 test cases ran. (531377 ms total)
[  PASSED  ] 1057 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] DockerFetcherPluginTest.INTERNET_CURL_FetchImage

 1 FAILED TEST
  YOU HAVE 231 DISABLED TESTS

 process.cpp:838] Failed to recv on socket WindowsFD::Type::SOCKET=2184 to peer 
'192.10.1.6:53880': IO failed with error code: The specified network name is no 
longer available.

I1211 01:12:40.542299 19384 containerizer.cpp:2969] Container 
94ca8fda-3f8e-4a66-8728-cd3a3337cb82 has exited
I1211 01:12:40.572299 20332 master.cpp:1117] Master terminating
I1211 01:12:40.573297  4976 hierarchical.cpp:643] Removed agent 
45c94a58-1600-4fb7-87c4-d18d704106d2-S0
I1211 01:12:40.941330 14416 process.cpp:927] Stopped the socket accept loop
```

- Mesos Reviewbot Windows


On Dec. 10, 2018, 11:34 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69505/
> ---
> 
> (Updated Dec. 10, 2018, 11:34 p.m.)
> 
> 
> Review request for mesos, Benno Evers and Gastón Kleiman.
> 
> 
> Bugs: MESOS-9278
> https://issues.apache.org/jira/browse/MESOS-9278
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds an operation status update manager to the agent
> in order to handle updates for operations on agent default
> resources. A new test is also added which verifies that such
> updates are retried.
> 
> 
> Later patches will integrate this status update manager with
> the agent's checkpointing/recovery code.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp ae5b24093156a6ba8ffa3e94a46e600eb4def5fa 
>   src/slave/slave.hpp edf7269d4057ec8c95bb54c855210ad00d002a50 
>   src/slave/slave.cpp e13b955ffd92a8872dd513c006e96bd72e298c85 
>   src/tests/slave_tests.cpp 1e918712d73d0a03c4c15b41babff1f649ba9c07 
> 
> 
> Diff: https://reviews.apache.org/r/69505/diff/3/
> 
> 
> Testing
> ---
> 
> `bin/mesos-tests.sh --gtest_filter="*UpdateOperationStatusRetry*" 
> --gtest_repeat=-1 --gtest_break_on_failure`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 69505: Added an operation status update manager to the agent.

2018-12-10 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69505/
---

(Updated Dec. 10, 2018, 11:34 p.m.)


Review request for mesos, Benno Evers and Gastón Kleiman.


Bugs: MESOS-9278
https://issues.apache.org/jira/browse/MESOS-9278


Repository: mesos


Description (updated)
---

This patch adds an operation status update manager to the agent
in order to handle updates for operations on agent default
resources. A new test is also added which verifies that such
updates are retried.


Later patches will integrate this status update manager with
the agent's checkpointing/recovery code.


Diffs (updated)
-

  src/master/master.cpp ae5b24093156a6ba8ffa3e94a46e600eb4def5fa 
  src/slave/slave.hpp edf7269d4057ec8c95bb54c855210ad00d002a50 
  src/slave/slave.cpp e13b955ffd92a8872dd513c006e96bd72e298c85 
  src/tests/slave_tests.cpp 1e918712d73d0a03c4c15b41babff1f649ba9c07 


Diff: https://reviews.apache.org/r/69505/diff/3/

Changes: https://reviews.apache.org/r/69505/diff/2-3/


Testing
---

`bin/mesos-tests.sh --gtest_filter="*UpdateOperationStatusRetry*" 
--gtest_repeat=-1 --gtest_break_on_failure`


Thanks,

Greg Mann



Re: Review Request 69505: Added an operation status update manager to the agent.

2018-12-10 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69505/
---

(Updated Dec. 10, 2018, 11:03 p.m.)


Review request for mesos, Benno Evers and Gastón Kleiman.


Bugs: MESOS-9278
https://issues.apache.org/jira/browse/MESOS-9278


Repository: mesos


Description (updated)
---

This patch adds an operation status update manager to the agent
in order to handle updates for operations on agent default
resources. A new test is also added which verifies that such
updates are retried.

Later patches will integrate this status update manager with
the agent's checkpointing/recovery code.


Diffs
-

  src/master/master.cpp ae5b24093156a6ba8ffa3e94a46e600eb4def5fa 
  src/slave/slave.hpp edf7269d4057ec8c95bb54c855210ad00d002a50 
  src/slave/slave.cpp e13b955ffd92a8872dd513c006e96bd72e298c85 
  src/tests/slave_tests.cpp 1e918712d73d0a03c4c15b41babff1f649ba9c07 


Diff: https://reviews.apache.org/r/69505/diff/2/


Testing
---

`bin/mesos-tests.sh --gtest_filter="*UpdateOperationStatusRetry*" 
--gtest_repeat=-1 --gtest_break_on_failure`


Thanks,

Greg Mann



Re: Review Request 69505: Added an operation status update manager to the agent.

2018-12-10 Thread Greg Mann


> On Dec. 6, 2018, 12:32 a.m., Gastón Kleiman wrote:
> > src/master/master.cpp
> > Lines 8539-8541 (original)
> > 
> >
> > Should we replace this with something like the following?
> > 
> > ```
> > CHECK(!resourceProviderId.isError())
> >   << "Could not determine resource provider of operation " << 
> > *operation
> >   << ": " << resourceProviderId.error();
> > ```

Good call, thanks!


- Greg


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69505/#review211067
---


On Dec. 10, 2018, 11:01 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69505/
> ---
> 
> (Updated Dec. 10, 2018, 11:01 p.m.)
> 
> 
> Review request for mesos, Benno Evers and Gastón Kleiman.
> 
> 
> Bugs: MESOS-9278
> https://issues.apache.org/jira/browse/MESOS-9278
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds an operation status update manager to the
> agent in order to provide reliable status updates to
> schedulers for operations on agent default resources.
> 
> Later patches will integrate this status update manager
> with the agent's checkpointing/recovery code.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp ae5b24093156a6ba8ffa3e94a46e600eb4def5fa 
>   src/slave/slave.hpp edf7269d4057ec8c95bb54c855210ad00d002a50 
>   src/slave/slave.cpp e13b955ffd92a8872dd513c006e96bd72e298c85 
>   src/tests/slave_tests.cpp 1e918712d73d0a03c4c15b41babff1f649ba9c07 
> 
> 
> Diff: https://reviews.apache.org/r/69505/diff/2/
> 
> 
> Testing
> ---
> 
> `bin/mesos-tests.sh --gtest_filter="*UpdateOperationStatusRetry*" 
> --gtest_repeat=-1 --gtest_break_on_failure`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 69505: Added an operation status update manager to the agent.

2018-12-10 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69505/
---

(Updated Dec. 10, 2018, 11:01 p.m.)


Review request for mesos, Benno Evers and Gastón Kleiman.


Bugs: MESOS-9278
https://issues.apache.org/jira/browse/MESOS-9278


Repository: mesos


Description (updated)
---

This patch adds an operation status update manager to the
agent in order to provide reliable status updates to
schedulers for operations on agent default resources.

Later patches will integrate this status update manager
with the agent's checkpointing/recovery code.


Diffs (updated)
-

  src/master/master.cpp ae5b24093156a6ba8ffa3e94a46e600eb4def5fa 
  src/slave/slave.hpp edf7269d4057ec8c95bb54c855210ad00d002a50 
  src/slave/slave.cpp e13b955ffd92a8872dd513c006e96bd72e298c85 
  src/tests/slave_tests.cpp 1e918712d73d0a03c4c15b41babff1f649ba9c07 


Diff: https://reviews.apache.org/r/69505/diff/2/

Changes: https://reviews.apache.org/r/69505/diff/1-2/


Testing
---

`bin/mesos-tests.sh --gtest_filter="*UpdateOperationStatusRetry*" 
--gtest_repeat=-1 --gtest_break_on_failure`


Thanks,

Greg Mann



Re: Review Request 69505: Added an operation status update manager to the agent.

2018-12-05 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69505/#review211067
---


Fix it, then Ship it!





src/master/master.cpp
Lines 8539-8541 (original)


Should we replace this with something like the following?

```
CHECK(!resourceProviderId.isError())
  << "Could not determine resource provider of operation " << *operation
  << ": " << resourceProviderId.error();
```



src/tests/slave_tests.cpp
Lines 6505-6507 (patched)


These checks seem redundant and unrelated to this test?


- Gastón Kleiman


On Dec. 4, 2018, 2:22 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69505/
> ---
> 
> (Updated Dec. 4, 2018, 2:22 p.m.)
> 
> 
> Review request for mesos, Benno Evers and Gastón Kleiman.
> 
> 
> Bugs: MESOS-9278
> https://issues.apache.org/jira/browse/MESOS-9278
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds an operation status update manager to the agent
> in order to handle updates for operations on agent default
> resources. A new test is also added which verifies that such
> updates are retried.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp b4b02d8b4d7d6d1aabda1f97b9bf824419f76a9e 
>   src/slave/slave.hpp edf7269d4057ec8c95bb54c855210ad00d002a50 
>   src/slave/slave.cpp 858b78620e1ef33f3587d0bd95a684917aaf5bbb 
>   src/tests/slave_tests.cpp 1e918712d73d0a03c4c15b41babff1f649ba9c07 
> 
> 
> Diff: https://reviews.apache.org/r/69505/diff/1/
> 
> 
> Testing
> ---
> 
> `bin/mesos-tests.sh --gtest_filter="*UpdateOperationStatusRetry*" 
> --gtest_repeat=-1 --gtest_break_on_failure`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 69505: Added an operation status update manager to the agent.

2018-12-04 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69505/#review211033
---



PASS: Mesos patch 69505 was successfully built and tested.

Reviews applied: `['69505']`

All the build artifacts available at: 
http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2669/mesos-review-69505

- Mesos Reviewbot Windows


On Dec. 4, 2018, 10:22 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69505/
> ---
> 
> (Updated Dec. 4, 2018, 10:22 p.m.)
> 
> 
> Review request for mesos, Benno Evers and Gastón Kleiman.
> 
> 
> Bugs: MESOS-9278
> https://issues.apache.org/jira/browse/MESOS-9278
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds an operation status update manager to the agent
> in order to handle updates for operations on agent default
> resources. A new test is also added which verifies that such
> updates are retried.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp b4b02d8b4d7d6d1aabda1f97b9bf824419f76a9e 
>   src/slave/slave.hpp edf7269d4057ec8c95bb54c855210ad00d002a50 
>   src/slave/slave.cpp 858b78620e1ef33f3587d0bd95a684917aaf5bbb 
>   src/tests/slave_tests.cpp 1e918712d73d0a03c4c15b41babff1f649ba9c07 
> 
> 
> Diff: https://reviews.apache.org/r/69505/diff/1/
> 
> 
> Testing
> ---
> 
> `bin/mesos-tests.sh --gtest_filter="*UpdateOperationStatusRetry*" 
> --gtest_repeat=-1 --gtest_break_on_failure`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>