Review Request 43285: Fixed flakiness in SlaveRecoveryTest/0.CleanupHTTPExecutor.

2016-02-06 Thread Anand Mazumdar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43285/
---

Review request for mesos and Vinod Kone.


Bugs: MESOS-4614
https://issues.apache.org/jira/browse/MESOS-4614


Repository: mesos


Description
---

This change fixes the flakiness in this test. The issue was a race between the 
`connected` callback being called before we did `process::spawn` to start the 
process.

The details of the race that lead to the failure are as follows:
- We started the executor library inside the constructor of `TestExecutor`. The 
callback function did `process::defer(self(), &Self::connected)`
- The `connected` callback can be invoked by the Executor library before we got 
a chance to actually invoke `process::spawn` on the `TestExecutor` process 
itself. This in can turn lead to the `dispatch` being silently dropped.
https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/process.cpp#L2456

This change now starts the library inside the `initialize` function that is 
gurranteed to be called after `process::spawn` is invoked.


Diffs
-

  src/examples/test_http_executor.cpp 4916e0ebb7215d911556561a8560d78a1192001c 

Diff: https://reviews.apache.org/r/43285/diff/


Testing
---

make check + The flakiness is really easy to reproduce under load:

- Run the tests with `stress --cpu 4 --timeout 120` running in the background.


Thanks,

Anand Mazumdar



Re: Review Request 43285: Fixed flakiness in SlaveRecoveryTest/0.CleanupHTTPExecutor.

2016-02-06 Thread haosdent huang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43285/#review118149
---


Ship it!




Ship It!

- haosdent huang


On 二月 6, 2016, 5:51 p.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43285/
> ---
> 
> (Updated 二月 6, 2016, 5:51 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-4614
> https://issues.apache.org/jira/browse/MESOS-4614
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This change fixes the flakiness in this test. The issue was a race between 
> the `connected` callback being called before we did `process::spawn` to start 
> the process.
> 
> The details of the race that lead to the failure are as follows:
> - We started the executor library inside the constructor of `TestExecutor`. 
> The callback function did `process::defer(self(), &Self::connected)`
> - The `connected` callback can be invoked by the Executor library before we 
> got a chance to actually invoke `process::spawn` on the `TestExecutor` 
> process itself. This in can turn lead to the `dispatch` being silently 
> dropped.
> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/process.cpp#L2456
> 
> This change now starts the library inside the `initialize` function that is 
> gurranteed to be called after `process::spawn` is invoked.
> 
> 
> Diffs
> -
> 
>   src/examples/test_http_executor.cpp 
> 4916e0ebb7215d911556561a8560d78a1192001c 
> 
> Diff: https://reviews.apache.org/r/43285/diff/
> 
> 
> Testing
> ---
> 
> make check + The flakiness is really easy to reproduce under load:
> 
> - Run the tests with `stress --cpu 4 --timeout 120` running in the background.
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 43285: Fixed flakiness in SlaveRecoveryTest/0.CleanupHTTPExecutor.

2016-02-08 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43285/#review118290
---


Fix it, then Ship it!




Great catch! Mystery solved!


src/examples/test_http_executor.cpp (line 198)


Can you add a comment here for posterity?


- Vinod Kone


On Feb. 6, 2016, 5:51 p.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43285/
> ---
> 
> (Updated Feb. 6, 2016, 5:51 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-4614
> https://issues.apache.org/jira/browse/MESOS-4614
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This change fixes the flakiness in this test. The issue was a race between 
> the `connected` callback being called before we did `process::spawn` to start 
> the process.
> 
> The details of the race that lead to the failure are as follows:
> - We started the executor library inside the constructor of `TestExecutor`. 
> The callback function did `process::defer(self(), &Self::connected)`
> - The `connected` callback can be invoked by the Executor library before we 
> got a chance to actually invoke `process::spawn` on the `TestExecutor` 
> process itself. This in can turn lead to the `dispatch` being silently 
> dropped.
> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/process.cpp#L2456
> 
> This change now starts the library inside the `initialize` function that is 
> gurranteed to be called after `process::spawn` is invoked.
> 
> 
> Diffs
> -
> 
>   src/examples/test_http_executor.cpp 
> 4916e0ebb7215d911556561a8560d78a1192001c 
> 
> Diff: https://reviews.apache.org/r/43285/diff/
> 
> 
> Testing
> ---
> 
> make check + The flakiness is really easy to reproduce under load:
> 
> - Run the tests with `stress --cpu 4 --timeout 120` running in the background.
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 43285: Fixed flakiness in SlaveRecoveryTest/0.CleanupHTTPExecutor.

2016-02-08 Thread Anand Mazumdar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43285/
---

(Updated Feb. 8, 2016, 10:04 p.m.)


Review request for mesos and Vinod Kone.


Changes
---

Review comments from Vinod


Bugs: MESOS-4614
https://issues.apache.org/jira/browse/MESOS-4614


Repository: mesos


Description
---

This change fixes the flakiness in this test. The issue was a race between the 
`connected` callback being called before we did `process::spawn` to start the 
process.

The details of the race that lead to the failure are as follows:
- We started the executor library inside the constructor of `TestExecutor`. The 
callback function did `process::defer(self(), &Self::connected)`
- The `connected` callback can be invoked by the Executor library before we got 
a chance to actually invoke `process::spawn` on the `TestExecutor` process 
itself. This in can turn lead to the `dispatch` being silently dropped.
https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/process.cpp#L2456

This change now starts the library inside the `initialize` function that is 
gurranteed to be called after `process::spawn` is invoked.


Diffs (updated)
-

  src/examples/test_http_executor.cpp 4916e0ebb7215d911556561a8560d78a1192001c 

Diff: https://reviews.apache.org/r/43285/diff/


Testing
---

make check + The flakiness is really easy to reproduce under load:

- Run the tests with `stress --cpu 4 --timeout 120` running in the background.


Thanks,

Anand Mazumdar