Re: Review Request 38645: Fixed Flaky Executor HTTP tests

2015-09-24 Thread Anand Mazumdar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38645/
---

(Updated Sept. 24, 2015, 11:24 p.m.)


Review request for mesos, Isabel Jimenez and Vinod Kone.


Changes
---

Review comments to bring pause/settle calls one after the other.


Repository: mesos


Description
---

This showed up on ASF CI. From the logs:

`I0922 17:31:49.819221 28463 slave.cpp:4104] Finished recovery”`
Then 
`../../src/tests/executor_http_api_tests.cpp:290: Failure`
`Failed to wait 15secs for __recover`

Instead of doing a `FUTURE_DISPATCH` after `StartSlave()` we should be doing it 
before starting the slave. In some cases, slave would have already recovered by 
the time we invoke `FUTURE_DISPATCH` leading to the flakiness.


Diffs (updated)
-

  src/tests/executor_http_api_tests.cpp 
9dbc5191b5950df2faa693720f3740e97c7df758 

Diff: https://reviews.apache.org/r/38645/diff/


Testing
---

Introduced a sleep before `AWAIT_READY`, the test failed. After this change 
with the sleep it still passed.

Ran in a loop 100 times.

ASF CI error log: 
https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/839/changes


Thanks,

Anand Mazumdar



Re: Review Request 38645: Fixed Flaky Executor HTTP tests

2015-09-24 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38645/#review100499
---

Ship it!


Ship It!

- Vinod Kone


On Sept. 24, 2015, 11:24 p.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38645/
> ---
> 
> (Updated Sept. 24, 2015, 11:24 p.m.)
> 
> 
> Review request for mesos, Isabel Jimenez and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This showed up on ASF CI. From the logs:
> 
> `I0922 17:31:49.819221 28463 slave.cpp:4104] Finished recovery”`
> Then 
> `../../src/tests/executor_http_api_tests.cpp:290: Failure`
> `Failed to wait 15secs for __recover`
> 
> Instead of doing a `FUTURE_DISPATCH` after `StartSlave()` we should be doing 
> it before starting the slave. In some cases, slave would have already 
> recovered by the time we invoke `FUTURE_DISPATCH` leading to the flakiness.
> 
> 
> Diffs
> -
> 
>   src/tests/executor_http_api_tests.cpp 
> 9dbc5191b5950df2faa693720f3740e97c7df758 
> 
> Diff: https://reviews.apache.org/r/38645/diff/
> 
> 
> Testing
> ---
> 
> Introduced a sleep before `AWAIT_READY`, the test failed. After this change 
> with the sleep it still passed.
> 
> Ran in a loop 100 times.
> 
> ASF CI error log: 
> https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/839/changes
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 38645: Fixed Flaky Executor HTTP tests

2015-09-24 Thread Anand Mazumdar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38645/
---

(Updated Sept. 24, 2015, 11:09 p.m.)


Review request for mesos, Isabel Jimenez and Vinod Kone.


Changes
---

Address comments from Vinod


Repository: mesos


Description
---

This showed up on ASF CI. From the logs:

`I0922 17:31:49.819221 28463 slave.cpp:4104] Finished recovery”`
Then 
`../../src/tests/executor_http_api_tests.cpp:290: Failure`
`Failed to wait 15secs for __recover`

Instead of doing a `FUTURE_DISPATCH` after `StartSlave()` we should be doing it 
before starting the slave. In some cases, slave would have already recovered by 
the time we invoke `FUTURE_DISPATCH` leading to the flakiness.


Diffs (updated)
-

  src/tests/executor_http_api_tests.cpp 
9dbc5191b5950df2faa693720f3740e97c7df758 

Diff: https://reviews.apache.org/r/38645/diff/


Testing
---

Introduced a sleep before `AWAIT_READY`, the test failed. After this change 
with the sleep it still passed.

Ran in a loop 100 times.

ASF CI error log: 
https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/839/changes


Thanks,

Anand Mazumdar



Re: Review Request 38645: Fixed Flaky Executor HTTP tests

2015-09-24 Thread Anand Mazumdar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38645/
---

(Updated Sept. 24, 2015, 11:48 p.m.)


Review request for mesos, Isabel Jimenez and Vinod Kone.


Changes
---

rebased


Repository: mesos


Description
---

This showed up on ASF CI. From the logs:

`I0922 17:31:49.819221 28463 slave.cpp:4104] Finished recovery”`
Then 
`../../src/tests/executor_http_api_tests.cpp:290: Failure`
`Failed to wait 15secs for __recover`

Instead of doing a `FUTURE_DISPATCH` after `StartSlave()` we should be doing it 
before starting the slave. In some cases, slave would have already recovered by 
the time we invoke `FUTURE_DISPATCH` leading to the flakiness.


Diffs (updated)
-

  src/tests/executor_http_api_tests.cpp 
9dbc5191b5950df2faa693720f3740e97c7df758 

Diff: https://reviews.apache.org/r/38645/diff/


Testing
---

Introduced a sleep before `AWAIT_READY`, the test failed. After this change 
with the sleep it still passed.

Ran in a loop 100 times.

ASF CI error log: 
https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/839/changes


Thanks,

Anand Mazumdar



Re: Review Request 38645: Fixed Flaky Executor HTTP tests

2015-09-24 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38645/#review100477
---



src/tests/executor_http_api_tests.cpp (line 95)


You should do a Clock::Settle() here (and for that to work pause the clock 
before) because AFAICT AWAIT_READY(__recover) doesn't guarantee that 
Slave::__recover() has been executed. It only tells us that the event is about 
to be processed. See process::resume().


- Vinod Kone


On Sept. 22, 2015, 9:37 p.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38645/
> ---
> 
> (Updated Sept. 22, 2015, 9:37 p.m.)
> 
> 
> Review request for mesos, Isabel Jimenez and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This showed up on ASF CI. From the logs:
> 
> `I0922 17:31:49.819221 28463 slave.cpp:4104] Finished recovery”`
> Then 
> `../../src/tests/executor_http_api_tests.cpp:290: Failure`
> `Failed to wait 15secs for __recover`
> 
> Instead of doing a `FUTURE_DISPATCH` after `StartSlave()` we should be doing 
> it before starting the slave. In some cases, slave would have already 
> recovered by the time we invoke `FUTURE_DISPATCH` leading to the flakiness.
> 
> 
> Diffs
> -
> 
>   src/tests/executor_http_api_tests.cpp 
> 9dbc5191b5950df2faa693720f3740e97c7df758 
> 
> Diff: https://reviews.apache.org/r/38645/diff/
> 
> 
> Testing
> ---
> 
> Introduced a sleep before `AWAIT_READY`, the test failed. After this change 
> with the sleep it still passed.
> 
> Ran in a loop 100 times.
> 
> ASF CI error log: 
> https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/839/changes
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 38645: Fixed Flaky Executor HTTP tests

2015-09-22 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38645/#review100066
---


For the sake of repro'ing, maybe you could add a sleep before waiting on the 
future? Obviously not something we want in the actual patch though.

- Neil Conway


On Sept. 22, 2015, 8:46 p.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38645/
> ---
> 
> (Updated Sept. 22, 2015, 8:46 p.m.)
> 
> 
> Review request for mesos, Isabel Jimenez and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This showed up on ASF CI. From the logs:
> 
> `I0922 17:31:49.819221 28463 slave.cpp:4104] Finished recovery”`
> Then 
> `../../src/tests/executor_http_api_tests.cpp:290: Failure`
> `Failed to wait 15secs for __recover`
> 
> Instead of doing a `FUTURE_DISPATCH` after `StartSlave()` we should be doing 
> it before starting the slave. In some cases, slave would have already 
> recovered by the time we invoke `FUTURE_DISPATCH` leading to the flakiness.
> 
> 
> Diffs
> -
> 
>   src/tests/executor_http_api_tests.cpp 
> 9dbc5191b5950df2faa693720f3740e97c7df758 
> 
> Diff: https://reviews.apache.org/r/38645/diff/
> 
> 
> Testing
> ---
> 
> I was not able to reproduce it before or after this change but looking at the 
> logs it is quite obvious what the issue was. Ran in a loop 100 times.
> 
> ASF CI error log: 
> https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/839/changes
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 38645: Fixed Flaky Executor HTTP tests

2015-09-22 Thread Anand Mazumdar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38645/
---

(Updated Sept. 22, 2015, 8:46 p.m.)


Review request for mesos, Isabel Jimenez and Vinod Kone.


Repository: mesos


Description (updated)
---

This showed up on ASF CI. From the logs:

`I0922 17:31:49.819221 28463 slave.cpp:4104] Finished recovery”`
Then 
`../../src/tests/executor_http_api_tests.cpp:290: Failure`
`Failed to wait 15secs for __recover`

Instead of doing a `FUTURE_DISPATCH` after `StartSlave()` we should be doing it 
before starting the slave. In some cases, slave would have already recovered by 
the time we invoke `FUTURE_DISPATCH` leading to the flakiness.


Diffs
-

  src/tests/executor_http_api_tests.cpp 
9dbc5191b5950df2faa693720f3740e97c7df758 

Diff: https://reviews.apache.org/r/38645/diff/


Testing (updated)
---

I was not able to reproduce it before or after this change but looking at the 
logs it is quite obvious what the issue was. Ran in a loop 100 times.

ASF CI error log: 
https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/839/changes


Thanks,

Anand Mazumdar



Re: Review Request 38645: Fixed Flaky Executor HTTP tests

2015-09-22 Thread Anand Mazumdar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38645/
---

(Updated Sept. 22, 2015, 9:37 p.m.)


Review request for mesos, Isabel Jimenez and Vinod Kone.


Changes
---

update testing done


Repository: mesos


Description
---

This showed up on ASF CI. From the logs:

`I0922 17:31:49.819221 28463 slave.cpp:4104] Finished recovery”`
Then 
`../../src/tests/executor_http_api_tests.cpp:290: Failure`
`Failed to wait 15secs for __recover`

Instead of doing a `FUTURE_DISPATCH` after `StartSlave()` we should be doing it 
before starting the slave. In some cases, slave would have already recovered by 
the time we invoke `FUTURE_DISPATCH` leading to the flakiness.


Diffs
-

  src/tests/executor_http_api_tests.cpp 
9dbc5191b5950df2faa693720f3740e97c7df758 

Diff: https://reviews.apache.org/r/38645/diff/


Testing (updated)
---

Introduced a sleep before `AWAIT_READY`, the test failed. After this change 
with the sleep it still passed.

Ran in a loop 100 times.

ASF CI error log: 
https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/839/changes


Thanks,

Anand Mazumdar



Re: Review Request 38645: Fixed Flaky Executor HTTP tests

2015-09-22 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38645/#review100072
---

Ship it!


Ship It!

- Neil Conway


On Sept. 22, 2015, 9:37 p.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38645/
> ---
> 
> (Updated Sept. 22, 2015, 9:37 p.m.)
> 
> 
> Review request for mesos, Isabel Jimenez and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This showed up on ASF CI. From the logs:
> 
> `I0922 17:31:49.819221 28463 slave.cpp:4104] Finished recovery”`
> Then 
> `../../src/tests/executor_http_api_tests.cpp:290: Failure`
> `Failed to wait 15secs for __recover`
> 
> Instead of doing a `FUTURE_DISPATCH` after `StartSlave()` we should be doing 
> it before starting the slave. In some cases, slave would have already 
> recovered by the time we invoke `FUTURE_DISPATCH` leading to the flakiness.
> 
> 
> Diffs
> -
> 
>   src/tests/executor_http_api_tests.cpp 
> 9dbc5191b5950df2faa693720f3740e97c7df758 
> 
> Diff: https://reviews.apache.org/r/38645/diff/
> 
> 
> Testing
> ---
> 
> Introduced a sleep before `AWAIT_READY`, the test failed. After this change 
> with the sleep it still passed.
> 
> Ran in a loop 100 times.
> 
> ASF CI error log: 
> https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/839/changes
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 38645: Fixed Flaky Executor HTTP tests

2015-09-22 Thread Anand Mazumdar


> On Sept. 22, 2015, 9:09 p.m., Neil Conway wrote:
> > For the sake of repro'ing, maybe you could add a sleep before waiting on 
> > the future? Obviously not something we want in the actual patch though.

Thanks Neil, that worked. Updated the `Testing Done` section with the details 
now. Should have spent more time reproducing it then just leaving it to 
inference from the error logs.


- Anand


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38645/#review100066
---


On Sept. 22, 2015, 8:46 p.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38645/
> ---
> 
> (Updated Sept. 22, 2015, 8:46 p.m.)
> 
> 
> Review request for mesos, Isabel Jimenez and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This showed up on ASF CI. From the logs:
> 
> `I0922 17:31:49.819221 28463 slave.cpp:4104] Finished recovery”`
> Then 
> `../../src/tests/executor_http_api_tests.cpp:290: Failure`
> `Failed to wait 15secs for __recover`
> 
> Instead of doing a `FUTURE_DISPATCH` after `StartSlave()` we should be doing 
> it before starting the slave. In some cases, slave would have already 
> recovered by the time we invoke `FUTURE_DISPATCH` leading to the flakiness.
> 
> 
> Diffs
> -
> 
>   src/tests/executor_http_api_tests.cpp 
> 9dbc5191b5950df2faa693720f3740e97c7df758 
> 
> Diff: https://reviews.apache.org/r/38645/diff/
> 
> 
> Testing
> ---
> 
> I was not able to reproduce it before or after this change but looking at the 
> logs it is quite obvious what the issue was. Ran in a loop 100 times.
> 
> ASF CI error log: 
> https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/839/changes
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>



Re: Review Request 38645: Fixed Flaky Executor HTTP tests

2015-09-22 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38645/#review100103
---


Patch looks great!

Reviews applied: [38645]

All tests passed.

- Mesos ReviewBot


On Sept. 22, 2015, 9:37 p.m., Anand Mazumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38645/
> ---
> 
> (Updated Sept. 22, 2015, 9:37 p.m.)
> 
> 
> Review request for mesos, Isabel Jimenez and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This showed up on ASF CI. From the logs:
> 
> `I0922 17:31:49.819221 28463 slave.cpp:4104] Finished recovery”`
> Then 
> `../../src/tests/executor_http_api_tests.cpp:290: Failure`
> `Failed to wait 15secs for __recover`
> 
> Instead of doing a `FUTURE_DISPATCH` after `StartSlave()` we should be doing 
> it before starting the slave. In some cases, slave would have already 
> recovered by the time we invoke `FUTURE_DISPATCH` leading to the flakiness.
> 
> 
> Diffs
> -
> 
>   src/tests/executor_http_api_tests.cpp 
> 9dbc5191b5950df2faa693720f3740e97c7df758 
> 
> Diff: https://reviews.apache.org/r/38645/diff/
> 
> 
> Testing
> ---
> 
> Introduced a sleep before `AWAIT_READY`, the test failed. After this change 
> with the sleep it still passed.
> 
> Ran in a loop 100 times.
> 
> ASF CI error log: 
> https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/839/changes
> 
> 
> Thanks,
> 
> Anand Mazumdar
> 
>