Re: Review Request 44989: Fixed a race in the resource offers tests.

2016-03-20 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44989/
---

(Updated March 20, 2016, 5:35 p.m.)


Review request for mesos, Adam B and Joerg Schad.


Bugs: MESOS-4849
https://issues.apache.org/jira/browse/MESOS-4849


Repository: mesos


Description
---

Fixed a race in the resource offers tests.

Adding HTTP credentials to `StartSlave` in 'src/tests/mesos.cpp' has exposed a 
race condition in ResourceOffersTest.ResourceOfferWithMultipleSlaves. The test 
quickly runs `StartSlave` 10 times to create 10 agents. Under the covers, 
`StartSlave` writes data to disk, and it seems that with the additional data 
being written to disk for HTTP credentials, the filesystem operations for one 
`StartSlave` call were not completing before the next call.

By settling the clock in between each invocation of `StartSlave`, this patch 
fixes the race. The test is slower, but it is now reliable. Running the test 3 
times, the old implementation gives an average runtime of 265ms, while the new 
one runs in an average of 359ms.


Diffs
-

  src/tests/resource_offers_tests.cpp 1cf292ee7931207596f8f06677386bef5965ef15 

Diff: https://reviews.apache.org/r/44989/diff/


Testing
---

`GTEST_FILTER="ResourceOffersTest.ResourceOfferWithMultipleSlaves" 
bin/mesos-tests.sh --gtest_repeat=1000 --gtest_break_on_failure=1` was used to 
test on both OSX and Ubuntu 14.04.


Thanks,

Greg Mann



Re: Review Request 44989: Fixed a race in the resource offers tests.

2016-03-19 Thread Neil Conway


> On March 18, 2016, 9:47 a.m., Adam B wrote:
> > src/tests/resource_offers_tests.cpp, line 63
> > 
> >
> > Why pause so soon? You can wait until after the master is started, but 
> > just before you start calling StartSlave() in the loop
> 
> Joerg Schad wrote:
> I agree with you, but actually we follow this pattern in many other tests 
> as well.
> E.g. 
> // This test ensures that allocation is done per slave. This is done
> // by having 2 slaves and 2 frameworks and making sure each framework
> // gets only one slave's resources during an allocation.
> TEST_F(HierarchicalAllocatorTest, CoarseGrained)
> {
>   // Pausing the clock ensures that the batch allocation does not
>   // influence this test.
> 
> Joerg Schad wrote:
> TEST_F(HierarchicalAllocatorTest, CoarseGrained)
> {
>   // Pausing the clock ensures that the batch allocation does not
>   // influence this test.
>   Clock::pause();

Is there an advantage to delaying the pause? Running the whole test with the 
clock paused (and hence pausing the clock at the beginning of the test) seems 
fine to me.


- Neil


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44989/#review124159
---


On March 18, 2016, 4:35 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44989/
> ---
> 
> (Updated March 18, 2016, 4:35 p.m.)
> 
> 
> Review request for mesos, Adam B and Joerg Schad.
> 
> 
> Bugs: MESOS-4849
> https://issues.apache.org/jira/browse/MESOS-4849
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fixed a race in the resource offers tests.
> 
> Adding HTTP credentials to `StartSlave` in 'src/tests/mesos.cpp' has exposed 
> a race condition in ResourceOffersTest.ResourceOfferWithMultipleSlaves. The 
> test quickly runs `StartSlave` 10 times to create 10 agents. Under the 
> covers, `StartSlave` writes data to disk, and it seems that with the 
> additional data being written to disk for HTTP credentials, the filesystem 
> operations for one `StartSlave` call were not completing before the next call.
> 
> By settling the clock in between each invocation of `StartSlave`, this patch 
> fixes the race. The test is slowed considerably, but it is now reliable.
> 
> 
> Diffs
> -
> 
>   src/tests/resource_offers_tests.cpp 
> 1cf292ee7931207596f8f06677386bef5965ef15 
> 
> Diff: https://reviews.apache.org/r/44989/diff/
> 
> 
> Testing
> ---
> 
> `GTEST_FILTER="ResourceOffersTest.ResourceOfferWithMultipleSlaves" 
> bin/mesos-tests.sh --gtest_repeat=1000 --gtest_break_on_failure=1` was used 
> to test on both OSX and Ubuntu 14.04.
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 44989: Fixed a race in the resource offers tests.

2016-03-19 Thread Adam B

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44989/#review124159
---



"The test is slowed considerably" - Can you provide some stats on how fast it 
is before/after the patch?


src/tests/resource_offers_tests.cpp (line 63)


Why pause so soon? You can wait until after the master is started, but just 
before you start calling StartSlave() in the loop



src/tests/resource_offers_tests.cpp (line 94)


Not yours, but ".Times(1)" is redundant as it is the default behavior. 
Please remove.


- Adam B


On March 17, 2016, 6 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44989/
> ---
> 
> (Updated March 17, 2016, 6 p.m.)
> 
> 
> Review request for mesos, Adam B and Joerg Schad.
> 
> 
> Bugs: MESOS-4849
> https://issues.apache.org/jira/browse/MESOS-4849
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fixed a race in the resource offers tests.
> 
> Adding HTTP credentials to `StartSlave` in 'src/tests/mesos.cpp' has exposed 
> a race condition in ResourceOffersTest.ResourceOfferWithMultipleSlaves. The 
> test quickly runs `StartSlave` 10 times to create 10 agents. Under the 
> covers, `StartSlave` writes data to disk, and it seems that with the 
> additional data being written to disk for HTTP credentials, the filesystem 
> operations for one `StartSlave` call were not completing before the next call.
> 
> By settling the clock in between each invocation of `StartSlave`, this patch 
> fixes the race. The test is slowed considerably, but it is now reliable.
> 
> 
> Diffs
> -
> 
>   src/tests/resource_offers_tests.cpp 
> 1cf292ee7931207596f8f06677386bef5965ef15 
> 
> Diff: https://reviews.apache.org/r/44989/diff/
> 
> 
> Testing
> ---
> 
> `GTEST_FILTER="ResourceOffersTest.ResourceOfferWithMultipleSlaves" 
> bin/mesos-tests.sh --gtest_repeat=1000 --gtest_break_on_failure=1` was used 
> to test on both OSX and Ubuntu 14.04.
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 44989: Fixed a race in the resource offers tests.

2016-03-19 Thread Greg Mann


> On March 17, 2016, 11:16 p.m., Neil Conway wrote:
> > src/tests/resource_offers_tests.cpp, line 63
> > 
> >
> > Style-wise, do we want all tests to resume if they initially pause it? 
> > I think we do a mix of both.

Hmmm good question. I know that I'm responsible for some instances where 
`resume` is *not* called at the end of the test, but now that you mention it, 
it does seem like a good idea. I've added it here. We should do a sweep and 
make this consistent across the tests; I could have a look next week.


- Greg


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44989/#review124116
---


On March 18, 2016, 1 a.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44989/
> ---
> 
> (Updated March 18, 2016, 1 a.m.)
> 
> 
> Review request for mesos, Adam B and Joerg Schad.
> 
> 
> Bugs: MESOS-4849
> https://issues.apache.org/jira/browse/MESOS-4849
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fixed a race in the resource offers tests.
> 
> Adding HTTP credentials to `StartSlave` in 'src/tests/mesos.cpp' has exposed 
> a race condition in ResourceOffersTest.ResourceOfferWithMultipleSlaves. The 
> test quickly runs `StartSlave` 10 times to create 10 agents. Under the 
> covers, `StartSlave` writes data to disk, and it seems that with the 
> additional data being written to disk for HTTP credentials, the filesystem 
> operations for one `StartSlave` call were not completing before the next call.
> 
> By settling the clock in between each invocation of `StartSlave`, this patch 
> fixes the race. The test is slowed considerably, but it is now reliable.
> 
> 
> Diffs
> -
> 
>   src/tests/resource_offers_tests.cpp 
> 1cf292ee7931207596f8f06677386bef5965ef15 
> 
> Diff: https://reviews.apache.org/r/44989/diff/
> 
> 
> Testing
> ---
> 
> `GTEST_FILTER="ResourceOffersTest.ResourceOfferWithMultipleSlaves" 
> bin/mesos-tests.sh --gtest_repeat=1000 --gtest_break_on_failure=1` was used 
> to test on both OSX and Ubuntu 14.04.
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 44989: Fixed a race in the resource offers tests.

2016-03-19 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44989/
---

(Updated March 18, 2016, 4:35 p.m.)


Review request for mesos, Adam B and Joerg Schad.


Bugs: MESOS-4849
https://issues.apache.org/jira/browse/MESOS-4849


Repository: mesos


Description
---

Fixed a race in the resource offers tests.

Adding HTTP credentials to `StartSlave` in 'src/tests/mesos.cpp' has exposed a 
race condition in ResourceOffersTest.ResourceOfferWithMultipleSlaves. The test 
quickly runs `StartSlave` 10 times to create 10 agents. Under the covers, 
`StartSlave` writes data to disk, and it seems that with the additional data 
being written to disk for HTTP credentials, the filesystem operations for one 
`StartSlave` call were not completing before the next call.

By settling the clock in between each invocation of `StartSlave`, this patch 
fixes the race. The test is slowed considerably, but it is now reliable.


Diffs
-

  src/tests/resource_offers_tests.cpp 1cf292ee7931207596f8f06677386bef5965ef15 

Diff: https://reviews.apache.org/r/44989/diff/


Testing
---

`GTEST_FILTER="ResourceOffersTest.ResourceOfferWithMultipleSlaves" 
bin/mesos-tests.sh --gtest_repeat=1000 --gtest_break_on_failure=1` was used to 
test on both OSX and Ubuntu 14.04.


Thanks,

Greg Mann



Re: Review Request 44989: Fixed a race in the resource offers tests.

2016-03-19 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44989/
---

(Updated March 18, 2016, 2:48 a.m.)


Review request for mesos, Adam B and Joerg Schad.


Bugs: MESOS-4849
https://issues.apache.org/jira/browse/MESOS-4849


Repository: mesos


Description
---

Fixed a race in the resource offers tests.

Adding HTTP credentials to `StartSlave` in 'src/tests/mesos.cpp' has exposed a 
race condition in ResourceOffersTest.ResourceOfferWithMultipleSlaves. The test 
quickly runs `StartSlave` 10 times to create 10 agents. Under the covers, 
`StartSlave` writes data to disk, and it seems that with the additional data 
being written to disk for HTTP credentials, the filesystem operations for one 
`StartSlave` call were not completing before the next call.

By settling the clock in between each invocation of `StartSlave`, this patch 
fixes the race. The test is slowed considerably, but it is now reliable.


Diffs
-

  src/tests/resource_offers_tests.cpp 1cf292ee7931207596f8f06677386bef5965ef15 

Diff: https://reviews.apache.org/r/44989/diff/


Testing
---

`GTEST_FILTER="ResourceOffersTest.ResourceOfferWithMultipleSlaves" 
bin/mesos-tests.sh --gtest_repeat=1000 --gtest_break_on_failure=1` was used to 
test on both OSX and Ubuntu 14.04.


Thanks,

Greg Mann



Re: Review Request 44989: Fixed a race in the resource offers tests.

2016-03-19 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44989/
---

(Updated March 18, 2016, 6:55 p.m.)


Review request for mesos, Adam B and Joerg Schad.


Changes
---

Addressed comment.


Bugs: MESOS-4849
https://issues.apache.org/jira/browse/MESOS-4849


Repository: mesos


Description
---

Fixed a race in the resource offers tests.

Adding HTTP credentials to `StartSlave` in 'src/tests/mesos.cpp' has exposed a 
race condition in ResourceOffersTest.ResourceOfferWithMultipleSlaves. The test 
quickly runs `StartSlave` 10 times to create 10 agents. Under the covers, 
`StartSlave` writes data to disk, and it seems that with the additional data 
being written to disk for HTTP credentials, the filesystem operations for one 
`StartSlave` call were not completing before the next call.

By settling the clock in between each invocation of `StartSlave`, this patch 
fixes the race. The test is slower, but it is now reliable. Running the test 3 
times, the old implementation gives an average runtime of 265ms, while the new 
one runs in an average of 359ms.


Diffs (updated)
-

  src/tests/resource_offers_tests.cpp 1cf292ee7931207596f8f06677386bef5965ef15 

Diff: https://reviews.apache.org/r/44989/diff/


Testing
---

`GTEST_FILTER="ResourceOffersTest.ResourceOfferWithMultipleSlaves" 
bin/mesos-tests.sh --gtest_repeat=1000 --gtest_break_on_failure=1` was used to 
test on both OSX and Ubuntu 14.04.


Thanks,

Greg Mann



Re: Review Request 44989: Fixed a race in the resource offers tests.

2016-03-19 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44989/#review124116
---




src/tests/resource_offers_tests.cpp (line 63)


Style-wise, do we want all tests to resume if they initially pause it? I 
think we do a mix of both.



src/tests/resource_offers_tests.cpp (line 75)


Not yours, but this can just be written:

```
slaveFlags.resources = "cpus:2;mem:1024";
```


- Neil Conway


On March 17, 2016, 10:56 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44989/
> ---
> 
> (Updated March 17, 2016, 10:56 p.m.)
> 
> 
> Review request for mesos, Adam B and Joerg Schad.
> 
> 
> Bugs: MESOS-4849
> https://issues.apache.org/jira/browse/MESOS-4849
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fixed a race in the resource offers tests.
> 
> Adding HTTP credentials to `StartSlave` in 'src/tests/mesos.cpp' has exposed 
> a race condition in ResourceOffersTest.ResourceOfferWithMultipleSlaves. The 
> test quickly runs `StartSlave` 10 times to create 10 agents. Under the 
> covers, `StartSlave` writes data to disk, and it seems that with the 
> additional data being written to disk for HTTP credentials, the filesystem 
> operations for one `StartSlave` call were not completing before the next call.
> 
> By settling the clock in between each invocation of `StartSlave`, this patch 
> fixes the race. The test is slowed considerably, but it is now reliable.
> 
> 
> Diffs
> -
> 
>   src/tests/resource_offers_tests.cpp 
> 1cf292ee7931207596f8f06677386bef5965ef15 
> 
> Diff: https://reviews.apache.org/r/44989/diff/
> 
> 
> Testing
> ---
> 
> `GTEST_FILTER="ResourceOffersTest.ResourceOfferWithMultipleSlaves" 
> bin/mesos-tests.sh --gtest_repeat=1000 --gtest_break_on_failure=1` was used 
> to test on both OSX and Ubuntu 14.04.
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 44989: Fixed a race in the resource offers tests.

2016-03-19 Thread Greg Mann


> On March 18, 2016, 9:47 a.m., Adam B wrote:
> > src/tests/resource_offers_tests.cpp, line 63
> > 
> >
> > Why pause so soon? You can wait until after the master is started, but 
> > just before you start calling StartSlave() in the loop
> 
> Joerg Schad wrote:
> I agree with you, but actually we follow this pattern in many other tests 
> as well.
> E.g. 
> // This test ensures that allocation is done per slave. This is done
> // by having 2 slaves and 2 frameworks and making sure each framework
> // gets only one slave's resources during an allocation.
> TEST_F(HierarchicalAllocatorTest, CoarseGrained)
> {
>   // Pausing the clock ensures that the batch allocation does not
>   // influence this test.
> 
> Joerg Schad wrote:
> TEST_F(HierarchicalAllocatorTest, CoarseGrained)
> {
>   // Pausing the clock ensures that the batch allocation does not
>   // influence this test.
>   Clock::pause();
> 
> Neil Conway wrote:
> Is there an advantage to delaying the pause? Running the whole test with 
> the clock paused (and hence pausing the clock at the beginning of the test) 
> seems fine to me.

Personally, I would prefer to pause the clock at the beginning of the test. I 
think it improves readability slightly, as it announces up front that this test 
will be run with the clock paused.


- Greg


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44989/#review124159
---


On March 18, 2016, 6:52 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44989/
> ---
> 
> (Updated March 18, 2016, 6:52 p.m.)
> 
> 
> Review request for mesos, Adam B and Joerg Schad.
> 
> 
> Bugs: MESOS-4849
> https://issues.apache.org/jira/browse/MESOS-4849
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fixed a race in the resource offers tests.
> 
> Adding HTTP credentials to `StartSlave` in 'src/tests/mesos.cpp' has exposed 
> a race condition in ResourceOffersTest.ResourceOfferWithMultipleSlaves. The 
> test quickly runs `StartSlave` 10 times to create 10 agents. Under the 
> covers, `StartSlave` writes data to disk, and it seems that with the 
> additional data being written to disk for HTTP credentials, the filesystem 
> operations for one `StartSlave` call were not completing before the next call.
> 
> By settling the clock in between each invocation of `StartSlave`, this patch 
> fixes the race. The test is slower, but it is now reliable. Running the test 
> 3 times, the old implementation gives an average runtime of 265ms, while the 
> new one runs in an average of 359ms.
> 
> 
> Diffs
> -
> 
>   src/tests/resource_offers_tests.cpp 
> 1cf292ee7931207596f8f06677386bef5965ef15 
> 
> Diff: https://reviews.apache.org/r/44989/diff/
> 
> 
> Testing
> ---
> 
> `GTEST_FILTER="ResourceOffersTest.ResourceOfferWithMultipleSlaves" 
> bin/mesos-tests.sh --gtest_repeat=1000 --gtest_break_on_failure=1` was used 
> to test on both OSX and Ubuntu 14.04.
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 44989: Fixed a race in the resource offers tests.

2016-03-19 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44989/
---

(Updated March 18, 2016, 6:52 p.m.)


Review request for mesos, Adam B and Joerg Schad.


Changes
---

Added timing information.


Bugs: MESOS-4849
https://issues.apache.org/jira/browse/MESOS-4849


Repository: mesos


Description (updated)
---

Fixed a race in the resource offers tests.

Adding HTTP credentials to `StartSlave` in 'src/tests/mesos.cpp' has exposed a 
race condition in ResourceOffersTest.ResourceOfferWithMultipleSlaves. The test 
quickly runs `StartSlave` 10 times to create 10 agents. Under the covers, 
`StartSlave` writes data to disk, and it seems that with the additional data 
being written to disk for HTTP credentials, the filesystem operations for one 
`StartSlave` call were not completing before the next call.

By settling the clock in between each invocation of `StartSlave`, this patch 
fixes the race. The test is slower, but it is now reliable. Running the test 3 
times, the old implementation gives an average runtime of 265ms, while the new 
one runs in an average of 359ms.


Diffs
-

  src/tests/resource_offers_tests.cpp 1cf292ee7931207596f8f06677386bef5965ef15 

Diff: https://reviews.apache.org/r/44989/diff/


Testing
---

`GTEST_FILTER="ResourceOffersTest.ResourceOfferWithMultipleSlaves" 
bin/mesos-tests.sh --gtest_repeat=1000 --gtest_break_on_failure=1` was used to 
test on both OSX and Ubuntu 14.04.


Thanks,

Greg Mann



Re: Review Request 44989: Fixed a race in the resource offers tests.

2016-03-19 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44989/
---

(Updated March 18, 2016, 1 a.m.)


Review request for mesos, Adam B and Joerg Schad.


Changes
---

Addressed comments.


Bugs: MESOS-4849
https://issues.apache.org/jira/browse/MESOS-4849


Repository: mesos


Description
---

Fixed a race in the resource offers tests.

Adding HTTP credentials to `StartSlave` in 'src/tests/mesos.cpp' has exposed a 
race condition in ResourceOffersTest.ResourceOfferWithMultipleSlaves. The test 
quickly runs `StartSlave` 10 times to create 10 agents. Under the covers, 
`StartSlave` writes data to disk, and it seems that with the additional data 
being written to disk for HTTP credentials, the filesystem operations for one 
`StartSlave` call were not completing before the next call.

By settling the clock in between each invocation of `StartSlave`, this patch 
fixes the race. The test is slowed considerably, but it is now reliable.


Diffs (updated)
-

  src/tests/resource_offers_tests.cpp 1cf292ee7931207596f8f06677386bef5965ef15 

Diff: https://reviews.apache.org/r/44989/diff/


Testing
---

`GTEST_FILTER="ResourceOffersTest.ResourceOfferWithMultipleSlaves" 
bin/mesos-tests.sh --gtest_repeat=1000 --gtest_break_on_failure=1` was used to 
test on both OSX and Ubuntu 14.04.


Thanks,

Greg Mann



Re: Review Request 44989: Fixed a race in the resource offers tests.

2016-03-19 Thread Joerg Schad


> On March 18, 2016, 9:47 a.m., Adam B wrote:
> > src/tests/resource_offers_tests.cpp, line 63
> > 
> >
> > Why pause so soon? You can wait until after the master is started, but 
> > just before you start calling StartSlave() in the loop
> 
> Joerg Schad wrote:
> I agree with you, but actually we follow this pattern in many other tests 
> as well.
> E.g. 
> // This test ensures that allocation is done per slave. This is done
> // by having 2 slaves and 2 frameworks and making sure each framework
> // gets only one slave's resources during an allocation.
> TEST_F(HierarchicalAllocatorTest, CoarseGrained)
> {
>   // Pausing the clock ensures that the batch allocation does not
>   // influence this test.

TEST_F(HierarchicalAllocatorTest, CoarseGrained)
{
  // Pausing the clock ensures that the batch allocation does not
  // influence this test.
  Clock::pause();


- Joerg


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44989/#review124159
---


On March 18, 2016, 9:48 a.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44989/
> ---
> 
> (Updated March 18, 2016, 9:48 a.m.)
> 
> 
> Review request for mesos, Adam B and Joerg Schad.
> 
> 
> Bugs: MESOS-4849
> https://issues.apache.org/jira/browse/MESOS-4849
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fixed a race in the resource offers tests.
> 
> Adding HTTP credentials to `StartSlave` in 'src/tests/mesos.cpp' has exposed 
> a race condition in ResourceOffersTest.ResourceOfferWithMultipleSlaves. The 
> test quickly runs `StartSlave` 10 times to create 10 agents. Under the 
> covers, `StartSlave` writes data to disk, and it seems that with the 
> additional data being written to disk for HTTP credentials, the filesystem 
> operations for one `StartSlave` call were not completing before the next call.
> 
> By settling the clock in between each invocation of `StartSlave`, this patch 
> fixes the race. The test is slowed considerably, but it is now reliable.
> 
> 
> Diffs
> -
> 
>   src/tests/resource_offers_tests.cpp 
> 1cf292ee7931207596f8f06677386bef5965ef15 
> 
> Diff: https://reviews.apache.org/r/44989/diff/
> 
> 
> Testing
> ---
> 
> `GTEST_FILTER="ResourceOffersTest.ResourceOfferWithMultipleSlaves" 
> bin/mesos-tests.sh --gtest_repeat=1000 --gtest_break_on_failure=1` was used 
> to test on both OSX and Ubuntu 14.04.
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 44989: Fixed a race in the resource offers tests.

2016-03-19 Thread Joerg Schad

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44989/#review124395
---




src/tests/resource_offers_tests.cpp (line 77)


Not yours, but do we actually need 10 agents here? I guess this also 
contributed to the slowdown mentioned.



src/tests/resource_offers_tests.cpp (line 102)


Aren't you missing a Clock::settle() here?


- Joerg Schad


On March 18, 2016, 6:55 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44989/
> ---
> 
> (Updated March 18, 2016, 6:55 p.m.)
> 
> 
> Review request for mesos, Adam B and Joerg Schad.
> 
> 
> Bugs: MESOS-4849
> https://issues.apache.org/jira/browse/MESOS-4849
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fixed a race in the resource offers tests.
> 
> Adding HTTP credentials to `StartSlave` in 'src/tests/mesos.cpp' has exposed 
> a race condition in ResourceOffersTest.ResourceOfferWithMultipleSlaves. The 
> test quickly runs `StartSlave` 10 times to create 10 agents. Under the 
> covers, `StartSlave` writes data to disk, and it seems that with the 
> additional data being written to disk for HTTP credentials, the filesystem 
> operations for one `StartSlave` call were not completing before the next call.
> 
> By settling the clock in between each invocation of `StartSlave`, this patch 
> fixes the race. The test is slower, but it is now reliable. Running the test 
> 3 times, the old implementation gives an average runtime of 265ms, while the 
> new one runs in an average of 359ms.
> 
> 
> Diffs
> -
> 
>   src/tests/resource_offers_tests.cpp 
> 1cf292ee7931207596f8f06677386bef5965ef15 
> 
> Diff: https://reviews.apache.org/r/44989/diff/
> 
> 
> Testing
> ---
> 
> `GTEST_FILTER="ResourceOffersTest.ResourceOfferWithMultipleSlaves" 
> bin/mesos-tests.sh --gtest_repeat=1000 --gtest_break_on_failure=1` was used 
> to test on both OSX and Ubuntu 14.04.
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 44989: Fixed a race in the resource offers tests.

2016-03-19 Thread Adam B

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44989/#review124390
---




src/tests/resource_offers_tests.cpp (lines 84 - 85)


Perhaps this is why `CreateSlaveFlags()` used `directory.get()` (mkdtemp) 
instead of `os::getcwd()`?
If so, maybe you don't even need this patch, because you introduced the 
race in a prior patch?


- Adam B


On March 18, 2016, 11:55 a.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44989/
> ---
> 
> (Updated March 18, 2016, 11:55 a.m.)
> 
> 
> Review request for mesos, Adam B and Joerg Schad.
> 
> 
> Bugs: MESOS-4849
> https://issues.apache.org/jira/browse/MESOS-4849
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fixed a race in the resource offers tests.
> 
> Adding HTTP credentials to `StartSlave` in 'src/tests/mesos.cpp' has exposed 
> a race condition in ResourceOffersTest.ResourceOfferWithMultipleSlaves. The 
> test quickly runs `StartSlave` 10 times to create 10 agents. Under the 
> covers, `StartSlave` writes data to disk, and it seems that with the 
> additional data being written to disk for HTTP credentials, the filesystem 
> operations for one `StartSlave` call were not completing before the next call.
> 
> By settling the clock in between each invocation of `StartSlave`, this patch 
> fixes the race. The test is slower, but it is now reliable. Running the test 
> 3 times, the old implementation gives an average runtime of 265ms, while the 
> new one runs in an average of 359ms.
> 
> 
> Diffs
> -
> 
>   src/tests/resource_offers_tests.cpp 
> 1cf292ee7931207596f8f06677386bef5965ef15 
> 
> Diff: https://reviews.apache.org/r/44989/diff/
> 
> 
> Testing
> ---
> 
> `GTEST_FILTER="ResourceOffersTest.ResourceOfferWithMultipleSlaves" 
> bin/mesos-tests.sh --gtest_repeat=1000 --gtest_break_on_failure=1` was used 
> to test on both OSX and Ubuntu 14.04.
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 44989: Fixed a race in the resource offers tests.

2016-03-19 Thread Adam B


> On March 18, 2016, 2:47 a.m., Adam B wrote:
> > src/tests/resource_offers_tests.cpp, line 63
> > 
> >
> > Why pause so soon? You can wait until after the master is started, but 
> > just before you start calling StartSlave() in the loop
> 
> Joerg Schad wrote:
> I agree with you, but actually we follow this pattern in many other tests 
> as well.
> E.g. 
> // This test ensures that allocation is done per slave. This is done
> // by having 2 slaves and 2 frameworks and making sure each framework
> // gets only one slave's resources during an allocation.
> TEST_F(HierarchicalAllocatorTest, CoarseGrained)
> {
>   // Pausing the clock ensures that the batch allocation does not
>   // influence this test.
> 
> Joerg Schad wrote:
> TEST_F(HierarchicalAllocatorTest, CoarseGrained)
> {
>   // Pausing the clock ensures that the batch allocation does not
>   // influence this test.
>   Clock::pause();
> 
> Neil Conway wrote:
> Is there an advantage to delaying the pause? Running the whole test with 
> the clock paused (and hence pausing the clock at the beginning of the test) 
> seems fine to me.
> 
> Greg Mann wrote:
> Personally, I would prefer to pause the clock at the beginning of the 
> test. I think it improves readability slightly, as it announces up front that 
> this test will be run with the clock paused.

sgtm, dropping


- Adam


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44989/#review124159
---


On March 18, 2016, 11:55 a.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44989/
> ---
> 
> (Updated March 18, 2016, 11:55 a.m.)
> 
> 
> Review request for mesos, Adam B and Joerg Schad.
> 
> 
> Bugs: MESOS-4849
> https://issues.apache.org/jira/browse/MESOS-4849
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fixed a race in the resource offers tests.
> 
> Adding HTTP credentials to `StartSlave` in 'src/tests/mesos.cpp' has exposed 
> a race condition in ResourceOffersTest.ResourceOfferWithMultipleSlaves. The 
> test quickly runs `StartSlave` 10 times to create 10 agents. Under the 
> covers, `StartSlave` writes data to disk, and it seems that with the 
> additional data being written to disk for HTTP credentials, the filesystem 
> operations for one `StartSlave` call were not completing before the next call.
> 
> By settling the clock in between each invocation of `StartSlave`, this patch 
> fixes the race. The test is slower, but it is now reliable. Running the test 
> 3 times, the old implementation gives an average runtime of 265ms, while the 
> new one runs in an average of 359ms.
> 
> 
> Diffs
> -
> 
>   src/tests/resource_offers_tests.cpp 
> 1cf292ee7931207596f8f06677386bef5965ef15 
> 
> Diff: https://reviews.apache.org/r/44989/diff/
> 
> 
> Testing
> ---
> 
> `GTEST_FILTER="ResourceOffersTest.ResourceOfferWithMultipleSlaves" 
> bin/mesos-tests.sh --gtest_repeat=1000 --gtest_break_on_failure=1` was used 
> to test on both OSX and Ubuntu 14.04.
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 44989: Fixed a race in the resource offers tests.

2016-03-19 Thread Joerg Schad


> On March 18, 2016, 9:47 a.m., Adam B wrote:
> > src/tests/resource_offers_tests.cpp, line 63
> > 
> >
> > Why pause so soon? You can wait until after the master is started, but 
> > just before you start calling StartSlave() in the loop

I agree with you, but actually we follow this pattern in many other tests as 
well.
E.g. 
// This test ensures that allocation is done per slave. This is done
// by having 2 slaves and 2 frameworks and making sure each framework
// gets only one slave's resources during an allocation.
TEST_F(HierarchicalAllocatorTest, CoarseGrained)
{
  // Pausing the clock ensures that the batch allocation does not
  // influence this test.


- Joerg


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44989/#review124159
---


On March 18, 2016, 9:48 a.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44989/
> ---
> 
> (Updated March 18, 2016, 9:48 a.m.)
> 
> 
> Review request for mesos, Adam B and Joerg Schad.
> 
> 
> Bugs: MESOS-4849
> https://issues.apache.org/jira/browse/MESOS-4849
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fixed a race in the resource offers tests.
> 
> Adding HTTP credentials to `StartSlave` in 'src/tests/mesos.cpp' has exposed 
> a race condition in ResourceOffersTest.ResourceOfferWithMultipleSlaves. The 
> test quickly runs `StartSlave` 10 times to create 10 agents. Under the 
> covers, `StartSlave` writes data to disk, and it seems that with the 
> additional data being written to disk for HTTP credentials, the filesystem 
> operations for one `StartSlave` call were not completing before the next call.
> 
> By settling the clock in between each invocation of `StartSlave`, this patch 
> fixes the race. The test is slowed considerably, but it is now reliable.
> 
> 
> Diffs
> -
> 
>   src/tests/resource_offers_tests.cpp 
> 1cf292ee7931207596f8f06677386bef5965ef15 
> 
> Diff: https://reviews.apache.org/r/44989/diff/
> 
> 
> Testing
> ---
> 
> `GTEST_FILTER="ResourceOffersTest.ResourceOfferWithMultipleSlaves" 
> bin/mesos-tests.sh --gtest_repeat=1000 --gtest_break_on_failure=1` was used 
> to test on both OSX and Ubuntu 14.04.
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 44989: Fixed a race in the resource offers tests.

2016-03-18 Thread Greg Mann


> On March 18, 2016, 9:47 a.m., Adam B wrote:
> > "The test is slowed considerably" - Can you provide some stats on how fast 
> > it is before/after the patch?

Good call - timing information has been added, and it actually wasn't as bad as 
I thought.


- Greg


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44989/#review124159
---


On March 18, 2016, 6:55 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44989/
> ---
> 
> (Updated March 18, 2016, 6:55 p.m.)
> 
> 
> Review request for mesos, Adam B and Joerg Schad.
> 
> 
> Bugs: MESOS-4849
> https://issues.apache.org/jira/browse/MESOS-4849
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fixed a race in the resource offers tests.
> 
> Adding HTTP credentials to `StartSlave` in 'src/tests/mesos.cpp' has exposed 
> a race condition in ResourceOffersTest.ResourceOfferWithMultipleSlaves. The 
> test quickly runs `StartSlave` 10 times to create 10 agents. Under the 
> covers, `StartSlave` writes data to disk, and it seems that with the 
> additional data being written to disk for HTTP credentials, the filesystem 
> operations for one `StartSlave` call were not completing before the next call.
> 
> By settling the clock in between each invocation of `StartSlave`, this patch 
> fixes the race. The test is slower, but it is now reliable. Running the test 
> 3 times, the old implementation gives an average runtime of 265ms, while the 
> new one runs in an average of 359ms.
> 
> 
> Diffs
> -
> 
>   src/tests/resource_offers_tests.cpp 
> 1cf292ee7931207596f8f06677386bef5965ef15 
> 
> Diff: https://reviews.apache.org/r/44989/diff/
> 
> 
> Testing
> ---
> 
> `GTEST_FILTER="ResourceOffersTest.ResourceOfferWithMultipleSlaves" 
> bin/mesos-tests.sh --gtest_repeat=1000 --gtest_break_on_failure=1` was used 
> to test on both OSX and Ubuntu 14.04.
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 44989: Fixed a race in the resource offers tests.

2016-03-18 Thread Benjamin Bannier


> On March 18, 2016, 12:16 a.m., Neil Conway wrote:
> > src/tests/resource_offers_tests.cpp, line 63
> > 
> >
> > Style-wise, do we want all tests to resume if they initially pause it? 
> > I think we do a mix of both.
> 
> Greg Mann wrote:
> Hmmm good question. I know that I'm responsible for some instances where 
> `resume` is *not* called at the end of the test, but now that you mention it, 
> it does seem like a good idea. I've added it here. We should do a sweep and 
> make this consistent across the tests; I could have a look next week.

Note that if an `ASSERT*` fails a `Clock::resume` at the end of the test would 
never be called, so one would expect something like this to be handled 
elsewhere, something we already do, see 
https://github.com/apache/mesos/blob/master/src/tests/mesos.hpp#L984 or 
https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/tests/main.cpp#L59.


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44989/#review124116
---


On March 18, 2016, 10:48 a.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44989/
> ---
> 
> (Updated March 18, 2016, 10:48 a.m.)
> 
> 
> Review request for mesos, Adam B and Joerg Schad.
> 
> 
> Bugs: MESOS-4849
> https://issues.apache.org/jira/browse/MESOS-4849
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fixed a race in the resource offers tests.
> 
> Adding HTTP credentials to `StartSlave` in 'src/tests/mesos.cpp' has exposed 
> a race condition in ResourceOffersTest.ResourceOfferWithMultipleSlaves. The 
> test quickly runs `StartSlave` 10 times to create 10 agents. Under the 
> covers, `StartSlave` writes data to disk, and it seems that with the 
> additional data being written to disk for HTTP credentials, the filesystem 
> operations for one `StartSlave` call were not completing before the next call.
> 
> By settling the clock in between each invocation of `StartSlave`, this patch 
> fixes the race. The test is slowed considerably, but it is now reliable.
> 
> 
> Diffs
> -
> 
>   src/tests/resource_offers_tests.cpp 
> 1cf292ee7931207596f8f06677386bef5965ef15 
> 
> Diff: https://reviews.apache.org/r/44989/diff/
> 
> 
> Testing
> ---
> 
> `GTEST_FILTER="ResourceOffersTest.ResourceOfferWithMultipleSlaves" 
> bin/mesos-tests.sh --gtest_repeat=1000 --gtest_break_on_failure=1` was used 
> to test on both OSX and Ubuntu 14.04.
> 
> 
> Thanks,
> 
> Greg Mann
> 
>