Re: Review Request 54889: Handled all possible offers in test.

Neil Conway Wed, 21 Dec 2016 08:41:28 -0800


> On Dec. 20, 2016, 4:41 p.m., Neil Conway wrote:
> > Can we add an expectation for exactly when we expect to receive the offer? 
> > i.e., another `WillOnce(FutureSatisfy(...))` and then wait for the future 
> > at the appropriate time.
> > 
> > The thing I don't like about the `WillRepeatedly(...)` pattern is that it 
> > obscures whether additional offers are actually expected or not. Lots of 
> > tests have copied this pattern and use it, even when another offer is not 
> > actually expected.
> 
> Benjamin Bannier wrote:
>     This test only uses a single offer, and we do not know when or even if a 
> second offer would arrive since the test does not run with a paused clock, 
> https://github.com/apache/mesos/blob/1e72605e9892eb4e518442ab9c1fe2a1a1696748/src/tests/fault_tolerance_tests.cpp#L853.
>  If execution on the thread running in the test body is delayed sufficiently 
> after the clock is resumed, additional offers might be made concurrently with 
> that part of the test body.
>     
>     It looks to me that as a general rule of thumb, if tests do not pause the 
> clock, one should always ignore subsequent calls to mock interfaces in order 
> to avoid this gmock assertion.
> 
> Neil Conway wrote:
>     Right, although there are plenty of other situations in which _arbitrary_ 
> delays in tests will cause failures (e.g., exceeding the registry read/write 
> timeouts, or even causing "sleep 60" dummy tasks to exit). So I'm not sure 
> that this problem is unique to making offers.
>     
>     It would be good to explore pausing the clock by default for all tests 
> (MESOS-4101). But in the short term, what about making the default allocation 
> interval much higher for tests? Say 60 seconds. Test code should generally 
> not be depending on batch allocations anyway (because it needlessly slows 
> down the test), so this would make such poorly written tests easier to find.
> 
> Benjamin Bannier wrote:
>     Yes, that makes sense.
>     
>     Would you mind taking this over? You already started fixing this test (it 
> wasn't pretty before, but also not flaky then), but we learnt in the meantime 
> and it would be great if we would fix this test completely. Like I already 
> mentioned to you offline, it would be a good idea to also think a little 
> harder about your approach to comparing doubles 
> (https://github.com/apache/mesos/blob/d2117362349ab4c383045720f77d42b2d9fd6871/src/tests/fault_tolerance_tests.cpp#L881-L888),
>  see https://issues.apache.org/jira/browse/MESOS-4695 for the kind of 
> failures this pattern can cause.


I think the current fix is fine for now. I'll take a look at the floating point 
comparison stuff and perhaps changing the `allocation_interval` later.


- Neil


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54889/#review159727
-----------------------------------------------------------


On Dec. 20, 2016, 10:15 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54889/
> -----------------------------------------------------------
> 
> (Updated Dec. 20, 2016, 10:15 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov and Neil Conway.
> 
> 
> Bugs: MESOS-6820
>     https://issues.apache.org/jira/browse/MESOS-6820
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The test FaultToleranceTest.FrameworkReregister observes a single
> offer in order to monitor progress and as a trigger to initiate other
> actions later in the test. The change installs expectations for
> possible additional offers. This allows for the thread running the
> test body to be delayed with respect to the master thread which could
> in the meantime make additional offers.
> 
> 
> Diffs
> -----
> 
>   src/tests/fault_tolerance_tests.cpp 
> 05937a917a2c175aa53b52488febb7cfd8400a13 
> 
> Diff: https://reviews.apache.org/r/54889/diff/
> 
> 
> Testing
> -------
> 
> `make check` (OS X, clang-trunk, w/ optimizations, SSL)
> 
> Before this fix `FaultToleranceTest.FrameworkReregister` failed for me 
> multiple times in ~10k iterations; after this patch it didn't fail in a 
> single run I stopped after 170k iterations.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>

Re: Review Request 54889: Handled all possible offers in test.

Reply via email to