On Mon, Jan 16, 2012 at 10:35 PM, Henry Robinson <he...@cloudera.com> wrote:
> On 16 January 2012 17:36, Patrick Hunt <ph...@apache.org> wrote:
>
>> On Sun, Jan 15, 2012 at 11:39 PM, Henry Robinson <he...@cloudera.com>
>> wrote:
>> > Hi -
>> >
>> > The unit tests are taking longer and longer to run, particularly
>> locally. I
>> > was poking about looking for some easy wins, and I noticed that a lot of
>> > the time is spent waiting for servers to come up, which is heavily
>> > dependent on the tick time. Lo and behold, dropping the tick time on (for
>> > example) QuorumPeerMainTest from 4s to 100ms made the test suite quicker
>> by
>> > 30s.
>> >
>> > On builds.apache.org it's not a great idea to reduce the tick time too
>> far
>> > because it generally runs on more contended hardware so timeouts get hit,
>> > but what if we just increase the session expiration time commensurately?
>> We
>> > could set a 500ms tick time with a 30s (or more) max session expiration
>> > time. Latencies due to waiting for servers to start should be lower, but
>> > the tests should remain as stable.
>> >
>> > Any thoughts? Any other ways we can tighten up the test suite runtime?
>>
>> I'd be concerned that we were testing with a different setting than
>> most users set. Would we be more or less likely to find issues by
>> setting this lower?
>>
>
> That's a good point, but I don't know that we can really say the tests as
> they stand are at all representative of what real users are doing. The unit
> tests have ensembles co-located on the same machine, with very synthetic
> workloads - I don't think they mimic production environments, nor should
> they.
>

That's true. Really we should run testing with various tick time
settings. Currently we happen to use 2sec which is pretty consistent
with that I've seen users do in production. Also that's what we have
in the sample config.

> Many of the tests start a cluster and then wait for some condition to be
> true, or a timeout to occur (then aborting). I'm suggesting keeping most of
> the timeouts to be similar lengths, but to poll more frequently so that we
> don't waste time waiting to wake up to check the condition, if that makes
> sense.
>

Yes, that makes sense. This is a good idea - what if we made the
tickTime a parameter to the tests (system prop or something) with a
default that's the current default. We could then set the tick time
lower for say "test-commit" and higher (what we currently have set,
the default) for "test" when we run as part of CI or the patch build.
We could even have other jobs at some point that vary the tick time
more widely...

> I like your idea of splitting tests into categories. I think a lot of the
> current tests should exist in the test-commit category but currently take a
> bit long to run. The 'hammer' tests are great examples of tests that should
> be in the full suite, since they're not really testing for a specific
> property, but the QuorumPeerMain tests are mostly testing a very specific
> scenario.
>

Agree.

> I filed ZOOKEEPER-1363 to deal with splitting the tests up by category
> (ZOOKEEPER-725, the only other place I saw this mentioned, is a bit more
> general).

Sounds good!

Patrick

Reply via email to