Re: Test failures are out of control......

Erick Erickson Sun, 25 Feb 2018 20:55:21 -0800

That's fine. I'm not totally clear what the "anti-regression" path
forward is. This should make tests less flakey, right? I'd guess that
if we test with badapples=true and don't get failures for a while for
some tests we'll try un-BadAppling tests as time passes.


Erick

P.S. Besides,it's already done ;)

On Sun, Feb 25, 2018 at 12:55 PM, Mikhail Khludnev <[email protected]> wrote:
> TLDR;
> I'm going to push https://issues.apache.org/jira/browse/SOLR-12027 in a day.
> Let me know if you think it's a bad idea.
>
> On Fri, Feb 23, 2018 at 8:06 PM, Erick Erickson <[email protected]>
> wrote:
>>
>> Testing distributed systems requires, well, distributed systems which
>> is what starting clusters is all about. The great leap of faith of
>> individual-method unit testing is that if all the small parts are
>> tested, combining them in various ways will "just work". This is
>> emphatically not true with distributed systems.
>>
>> Which is also one of the reasons some of the tests are long. It takes
>> time (as you pointed out) to set up a cluster. So once a cluster is
>> started, testing a bunch of things amortizes the expense of setting up
>> the cluster. If each test of some bit of distributed functionality set
>> up and tore down a cluster, that would extend the time it takes to run
>> a full test suite by quite a bit. Note this is mostly a problem in
>> Solr, Lucene tests tend to run much faster.
>>
>> What Dawid said about randomness. All the randomization functions are
>> controlled by the "seed", that's what the "reproduce with" line in the
>> results is all about.  That "controlled randomization" has uncovered
>> any number of bugs for obscure things that would have been vastly more
>> painful to discover otherwise. One example I remember went along the
>> lines of "this particular functionality is broken when op systems X
>> thinks it's in the Turkish locale". Which is _also_ why all tests must
>> use the framework random() method provided by LuceneTestCase and never
>> the Java random functions.
>>
>> For that matter, one _other_ problem uncovered by the randomness is
>> that tests in a suite are executed in different order with different
>> seeds, so side effects of one test method that would affect another
>> are flushed out.
>>
>> Mind you, this doesn't help with race conditions that are sensitive
>> to, say, the clock speed of the machine you're running on....
>>
>> All that said, there's plenty of room for improving our tests. I'm
>> sure there are tests that spin up a cluster that don't need to.  All
>> patches welcome of course.
>>
>> Best,
>> Erick
>>
>>
>>
>> On Fri, Feb 23, 2018 at 8:20 AM, Dawid Weiss <[email protected]>
>> wrote:
>> >> Randomness makes it difficult to correlate a failure to the commit that
>> >> made
>> >> the test to fail (as was pointed out earlier in the discussion). If
>> >> each
>> >> execution path is different, it may very well be that a failure you
>> >> experience is introduced several commits ago, so it may not be your
>> >> fault.
>> >
>> > This is true only to a certain degree. If you  don't randomize all you
>> > do is essentially run a fixed scenario. This protects you against a
>> > regression in this particular state, but it doesn't help in
>> > discovering new corner cases or environment quirks, which would be
>> > prohibitive to run as a full Cartesian product of all possibilities.
>> > So there is a tradeoff here and most folks in this project have agreed
>> > to it. If you look at how many problems randomization have helped
>> > discover I think it's a good tradeoff.
>> >
>> > Finally: your scenario can be actually reproduced with ease. Run the
>> > tests with a fixed seed before you apply a patch and after you apply
>> > it... if there is no regression you can assume your patch is fine (but
>> > it doesn't mean it won't fail later on on a different seed, which
>> > nobody will blame you for).
>> >
>> > Dawid
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [email protected]
>> > For additional commands, e-mail: [email protected]
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Test failures are out of control......

Reply via email to