Re: Random sampling in tests

Marco Gaido Mon, 08 Oct 2018 07:37:17 -0700

Yes, I see. It makes sense.
Thanks.

Il giorno lun 8 ott 2018 alle ore 16:35 Reynold Xin <r...@databricks.com>
ha scritto:


> Marco - the issue is to reproduce. It is much more annoying for somebody
> else who might not have touched this test case to be able to reproduce the
> error, just given a timezone. It is much easier to just follow some
> documentation saying "please run TEST_SEED=5 build/sbt ~.... ".
>
>
> On Mon, Oct 8, 2018 at 4:33 PM Marco Gaido <marcogaid...@gmail.com> wrote:
>
>> Hi all,
>>
>> thanks for bringing up the topic Sean. I agree too with Reynold's idea,
>> but in the specific case, if there is an error the timezone is part of the
>> error message.
>> So we know exactly which timezone caused the failure. Hence I thought
>> that logging the seed is not necessary, as we can directly use the failing
>> timezone.
>>
>> Thanks,
>> Marco
>>
>> Il giorno lun 8 ott 2018 alle ore 16:24 Xiao Li <lix...@databricks.com>
>> ha scritto:
>>
>>> For this specific case, I do not think we should test all the timezone.
>>> If this is fast, I am fine to leave it unchanged. However, this is very
>>> slow. Thus, I even prefer to reducing the tested timezone to a smaller
>>> number or just hardcoding some specific time zones.
>>>
>>> In general, I like Reynold’s idea by including the seed value and we add
>>> the seed name in the test case name. This can help us reproduce it.
>>>
>>> Xiao
>>>
>>> On Mon, Oct 8, 2018 at 7:08 AM Reynold Xin <r...@databricks.com> wrote:
>>>
>>>> I'm personally not a big fan of doing it that way in the PR. It is
>>>> perfectly fine to employ randomized tests, and in this case it might even
>>>> be fine to just pick couple different timezones like the way it happened in
>>>> the PR, but we should:
>>>>
>>>> 1. Document in the code comment why we did it that way.
>>>>
>>>> 2. Use a seed and log the seed, so any test failures can be reproduced
>>>> deterministically. For this one, it'd be better to pick the seed from a
>>>> seed environmental variable. If the env variable is not set, set to a
>>>> random seed.
>>>>
>>>>
>>>>
>>>> On Mon, Oct 8, 2018 at 3:05 PM Sean Owen <sro...@gmail.com> wrote:
>>>>
>>>>> Recently, I've seen 3 pull requests that try to speed up a test suite
>>>>> that tests a bunch of cases by randomly choosing different subsets of
>>>>> cases to test on each Jenkins run.
>>>>>
>>>>> There's disagreement about whether this is good approach to improving
>>>>> test runtime. Here's a discussion on one that was committed:
>>>>> https://github.com/apache/spark/pull/22631/files#r223190476
>>>>>
>>>>> I'm flagging it for more input.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>
>>>>>

Re: Random sampling in tests

Reply via email to