Yes, I see. It makes sense. Thanks. Il giorno lun 8 ott 2018 alle ore 16:35 Reynold Xin <r...@databricks.com> ha scritto:
> Marco - the issue is to reproduce. It is much more annoying for somebody > else who might not have touched this test case to be able to reproduce the > error, just given a timezone. It is much easier to just follow some > documentation saying "please run TEST_SEED=5 build/sbt ~.... ". > > > On Mon, Oct 8, 2018 at 4:33 PM Marco Gaido <marcogaid...@gmail.com> wrote: > >> Hi all, >> >> thanks for bringing up the topic Sean. I agree too with Reynold's idea, >> but in the specific case, if there is an error the timezone is part of the >> error message. >> So we know exactly which timezone caused the failure. Hence I thought >> that logging the seed is not necessary, as we can directly use the failing >> timezone. >> >> Thanks, >> Marco >> >> Il giorno lun 8 ott 2018 alle ore 16:24 Xiao Li <lix...@databricks.com> >> ha scritto: >> >>> For this specific case, I do not think we should test all the timezone. >>> If this is fast, I am fine to leave it unchanged. However, this is very >>> slow. Thus, I even prefer to reducing the tested timezone to a smaller >>> number or just hardcoding some specific time zones. >>> >>> In general, I like Reynold’s idea by including the seed value and we add >>> the seed name in the test case name. This can help us reproduce it. >>> >>> Xiao >>> >>> On Mon, Oct 8, 2018 at 7:08 AM Reynold Xin <r...@databricks.com> wrote: >>> >>>> I'm personally not a big fan of doing it that way in the PR. It is >>>> perfectly fine to employ randomized tests, and in this case it might even >>>> be fine to just pick couple different timezones like the way it happened in >>>> the PR, but we should: >>>> >>>> 1. Document in the code comment why we did it that way. >>>> >>>> 2. Use a seed and log the seed, so any test failures can be reproduced >>>> deterministically. For this one, it'd be better to pick the seed from a >>>> seed environmental variable. If the env variable is not set, set to a >>>> random seed. >>>> >>>> >>>> >>>> On Mon, Oct 8, 2018 at 3:05 PM Sean Owen <sro...@gmail.com> wrote: >>>> >>>>> Recently, I've seen 3 pull requests that try to speed up a test suite >>>>> that tests a bunch of cases by randomly choosing different subsets of >>>>> cases to test on each Jenkins run. >>>>> >>>>> There's disagreement about whether this is good approach to improving >>>>> test runtime. Here's a discussion on one that was committed: >>>>> https://github.com/apache/spark/pull/22631/files#r223190476 >>>>> >>>>> I'm flagging it for more input. >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>> >>>>>