Marco - the issue is to reproduce. It is much more annoying for somebody else who might not have touched this test case to be able to reproduce the error, just given a timezone. It is much easier to just follow some documentation saying "please run TEST_SEED=5 build/sbt ~.... ".
On Mon, Oct 8, 2018 at 4:33 PM Marco Gaido <marcogaid...@gmail.com> wrote: > Hi all, > > thanks for bringing up the topic Sean. I agree too with Reynold's idea, > but in the specific case, if there is an error the timezone is part of the > error message. > So we know exactly which timezone caused the failure. Hence I thought that > logging the seed is not necessary, as we can directly use the failing > timezone. > > Thanks, > Marco > > Il giorno lun 8 ott 2018 alle ore 16:24 Xiao Li <lix...@databricks.com> > ha scritto: > >> For this specific case, I do not think we should test all the timezone. >> If this is fast, I am fine to leave it unchanged. However, this is very >> slow. Thus, I even prefer to reducing the tested timezone to a smaller >> number or just hardcoding some specific time zones. >> >> In general, I like Reynold’s idea by including the seed value and we add >> the seed name in the test case name. This can help us reproduce it. >> >> Xiao >> >> On Mon, Oct 8, 2018 at 7:08 AM Reynold Xin <r...@databricks.com> wrote: >> >>> I'm personally not a big fan of doing it that way in the PR. It is >>> perfectly fine to employ randomized tests, and in this case it might even >>> be fine to just pick couple different timezones like the way it happened in >>> the PR, but we should: >>> >>> 1. Document in the code comment why we did it that way. >>> >>> 2. Use a seed and log the seed, so any test failures can be reproduced >>> deterministically. For this one, it'd be better to pick the seed from a >>> seed environmental variable. If the env variable is not set, set to a >>> random seed. >>> >>> >>> >>> On Mon, Oct 8, 2018 at 3:05 PM Sean Owen <sro...@gmail.com> wrote: >>> >>>> Recently, I've seen 3 pull requests that try to speed up a test suite >>>> that tests a bunch of cases by randomly choosing different subsets of >>>> cases to test on each Jenkins run. >>>> >>>> There's disagreement about whether this is good approach to improving >>>> test runtime. Here's a discussion on one that was committed: >>>> https://github.com/apache/spark/pull/22631/files#r223190476 >>>> >>>> I'm flagging it for more input. >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>> >>>>