Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22631#discussion_r223224573 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala --- @@ -110,7 +112,7 @@ class CastSuite extends SparkFunSuite with ExpressionEvalHelper { } test("cast string to timestamp") { - for (tz <- ALL_TIMEZONES) { + for (tz <- Random.shuffle(ALL_TIMEZONES).take(50)) { --- End diff -- Tests should be deterministic, ideally; any sources of randomness should be seeded. Do you see one that isn't? I think this is like deciding we'll run just 90% of all test suites every time randomly, to save time. I think it's just well against good practice. There are other solutions: 1) pick a subset of timezones that we're confident do exercise the code and just explicitly test those 2) parallelize these tests within the test suite The latter should be trivial in this case: `ALL_TIMEZONES.par.foreach { tz =>` instead. It's the same amount of work but 8x, 16x faster by wall clock time, depending on how many cores are available. What about that?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org