Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22631#discussion_r223292455 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala --- @@ -110,7 +112,7 @@ class CastSuite extends SparkFunSuite with ExpressionEvalHelper { } test("cast string to timestamp") { - for (tz <- ALL_TIMEZONES) { + for (tz <- Random.shuffle(ALL_TIMEZONES).take(50)) { --- End diff -- I don't think that adding parallelism is a good way for improve test time. The amount of resources used for testing is anyway limited. I think the goal here is not (only) reduce the overall time of the test but also reduce the amount of resources needed to test. Problems with a specific timezone like you mentioned, @HyukjinKwon, are exactly the reason why I am proposing this randomized approach, rather than picking 3 timezones and always use them as done in `DateExpressionsSuite`: if there is a problem with a specific timezone, in this way, we will be able to catch it. With a fixed subset of them (even though not on the single run), we are not. The only safe deterministic way would be to run against all of them, as it was done before, but then I'd argue that we should do the same everywhere we have different timezones involved in tests (why are we testing all timezones only for casting to timestamp and not for all other functions involving dates and times, if it is so important to check all timezones?). But then the amount of time needed to run all the tests would be crazy, so it is not doable.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org