Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22631#discussion_r223226770 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala --- @@ -110,7 +112,7 @@ class CastSuite extends SparkFunSuite with ExpressionEvalHelper { } test("cast string to timestamp") { - for (tz <- ALL_TIMEZONES) { + for (tz <- Random.shuffle(ALL_TIMEZONES).take(50)) { --- End diff -- Yes, there are many tests where data is randomly generated. And they are not seeded of course. As I said, I think the goal here is to test that the function works well with different timezones: then picking a subset of timezones would be fine too, but I prefer taking them randomly among all because if there is a single timezone creating issues (very unlikely IMHO), we would discover it anyway (not on the single run though). Moreover, it would be great then to be consistent among all the codebase on what we test. In `DateExpressionsSuite` we test only 3 timezones and here we test all 650: it is a weird, isn't it? We should probably define which is the right thing to do when timezones are involved and test always the same. Otherwise, testing 650 timezones on a single specific function and 3 on the most of the others is a nonsense IMHO.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org