Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22631#discussion_r223226770
  
    --- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
 ---
    @@ -110,7 +112,7 @@ class CastSuite extends SparkFunSuite with 
ExpressionEvalHelper {
       }
     
       test("cast string to timestamp") {
    -    for (tz <- ALL_TIMEZONES) {
    +    for (tz <- Random.shuffle(ALL_TIMEZONES).take(50)) {
    --- End diff --
    
    Yes, there are many tests where data is randomly generated. And they are 
not seeded of course.
    
    As I said, I think the goal here is to test that the function works well 
with different timezones: then picking a subset of timezones would be fine too, 
but I prefer taking them randomly among all because if there is a single 
timezone creating issues (very unlikely IMHO), we would discover it anyway (not 
on the single run though).
    
    Moreover, it would be great then to be consistent among all the codebase on 
what we test. In `DateExpressionsSuite` we test only 3 timezones and here we 
test all 650: it is a weird, isn't it? We should probably define which is the 
right thing to do when timezones are involved and test always the same. 
Otherwise, testing 650 timezones on a single specific function and 3 on the 
most of the others is a nonsense IMHO.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to