Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22631#discussion_r223202913
  
    --- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
 ---
    @@ -110,7 +112,7 @@ class CastSuite extends SparkFunSuite with 
ExpressionEvalHelper {
       }
     
       test("cast string to timestamp") {
    -    for (tz <- ALL_TIMEZONES) {
    +    for (tz <- Random.shuffle(ALL_TIMEZONES).take(50)) {
    --- End diff --
    
    @srowen I think the point here is to test that this works with different 
timezones. In `DateExpressionsSuite`, for instance, we test only 3-4 timezones. 
I don't think it makes sense to test every of the 650 possible timezones: if it 
works with many of them, then it means that the code is generic and respects 
timezones. We can also define a fixed subset of timezones, but IMHO taking 
randomly some of them provides the additional safety that if there is a 
specific single timezone creating problem, we are able to identify it on 
several subsequent runs.
    
    We have many place where we generate data randomly in the test, so we 
already have randomness in the tests. I think the rationale behind them is the 
same: if the function works with some different data, then it generalize 
properly.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to