Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22631#discussion_r223285813
  
    --- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
 ---
    @@ -110,7 +112,7 @@ class CastSuite extends SparkFunSuite with 
ExpressionEvalHelper {
       }
     
       test("cast string to timestamp") {
    -    for (tz <- ALL_TIMEZONES) {
    +    for (tz <- Random.shuffle(ALL_TIMEZONES).take(50)) {
    --- End diff --
    
    I think tests need to be deterministic in general as well.
    
    In this particular case ideally, we should categorize timezones and test 
fixed set. For instance, timezone with DST, without DST, and some exceptions 
such as, for instance, see this particular case which Python 3.6 addressed 
lately 
(https://github.com/python/cpython/blob/e42b705188271da108de42b55d9344642170aa2b/Lib/datetime.py#L1572-L1574),
 IMHO.
    
    Of course, this approach requires a lot of investigations and overheads. 
So, as an alternative, I would incline to go for Sean's approach 
(https://github.com/apache/spark/pull/22631/files#r223224573) for this 
particular case.
    
    For randomness, I think primarily we should have first deterministic set of 
tests. Maybe we could additionally have some set of randomized input to cover 
some cases we haven't foreseen but that's secondary.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to