Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22631#discussion_r223292455
  
    --- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
 ---
    @@ -110,7 +112,7 @@ class CastSuite extends SparkFunSuite with 
ExpressionEvalHelper {
       }
     
       test("cast string to timestamp") {
    -    for (tz <- ALL_TIMEZONES) {
    +    for (tz <- Random.shuffle(ALL_TIMEZONES).take(50)) {
    --- End diff --
    
    I don't think that adding parallelism is a good way for improve test time. 
The amount of resources used for testing is anyway limited. I think the goal 
here is not (only) reduce the overall time of the test but also reduce the 
amount of resources needed to test.
    
    Problems with a specific timezone like you mentioned, @HyukjinKwon, are 
exactly the reason why I am proposing this randomized approach, rather than 
picking 3 timezones and always use them as done in  `DateExpressionsSuite`: if 
there is a problem with a specific timezone, in this way, we will be able to 
catch it. With a fixed subset of them (even though not on the single run), we 
are not.
    
    The only safe deterministic way would be to run against all of them, as it 
was done before, but then I'd argue that we should do the same everywhere we 
have different timezones involved in tests (why are we testing all timezones 
only for casting to timestamp and not for all other functions involving dates 
and times, if it is so important to check all timezones?). But then the amount 
of time needed to run all the tests would be crazy, so it is not doable.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to