Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22638#discussion_r222952924 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetCacheSuite.scala --- @@ -127,16 +127,16 @@ class DatasetCacheSuite extends QueryTest with SharedSQLContext with TimeLimits } test("cache UDF result correctly") { - val expensiveUDF = udf({x: Int => Thread.sleep(5000); x}) - val df = spark.range(0, 10).toDF("a").withColumn("b", expensiveUDF($"a")) + val expensiveUDF = udf({x: Int => Thread.sleep(2000); x}) --- End diff -- well, I do think this will pass 100% times, my concern was that in case of a regression we might fail detecting it. But yes, with the repartition to 1 you're right, I haven't considered it, otherwise they may have run in parallel. So this seems enough.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org