AngersZhuuuu commented on issue #26500: [SPARK-29874][SQL]Optimize Dataset.isEmpty() URL: https://github.com/apache/spark/pull/26500#issuecomment-556952943 ``` test("benchmark of empty") { var start = System.currentTimeMillis() var isEmpty = spark.range(10000000) .repartition(100) .limit(1) .groupBy() .count() .queryExecution.executedPlan.executeCollect().head.getLong(0) == 0 println(isEmpty) var end = System.currentTimeMillis() // scalastyle:off println(s"duration = ${end - start}") start = System.currentTimeMillis() isEmpty = spark.range(10000000) .repartition(100) .select() .queryExecution.executedPlan.executeTake(1) == 0 println(isEmpty) end = System.currentTimeMillis() // scalastyle:off println(s"duration = ${end - start}") } Result false duration = 7248 false duration = 1449 ``` @cloud-fan @maropu @srowen The test case is simple but can mimic the behavior before and after the API change.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org