Github user viirya commented on the issue: https://github.com/apache/spark/pull/20806 @cloud-fan @WeichenXu123 Ok. I've setup a Spark cluster with 5 nodes for the benchmark. The used data: ``` val r = new Random val ds = (0 to 10000).map { _ => val a = Array.fill(10000)(if (r.nextDouble() > 0.5) 1.0 else 0.0 ) Tuple1(Vectors.dense(a)) }.toDS ``` Two versions of `treeAggregate` perform very close. Thus, directly using `RDD.treeAggregate` can be much simpler.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org