Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19438#discussion_r143492975 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/QuantileSummariesSuite.scala --- @@ -58,7 +58,7 @@ class QuantileSummariesSuite extends SparkFunSuite { if (data.nonEmpty) { val approx = summary.query(quant).get // The rank of the approximation. - val rank = data.count(_ < approx) // has to be <, not <= to be exact + val rank = data.count(_ <= approx) --- End diff -- I agreed that the rank here is not accurate, especially such case `[1,2,2,2,2,2,2,2,3]`. Use average of `data.count(_ < approx) ` and `data.count(_ <= approx) ` looks more reasonable.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org