Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/19406 @srowen Before the change, the answer in the test case is `2, 2, 2`. Based on the code before the change, percentile_approx would never return the first element when percentile is in (relativeError, 1/N], where relativeError default is 1/10000, and N is the total number of elements. But ideally, percentiles in [0, 1/N] should all return the first element as the answer. > why this method has to use relativeError `QuantileSummaries` is a sampled data structure, `relativeError` is used to compute `targeError`, which decides the error bound of the target rank (https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala#L204).
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org