This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.0 by this push: new 542dc97 [SPARK-32306][SQL][DOCS][3.0] Clarify the result of `percentile_approx()` 542dc97 is described below commit 542dc97525860e67e3ddcd543cecc8654b19715d Author: Max Gekk <max.g...@gmail.com> AuthorDate: Wed Sep 23 20:15:52 2020 +0900 [SPARK-32306][SQL][DOCS][3.0] Clarify the result of `percentile_approx()` ### What changes were proposed in this pull request? More precise description of the result of the `percentile_approx()` function and its synonym `approx_percentile()`. The proposed sentence clarifies that the function returns **one of elements** (or array of elements) from the input column. ### Why are the changes needed? To improve Spark docs and avoid misunderstanding of the function behavior. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? `./dev/scalastyle` Authored-by: Max Gekk <max.gekkgmail.com> Signed-off-by: Liang-Chi Hsieh <viiryagmail.com> (cherry picked from commit 7c14f177eb5b52d491f41b217926cc8ca5f0ce4c) Signed-off-by: Max Gekk <max.gekkgmail.com> Closes #29845 from MaxGekk/doc-percentile_approx-3.0. Authored-by: Max Gekk <max.g...@gmail.com> Signed-off-by: HyukjinKwon <gurwls...@apache.org> --- .../expressions/aggregate/ApproximatePercentile.scala | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala index 32f21fc..3327f4c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala @@ -49,11 +49,13 @@ import org.apache.spark.sql.types._ */ @ExpressionDescription( usage = """ - _FUNC_(col, percentage [, accuracy]) - Returns the approximate percentile value of numeric - column `col` at the given percentage. The value of percentage must be between 0.0 - and 1.0. The `accuracy` parameter (default: 10000) is a positive numeric literal which - controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields - better accuracy, `1.0/accuracy` is the relative error of the approximation. + _FUNC_(col, percentage [, accuracy]) - Returns the approximate `percentile` of the numeric + column `col` which is the smallest value in the ordered `col` values (sorted from least to + greatest) such that no more than `percentage` of `col` values is less than the value + or equal to that value. The value of percentage must be between 0.0 and 1.0. The `accuracy` + parameter (default: 10000) is a positive numeric literal which controls approximation accuracy + at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is + the relative error of the approximation. When `percentage` is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of column `col` at the given percentage array. --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org