[spark] branch branch-3.0 updated: [SPARK-32306][SQL][DOCS][3.0] Clarify the result of `percentile_approx()`

gurwls223 Wed, 23 Sep 2020 04:42:52 -0700

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.0 by this push:
     new 542dc97  [SPARK-32306][SQL][DOCS][3.0] Clarify the result of 
`percentile_approx()`
542dc97 is described below

commit 542dc97525860e67e3ddcd543cecc8654b19715d
Author: Max Gekk <max.g...@gmail.com>
AuthorDate: Wed Sep 23 20:15:52 2020 +0900

    [SPARK-32306][SQL][DOCS][3.0] Clarify the result of `percentile_approx()`
    
    ### What changes were proposed in this pull request?
    More precise description of the result of the `percentile_approx()` 
function and its synonym `approx_percentile()`. The proposed sentence clarifies 
that  the function returns **one of elements** (or array of elements) from the 
input column.
    
    ### Why are the changes needed?
    To improve Spark docs and avoid misunderstanding of the function behavior.
    
    ### Does this PR introduce _any_ user-facing change?
    No
    
    ### How was this patch tested?
    `./dev/scalastyle`
    
    Authored-by: Max Gekk <max.gekkgmail.com>
    Signed-off-by: Liang-Chi Hsieh <viiryagmail.com>
    (cherry picked from commit 7c14f177eb5b52d491f41b217926cc8ca5f0ce4c)
    Signed-off-by: Max Gekk <max.gekkgmail.com>
    
    Closes #29845 from MaxGekk/doc-percentile_approx-3.0.
    
    Authored-by: Max Gekk <max.g...@gmail.com>
    Signed-off-by: HyukjinKwon <gurwls...@apache.org>
---
 .../expressions/aggregate/ApproximatePercentile.scala        | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
index 32f21fc..3327f4c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
@@ -49,11 +49,13 @@ import org.apache.spark.sql.types._
  */
 @ExpressionDescription(
   usage = """
-    _FUNC_(col, percentage [, accuracy]) - Returns the approximate percentile 
value of numeric
-      column `col` at the given percentage. The value of percentage must be 
between 0.0
-      and 1.0. The `accuracy` parameter (default: 10000) is a positive numeric 
literal which
-      controls approximation accuracy at the cost of memory. Higher value of 
`accuracy` yields
-      better accuracy, `1.0/accuracy` is the relative error of the 
approximation.
+    _FUNC_(col, percentage [, accuracy]) - Returns the approximate 
`percentile` of the numeric
+      column `col` which is the smallest value in the ordered `col` values 
(sorted from least to
+      greatest) such that no more than `percentage` of `col` values is less 
than the value
+      or equal to that value. The value of percentage must be between 0.0 and 
1.0. The `accuracy`
+      parameter (default: 10000) is a positive numeric literal which controls 
approximation accuracy
+      at the cost of memory. Higher value of `accuracy` yields better 
accuracy, `1.0/accuracy` is
+      the relative error of the approximation.
       When `percentage` is an array, each value of the percentage array must 
be between 0.0 and 1.0.
       In this case, returns the approximate percentile array of column `col` 
at the given
       percentage array.


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32306][SQL][DOCS][3.0] Clarify the result of `percentile_approx()`

Reply via email to