[ 
https://issues.apache.org/jira/browse/SPARK-44871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Toth updated SPARK-44871:
-------------------------------
    Description: 
Currently {{percentile_disc()}} returns incorrect results in some cases:

E.g.:
{code:java}
SELECT
  percentile_disc(0.0) WITHIN GROUP (ORDER BY a) as p0,
  percentile_disc(0.1) WITHIN GROUP (ORDER BY a) as p1,
  percentile_disc(0.2) WITHIN GROUP (ORDER BY a) as p2,
  percentile_disc(0.3) WITHIN GROUP (ORDER BY a) as p3,
  percentile_disc(0.4) WITHIN GROUP (ORDER BY a) as p4,
  percentile_disc(0.5) WITHIN GROUP (ORDER BY a) as p5,
  percentile_disc(0.6) WITHIN GROUP (ORDER BY a) as p6,
  percentile_disc(0.7) WITHIN GROUP (ORDER BY a) as p7,
  percentile_disc(0.8) WITHIN GROUP (ORDER BY a) as p8,
  percentile_disc(0.9) WITHIN GROUP (ORDER BY a) as p9,
  percentile_disc(1.0) WITHIN GROUP (ORDER BY a) as p10
FROM VALUES (0), (1), (2), (3), (4) AS v(a)
{code}
returns:
{code:java}
+---+---+---+---+---+---+---+---+---+---+---+
| p0| p1| p2| p3| p4| p5| p6| p7| p8| p9|p10|
+---+---+---+---+---+---+---+---+---+---+---+
|0.0|0.0|0.0|1.0|1.0|2.0|2.0|2.0|3.0|3.0|4.0|
+---+---+---+---+---+---+---+---+---+---+---+
{code}
but it should return:
{noformat}
+---+---+---+---+---+---+---+---+---+---+---+
| p0| p1| p2| p3| p4| p5| p6| p7| p8| p9|p10|
+---+---+---+---+---+---+---+---+---+---+---+
|0.0|0.0|0.0|1.0|1.0|2.0|2.0|3.0|3.0|4.0|4.0|
+---+---+---+---+---+---+---+---+---+---+---+
{noformat}

> Fix PERCENTILE_DISC behaviour
> -----------------------------
>
>                 Key: SPARK-44871
>                 URL: https://issues.apache.org/jira/browse/SPARK-44871
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.3.0, 3.4.0, 3.5.0, 4.0.0
>            Reporter: Peter Toth
>            Priority: Critical
>
> Currently {{percentile_disc()}} returns incorrect results in some cases:
> E.g.:
> {code:java}
> SELECT
>   percentile_disc(0.0) WITHIN GROUP (ORDER BY a) as p0,
>   percentile_disc(0.1) WITHIN GROUP (ORDER BY a) as p1,
>   percentile_disc(0.2) WITHIN GROUP (ORDER BY a) as p2,
>   percentile_disc(0.3) WITHIN GROUP (ORDER BY a) as p3,
>   percentile_disc(0.4) WITHIN GROUP (ORDER BY a) as p4,
>   percentile_disc(0.5) WITHIN GROUP (ORDER BY a) as p5,
>   percentile_disc(0.6) WITHIN GROUP (ORDER BY a) as p6,
>   percentile_disc(0.7) WITHIN GROUP (ORDER BY a) as p7,
>   percentile_disc(0.8) WITHIN GROUP (ORDER BY a) as p8,
>   percentile_disc(0.9) WITHIN GROUP (ORDER BY a) as p9,
>   percentile_disc(1.0) WITHIN GROUP (ORDER BY a) as p10
> FROM VALUES (0), (1), (2), (3), (4) AS v(a)
> {code}
> returns:
> {code:java}
> +---+---+---+---+---+---+---+---+---+---+---+
> | p0| p1| p2| p3| p4| p5| p6| p7| p8| p9|p10|
> +---+---+---+---+---+---+---+---+---+---+---+
> |0.0|0.0|0.0|1.0|1.0|2.0|2.0|2.0|3.0|3.0|4.0|
> +---+---+---+---+---+---+---+---+---+---+---+
> {code}
> but it should return:
> {noformat}
> +---+---+---+---+---+---+---+---+---+---+---+
> | p0| p1| p2| p3| p4| p5| p6| p7| p8| p9|p10|
> +---+---+---+---+---+---+---+---+---+---+---+
> |0.0|0.0|0.0|1.0|1.0|2.0|2.0|3.0|3.0|4.0|4.0|
> +---+---+---+---+---+---+---+---+---+---+---+
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to