[ 
https://issues.apache.org/jira/browse/SPARK-35480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35480:
------------------------------------

    Assignee: Apache Spark

> percentile_approx function doesn't work with pivot
> --------------------------------------------------
>
>                 Key: SPARK-35480
>                 URL: https://issues.apache.org/jira/browse/SPARK-35480
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>    Affects Versions: 3.1.1
>            Reporter: Christopher Bryant
>            Assignee: Apache Spark
>            Priority: Major
>
> The percentile_approx PySpark function does not appear to treat the 
> "accuracy" parameter correctly when pivoting on a column, causing the query 
> below to fail (this also fails if the accuracy parameter is left unspecified):
> ----
> {{import pyspark.sql.functions as F}}
> {{df = sc.parallelize([}}
>  {{    ["a", -1.0],}}
>  {{    ["a", 5.5],}}
>  {{    ["a", 2.5],}}
>  {{    ["b", 3.0],}}
>  {{    ["b", 5.2]}}
>  {{]).toDF(["type", "value"])}}
>  {{    .groupBy()}}
>  {{    .pivot("type", ["a", "b"])}}
>  {{    .agg(F.percentile_approx("value", [0.5], 10000).alias("percentiles"))}}
> ----
> Error message: 
> {{AnalysisException: cannot resolve 'percentile_approx((IF((`type` <=> 
> CAST('a' AS STRING)), `value`, CAST(NULL AS DOUBLE))), (IF((`type` <=> 
> CAST('a' AS STRING)), array(0.5D), NULL)), (IF((`type` <=> CAST('a' AS 
> STRING)), 10000, CAST(NULL AS INT))))' due to data type mismatch: The 
> accuracy or percentage provided must be a constant literal; 'Aggregate 
> [percentile_approx(if ((type#242 <=> cast(a as string))) value#243 else 
> cast(null as double), if ((type#242 <=> cast(a as string))) array(0.5) else 
> cast(null as array<double>), if ((type#242 <=> cast(a as string))) 10000 else 
> cast(null as int), 0, 0) AS a#251, percentile_approx(if ((type#242 <=> cast(b 
> as string))) value#243 else cast(null as double), if ((type#242 <=> cast(b as 
> string))) array(0.5) else cast(null as array<double>), if ((type#242 <=> 
> cast(b as string))) 10000 else cast(null as int), 0, 0) AS b#253|#242 <=> 
> cast(a as string))) value#243 else cast(null as double), if ((type#242 <=> 
> cast(a as string))) array(0.5) else cast(null as array<double>), if 
> ((type#242 <=> cast(a as string))) 10000 else cast(null as int), 0, 0) AS 
> a#251, percentile_approx(if ((type#242 <=> cast(b as string))) value#243 else 
> cast(null as double), if ((type#242 <=> cast(b as string))) array(0.5) else 
> cast(null as array<double>), if ((type#242 <=> cast(b as string))) 10000 else 
> cast(null as int), 0, 0) AS b#253] +- LogicalRDD [type#242, value#243|#242, 
> value#243], false}}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to