[ https://issues.apache.org/jira/browse/SPARK-35480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-35480: ------------------------------------ Assignee: Apache Spark > percentile_approx function doesn't work with pivot > -------------------------------------------------- > > Key: SPARK-35480 > URL: https://issues.apache.org/jira/browse/SPARK-35480 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL > Affects Versions: 3.1.1 > Reporter: Christopher Bryant > Assignee: Apache Spark > Priority: Major > > The percentile_approx PySpark function does not appear to treat the > "accuracy" parameter correctly when pivoting on a column, causing the query > below to fail (this also fails if the accuracy parameter is left unspecified): > ---- > {{import pyspark.sql.functions as F}} > {{df = sc.parallelize([}} > {{ ["a", -1.0],}} > {{ ["a", 5.5],}} > {{ ["a", 2.5],}} > {{ ["b", 3.0],}} > {{ ["b", 5.2]}} > {{]).toDF(["type", "value"])}} > {{ .groupBy()}} > {{ .pivot("type", ["a", "b"])}} > {{ .agg(F.percentile_approx("value", [0.5], 10000).alias("percentiles"))}} > ---- > Error message: > {{AnalysisException: cannot resolve 'percentile_approx((IF((`type` <=> > CAST('a' AS STRING)), `value`, CAST(NULL AS DOUBLE))), (IF((`type` <=> > CAST('a' AS STRING)), array(0.5D), NULL)), (IF((`type` <=> CAST('a' AS > STRING)), 10000, CAST(NULL AS INT))))' due to data type mismatch: The > accuracy or percentage provided must be a constant literal; 'Aggregate > [percentile_approx(if ((type#242 <=> cast(a as string))) value#243 else > cast(null as double), if ((type#242 <=> cast(a as string))) array(0.5) else > cast(null as array<double>), if ((type#242 <=> cast(a as string))) 10000 else > cast(null as int), 0, 0) AS a#251, percentile_approx(if ((type#242 <=> cast(b > as string))) value#243 else cast(null as double), if ((type#242 <=> cast(b as > string))) array(0.5) else cast(null as array<double>), if ((type#242 <=> > cast(b as string))) 10000 else cast(null as int), 0, 0) AS b#253|#242 <=> > cast(a as string))) value#243 else cast(null as double), if ((type#242 <=> > cast(a as string))) array(0.5) else cast(null as array<double>), if > ((type#242 <=> cast(a as string))) 10000 else cast(null as int), 0, 0) AS > a#251, percentile_approx(if ((type#242 <=> cast(b as string))) value#243 else > cast(null as double), if ((type#242 <=> cast(b as string))) array(0.5) else > cast(null as array<double>), if ((type#242 <=> cast(b as string))) 10000 else > cast(null as int), 0, 0) AS b#253] +- LogicalRDD [type#242, value#243|#242, > value#243], false}} > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org