[ https://issues.apache.org/jira/browse/SPARK-42822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon updated SPARK-42822: --------------------------------- Summary: Ambiguous reference for case-insensitive grouping column in groupby. applyInPandas (was: Ambiguous reference for case-insensitive grouping column ) > Ambiguous reference for case-insensitive grouping column in groupby. > applyInPandas > ---------------------------------------------------------------------------------- > > Key: SPARK-42822 > URL: https://issues.apache.org/jira/browse/SPARK-42822 > Project: Spark > Issue Type: Task > Components: Connect, PySpark > Affects Versions: 3.4.0, 3.5.0 > Reporter: Xinrong Meng > Priority: Major > > Ambiguous reference for case-insensitive grouping column, as shown below: > {code:sh} > >>> df = spark.createDataFrame([[1, 1]], ["column", "score"]) > >>> def my_pandas_udf(pdf): > ... return pdf.assign(score=0.5) > ... > >>> df.groupby("COLUMN").applyInPandas(my_pandas_udf, schema="column integer, > >>> score float").first() > 23/03/16 10:18:23 ERROR SparkConnectService: Error during: execute > org.apache.spark.sql.AnalysisException: [AMBIGUOUS_REFERENCE] Reference > `column` is ambiguous, could be: [`column`, `column`]. > {code} > Relevant change in PySpark legacy API: > https://github.com/apache/spark/pull/28777/files#diff-32b043dfe6b906fb2b240e8557f98b03a648ba792ce58d11b631744b34bcea71 -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org