[ 
https://issues.apache.org/jira/browse/SPARK-43513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frederik Paradis updated SPARK-43513:
-------------------------------------
    Summary: withColumnRenamed duplicates columns if new column already exist  
(was: withColumnRenamed duplicate columns if new column already exist)

> withColumnRenamed duplicates columns if new column already exist
> ----------------------------------------------------------------
>
>                 Key: SPARK-43513
>                 URL: https://issues.apache.org/jira/browse/SPARK-43513
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.4.0
>            Reporter: Frederik Paradis
>            Priority: Major
>
> withColumnRenamed should either replace the column when new column already 
> exists or should specify the specificity in the documentation. See the code 
> below as an example of the current state.
> {code:python}
> from pyspark.sql import SparkSession
> spark = 
> SparkSession.builder.master("local[1]").appName("local-spark-session").getOrCreate()
> df = spark.createDataFrame([(1, 0.5, 0.4), (2, 0.5, 0.8)], ["id", "score", 
> "test_score"])
> r = df.withColumnRenamed("test_score", "score")
> print(r)  # DataFrame[id: bigint, score: double, score: double]
> # pyspark.sql.utils.AnalysisException: Reference 'score' is ambiguous, could 
> be: score, score.
> print(r.select("score"))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to