Frederik Paradis created SPARK-43513:
----------------------------------------

             Summary: withColumnRenamed duplicate columns if new column already 
exist
                 Key: SPARK-43513
                 URL: https://issues.apache.org/jira/browse/SPARK-43513
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 3.4.0
            Reporter: Frederik Paradis


withColumnRenamed should either replace the column when new column already 
exists or should specify the specificity in the documentation. See the code 
below as an example of the current state.

{code:python}
from pyspark.sql import SparkSession

spark = 
SparkSession.builder.master("local[1]").appName("local-spark-session").getOrCreate()

df = spark.createDataFrame([(1, 0.5, 0.4), (2, 0.5, 0.8)], ["id", "score", 
"test_score"])
r = df.withColumnRenamed("test_score", "score")
print(r)  # DataFrame[id: bigint, score: double, score: double]

# pyspark.sql.utils.AnalysisException: Reference 'score' is ambiguous, could 
be: score, score.
print(r.select("score"))
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to