[ https://issues.apache.org/jira/browse/SPARK-43513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Frederik Paradis updated SPARK-43513: ------------------------------------- Summary: withColumnRenamed duplicates columns if new column already exist (was: withColumnRenamed duplicate columns if new column already exist) > withColumnRenamed duplicates columns if new column already exist > ---------------------------------------------------------------- > > Key: SPARK-43513 > URL: https://issues.apache.org/jira/browse/SPARK-43513 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.4.0 > Reporter: Frederik Paradis > Priority: Major > > withColumnRenamed should either replace the column when new column already > exists or should specify the specificity in the documentation. See the code > below as an example of the current state. > {code:python} > from pyspark.sql import SparkSession > spark = > SparkSession.builder.master("local[1]").appName("local-spark-session").getOrCreate() > df = spark.createDataFrame([(1, 0.5, 0.4), (2, 0.5, 0.8)], ["id", "score", > "test_score"]) > r = df.withColumnRenamed("test_score", "score") > print(r) # DataFrame[id: bigint, score: double, score: double] > # pyspark.sql.utils.AnalysisException: Reference 'score' is ambiguous, could > be: score, score. > print(r.select("score")) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org