Frederik Paradis created SPARK-43513: ----------------------------------------
Summary: withColumnRenamed duplicate columns if new column already exist Key: SPARK-43513 URL: https://issues.apache.org/jira/browse/SPARK-43513 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.4.0 Reporter: Frederik Paradis withColumnRenamed should either replace the column when new column already exists or should specify the specificity in the documentation. See the code below as an example of the current state. {code:python} from pyspark.sql import SparkSession spark = SparkSession.builder.master("local[1]").appName("local-spark-session").getOrCreate() df = spark.createDataFrame([(1, 0.5, 0.4), (2, 0.5, 0.8)], ["id", "score", "test_score"]) r = df.withColumnRenamed("test_score", "score") print(r) # DataFrame[id: bigint, score: double, score: double] # pyspark.sql.utils.AnalysisException: Reference 'score' is ambiguous, could be: score, score. print(r.select("score")) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org