[ https://issues.apache.org/jira/browse/SPARK-16464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15385255#comment-15385255 ]
Liwei Lin commented on SPARK-16464: ----------------------------------- In scala, {{withColumn}}'s behavior is "adding a column or replacing the existing column that has the same name" (please refer to {Dataset.withColumn|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L1708}): {code} // results are the same for Spark 1.6.1 and current master // some setups here val ds0 = sqlContext.range(1, 4) ds0.show() /* prints +---+ | id| +---+ | 1| | 2| | 3| +---+ */ val ds1 = ds0.withColumn("newId", $"id") ds1.show() /* prints +---+-----+ | id|newId| +---+-----+ | 1| 1| | 2| 2| | 3| 3| +---+-----+ */ val ds2 = ds1.withColumn("newId", $"id" * 2) ds2.show() /* prints +---+-----+ | id|newId| +---+-----+ | 1| 2| | 2| 4| | 3| 6| +---+-----+ */ {code} > withColumn() allows illegal creation of duplicate column names on DataFrame > --------------------------------------------------------------------------- > > Key: SPARK-16464 > URL: https://issues.apache.org/jira/browse/SPARK-16464 > Project: Spark > Issue Type: Bug > Components: SparkR, SQL > Affects Versions: 1.6.1 > Environment: Databricks.com > Reporter: Neil Dewar > Priority: Minor > > If I take an existing DataFrame, I am permitted to use withColumn() to create > a duplicate column name. I assume this should be illegal, and withColumn > should be prevented from permitting this. Some functions subsequently fail > due to the duplicate column names. Example: > sdfCar <- createDataFrame(sqlContext, mtcars) > sdfCar1 <- withColumn(sdfCar, "isEfficient", sdfCar$mpg<=20) > sdfCar1 <- withColumn(sdfCar1, "isEfficient", ifelse(sdfCar1$mpg == > sdfCar1$mpg,1,0)) > sdfCar2 <- subset(sdfCar1, select=sdfCar1$isEfficient) > # subset() command fails with message: "Reference 'isEfficient' is ambiguous" > Note: I only know if this is SparkR - it might affect other languages APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org