[ https://issues.apache.org/jira/browse/SPARK-16464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15385389#comment-15385389 ]
Dongjoon Hyun commented on SPARK-16464: --------------------------------------- Here is the situation regeneration. **1.6.x Branch** {code} > sdfCar <- createDataFrame(sqlContext, mtcars) > sdfCar1 <- withColumn(sdfCar, "isEfficient", sdfCar$mpg<=20) > sdfCar1 <- withColumn(sdfCar1, "isEfficient", ifelse(sdfCar1$mpg == > sdfCar1$mpg,1,0)) 16/07/19 22:39:24 WARN Column: Constructing trivially true equals predicate, 'mpg#0 = mpg#0'. Perhaps you need to use aliases. > str(sdfCar1) 'DataFrame': 13 variables: $ mpg : num 21 21 22.8 21.4 18.7 18.1 $ cyl : num 6 6 4 6 8 6 $ disp : num 160 160 108 258 360 225 $ hp : num 110 110 93 110 175 105 $ drat : num 3.9 3.9 3.85 3.08 3.15 2.76 $ wt : num 2.62 2.875 2.32 3.215 3.44 3.46 $ qsec : num 16.46 17.02 18.61 19.44 17.02 20.22 $ vs : num 0 0 1 1 0 1 $ am : num 1 1 1 0 0 0 $ gear : num 4 4 4 3 3 3 $ carb : num 4 4 1 1 2 1 $ isEfficient: logi FALSE FALSE FALSE FALSE TRUE TRUE $ isEfficient: num 1 1 1 1 1 1 {code} **Master** {code} > sdfCar <- createDataFrame(mtcars) > sdfCar1 <- withColumn(sdfCar, "isEfficient", sdfCar$mpg<=20) > sdfCar1 <- withColumn(sdfCar1, "isEfficient", ifelse(sdfCar1$mpg == > sdfCar1$mpg,1,0)) 16/07/19 22:41:45 WARN Column: Constructing trivially true equals predicate, 'mpg#11 = mpg#11'. Perhaps you need to use aliases. > str(sdfCar1) 'SparkDataFrame': 12 variables: $ mpg : num 21 21 22.8 21.4 18.7 18.1 $ cyl : num 6 6 4 6 8 6 $ disp : num 160 160 108 258 360 225 $ hp : num 110 110 93 110 175 105 $ drat : num 3.9 3.9 3.85 3.08 3.15 2.76 $ wt : num 2.62 2.875 2.32 3.215 3.44 3.46 $ qsec : num 16.46 17.02 18.61 19.44 17.02 20.22 $ vs : num 0 0 1 1 0 1 $ am : num 1 1 1 0 0 0 $ gear : num 4 4 4 3 3 3 $ carb : num 4 4 1 1 2 1 $ isEfficient: num 1 1 1 1 1 1 {code} It seems to be fixed currently. Although it shows the same error message, the current master do not add that. > withColumn() allows illegal creation of duplicate column names on DataFrame > --------------------------------------------------------------------------- > > Key: SPARK-16464 > URL: https://issues.apache.org/jira/browse/SPARK-16464 > Project: Spark > Issue Type: Bug > Components: SparkR, SQL > Affects Versions: 1.6.1 > Environment: Databricks.com > Reporter: Neil Dewar > Priority: Minor > > If I take an existing DataFrame, I am permitted to use withColumn() to create > a duplicate column name. I assume this should be illegal, and withColumn > should be prevented from permitting this. Some functions subsequently fail > due to the duplicate column names. Example: > sdfCar <- createDataFrame(sqlContext, mtcars) > sdfCar1 <- withColumn(sdfCar, "isEfficient", sdfCar$mpg<=20) > sdfCar1 <- withColumn(sdfCar1, "isEfficient", ifelse(sdfCar1$mpg == > sdfCar1$mpg,1,0)) > sdfCar2 <- subset(sdfCar1, select=sdfCar1$isEfficient) > # subset() command fails with message: "Reference 'isEfficient' is ambiguous" > Note: I only know if this is SparkR - it might affect other languages APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org