[ 
https://issues.apache.org/jira/browse/SPARK-16464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15385389#comment-15385389
 ] 

Dongjoon Hyun commented on SPARK-16464:
---------------------------------------

Here is the situation regeneration.

**1.6.x Branch**
{code}
> sdfCar <- createDataFrame(sqlContext, mtcars)
> sdfCar1 <- withColumn(sdfCar, "isEfficient", sdfCar$mpg<=20)
> sdfCar1 <- withColumn(sdfCar1, "isEfficient", ifelse(sdfCar1$mpg == 
> sdfCar1$mpg,1,0))
16/07/19 22:39:24 WARN Column: Constructing trivially true equals predicate, 
'mpg#0 = mpg#0'. Perhaps you need to use aliases.
> str(sdfCar1)
'DataFrame': 13 variables:
 $ mpg        : num 21 21 22.8 21.4 18.7 18.1
 $ cyl        : num 6 6 4 6 8 6
 $ disp       : num 160 160 108 258 360 225
 $ hp         : num 110 110 93 110 175 105
 $ drat       : num 3.9 3.9 3.85 3.08 3.15 2.76
 $ wt         : num 2.62 2.875 2.32 3.215 3.44 3.46
 $ qsec       : num 16.46 17.02 18.61 19.44 17.02 20.22
 $ vs         : num 0 0 1 1 0 1
 $ am         : num 1 1 1 0 0 0
 $ gear       : num 4 4 4 3 3 3
 $ carb       : num 4 4 1 1 2 1
 $ isEfficient: logi FALSE FALSE FALSE FALSE TRUE TRUE
 $ isEfficient: num 1 1 1 1 1 1
{code}

**Master**
{code}
> sdfCar <- createDataFrame(mtcars)
> sdfCar1 <- withColumn(sdfCar, "isEfficient", sdfCar$mpg<=20)
> sdfCar1 <- withColumn(sdfCar1, "isEfficient", ifelse(sdfCar1$mpg == 
> sdfCar1$mpg,1,0))
16/07/19 22:41:45 WARN Column: Constructing trivially true equals predicate, 
'mpg#11 = mpg#11'. Perhaps you need to use aliases.
> str(sdfCar1)
'SparkDataFrame': 12 variables:
 $ mpg        : num 21 21 22.8 21.4 18.7 18.1
 $ cyl        : num 6 6 4 6 8 6
 $ disp       : num 160 160 108 258 360 225
 $ hp         : num 110 110 93 110 175 105
 $ drat       : num 3.9 3.9 3.85 3.08 3.15 2.76
 $ wt         : num 2.62 2.875 2.32 3.215 3.44 3.46
 $ qsec       : num 16.46 17.02 18.61 19.44 17.02 20.22
 $ vs         : num 0 0 1 1 0 1
 $ am         : num 1 1 1 0 0 0
 $ gear       : num 4 4 4 3 3 3
 $ carb       : num 4 4 1 1 2 1
 $ isEfficient: num 1 1 1 1 1 1
{code}

It seems to be fixed currently. Although it shows the same error message, the 
current master do not add that.

> withColumn() allows illegal creation of duplicate column names on DataFrame
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-16464
>                 URL: https://issues.apache.org/jira/browse/SPARK-16464
>             Project: Spark
>          Issue Type: Bug
>          Components: SparkR, SQL
>    Affects Versions: 1.6.1
>         Environment: Databricks.com
>            Reporter: Neil Dewar
>            Priority: Minor
>
> If I take an existing DataFrame, I am permitted to use withColumn() to create 
> a duplicate column name.  I assume this should be illegal, and withColumn 
> should be prevented from permitting this.  Some functions subsequently fail 
> due to the duplicate column names.  Example:
> sdfCar <- createDataFrame(sqlContext, mtcars)
> sdfCar1 <- withColumn(sdfCar, "isEfficient", sdfCar$mpg<=20)
> sdfCar1 <- withColumn(sdfCar1, "isEfficient", ifelse(sdfCar1$mpg == 
> sdfCar1$mpg,1,0))
> sdfCar2 <- subset(sdfCar1, select=sdfCar1$isEfficient)
> # subset() command fails with message: "Reference 'isEfficient' is ambiguous"
> Note: I only know if this is SparkR - it might affect other languages APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to