[ 
https://issues.apache.org/jira/browse/SPARK-10346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-10346:
---------------------------------
    Description: 
Spark doesn't seem to replace existing column with the name in mutate (ie. 
mutate(df, age = df$age + 2) - returned DataFrame has 2 columns with the same 
name 'age'), so therefore not doing that for now in transform.

Though it is clearly stated it should replace column with matching name:

https://stat.ethz.ch/R-manual/R-devel/library/base/html/transform.html

"The tags are matched against names(_data), and for those that match, the value 
replace the corresponding variable in _data, and the others are appended to 
_data."

Also the resulting DataFrame might be hard to work with if one is to use select 
with column names, or to register the table to SQL, and so on, since then 2 
columns have the same name.


  was:
Spark doesn't seem to replace existing column with the name in mutate (ie. 
mutate(df, age = df$age + 2) - returned DataFrame has 2 columns with the same 
name 'age'), so therefore not doing that for now in transform.

Though it is clearly stated it should replace column with matching name:

https://stat.ethz.ch/R-manual/R-devel/library/base/html/transform.html

"The tags are matched against names(_data), and for those that match, the value 
replace the corresponding variable in _data, and the others are appended to 
_data."

Also the resulting DataFrame might be hard to work with if one is to use select 
with column names and so on.



> SparkR mutate and transform should replace column with same name to match R 
> data.frame behavior
> -----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-10346
>                 URL: https://issues.apache.org/jira/browse/SPARK-10346
>             Project: Spark
>          Issue Type: Bug
>          Components: R
>    Affects Versions: 1.5.0
>            Reporter: Felix Cheung
>
> Spark doesn't seem to replace existing column with the name in mutate (ie. 
> mutate(df, age = df$age + 2) - returned DataFrame has 2 columns with the same 
> name 'age'), so therefore not doing that for now in transform.
> Though it is clearly stated it should replace column with matching name:
> https://stat.ethz.ch/R-manual/R-devel/library/base/html/transform.html
> "The tags are matched against names(_data), and for those that match, the 
> value replace the corresponding variable in _data, and the others are 
> appended to _data."
> Also the resulting DataFrame might be hard to work with if one is to use 
> select with column names, or to register the table to SQL, and so on, since 
> then 2 columns have the same name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to