[jira] [Commented] (SPARK-18823) Assignation by column name variable not available or bug?
[ https://issues.apache.org/jira/browse/SPARK-18823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832324#comment-15832324 ] Apache Spark commented on SPARK-18823: -- User 'felixcheung' has created a pull request for this issue: https://github.com/apache/spark/pull/16663 > Assignation by column name variable not available or bug? > - > > Key: SPARK-18823 > URL: https://issues.apache.org/jira/browse/SPARK-18823 > Project: Spark > Issue Type: Question > Components: SparkR >Affects Versions: 2.0.2 > Environment: RStudio Server in EC2 Instances (EMR Service of AWS) Emr > 4. Or databricks (community.cloud.databricks.com) . >Reporter: Vicente Masip > Original Estimate: 24h > Remaining Estimate: 24h > > I really don't know if this is a bug or can be done with some function: > Sometimes is very important to assign something to a column which name has to > be access trough a variable. Normally, I have always used it with doble > brackets likes this out of SparkR problems: > # df could be faithful normal data frame or data table. > # accesing by variable name: > myname = "waiting" > df[[myname]] <- c(1:nrow(df)) > # or even column number > df[[2]] <- df$eruptions > The error is not caused by the right side of the "<-" operator of assignment. > The problem is that I can't assign to a column name using a variable or > column number as I do in this examples out of spark. Doesn't matter if I am > modifying or creating column. Same problem. > I have also tried to use this with no results: > val df2 = withColumn(df,"tmp", df$eruptions) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18823) Assignation by column name variable not available or bug?
[ https://issues.apache.org/jira/browse/SPARK-18823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820389#comment-15820389 ] Felix Cheung commented on SPARK-18823: -- Yap. I'll start on this shortly. > Assignation by column name variable not available or bug? > - > > Key: SPARK-18823 > URL: https://issues.apache.org/jira/browse/SPARK-18823 > Project: Spark > Issue Type: Question > Components: SparkR >Affects Versions: 2.0.2 > Environment: RStudio Server in EC2 Instances (EMR Service of AWS) Emr > 4. Or databricks (community.cloud.databricks.com) . >Reporter: Vicente Masip > Original Estimate: 24h > Remaining Estimate: 24h > > I really don't know if this is a bug or can be done with some function: > Sometimes is very important to assign something to a column which name has to > be access trough a variable. Normally, I have always used it with doble > brackets likes this out of SparkR problems: > # df could be faithful normal data frame or data table. > # accesing by variable name: > myname = "waiting" > df[[myname]] <- c(1:nrow(df)) > # or even column number > df[[2]] <- df$eruptions > The error is not caused by the right side of the "<-" operator of assignment. > The problem is that I can't assign to a column name using a variable or > column number as I do in this examples out of spark. Doesn't matter if I am > modifying or creating column. Same problem. > I have also tried to use this with no results: > val df2 = withColumn(df,"tmp", df$eruptions) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18823) Assignation by column name variable not available or bug?
[ https://issues.apache.org/jira/browse/SPARK-18823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819446#comment-15819446 ] Shivaram Venkataraman commented on SPARK-18823: --- Yeah I think it makes sense to not handle the case where we take a local vector. However adding support for `[` and `[[` to support literals and existing columns would be good. This is the only item remaining from what is summarized as #1 above I think ? > Assignation by column name variable not available or bug? > - > > Key: SPARK-18823 > URL: https://issues.apache.org/jira/browse/SPARK-18823 > Project: Spark > Issue Type: Question > Components: SparkR >Affects Versions: 2.0.2 > Environment: RStudio Server in EC2 Instances (EMR Service of AWS) Emr > 4. Or databricks (community.cloud.databricks.com) . >Reporter: Vicente Masip > Original Estimate: 24h > Remaining Estimate: 24h > > I really don't know if this is a bug or can be done with some function: > Sometimes is very important to assign something to a column which name has to > be access trough a variable. Normally, I have always used it with doble > brackets likes this out of SparkR problems: > # df could be faithful normal data frame or data table. > # accesing by variable name: > myname = "waiting" > df[[myname]] <- c(1:nrow(df)) > # or even column number > df[[2]] <- df$eruptions > The error is not caused by the right side of the "<-" operator of assignment. > The problem is that I can't assign to a column name using a variable or > column number as I do in this examples out of spark. Doesn't matter if I am > modifying or creating column. Same problem. > I have also tried to use this with no results: > val df2 = withColumn(df,"tmp", df$eruptions) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18823) Assignation by column name variable not available or bug?
[ https://issues.apache.org/jira/browse/SPARK-18823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15810634#comment-15810634 ] Felix Cheung commented on SPARK-18823: -- I think to Shivaram, this is a bit tricky since we are making assumption that the column data can fit in memory of a single node (where the R client is running). Even then, we would need to handle a potentially large amount of data to serialze and distribute and so on. > Assignation by column name variable not available or bug? > - > > Key: SPARK-18823 > URL: https://issues.apache.org/jira/browse/SPARK-18823 > Project: Spark > Issue Type: Question > Components: SparkR >Affects Versions: 2.0.2 > Environment: RStudio Server in EC2 Instances (EMR Service of AWS) Emr > 4. Or databricks (community.cloud.databricks.com) . >Reporter: Vicente Masip > Original Estimate: 24h > Remaining Estimate: 24h > > I really don't know if this is a bug or can be done with some function: > Sometimes is very important to assign something to a column which name has to > be access trough a variable. Normally, I have always used it with doble > brackets likes this out of SparkR problems: > # df could be faithful normal data frame or data table. > # accesing by variable name: > myname = "waiting" > df[[myname]] <- c(1:nrow(df)) > # or even column number > df[[2]] <- df$eruptions > The error is not caused by the right side of the "<-" operator of assignment. > The problem is that I can't assign to a column name using a variable or > column number as I do in this examples out of spark. Doesn't matter if I am > modifying or creating column. Same problem. > I have also tried to use this with no results: > val df2 = withColumn(df,"tmp", df$eruptions) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18823) Assignation by column name variable not available or bug?
[ https://issues.apache.org/jira/browse/SPARK-18823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752456#comment-15752456 ] Joseph K. Bradley commented on SPARK-18823: --- Note: Please don't set the Target Version or Fix Version. Committers can use those fields for tracking releases. Thanks! > Assignation by column name variable not available or bug? > - > > Key: SPARK-18823 > URL: https://issues.apache.org/jira/browse/SPARK-18823 > Project: Spark > Issue Type: Question > Components: SparkR >Affects Versions: 2.0.2 > Environment: RStudio Server in EC2 Instances (EMR Service of AWS) Emr > 4. Or databricks (community.cloud.databricks.com) . >Reporter: Vicente Masip > Original Estimate: 24h > Remaining Estimate: 24h > > I really don't know if this is a bug or can be done with some function: > Sometimes is very important to assign something to a column which name has to > be access trough a variable. Normally, I have always used it with doble > brackets likes this out of SparkR problems: > # df could be faithful normal data frame or data table. > # accesing by variable name: > myname = "waiting" > df[[myname]] <- c(1:nrow(df)) > # or even column number > df[[2]] <- df$eruptions > The error is not caused by the right side of the "<-" operator of assignment. > The problem is that I can't assign to a column name using a variable or > column number as I do in this examples out of spark. Doesn't matter if I am > modifying or creating column. Same problem. > I have also tried to use this with no results: > val df2 = withColumn(df,"tmp", df$eruptions) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18823) Assignation by column name variable not available or bug?
[ https://issues.apache.org/jira/browse/SPARK-18823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15747863#comment-15747863 ] Vicente Masip commented on SPARK-18823: --- At this issue,there is something missing too, that is assign a vector to a column (same size obviously) > Assignation by column name variable not available or bug? > - > > Key: SPARK-18823 > URL: https://issues.apache.org/jira/browse/SPARK-18823 > Project: Spark > Issue Type: Question > Components: SparkR >Affects Versions: 2.0.2 > Environment: RStudio Server in EC2 Instances (EMR Service of AWS) Emr > 4. Or databricks (community.cloud.databricks.com) . >Reporter: Vicente Masip > Fix For: 2.0.2 > > Original Estimate: 24h > Remaining Estimate: 24h > > I really don't know if this is a bug or can be done with some function: > Sometimes is very important to assign something to a column which name has to > be access trough a variable. Normally, I have always used it with doble > brackets likes this out of SparkR problems: > # df could be faithful normal data frame or data table. > # accesing by variable name: > myname = "waiting" > df[[myname]] <- c(1:nrow(df)) > # or even column number > df[[2]] <- df$eruptions > The error is not caused by the right side of the "<-" operator of assignment. > The problem is that I can't assign to a column name using a variable or > column number as I do in this examples out of spark. Doesn't matter if I am > modifying or creating column. Same problem. > I have also tried to use this with no results: > val df2 = withColumn(df,"tmp", df$eruptions) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18823) Assignation by column name variable not available or bug?
[ https://issues.apache.org/jira/browse/SPARK-18823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15747433#comment-15747433 ] Felix Cheung commented on SPARK-18823: -- We will address both of your suggestions. As for x$y <- t$q, assuming you mean x and t being 2 different Spark DataFrame, this would depend on having the ability to collect a specific column, and ideally, without transitioning JVM->R->JVM. > Assignation by column name variable not available or bug? > - > > Key: SPARK-18823 > URL: https://issues.apache.org/jira/browse/SPARK-18823 > Project: Spark > Issue Type: Question > Components: SparkR >Affects Versions: 2.0.2 > Environment: RStudio Server in EC2 Instances (EMR Service of AWS) Emr > 4. Or databricks (community.cloud.databricks.com) . >Reporter: Vicente Masip > Fix For: 2.0.2 > > Original Estimate: 24h > Remaining Estimate: 24h > > I really don't know if this is a bug or can be done with some function: > Sometimes is very important to assign something to a column which name has to > be access trough a variable. Normally, I have always used it with doble > brackets likes this out of SparkR problems: > # df could be faithful normal data frame or data table. > # accesing by variable name: > myname = "waiting" > df[[myname]] <- c(1:nrow(df)) > # or even column number > df[[2]] <- df$eruptions > The error is not caused by the right side of the "<-" operator of assignment. > The problem is that I can't assign to a column name using a variable or > column number as I do in this examples out of spark. Doesn't matter if I am > modifying or creating column. Same problem. > I have also tried to use this with no results: > val df2 = withColumn(df,"tmp", df$eruptions) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18823) Assignation by column name variable not available or bug?
[ https://issues.apache.org/jira/browse/SPARK-18823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15747429#comment-15747429 ] Felix Cheung commented on SPARK-18823: -- For #2, I do agree it could get messy, but I was thinking about df$waiting <- 1 and I think we should at least support just that. Today one would have to do df$waiting <- lit(1) In other words we could constraint the support to numeric or character of length == 1. > Assignation by column name variable not available or bug? > - > > Key: SPARK-18823 > URL: https://issues.apache.org/jira/browse/SPARK-18823 > Project: Spark > Issue Type: Question > Components: SparkR >Affects Versions: 2.0.2 > Environment: RStudio Server in EC2 Instances (EMR Service of AWS) Emr > 4. Or databricks (community.cloud.databricks.com) . >Reporter: Vicente Masip > Fix For: 2.0.2 > > Original Estimate: 24h > Remaining Estimate: 24h > > I really don't know if this is a bug or can be done with some function: > Sometimes is very important to assign something to a column which name has to > be access trough a variable. Normally, I have always used it with doble > brackets likes this out of SparkR problems: > # df could be faithful normal data frame or data table. > # accesing by variable name: > myname = "waiting" > df[[myname]] <- c(1:nrow(df)) > # or even column number > df[[2]] <- df$eruptions > The error is not caused by the right side of the "<-" operator of assignment. > The problem is that I can't assign to a column name using a variable or > column number as I do in this examples out of spark. Doesn't matter if I am > modifying or creating column. Same problem. > I have also tried to use this with no results: > val df2 = withColumn(df,"tmp", df$eruptions) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18823) Assignation by column name variable not available or bug?
[ https://issues.apache.org/jira/browse/SPARK-18823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15745633#comment-15745633 ] Shivaram Venkataraman commented on SPARK-18823: --- Thanks [~masip85] for verifying this. I think as [~felixcheung] pointed out there are two separate issues we can file as feature requests 1. Supporting assignment of DataFrame columns in `[` and `[[` -- This should be pretty straight forward I'd guess 2. Supporting assignment of a local R column using `$` and / or `[[` -- This one I'm less sure about because it will involve determining types, serializing data from local R and splitting into existing DataFrame etc. Also at a higher level if the DataFrame has a 100M rows then it might not be efficient to ship that much data etc. > Assignation by column name variable not available or bug? > - > > Key: SPARK-18823 > URL: https://issues.apache.org/jira/browse/SPARK-18823 > Project: Spark > Issue Type: Question > Components: SparkR >Affects Versions: 2.0.2 > Environment: RStudio Server in EC2 Instances (EMR Service of AWS) Emr > 4. Or databricks (community.cloud.databricks.com) . >Reporter: Vicente Masip > Fix For: 2.0.2 > > Original Estimate: 24h > Remaining Estimate: 24h > > I really don't know if this is a bug or can be done with some function: > Sometimes is very important to assign something to a column which name has to > be access trough a variable. Normally, I have always used it with doble > brackets likes this out of SparkR problems: > # df could be faithful normal data frame or data table. > # accesing by variable name: > myname = "waiting" > df[[myname]] <- c(1:nrow(df)) > # or even column number > df[[2]] <- df$eruptions > The error is not caused by the right side of the "<-" operator of assignment. > The problem is that I can't assign to a column name using a variable or > column number as I do in this examples out of spark. Doesn't matter if I am > modifying or creating column. Same problem. > I have also tried to use this with no results: > val df2 = withColumn(df,"tmp", df$eruptions) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18823) Assignation by column name variable not available or bug?
[ https://issues.apache.org/jira/browse/SPARK-18823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15744740#comment-15744740 ] Vicente Masip commented on SPARK-18823: --- Yes. I've been able to do it with your suggestion. In my case problem, i use it into a loop trough col names.(i: iterator) df <- withColumn(df, myColNames[i], cast(df[[i]],"boolean")) Happy to find the solution. More friendly if [[ is available at the left side. Anyway, if this example is inside documentation, it would be a solution too. Should be clear, that is available at the right side, but not at the left. And that if you need it, WithColumn is the solution. I think too that x$y <- t$q assignation is available,it should be with all its consequences. Am I wrong? > Assignation by column name variable not available or bug? > - > > Key: SPARK-18823 > URL: https://issues.apache.org/jira/browse/SPARK-18823 > Project: Spark > Issue Type: Question > Components: SparkR >Affects Versions: 2.0.2 > Environment: RStudio Server in EC2 Instances (EMR Service of AWS) Emr > 4. Or databricks (community.cloud.databricks.com) . >Reporter: Vicente Masip > Fix For: 2.0.2 > > Original Estimate: 24h > Remaining Estimate: 24h > > I really don't know if this is a bug or can be done with some function: > Sometimes is very important to assign something to a column which name has to > be access trough a variable. Normally, I have always used it with doble > brackets likes this out of SparkR problems: > # df could be faithful normal data frame or data table. > # accesing by variable name: > myname = "waiting" > df[[myname]] <- c(1:nrow(df)) > # or even column number > df[[2]] <- df$eruptions > The error is not caused by the right side of the "<-" operator of assignment. > The problem is that I can't assign to a column name using a variable or > column number as I do in this examples out of spark. Doesn't matter if I am > modifying or creating column. Same problem. > I have also tried to use this with no results: > val df2 = withColumn(df,"tmp", df$eruptions) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18823) Assignation by column name variable not available or bug?
[ https://issues.apache.org/jira/browse/SPARK-18823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15744519#comment-15744519 ] Shivaram Venkataraman commented on SPARK-18823: --- Ah I see your point - `withColumn` does work for this use case ? I agree that adding this to `[ <- ` or `[[ <- ` would be a better user experience {code} > df2 <- withColumn(df, "tmp1", df$eruptions) > head(df2) eruptions waiting tmp1 1 3.600 79 3.600 ... {code} > Assignation by column name variable not available or bug? > - > > Key: SPARK-18823 > URL: https://issues.apache.org/jira/browse/SPARK-18823 > Project: Spark > Issue Type: Question > Components: SparkR >Affects Versions: 2.0.2 > Environment: RStudio Server in EC2 Instances (EMR Service of AWS) Emr > 4. Or databricks (community.cloud.databricks.com) . >Reporter: Vicente Masip > Fix For: 2.0.2 > > Original Estimate: 24h > Remaining Estimate: 24h > > I really don't know if this is a bug or can be done with some function: > Sometimes is very important to assign something to a column which name has to > be access trough a variable. Normally, I have always used it with doble > brackets likes this out of SparkR problems: > # df could be faithful normal data frame or data table. > # accesing by variable name: > myname = "waiting" > df[[myname]] <- c(1:nrow(df)) > # or even column number > df[[2]] <- df$eruptions > The error is not caused by the right side of the "<-" operator of assignment. > The problem is that I can't assign to a column name using a variable or > column number as I do in this examples out of spark. Doesn't matter if I am > modifying or creating column. Same problem. > I have also tried to use this with no results: > val df2 = withColumn(df,"tmp", df$eruptions) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18823) Assignation by column name variable not available or bug?
[ https://issues.apache.org/jira/browse/SPARK-18823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15744465#comment-15744465 ] Vicente Masip commented on SPARK-18823: --- Well maybe I haven't explained myself. I wrote the right side in both ways using no Spark examples. But my need, is that this no spark examples has at the left side of the operation a variable. Imagine you have to operate over 30 columns. Do I have to handscript every operation 30 times? The left side column name cannot be a variable? This sounds very important to me. > Assignation by column name variable not available or bug? > - > > Key: SPARK-18823 > URL: https://issues.apache.org/jira/browse/SPARK-18823 > Project: Spark > Issue Type: Question > Components: SparkR >Affects Versions: 2.0.2 > Environment: RStudio Server in EC2 Instances (EMR Service of AWS) Emr > 4. Or databricks (community.cloud.databricks.com) . >Reporter: Vicente Masip > Fix For: 2.0.2 > > Original Estimate: 24h > Remaining Estimate: 24h > > I really don't know if this is a bug or can be done with some function: > Sometimes is very important to assign something to a column which name has to > be access trough a variable. Normally, I have always used it with doble > brackets likes this out of SparkR problems: > # df could be faithful normal data frame or data table. > # accesing by variable name: > myname = "waiting" > df[[myname]] <- c(1:nrow(df)) > # or even column number > df[[2]] <- df$eruptions > The error is not caused by the right side of the "<-" operator of assignment. > The problem is that I can't assign to a column name using a variable or > column number as I do in this examples out of spark. Doesn't matter if I am > modifying or creating column. Same problem. > I have also tried to use this with no results: > val df2 = withColumn(df,"tmp", df$eruptions) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18823) Assignation by column name variable not available or bug?
[ https://issues.apache.org/jira/browse/SPARK-18823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15744415#comment-15744415 ] Felix Cheung commented on SPARK-18823: -- How important it is to support df[[myname]] <- c(1:nrow(df)) or df[[2]] <- df$eruptions I think we should support df$waiting <- c(1:nrow(df)) which I've plan to work on. > Assignation by column name variable not available or bug? > - > > Key: SPARK-18823 > URL: https://issues.apache.org/jira/browse/SPARK-18823 > Project: Spark > Issue Type: Question > Components: SparkR >Affects Versions: 2.0.2 > Environment: RStudio Server in EC2 Instances (EMR Service of AWS) Emr > 4. Or databricks (community.cloud.databricks.com) . >Reporter: Vicente Masip > Fix For: 2.0.2 > > Original Estimate: 24h > Remaining Estimate: 24h > > I really don't know if this is a bug or can be done with some function: > Sometimes is very important to assign something to a column which name has to > be access trough a variable. Normally, I have always used it with doble > brackets likes this out of SparkR problems: > # df could be faithful normal data frame or data table. > # accesing by variable name: > myname = "waiting" > df[[myname]] <- c(1:nrow(df)) > # or even column number > df[[2]] <- df$eruptions > The error is not caused by the right side of the "<-" operator of assignment. > The problem is that I can't assign to a column name using a variable or > column number as I do in this examples out of spark. Doesn't matter if I am > modifying or creating column. Same problem. > I have also tried to use this with no results: > val df2 = withColumn(df,"tmp", df$eruptions) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18823) Assignation by column name variable not available or bug?
[ https://issues.apache.org/jira/browse/SPARK-18823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15743648#comment-15743648 ] Shivaram Venkataraman commented on SPARK-18823: --- We don't support assigning to columns using `[` and `[[` -- the code is just not there, so this is more of a missing feature than a bug. We do support creating new columns with the `$` sign -- for example df$eruptions_new <- df$eruptions + 10 -- But there is a limitation that the right hand side has to be a Column and thus `c(1:nrow(df)` will not work there as well. > Assignation by column name variable not available or bug? > - > > Key: SPARK-18823 > URL: https://issues.apache.org/jira/browse/SPARK-18823 > Project: Spark > Issue Type: Question > Components: SparkR >Affects Versions: 2.0.2 > Environment: RStudio Server in EC2 Instances (EMR Service of AWS) Emr > 4. Or databricks (community.cloud.databricks.com) . >Reporter: Vicente Masip > Fix For: 2.0.2 > > Original Estimate: 24h > Remaining Estimate: 24h > > I really don't know if this is a bug or can be done with some function: > Sometimes is very important to assign something to a column which name has to > be access trough a variable. Normally, I have always used it with doble > brackets likes this out of SparkR problems: > # df could be faithful normal data frame or data table. > # accesing by variable name: > myname = "waiting" > df[[myname]] <- c(1:nrow(df)) > # or even column number > df[[2]] <- df$eruptions > The error is not caused by the right side of the "<-" operator of assignment. > The problem is that I can't assign to a column name using a variable or > column number as I do in this examples out of spark. Doesn't matter if I am > modifying or creating column. Same problem. > I have also tried to use this with no results: > val df2 = withColumn(df,"tmp", df$eruptions) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org