[jira] [Commented] (SPARK-18823) Assignation by column name variable not available or bug?

2017-01-20 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832324#comment-15832324
 ] 

Apache Spark commented on SPARK-18823:
--

User 'felixcheung' has created a pull request for this issue:
https://github.com/apache/spark/pull/16663

> Assignation by column name variable not available or bug?
> -
>
> Key: SPARK-18823
> URL: https://issues.apache.org/jira/browse/SPARK-18823
> Project: Spark
>  Issue Type: Question
>  Components: SparkR
>Affects Versions: 2.0.2
> Environment: RStudio Server in EC2 Instances (EMR Service of AWS) Emr 
> 4. Or databricks (community.cloud.databricks.com) .
>Reporter: Vicente Masip
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I really don't know if this is a bug or can be done with some function:
> Sometimes is very important to assign something to a column which name has to 
> be access trough a variable. Normally, I have always used it with doble 
> brackets likes this out of SparkR problems:
> # df could be faithful normal data frame or data table.
> # accesing by variable name:
> myname = "waiting"
> df[[myname]] <- c(1:nrow(df))
> # or even column number
> df[[2]] <- df$eruptions
> The error is not caused by the right side of the "<-" operator of assignment. 
> The problem is that I can't assign to a column name using a variable or 
> column number as I do in this examples out of spark. Doesn't matter if I am 
> modifying or creating column. Same problem.
> I have also tried to use this with no results:
> val df2 = withColumn(df,"tmp", df$eruptions)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18823) Assignation by column name variable not available or bug?

2017-01-11 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820389#comment-15820389
 ] 

Felix Cheung commented on SPARK-18823:
--

Yap. I'll start on this shortly.

> Assignation by column name variable not available or bug?
> -
>
> Key: SPARK-18823
> URL: https://issues.apache.org/jira/browse/SPARK-18823
> Project: Spark
>  Issue Type: Question
>  Components: SparkR
>Affects Versions: 2.0.2
> Environment: RStudio Server in EC2 Instances (EMR Service of AWS) Emr 
> 4. Or databricks (community.cloud.databricks.com) .
>Reporter: Vicente Masip
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I really don't know if this is a bug or can be done with some function:
> Sometimes is very important to assign something to a column which name has to 
> be access trough a variable. Normally, I have always used it with doble 
> brackets likes this out of SparkR problems:
> # df could be faithful normal data frame or data table.
> # accesing by variable name:
> myname = "waiting"
> df[[myname]] <- c(1:nrow(df))
> # or even column number
> df[[2]] <- df$eruptions
> The error is not caused by the right side of the "<-" operator of assignment. 
> The problem is that I can't assign to a column name using a variable or 
> column number as I do in this examples out of spark. Doesn't matter if I am 
> modifying or creating column. Same problem.
> I have also tried to use this with no results:
> val df2 = withColumn(df,"tmp", df$eruptions)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18823) Assignation by column name variable not available or bug?

2017-01-11 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819446#comment-15819446
 ] 

Shivaram Venkataraman commented on SPARK-18823:
---

Yeah I think it makes sense to not handle the case where we take a local 
vector. However adding support for `[` and `[[` to support literals and 
existing columns would be good. This is the only item remaining from what is 
summarized as #1 above I think ?

> Assignation by column name variable not available or bug?
> -
>
> Key: SPARK-18823
> URL: https://issues.apache.org/jira/browse/SPARK-18823
> Project: Spark
>  Issue Type: Question
>  Components: SparkR
>Affects Versions: 2.0.2
> Environment: RStudio Server in EC2 Instances (EMR Service of AWS) Emr 
> 4. Or databricks (community.cloud.databricks.com) .
>Reporter: Vicente Masip
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I really don't know if this is a bug or can be done with some function:
> Sometimes is very important to assign something to a column which name has to 
> be access trough a variable. Normally, I have always used it with doble 
> brackets likes this out of SparkR problems:
> # df could be faithful normal data frame or data table.
> # accesing by variable name:
> myname = "waiting"
> df[[myname]] <- c(1:nrow(df))
> # or even column number
> df[[2]] <- df$eruptions
> The error is not caused by the right side of the "<-" operator of assignment. 
> The problem is that I can't assign to a column name using a variable or 
> column number as I do in this examples out of spark. Doesn't matter if I am 
> modifying or creating column. Same problem.
> I have also tried to use this with no results:
> val df2 = withColumn(df,"tmp", df$eruptions)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18823) Assignation by column name variable not available or bug?

2017-01-08 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15810634#comment-15810634
 ] 

Felix Cheung commented on SPARK-18823:
--

I think to Shivaram, this is a bit tricky since we are making assumption that 
the column data can fit in memory of a single node (where the R client is 
running). Even then, we would need to handle a potentially large amount of data 
to serialze and distribute and so on. 

> Assignation by column name variable not available or bug?
> -
>
> Key: SPARK-18823
> URL: https://issues.apache.org/jira/browse/SPARK-18823
> Project: Spark
>  Issue Type: Question
>  Components: SparkR
>Affects Versions: 2.0.2
> Environment: RStudio Server in EC2 Instances (EMR Service of AWS) Emr 
> 4. Or databricks (community.cloud.databricks.com) .
>Reporter: Vicente Masip
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I really don't know if this is a bug or can be done with some function:
> Sometimes is very important to assign something to a column which name has to 
> be access trough a variable. Normally, I have always used it with doble 
> brackets likes this out of SparkR problems:
> # df could be faithful normal data frame or data table.
> # accesing by variable name:
> myname = "waiting"
> df[[myname]] <- c(1:nrow(df))
> # or even column number
> df[[2]] <- df$eruptions
> The error is not caused by the right side of the "<-" operator of assignment. 
> The problem is that I can't assign to a column name using a variable or 
> column number as I do in this examples out of spark. Doesn't matter if I am 
> modifying or creating column. Same problem.
> I have also tried to use this with no results:
> val df2 = withColumn(df,"tmp", df$eruptions)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18823) Assignation by column name variable not available or bug?

2016-12-15 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752456#comment-15752456
 ] 

Joseph K. Bradley commented on SPARK-18823:
---

Note: Please don't set the Target Version or Fix Version.  Committers can use 
those fields for tracking releases.  Thanks!

> Assignation by column name variable not available or bug?
> -
>
> Key: SPARK-18823
> URL: https://issues.apache.org/jira/browse/SPARK-18823
> Project: Spark
>  Issue Type: Question
>  Components: SparkR
>Affects Versions: 2.0.2
> Environment: RStudio Server in EC2 Instances (EMR Service of AWS) Emr 
> 4. Or databricks (community.cloud.databricks.com) .
>Reporter: Vicente Masip
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I really don't know if this is a bug or can be done with some function:
> Sometimes is very important to assign something to a column which name has to 
> be access trough a variable. Normally, I have always used it with doble 
> brackets likes this out of SparkR problems:
> # df could be faithful normal data frame or data table.
> # accesing by variable name:
> myname = "waiting"
> df[[myname]] <- c(1:nrow(df))
> # or even column number
> df[[2]] <- df$eruptions
> The error is not caused by the right side of the "<-" operator of assignment. 
> The problem is that I can't assign to a column name using a variable or 
> column number as I do in this examples out of spark. Doesn't matter if I am 
> modifying or creating column. Same problem.
> I have also tried to use this with no results:
> val df2 = withColumn(df,"tmp", df$eruptions)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18823) Assignation by column name variable not available or bug?

2016-12-14 Thread Vicente Masip (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15747863#comment-15747863
 ] 

Vicente Masip commented on SPARK-18823:
---

At this issue,there is something missing too, that is assign a vector to a 
column (same size obviously)

> Assignation by column name variable not available or bug?
> -
>
> Key: SPARK-18823
> URL: https://issues.apache.org/jira/browse/SPARK-18823
> Project: Spark
>  Issue Type: Question
>  Components: SparkR
>Affects Versions: 2.0.2
> Environment: RStudio Server in EC2 Instances (EMR Service of AWS) Emr 
> 4. Or databricks (community.cloud.databricks.com) .
>Reporter: Vicente Masip
> Fix For: 2.0.2
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I really don't know if this is a bug or can be done with some function:
> Sometimes is very important to assign something to a column which name has to 
> be access trough a variable. Normally, I have always used it with doble 
> brackets likes this out of SparkR problems:
> # df could be faithful normal data frame or data table.
> # accesing by variable name:
> myname = "waiting"
> df[[myname]] <- c(1:nrow(df))
> # or even column number
> df[[2]] <- df$eruptions
> The error is not caused by the right side of the "<-" operator of assignment. 
> The problem is that I can't assign to a column name using a variable or 
> column number as I do in this examples out of spark. Doesn't matter if I am 
> modifying or creating column. Same problem.
> I have also tried to use this with no results:
> val df2 = withColumn(df,"tmp", df$eruptions)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18823) Assignation by column name variable not available or bug?

2016-12-13 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15747433#comment-15747433
 ] 

Felix Cheung commented on SPARK-18823:
--

We will address both of your suggestions.

As for x$y <- t$q, assuming you mean x and t being 2 different Spark DataFrame, 
this would depend on having the ability to collect a specific column, and 
ideally, without transitioning JVM->R->JVM. 

> Assignation by column name variable not available or bug?
> -
>
> Key: SPARK-18823
> URL: https://issues.apache.org/jira/browse/SPARK-18823
> Project: Spark
>  Issue Type: Question
>  Components: SparkR
>Affects Versions: 2.0.2
> Environment: RStudio Server in EC2 Instances (EMR Service of AWS) Emr 
> 4. Or databricks (community.cloud.databricks.com) .
>Reporter: Vicente Masip
> Fix For: 2.0.2
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I really don't know if this is a bug or can be done with some function:
> Sometimes is very important to assign something to a column which name has to 
> be access trough a variable. Normally, I have always used it with doble 
> brackets likes this out of SparkR problems:
> # df could be faithful normal data frame or data table.
> # accesing by variable name:
> myname = "waiting"
> df[[myname]] <- c(1:nrow(df))
> # or even column number
> df[[2]] <- df$eruptions
> The error is not caused by the right side of the "<-" operator of assignment. 
> The problem is that I can't assign to a column name using a variable or 
> column number as I do in this examples out of spark. Doesn't matter if I am 
> modifying or creating column. Same problem.
> I have also tried to use this with no results:
> val df2 = withColumn(df,"tmp", df$eruptions)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18823) Assignation by column name variable not available or bug?

2016-12-13 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15747429#comment-15747429
 ] 

Felix Cheung commented on SPARK-18823:
--

For #2, I do agree it could get messy, but I was thinking about
df$waiting <- 1

and I think we should at least support just that. Today one would have to do

df$waiting <- lit(1)

In other words we could constraint the support to numeric or character of 
length == 1.


> Assignation by column name variable not available or bug?
> -
>
> Key: SPARK-18823
> URL: https://issues.apache.org/jira/browse/SPARK-18823
> Project: Spark
>  Issue Type: Question
>  Components: SparkR
>Affects Versions: 2.0.2
> Environment: RStudio Server in EC2 Instances (EMR Service of AWS) Emr 
> 4. Or databricks (community.cloud.databricks.com) .
>Reporter: Vicente Masip
> Fix For: 2.0.2
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I really don't know if this is a bug or can be done with some function:
> Sometimes is very important to assign something to a column which name has to 
> be access trough a variable. Normally, I have always used it with doble 
> brackets likes this out of SparkR problems:
> # df could be faithful normal data frame or data table.
> # accesing by variable name:
> myname = "waiting"
> df[[myname]] <- c(1:nrow(df))
> # or even column number
> df[[2]] <- df$eruptions
> The error is not caused by the right side of the "<-" operator of assignment. 
> The problem is that I can't assign to a column name using a variable or 
> column number as I do in this examples out of spark. Doesn't matter if I am 
> modifying or creating column. Same problem.
> I have also tried to use this with no results:
> val df2 = withColumn(df,"tmp", df$eruptions)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18823) Assignation by column name variable not available or bug?

2016-12-13 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15745633#comment-15745633
 ] 

Shivaram Venkataraman commented on SPARK-18823:
---

Thanks [~masip85] for verifying this. I think as [~felixcheung] pointed out 
there are two separate issues we can file as feature requests

1. Supporting assignment of DataFrame columns in `[` and `[[` -- This should be 
pretty straight forward I'd guess

2. Supporting assignment of a local R column using `$` and / or `[[`  -- This 
one I'm less sure about because it will involve determining types, serializing 
data from local R and splitting into existing DataFrame etc. Also at a higher 
level if the DataFrame has a 100M rows then it might not be efficient to ship 
that much data etc. 

> Assignation by column name variable not available or bug?
> -
>
> Key: SPARK-18823
> URL: https://issues.apache.org/jira/browse/SPARK-18823
> Project: Spark
>  Issue Type: Question
>  Components: SparkR
>Affects Versions: 2.0.2
> Environment: RStudio Server in EC2 Instances (EMR Service of AWS) Emr 
> 4. Or databricks (community.cloud.databricks.com) .
>Reporter: Vicente Masip
> Fix For: 2.0.2
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I really don't know if this is a bug or can be done with some function:
> Sometimes is very important to assign something to a column which name has to 
> be access trough a variable. Normally, I have always used it with doble 
> brackets likes this out of SparkR problems:
> # df could be faithful normal data frame or data table.
> # accesing by variable name:
> myname = "waiting"
> df[[myname]] <- c(1:nrow(df))
> # or even column number
> df[[2]] <- df$eruptions
> The error is not caused by the right side of the "<-" operator of assignment. 
> The problem is that I can't assign to a column name using a variable or 
> column number as I do in this examples out of spark. Doesn't matter if I am 
> modifying or creating column. Same problem.
> I have also tried to use this with no results:
> val df2 = withColumn(df,"tmp", df$eruptions)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18823) Assignation by column name variable not available or bug?

2016-12-13 Thread Vicente Masip (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15744740#comment-15744740
 ] 

Vicente Masip commented on SPARK-18823:
---

Yes. I've been able to do it with your suggestion.

In my case problem, i use it into a loop trough col names.(i: iterator)

df <- withColumn(df, myColNames[i], cast(df[[i]],"boolean"))

Happy to find the solution. More friendly if [[ is available at the left side. 
Anyway, if this example is inside documentation, it would be a solution too. 
Should be clear, that is available at the right side, but not at the left. And 
that if you need it, WithColumn is the solution.

I think too that x$y <- t$q assignation is available,it should be with all its 
consequences. Am 
I wrong?

> Assignation by column name variable not available or bug?
> -
>
> Key: SPARK-18823
> URL: https://issues.apache.org/jira/browse/SPARK-18823
> Project: Spark
>  Issue Type: Question
>  Components: SparkR
>Affects Versions: 2.0.2
> Environment: RStudio Server in EC2 Instances (EMR Service of AWS) Emr 
> 4. Or databricks (community.cloud.databricks.com) .
>Reporter: Vicente Masip
> Fix For: 2.0.2
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I really don't know if this is a bug or can be done with some function:
> Sometimes is very important to assign something to a column which name has to 
> be access trough a variable. Normally, I have always used it with doble 
> brackets likes this out of SparkR problems:
> # df could be faithful normal data frame or data table.
> # accesing by variable name:
> myname = "waiting"
> df[[myname]] <- c(1:nrow(df))
> # or even column number
> df[[2]] <- df$eruptions
> The error is not caused by the right side of the "<-" operator of assignment. 
> The problem is that I can't assign to a column name using a variable or 
> column number as I do in this examples out of spark. Doesn't matter if I am 
> modifying or creating column. Same problem.
> I have also tried to use this with no results:
> val df2 = withColumn(df,"tmp", df$eruptions)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18823) Assignation by column name variable not available or bug?

2016-12-13 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15744519#comment-15744519
 ] 

Shivaram Venkataraman commented on SPARK-18823:
---

Ah I see your point - `withColumn` does work for this use case ? I agree that 
adding this to `[ <- ` or `[[ <- ` would be a better user experience

{code}
> df2 <- withColumn(df, "tmp1", df$eruptions)
> head(df2)
  eruptions waiting  tmp1
1 3.600  79 3.600
...
{code}

> Assignation by column name variable not available or bug?
> -
>
> Key: SPARK-18823
> URL: https://issues.apache.org/jira/browse/SPARK-18823
> Project: Spark
>  Issue Type: Question
>  Components: SparkR
>Affects Versions: 2.0.2
> Environment: RStudio Server in EC2 Instances (EMR Service of AWS) Emr 
> 4. Or databricks (community.cloud.databricks.com) .
>Reporter: Vicente Masip
> Fix For: 2.0.2
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I really don't know if this is a bug or can be done with some function:
> Sometimes is very important to assign something to a column which name has to 
> be access trough a variable. Normally, I have always used it with doble 
> brackets likes this out of SparkR problems:
> # df could be faithful normal data frame or data table.
> # accesing by variable name:
> myname = "waiting"
> df[[myname]] <- c(1:nrow(df))
> # or even column number
> df[[2]] <- df$eruptions
> The error is not caused by the right side of the "<-" operator of assignment. 
> The problem is that I can't assign to a column name using a variable or 
> column number as I do in this examples out of spark. Doesn't matter if I am 
> modifying or creating column. Same problem.
> I have also tried to use this with no results:
> val df2 = withColumn(df,"tmp", df$eruptions)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18823) Assignation by column name variable not available or bug?

2016-12-12 Thread Vicente Masip (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15744465#comment-15744465
 ] 

Vicente Masip commented on SPARK-18823:
---

Well maybe I haven't explained myself. I wrote the right side in both ways 
using no Spark examples. But my need, is that this no spark examples has at the 
left side of the operation a variable.

Imagine you have to operate over 30 columns. Do I have to handscript every 
operation 30 times? The left side column name cannot be a variable? This sounds 
very important to me.

> Assignation by column name variable not available or bug?
> -
>
> Key: SPARK-18823
> URL: https://issues.apache.org/jira/browse/SPARK-18823
> Project: Spark
>  Issue Type: Question
>  Components: SparkR
>Affects Versions: 2.0.2
> Environment: RStudio Server in EC2 Instances (EMR Service of AWS) Emr 
> 4. Or databricks (community.cloud.databricks.com) .
>Reporter: Vicente Masip
> Fix For: 2.0.2
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I really don't know if this is a bug or can be done with some function:
> Sometimes is very important to assign something to a column which name has to 
> be access trough a variable. Normally, I have always used it with doble 
> brackets likes this out of SparkR problems:
> # df could be faithful normal data frame or data table.
> # accesing by variable name:
> myname = "waiting"
> df[[myname]] <- c(1:nrow(df))
> # or even column number
> df[[2]] <- df$eruptions
> The error is not caused by the right side of the "<-" operator of assignment. 
> The problem is that I can't assign to a column name using a variable or 
> column number as I do in this examples out of spark. Doesn't matter if I am 
> modifying or creating column. Same problem.
> I have also tried to use this with no results:
> val df2 = withColumn(df,"tmp", df$eruptions)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18823) Assignation by column name variable not available or bug?

2016-12-12 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15744415#comment-15744415
 ] 

Felix Cheung commented on SPARK-18823:
--

How important it is to support
df[[myname]] <- c(1:nrow(df))
or
df[[2]] <- df$eruptions

I think we should support
df$waiting <- c(1:nrow(df))

which I've plan to work on.

> Assignation by column name variable not available or bug?
> -
>
> Key: SPARK-18823
> URL: https://issues.apache.org/jira/browse/SPARK-18823
> Project: Spark
>  Issue Type: Question
>  Components: SparkR
>Affects Versions: 2.0.2
> Environment: RStudio Server in EC2 Instances (EMR Service of AWS) Emr 
> 4. Or databricks (community.cloud.databricks.com) .
>Reporter: Vicente Masip
> Fix For: 2.0.2
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I really don't know if this is a bug or can be done with some function:
> Sometimes is very important to assign something to a column which name has to 
> be access trough a variable. Normally, I have always used it with doble 
> brackets likes this out of SparkR problems:
> # df could be faithful normal data frame or data table.
> # accesing by variable name:
> myname = "waiting"
> df[[myname]] <- c(1:nrow(df))
> # or even column number
> df[[2]] <- df$eruptions
> The error is not caused by the right side of the "<-" operator of assignment. 
> The problem is that I can't assign to a column name using a variable or 
> column number as I do in this examples out of spark. Doesn't matter if I am 
> modifying or creating column. Same problem.
> I have also tried to use this with no results:
> val df2 = withColumn(df,"tmp", df$eruptions)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18823) Assignation by column name variable not available or bug?

2016-12-12 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15743648#comment-15743648
 ] 

Shivaram Venkataraman commented on SPARK-18823:
---

We don't support assigning to columns using `[` and `[[` -- the code is just 
not there, so this is more of a missing feature than a bug. We do support 
creating new columns with the `$` sign -- for example df$eruptions_new <- 
df$eruptions + 10 -- But there is a limitation that the right hand side has to 
be a Column and thus `c(1:nrow(df)` will not work there as well. 

> Assignation by column name variable not available or bug?
> -
>
> Key: SPARK-18823
> URL: https://issues.apache.org/jira/browse/SPARK-18823
> Project: Spark
>  Issue Type: Question
>  Components: SparkR
>Affects Versions: 2.0.2
> Environment: RStudio Server in EC2 Instances (EMR Service of AWS) Emr 
> 4. Or databricks (community.cloud.databricks.com) .
>Reporter: Vicente Masip
> Fix For: 2.0.2
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I really don't know if this is a bug or can be done with some function:
> Sometimes is very important to assign something to a column which name has to 
> be access trough a variable. Normally, I have always used it with doble 
> brackets likes this out of SparkR problems:
> # df could be faithful normal data frame or data table.
> # accesing by variable name:
> myname = "waiting"
> df[[myname]] <- c(1:nrow(df))
> # or even column number
> df[[2]] <- df$eruptions
> The error is not caused by the right side of the "<-" operator of assignment. 
> The problem is that I can't assign to a column name using a variable or 
> column number as I do in this examples out of spark. Doesn't matter if I am 
> modifying or creating column. Same problem.
> I have also tried to use this with no results:
> val df2 = withColumn(df,"tmp", df$eruptions)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org