[ 
https://issues.apache.org/jira/browse/SPARK-19177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15821758#comment-15821758
 ] 

Vicente Masip edited comment on SPARK-19177 at 1/13/17 1:18 PM:
----------------------------------------------------------------

If I want to specify schema with gapply or I NEED to specify it at dapply, I 
have had a problem.  The documentation example is beautiful: 

schema <- structType(structField("eruptions", "double"), structField("waiting", 
"double"),
                     structField("waiting_secs", "double"))
df1 <- dapply(df, function(x) { x <- cbind(x, x$waiting * 60) }, schema)

your returning data.frame inside function is 3 columns size. I have 50 columns, 
and I want to return them all again a new computed column. 

Imagine that:  function( x ) { x <- cbind(x, x$waiting * 60) , in some way, x 
has many columns, and the new column has to be handled with an schema at the 
outside function dapply. How would yo define schema? You cannot append an 
structField to the structType.

Finally I'm going to solve it with a dummy new column specified with a lit, 
getting it's new schema and deleting the new column. Not elegant, but I keep on 
my work.


was (Author: masip85):
If I want to specify schema with gapply or I NEED to specify it at dapply, I 
have had a problem.  The documentation example is beautiful: 

schema <- structType(structField("eruptions", "double"), structField("waiting", 
"double"),
                     structField("waiting_secs", "double"))
df1 <- dapply(df, function(x) { x <- cbind(x, x$waiting * 60) }, schema)

your returning data.frame inside function is 3 columns size. I have 50 columns, 
and I want to return them all again a new computed column. 

Imagine that:  function(x) { x <- cbind(x, x$waiting * 60) , in some way, x has 
many columns, and the new column has to be handled with an schema at the 
outside function dapply. How would yo define schema? You cannot append an 
structField to the structType.

Finally I'm going to solve it with a dummy new column specified with a lit, 
getting it's new schema and deleting the new column. Not elegant, but I keep on 
my work.

> SparkR Data Frame operation between columns elements
> ----------------------------------------------------
>
>                 Key: SPARK-19177
>                 URL: https://issues.apache.org/jira/browse/SPARK-19177
>             Project: Spark
>          Issue Type: Question
>          Components: SparkR
>    Affects Versions: 2.0.2
>            Reporter: Vicente Masip
>            Priority: Minor
>              Labels: schema, sparkR, struct
>
> I have commented this in other thread, but I think it can be important to 
> clarify that:
> What happen when you are working with 50 columns and gapply? Do I rewrite 50 
> columns scheme with it's new column from gapply operation? I think there is 
> no alternative because structFields cannot be appended to structType. Any 
> suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to