[ 
https://issues.apache.org/jira/browse/SPARK-18569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15748440#comment-15748440
 ] 

Yanbo Liang edited comment on SPARK-18569 at 12/14/16 2:26 PM:
---------------------------------------------------------------

This is generally a good idea, but I think we need to delimit the scope of 
these functions to set priority, which functions are the most popular used for 
R users? If we figure out the route for the first arithmetic function, we can 
add others simple and reproducible. To archive this, it's better we can have a 
basic design document for {{RFormula}} improvement, to make this architecture 
scalable and maintainable.
Actually we have used a tricky way when implementing {{spark.survreg}}, since 
it requires R formula likes Surv(futime, fustat) ~ ecog_ps + rx which has two 
elements in the label/response side and need to support {{Surv}} as keyword, 
but {{RFormula}} does not support them currently. We can put these issues 
together to figure out an elegant way.


was (Author: yanboliang):
This is generally a good idea, but I think we need to delimit the scope of 
these functions to set priority, which functions are the most popular used for 
R users? If we figure out the route for the first arithmetic function, we can 
add others simple and reproducible. To archive this, it's better we can have a 
basic design document for {{RFormula}} improvement, to make this architecture 
scalable and maintainable.
Actually we have used a tricky way when implementing {{spark.survreg}}, since 
it requires R formula likes Surv(futime, fustat) ~ ecog_ps + rx which has two 
elements in the label/response side and need to support {{Surv}} as keyword, 
but {{RFormula}} does not support them currently. We can put this issues 
together to figure out an elegant way.

> Support R formula arithmetic 
> -----------------------------
>
>                 Key: SPARK-18569
>                 URL: https://issues.apache.org/jira/browse/SPARK-18569
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML, SparkR
>            Reporter: Felix Cheung
>
> I think we should support arithmetic which makes it a lot more convenient to 
> build model. Something like
> {code}
>   log(y) ~ a + log(x)
> {code}
> And to avoid resolution confusions we should support the I() operator:
> {code}
> I
>  I(X∗Z) as is: include a new variable consisting of these variables multiplied
> {code}
> Such that this works:
> {code}
> y ~ a + I(b+c)
> {code}
> the term b+c is to be interpreted as the sum of b and c.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to