[ https://issues.apache.org/jira/browse/SPARK-18569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15748440#comment-15748440 ]
Yanbo Liang edited comment on SPARK-18569 at 12/14/16 2:26 PM: --------------------------------------------------------------- This is generally a good idea, but I think we need to delimit the scope of these functions to set priority, which functions are the most popular used for R users? If we figure out the route for the first arithmetic function, we can add others simple and reproducible. To archive this, it's better we can have a basic design document for {{RFormula}} improvement, to make this architecture scalable and maintainable. Actually we have used a tricky way when implementing {{spark.survreg}}, since it requires R formula likes Surv(futime, fustat) ~ ecog_ps + rx which has two elements in the label/response side and need to support {{Surv}} as keyword, but {{RFormula}} does not support them currently. We can put these issues together to figure out an elegant way. was (Author: yanboliang): This is generally a good idea, but I think we need to delimit the scope of these functions to set priority, which functions are the most popular used for R users? If we figure out the route for the first arithmetic function, we can add others simple and reproducible. To archive this, it's better we can have a basic design document for {{RFormula}} improvement, to make this architecture scalable and maintainable. Actually we have used a tricky way when implementing {{spark.survreg}}, since it requires R formula likes Surv(futime, fustat) ~ ecog_ps + rx which has two elements in the label/response side and need to support {{Surv}} as keyword, but {{RFormula}} does not support them currently. We can put this issues together to figure out an elegant way. > Support R formula arithmetic > ----------------------------- > > Key: SPARK-18569 > URL: https://issues.apache.org/jira/browse/SPARK-18569 > Project: Spark > Issue Type: Sub-task > Components: ML, SparkR > Reporter: Felix Cheung > > I think we should support arithmetic which makes it a lot more convenient to > build model. Something like > {code} > log(y) ~ a + log(x) > {code} > And to avoid resolution confusions we should support the I() operator: > {code} > I > I(X∗Z) as is: include a new variable consisting of these variables multiplied > {code} > Such that this works: > {code} > y ~ a + I(b+c) > {code} > the term b+c is to be interpreted as the sum of b and c. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org