[jira] [Commented] (SPARK-17317) Add package vignette to SparkR
[ https://issues.apache.org/jira/browse/SPARK-17317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15450013#comment-15450013 ] Junyang Qian commented on SPARK-17317: -- WIP > Add package vignette to SparkR > -- > > Key: SPARK-17317 > URL: https://issues.apache.org/jira/browse/SPARK-17317 > Project: Spark > Issue Type: Improvement >Reporter: Junyang Qian > > In publishing SparkR to CRAN, it would be nice to have a vignette as a user > guide that > * describes the big picture > * introduces the use of various methods > This is important for new users because they may not even know which method > to look up. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17317) Add package vignette to SparkR
Junyang Qian created SPARK-17317: Summary: Add package vignette to SparkR Key: SPARK-17317 URL: https://issues.apache.org/jira/browse/SPARK-17317 Project: Spark Issue Type: Improvement Reporter: Junyang Qian In publishing SparkR to CRAN, it would be nice to have a vignette as a user guide that * describes the big picture * introduces the use of various methods This is important for new users because they may not even know which method to look up. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17315) Add Kolmogorov-Smirnov Test to SparkR
Junyang Qian created SPARK-17315: Summary: Add Kolmogorov-Smirnov Test to SparkR Key: SPARK-17315 URL: https://issues.apache.org/jira/browse/SPARK-17315 Project: Spark Issue Type: New Feature Reporter: Junyang Qian Kolmogorov-Smirnov Test is a popular nonparametric test of equality of distributions. There is implementation in MLlib. It will be nice if we can expose that in SparkR. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17241) SparkR spark.glm should have configurable regularization parameter
[ https://issues.apache.org/jira/browse/SPARK-17241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15437717#comment-15437717 ] Junyang Qian commented on SPARK-17241: -- I'll take a closer look and see if we can add it easily. > SparkR spark.glm should have configurable regularization parameter > -- > > Key: SPARK-17241 > URL: https://issues.apache.org/jira/browse/SPARK-17241 > Project: Spark > Issue Type: Improvement >Reporter: Junyang Qian > > Spark has configurable L2 regularization parameter for generalized linear > regression. It is very important to have them in SparkR so that users can run > ridge regression. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17241) SparkR spark.glm should have configurable regularization parameter
[ https://issues.apache.org/jira/browse/SPARK-17241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15437692#comment-15437692 ] Junyang Qian commented on SPARK-17241: -- [~shivaram] It seems that spark has it for linear regression but not for glm. > SparkR spark.glm should have configurable regularization parameter > -- > > Key: SPARK-17241 > URL: https://issues.apache.org/jira/browse/SPARK-17241 > Project: Spark > Issue Type: Improvement >Reporter: Junyang Qian > > Spark has configurable L2 regularization parameter for generalized linear > regression. It is very important to have them in SparkR so that users can run > ridge regression. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17241) SparkR spark.glm should have configurable regularization parameter
[ https://issues.apache.org/jira/browse/SPARK-17241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junyang Qian updated SPARK-17241: - Summary: SparkR spark.glm should have configurable regularization parameter (was: SparkR spark.glm should have configurable regularization parameter(s)) > SparkR spark.glm should have configurable regularization parameter > -- > > Key: SPARK-17241 > URL: https://issues.apache.org/jira/browse/SPARK-17241 > Project: Spark > Issue Type: Improvement >Reporter: Junyang Qian > > Spark has configurable L2 regularization parameter for linear regression and > an additional elastic-net parameter for generalized linear model. It is very > important to have them in SparkR so that users can run ridge regression and > elastic-net. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17241) SparkR spark.glm should have configurable regularization parameter
[ https://issues.apache.org/jira/browse/SPARK-17241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junyang Qian updated SPARK-17241: - Description: Spark has configurable L2 regularization parameter for generalized linear regression. It is very important to have them in SparkR so that users can run ridge regression. (was: Spark has configurable L2 regularization parameter for linear regression and an additional elastic-net parameter for generalized linear model. It is very important to have them in SparkR so that users can run ridge regression and elastic-net.) > SparkR spark.glm should have configurable regularization parameter > -- > > Key: SPARK-17241 > URL: https://issues.apache.org/jira/browse/SPARK-17241 > Project: Spark > Issue Type: Improvement >Reporter: Junyang Qian > > Spark has configurable L2 regularization parameter for generalized linear > regression. It is very important to have them in SparkR so that users can run > ridge regression. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17241) SparkR spark.glm should have configurable regularization parameter(s)
Junyang Qian created SPARK-17241: Summary: SparkR spark.glm should have configurable regularization parameter(s) Key: SPARK-17241 URL: https://issues.apache.org/jira/browse/SPARK-17241 Project: Spark Issue Type: Improvement Reporter: Junyang Qian Spark has configurable L2 regularization parameter for linear regression and an additional elastic-net parameter for generalized linear model. It is very important to have them in SparkR so that users can run ridge regression and elastic-net. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16508) Fix documentation warnings found by R CMD check
[ https://issues.apache.org/jira/browse/SPARK-16508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15411132#comment-15411132 ] Junyang Qian commented on SPARK-16508: -- Sounds good. I'll be working on the undocumented/duplicated argument warnings. > Fix documentation warnings found by R CMD check > --- > > Key: SPARK-16508 > URL: https://issues.apache.org/jira/browse/SPARK-16508 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Reporter: Shivaram Venkataraman > > A full list of warnings after the fixes in SPARK-16507 is at > https://gist.github.com/shivaram/62866c4ca59c5d34b8963939cf04b5eb -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16508) Fix documentation warnings found by R CMD check
[ https://issues.apache.org/jira/browse/SPARK-16508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410296#comment-15410296 ] Junyang Qian commented on SPARK-16508: -- It seems that there are still some warnings in my local check, e.g. undocumented arguments in as.data.frame "row.names", "optional". I was wondering if I missed something or if we should deal with those? > Fix documentation warnings found by R CMD check > --- > > Key: SPARK-16508 > URL: https://issues.apache.org/jira/browse/SPARK-16508 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Reporter: Shivaram Venkataraman > > A full list of warnings after the fixes in SPARK-16507 is at > https://gist.github.com/shivaram/62866c4ca59c5d34b8963939cf04b5eb -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16727) SparkR unit test fails - incorrect expected output
Junyang Qian created SPARK-16727: Summary: SparkR unit test fails - incorrect expected output Key: SPARK-16727 URL: https://issues.apache.org/jira/browse/SPARK-16727 Project: Spark Issue Type: Bug Reporter: Junyang Qian https://github.com/apache/spark/blob/master/R/pkg/inst/tests/testthat/test_sparkSQL.R#L1827 When I run spark/R/run-tests.sh, the tests failed with the following message: 1. Failure (at test_sparkSQL.R#1827): describe() and summarize() on a DataFrame collect(stats)[4, "name"] not equal to "Andy" target is NULL, current is character 2. Failure (at test_sparkSQL.R#1831): describe() and summarize() on a DataFrame collect(stats2)[4, "name"] not equal to "Andy" target is NULL, current is character Error: Test failures Execution halted -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16579) Add a spark install function
[ https://issues.apache.org/jira/browse/SPARK-16579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384628#comment-15384628 ] Junyang Qian commented on SPARK-16579: -- If we find Spark home and the JARs missing, do we want to still install to a cache dir and then redirect Spark home to that dir? > Add a spark install function > > > Key: SPARK-16579 > URL: https://issues.apache.org/jira/browse/SPARK-16579 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Reporter: Shivaram Venkataraman >Assignee: Junyang Qian > > As described in the design doc we need to introduce a function to install > Spark in case the user directly downloads SparkR from CRAN. > To do that we can introduce a install_spark function that takes in the > following arguments > {code} > hadoop_version > url_to_use # defaults to apache > local_dir # defaults to a cache dir > {code} > Further more I think we can automatically run this from sparkR.init if we > find Spark home and the JARs missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org