[jira] [Commented] (SPARK-17317) Add package vignette to SparkR

2016-08-30 Thread Junyang Qian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15450013#comment-15450013
 ] 

Junyang Qian commented on SPARK-17317:
--

WIP

> Add package vignette to SparkR
> --
>
> Key: SPARK-17317
> URL: https://issues.apache.org/jira/browse/SPARK-17317
> Project: Spark
>  Issue Type: Improvement
>Reporter: Junyang Qian
>
> In publishing SparkR to CRAN, it would be nice to have a vignette as a user 
> guide that
> * describes the big picture
> * introduces the use of various methods
> This is important for new users because they may not even know which method 
> to look up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17317) Add package vignette to SparkR

2016-08-30 Thread Junyang Qian (JIRA)
Junyang Qian created SPARK-17317:


 Summary: Add package vignette to SparkR
 Key: SPARK-17317
 URL: https://issues.apache.org/jira/browse/SPARK-17317
 Project: Spark
  Issue Type: Improvement
Reporter: Junyang Qian


In publishing SparkR to CRAN, it would be nice to have a vignette as a user 
guide that
* describes the big picture
* introduces the use of various methods

This is important for new users because they may not even know which method to 
look up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17315) Add Kolmogorov-Smirnov Test to SparkR

2016-08-30 Thread Junyang Qian (JIRA)
Junyang Qian created SPARK-17315:


 Summary: Add Kolmogorov-Smirnov Test to SparkR
 Key: SPARK-17315
 URL: https://issues.apache.org/jira/browse/SPARK-17315
 Project: Spark
  Issue Type: New Feature
Reporter: Junyang Qian


Kolmogorov-Smirnov Test is a popular nonparametric test of equality of 
distributions. There is implementation in MLlib. It will be nice if we can 
expose that in SparkR. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17241) SparkR spark.glm should have configurable regularization parameter

2016-08-25 Thread Junyang Qian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15437717#comment-15437717
 ] 

Junyang Qian commented on SPARK-17241:
--

I'll take a closer look and see if we can add it easily.

> SparkR spark.glm should have configurable regularization parameter
> --
>
> Key: SPARK-17241
> URL: https://issues.apache.org/jira/browse/SPARK-17241
> Project: Spark
>  Issue Type: Improvement
>Reporter: Junyang Qian
>
> Spark has configurable L2 regularization parameter for generalized linear 
> regression. It is very important to have them in SparkR so that users can run 
> ridge regression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17241) SparkR spark.glm should have configurable regularization parameter

2016-08-25 Thread Junyang Qian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15437692#comment-15437692
 ] 

Junyang Qian commented on SPARK-17241:
--

[~shivaram] It seems that spark has it for linear regression but not for glm. 

> SparkR spark.glm should have configurable regularization parameter
> --
>
> Key: SPARK-17241
> URL: https://issues.apache.org/jira/browse/SPARK-17241
> Project: Spark
>  Issue Type: Improvement
>Reporter: Junyang Qian
>
> Spark has configurable L2 regularization parameter for generalized linear 
> regression. It is very important to have them in SparkR so that users can run 
> ridge regression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17241) SparkR spark.glm should have configurable regularization parameter

2016-08-25 Thread Junyang Qian (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junyang Qian updated SPARK-17241:
-
Summary: SparkR spark.glm should have configurable regularization parameter 
 (was: SparkR spark.glm should have configurable regularization parameter(s))

> SparkR spark.glm should have configurable regularization parameter
> --
>
> Key: SPARK-17241
> URL: https://issues.apache.org/jira/browse/SPARK-17241
> Project: Spark
>  Issue Type: Improvement
>Reporter: Junyang Qian
>
> Spark has configurable L2 regularization parameter for linear regression and 
> an additional elastic-net parameter for generalized linear model. It is very 
> important to have them in SparkR so that users can run ridge regression and 
> elastic-net.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17241) SparkR spark.glm should have configurable regularization parameter

2016-08-25 Thread Junyang Qian (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junyang Qian updated SPARK-17241:
-
Description: Spark has configurable L2 regularization parameter for 
generalized linear regression. It is very important to have them in SparkR so 
that users can run ridge regression.  (was: Spark has configurable L2 
regularization parameter for linear regression and an additional elastic-net 
parameter for generalized linear model. It is very important to have them in 
SparkR so that users can run ridge regression and elastic-net.)

> SparkR spark.glm should have configurable regularization parameter
> --
>
> Key: SPARK-17241
> URL: https://issues.apache.org/jira/browse/SPARK-17241
> Project: Spark
>  Issue Type: Improvement
>Reporter: Junyang Qian
>
> Spark has configurable L2 regularization parameter for generalized linear 
> regression. It is very important to have them in SparkR so that users can run 
> ridge regression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17241) SparkR spark.glm should have configurable regularization parameter(s)

2016-08-25 Thread Junyang Qian (JIRA)
Junyang Qian created SPARK-17241:


 Summary: SparkR spark.glm should have configurable regularization 
parameter(s)
 Key: SPARK-17241
 URL: https://issues.apache.org/jira/browse/SPARK-17241
 Project: Spark
  Issue Type: Improvement
Reporter: Junyang Qian


Spark has configurable L2 regularization parameter for linear regression and an 
additional elastic-net parameter for generalized linear model. It is very 
important to have them in SparkR so that users can run ridge regression and 
elastic-net.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16508) Fix documentation warnings found by R CMD check

2016-08-07 Thread Junyang Qian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15411132#comment-15411132
 ] 

Junyang Qian commented on SPARK-16508:
--

Sounds good. I'll be working on the undocumented/duplicated argument warnings. 

> Fix documentation warnings found by R CMD check
> ---
>
> Key: SPARK-16508
> URL: https://issues.apache.org/jira/browse/SPARK-16508
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>
> A full list of warnings after the fixes in SPARK-16507 is at 
> https://gist.github.com/shivaram/62866c4ca59c5d34b8963939cf04b5eb 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16508) Fix documentation warnings found by R CMD check

2016-08-05 Thread Junyang Qian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410296#comment-15410296
 ] 

Junyang Qian commented on SPARK-16508:
--

It seems that there are still some warnings in my local check, e.g. 
undocumented arguments in as.data.frame "row.names", "optional". I was 
wondering if I missed something or if we should deal with those?

> Fix documentation warnings found by R CMD check
> ---
>
> Key: SPARK-16508
> URL: https://issues.apache.org/jira/browse/SPARK-16508
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>
> A full list of warnings after the fixes in SPARK-16507 is at 
> https://gist.github.com/shivaram/62866c4ca59c5d34b8963939cf04b5eb 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16727) SparkR unit test fails - incorrect expected output

2016-07-25 Thread Junyang Qian (JIRA)
Junyang Qian created SPARK-16727:


 Summary: SparkR unit test fails - incorrect expected output
 Key: SPARK-16727
 URL: https://issues.apache.org/jira/browse/SPARK-16727
 Project: Spark
  Issue Type: Bug
Reporter: Junyang Qian


https://github.com/apache/spark/blob/master/R/pkg/inst/tests/testthat/test_sparkSQL.R#L1827

When I run spark/R/run-tests.sh, the tests failed with the following message:

1. Failure (at test_sparkSQL.R#1827): describe() and summarize() on a DataFrame 
collect(stats)[4, "name"] not equal to "Andy"
target is NULL, current is character

2. Failure (at test_sparkSQL.R#1831): describe() and summarize() on a DataFrame 
collect(stats2)[4, "name"] not equal to "Andy"
target is NULL, current is character
Error: Test failures
Execution halted



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16579) Add a spark install function

2016-07-19 Thread Junyang Qian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384628#comment-15384628
 ] 

Junyang Qian commented on SPARK-16579:
--

If we find Spark home and the JARs missing, do we want to still install to a 
cache dir and then redirect Spark home to that dir?

> Add a spark install function
> 
>
> Key: SPARK-16579
> URL: https://issues.apache.org/jira/browse/SPARK-16579
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>Assignee: Junyang Qian
>
> As described in the design doc we need to introduce a function to install 
> Spark in case the user directly downloads SparkR from CRAN.
> To do that we can introduce a install_spark function that takes in the 
> following arguments
> {code}
> hadoop_version
> url_to_use # defaults to apache
> local_dir # defaults to a cache dir
> {code} 
> Further more I think we can automatically run this from sparkR.init if we 
> find Spark home and the JARs missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org