[GitHub] spark pull request #19342: [MINOR][SparkR] minor fixes for CRAN compliance

2017-09-28 Thread bdwyer2
Github user bdwyer2 closed the pull request at:

https://github.com/apache/spark/pull/19342


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19342: [MINOR][SparkR] minor fixes for CRAN compliance

2017-09-25 Thread bdwyer2
GitHub user bdwyer2 opened a pull request:

https://github.com/apache/spark/pull/19342

[MINOR][SparkR] minor fixes for CRAN compliance

## What changes were proposed in this pull request?

Added `SystemRequirements` field to the `Description` file and added an 
`on.exit()` call to the `SparkDataFrame` `attach` method.

Please see the discussion on 
[SPARK-15799](https://issues.apache.org/jira/browse/SPARK-15799).

## How was this patch tested?

Existing tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bdwyer2/spark minor_cran_fixes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19342.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19342


commit a9e13c00dd861f29c9c4f0c68aa56fb6830e423d
Author: Brendan Dwyer 
Date:   2017-09-25T20:51:34Z

minor fixes for CRAN compliance




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18271: [MINOR][DOCS] Improve Running R Tests docs

2017-06-13 Thread bdwyer2
Github user bdwyer2 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18271#discussion_r121768881
  
--- Diff: docs/building-spark.md ---
@@ -218,9 +218,11 @@ The run-tests script also can be limited to a specific 
Python version or a speci
 
 ## Running R Tests
 
-To run the SparkR tests you will need to install the R package `testthat`
-(run `install.packages(testthat)` from R shell).  You can run just the 
SparkR tests using
-the command:
+To run the SparkR tests you will need to install the 
[knitr](https://cran.r-project.org/package=knitr), 
[rmarkdown](https://cran.r-project.org/package=rmarkdown), 
[testthat](https://cran.r-project.org/package=testthat), 
[e1071](https://cran.r-project.org/package=e1071) and 
[survival](https://cran.r-project.org/package=survival) packages first:
+
+R -e 'install.packages(c("knitr", "rmarkdown", "testthat", "e1071", 
"survival"), repos="http://cran.us.r-project.org";)'
--- End diff --

what about a more global mirror? (e.g. https://cloud.r-project.org/)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18035: [MINOR][SPARKR][ML] Joint coefficients with inter...

2017-05-22 Thread bdwyer2
Github user bdwyer2 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18035#discussion_r117864689
  
--- Diff: R/pkg/R/mllib_classification.R ---
@@ -46,15 +46,16 @@ setClass("MultilayerPerceptronClassificationModel", 
representation(jobj = "jobj"
 #' @note NaiveBayesModel since 2.0.0
 setClass("NaiveBayesModel", representation(jobj = "jobj"))
 
-#' linear SVM Model
+#' Linear SVM Model
 #'
-#' Fits an linear SVM model against a SparkDataFrame. It is a binary 
classifier, similar to svm in glmnet package
+#' Fits a linear SVM model against a SparkDataFrame, similar to svm in 
e1071 package.
+#' Currently only supports binary classification model with linear kernal.
--- End diff --

Do you mean `kernel` instead of `kernal`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17981: [SPARK-15767][ML][SparkR] Decision Tree wrapper i...

2017-05-16 Thread bdwyer2
Github user bdwyer2 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17981#discussion_r116859197
  
--- Diff: R/pkg/R/mllib_tree.R ---
@@ -499,3 +543,199 @@ setMethod("write.ml", signature(object = 
"RandomForestClassificationModel", path
   function(object, path, overwrite = FALSE) {
 write_internal(object, path, overwrite)
   })
+
+#' Decision Tree Model for Regression and Classification
+#'
+#' \code{spark.decisionTree} fits a Decision Tree Regression model or 
Classification model on
+#' a SparkDataFrame. Users can call \code{summary} to get a summary of the 
fitted Decision Tree
+#' model, \code{predict} to make predictions on new data, and 
\code{write.ml}/\code{read.ml} to
+#' save/load fitted models.
+#' For more details, see
+#' 
\href{http://spark.apache.org/docs/latest/ml-classification-regression.html#decision-tree-regression}{
+#' Decision Tree Regression} and
+#' 
\href{http://spark.apache.org/docs/latest/ml-classification-regression.html#decision-tree-classifier}{
+#' Decision Tree Classification}
+#'
+#' @param data a SparkDataFrame for training.
+#' @param formula a symbolic description of the model to be fitted. 
Currently only a few formula
+#'operators are supported, including '~', ':', '+', and 
'-'.
+#' @param type type of model, one of "regression" or "classification", to 
fit
+#' @param maxDepth Maximum depth of the tree (>= 0).
+#' @param maxBins Maximum number of bins used for discretizing continuous 
features and for choosing
+#'how to split on features at each node. More bins give 
higher granularity. Must be
+#'>= 2 and >= number of categories in any categorical 
feature.
+#' @param impurity Criterion used for information gain calculation.
+#' For regression, must be "variance". For classification, 
must be one of
+#' "entropy" and "gini", default is "gini".
+#' @param seed integer seed for random number generation.
+#' @param minInstancesPerNode Minimum number of instances each child must 
have after split.
+#' @param minInfoGain Minimum information gain for a split to be 
considered at a tree node.
+#' @param checkpointInterval Param for set checkpoint interval (>= 1) or 
disable checkpoint (-1).
+#' @param maxMemoryInMB Maximum memory in MB allocated to histogram 
aggregation.
+#' @param cacheNodeIds If FALSE, the algorithm will pass trees to 
executors to match instances with
+#' nodes. If TRUE, the algorithm will cache node IDs 
for each instance. Caching
+#' can speed up training of deeper trees. Users can 
set how often should the
+#' cache be checkpointed or disable it by setting 
checkpointInterval.
--- End diff --

This is kind of confusing
>Users can set how often should the cache be checkpointed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18003: [SparkR] Fix bad examples in DataFrame methods

2017-05-16 Thread bdwyer2
Github user bdwyer2 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18003#discussion_r116820964
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3175,7 +3176,8 @@ setMethod("with",
 #' @aliases str,SparkDataFrame-method
 #' @family SparkDataFrame functions
 #' @param object a SparkDataFrame
-#' @examples \dontrun{
+#' @examples
+#' \dontrun{
--- End diff --

There are a lot of these in `functions.R`. Maybe you could fix them too 
with this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17611: [SPARK-20298][SparkR][MINOR] fixed spelling mista...

2017-04-11 Thread bdwyer2
GitHub user bdwyer2 opened a pull request:

https://github.com/apache/spark/pull/17611

[SPARK-20298][SparkR][MINOR] fixed spelling mistake "charactor"

## What changes were proposed in this pull request?

Fixed spelling of "charactor"

## How was this patch tested?

Spelling change only


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bdwyer2/spark SPARK-20298

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17611.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17611


commit 58b2fa882feef2c2a303c86616ff4c921ba558b0
Author: Brendan Dwyer 
Date:   2017-04-11T20:28:32Z

fixed spelling mistake "charactor"




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...

2016-12-16 Thread bdwyer2
Github user bdwyer2 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16290#discussion_r92916155
  
--- Diff: R/pkg/inst/tests/testthat/test_context.R ---
@@ -72,6 +72,20 @@ test_that("repeatedly starting and stopping 
SparkSession", {
   }
 })
 
+test_that("Default warehouse dir should be set to tempdir", {
+  sparkR.session.stop()
+  sparkR.session(enableHiveSupport = FALSE)
+
+  # Create a temporary table
+  sql("CREATE TABLE people_warehouse_test")
+  # spark-warehouse should be written only tempdir() and not current 
working directory
+  res <- list.files(path = ".", pattern = ".*spark-warehouse.*",
--- End diff --

should we test to make sure that no files are created during this process 
instead of only checking for `spark-warehouse`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...

2016-12-14 Thread bdwyer2
Github user bdwyer2 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16290#discussion_r92533173
  
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -2165,6 +2165,14 @@ test_that("SQL error message is returned from JVM", {
   expect_equal(grepl("blah", retError), TRUE)
 })
 
+test_that("Default warehouse dir should be set to tempdir", {
+  # nothing should be written outside tempdir() without explicit user 
permission
+  inital_working_directory_files <- list.files()
--- End diff --

Does Jenkins start with a new workspace every time it runs a test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16247: [SPARK-18817][SparkR] set default spark-warehouse...

2016-12-14 Thread bdwyer2
Github user bdwyer2 closed the pull request at:

https://github.com/apache/spark/pull/16247


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16247: [SPARK-18817][SparkR] set default spark-warehouse path t...

2016-12-14 Thread bdwyer2
Github user bdwyer2 commented on the issue:

https://github.com/apache/spark/pull/16247
  
@shivaram @felixcheung I'll close this PR so that one of you can take over 
in order to have it done in time for the RC.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16247: [SPARK-18817][SparkR] set default spark-warehouse path t...

2016-12-14 Thread bdwyer2
Github user bdwyer2 commented on the issue:

https://github.com/apache/spark/pull/16247
  
How would we access that value on the scala side? Would 
`sparkContext.hadoopConfiguration.get("spark.sql.warehouse.default.dir")` work?

I'm currently unable to compile Spark which makes experimenting with scala 
difficult.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16247: [SPARK-18817][SparkR] set default spark-warehouse...

2016-12-13 Thread bdwyer2
Github user bdwyer2 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16247#discussion_r92289217
  
--- Diff: R/pkg/R/sparkR.R ---
@@ -362,6 +362,10 @@ sparkR.session <- function(
   enableHiveSupport = TRUE,
   ...) {
 
+  if (length(sparkConfig[["spark.sql.warehouse.dir"]]) == 0) {
+sparkConfig[["spark.sql.warehouse.dir"]] <- tempdir()
--- End diff --

@felixcheung I'm confused. By "spark property" do you mean something passed 
to `sparkR.session()` via the `sparkConfig` argument?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16247: [SPARK-18817][SparkR] set default spark-warehouse path t...

2016-12-13 Thread bdwyer2
Github user bdwyer2 commented on the issue:

https://github.com/apache/spark/pull/16247
  
Would calling the test with this be an acceptable solution?
```R
sparkR.session(enableHiveSupport = FALSE)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16247: [SPARK-18817][SparkR] set default spark-warehouse path t...

2016-12-12 Thread bdwyer2
Github user bdwyer2 commented on the issue:

https://github.com/apache/spark/pull/16247
  
I don't see how my last commit could of caused this
```
functions in sparkR.R: .
SparkSQL functions: Spark package found in SPARK_HOME: 
/home/jenkins/workspace/SparkPullRequestBuilder
Error in handleErrors(returnStatus, conn) : 
  java.lang.IllegalArgumentException: Error while instantiating 
'org.apache.spark.sql.hive.HiveSessionState':
at 
org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:981)
at 
org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110)
at 
org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109)
at 
org.apache.spark.sql.api.r.SQLUtils$$anonfun$setSparkContextSessionConf$2.apply(SQLUtils.scala:67)
at 
org.apache.spark.sql.api.r.SQLUtils$$anonfun$setSparkContextSessionConf$2.apply(SQLUtils.scala:66)
at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$WithF
Calls: test_package ... sparkR.session -> callJStatic -> invokeJava -> 
handleErrors
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16247: [SPARK-18817][SparkR] set default spark-warehouse path t...

2016-12-12 Thread bdwyer2
Github user bdwyer2 commented on the issue:

https://github.com/apache/spark/pull/16247
  
@shivaram I can create a test to verify the output of `list.files()` is the 
same before and after running `sparkR.session()`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16247: [SPARK-18817][SparkR] set default spark-warehouse...

2016-12-12 Thread bdwyer2
Github user bdwyer2 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16247#discussion_r92010995
  
--- Diff: R/pkg/R/sparkR.R ---
@@ -362,6 +362,10 @@ sparkR.session <- function(
   enableHiveSupport = TRUE,
   ...) {
 
+  if (length(sparkConfig[["spark.sql.warehouse.dir"]]) == 0) {
+sparkConfig[["spark.sql.warehouse.dir"]] <- tempdir()
--- End diff --

How about an argument named `sparkWorkingDirectory` that defaulted to 
`tempdir()`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16247: [MINOR][SparkR] set default spark-warehouse path to temp...

2016-12-10 Thread bdwyer2
Github user bdwyer2 commented on the issue:

https://github.com/apache/spark/pull/16247
  
Should I open a JIRA under SPARK-15799 myself or leave that to one of the 
admins?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16247: [MINOR][SparkR] set default spark-warehouse path to temp...

2016-12-10 Thread bdwyer2
Github user bdwyer2 commented on the issue:

https://github.com/apache/spark/pull/16247
  
@HyukjinKwon yes but we are restricted by CRAN policies

> Packages should not write in the users’ home filespace, nor anywhere 
else on the file system apart from the R session’s temporary directory


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16247: [MINOR][SparkR] set default spark-warehouse path ...

2016-12-10 Thread bdwyer2
GitHub user bdwyer2 opened a pull request:

https://github.com/apache/spark/pull/16247

[MINOR][SparkR] set default spark-warehouse path to tempdir()

## What changes were proposed in this pull request?

Set the default location of `spark.sql.warehouse.dir` to be compliant with 
the CRAN policy (https://cran.r-project.org/web/packages/policies.html) 
regarding writing files outside of the tmp directory. Previously a folder named 
`spark-warehouse` was created in the working directory when `sparkR.session()` 
was called.

See SPARK-15799 for discussion.
cc @shivaram 


## How was this patch tested?

Ran the following code and verified nothing was created in my working 
directory:
```R
sparkR.session(master = "local[*]",
   sparkConfig = list(spark.driver.memory = "2g"),
   enableHiveSupport = FALSE)
```




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bdwyer2/spark 
default_sparkr_spark_warehouse_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16247.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16247


commit c855c2ce1239650edbaf86a53372adbfd4b3278b
Author: Brendan Dwyer 
Date:   2016-12-10T21:00:45Z

set default location of spark.sql.warehouse.dir




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org