[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2018-04-06 Thread mariusvniekerk
Github user mariusvniekerk closed the pull request at:

https://github.com/apache/spark/pull/15666


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-10-10 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r143838425
  
--- Diff: python/pyspark/tests.py ---
@@ -435,6 +436,19 @@ def test_add_file_locally(self):
 with open(download_path) as test_file:
 self.assertEqual("Hello World!\n", test_file.readline())
 
+def test_add_jar(self):
+jvm = self.sc._jvm
+# We shouldn't be able to load anything from the package before it 
is added
+self.assertFalse(isinstance(jvm.pysparktests.DummyClass, 
JavaClass))
+# Generate and compile the test jar
+destDir = os.path.join(SPARK_HOME, "python/test_support/jar")
--- End diff --

Instead you want to use a temp directory?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-10-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r143018955
  
--- Diff: python/pyspark/context.py ---
@@ -863,6 +863,21 @@ def addPyFile(self, path):
 import importlib
 importlib.invalidate_caches()
 
+def addJar(self, path, addToCurrentClassLoader=False):
+"""
+Adds a JAR dependency for all tasks to be executed on this 
SparkContext in the future.
+The `path` passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
+filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file 
on every worker node.
+If addToCurrentClassLoader is true, add the jar to the current 
threads'  classloader.
--- End diff --

little nit: `addToCurrentClassLoader` ->`` `addToCurrentClassLoader` `` and 
`ads'  cl` -> `ads' cl`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-10-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r143019603
  
--- Diff: python/pyspark/tests.py ---
@@ -435,6 +436,19 @@ def test_add_file_locally(self):
 with open(download_path) as test_file:
 self.assertEqual("Hello World!\n", test_file.readline())
 
+def test_add_jar(self):
+jvm = self.sc._jvm
+# We shouldn't be able to load anything from the package before it 
is added
+self.assertFalse(isinstance(jvm.pysparktests.DummyClass, 
JavaClass))
+# Generate and compile the test jar
+destDir = os.path.join(SPARK_HOME, "python/test_support/jar")
--- End diff --

I'd remove this directory too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-10-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r143015274
  
--- Diff: R/pkg/tests/fulltests/test_context.R ---
@@ -167,6 +167,18 @@ test_that("spark.lapply should perform simple 
transforms", {
   sparkR.session.stop()
 })
 
+test_that("add jar should work and allow usage of the jar on the driver 
node", {
+  sparkR.sparkContext()
+
+  destDir <- file.path(tempdir(), "testjar")
+  jarName <- callJStatic("org.apache.spark.TestUtils", "createDummyJar",
+  destDir, "sparkrTests", "DummyClassForAddJarTest")
--- End diff --

little nit:

```r
jarName <- callJStatic("org.apache.spark.TestUtils", "createDummyJar",
destDir, "sparkrTests", "DummyClassForAddJarTest")
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-10-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r143014933
  
--- Diff: R/pkg/R/context.R ---
@@ -319,6 +319,32 @@ spark.addFile <- function(path, recursive = FALSE) {
   invisible(callJMethod(sc, "addFile", 
suppressWarnings(normalizePath(path)), recursive))
 }
 
+
+#' Adds a JAR dependency for all tasks to be executed on this SparkContext 
in the future.
+#'
+#' The \code{path} passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
+#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on 
every worker node.
+#' If \code{addToCurrentClassLoader} is true, add the jar to the current 
threads' classloader. In
+#' general adding to the current threads' class loader will impact all 
other application threads
+#' unless they have explicitly changed their class loader.
+#'
+#' @rdname spark.addJar
+#' @param path The path of the jar to be added
+#' @param addToCurrentClassLoader Whether to add the jar to the current 
driver classloader.
+#' @export
+#' @examples
+#'\dontrun{
+#' spark.addJar("/path/to/something.jar", TRUE)
+#'}
+#' @note spark.addJar since 2.2.0
+spark.addJar <- function(path, addToCurrentClassLoader = FALSE) {
+  normalizedPath <- suppressWarnings(normalizePath(path))
+  sc <- callJMethod(getSparkContext(), "sc")
+  invisible(callJMethod(sc, "addJar", normalizedPath, 
addToCurrentClassLoader))
+}
+
+
+
--- End diff --

little nit: I guess we just need a single newline.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-10-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r143018036
  
--- Diff: core/src/main/scala/org/apache/spark/TestUtils.scala ---
@@ -168,6 +168,27 @@ private[spark] object TestUtils {
 createCompiledClass(className, destDir, sourceFile, classpathUrls)
   }
 
+  /** Create a dummy compile jar for a given package, classname.  Jar will 
be placed in destDir */
+  def createDummyJar(destDir: String, packageName: String, className: 
String): String = {
+val srcDir = new File(destDir, packageName)
+srcDir.mkdirs()
+val excSource = new JavaSourceFromString(new File(srcDir, 
className).toURI.getPath,
+  s"""package $packageName;
+ |
+  |public class $className implements java.io.Serializable {
+ |  public static String helloWorld(String arg) { return "Hello " 
+ arg; }
+ |  public static int addStuff(int arg1, int arg2) { return arg1 + 
arg2; }
+ |}
+""".
+stripMargin)
--- End diff --

We could make this lined.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-10-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r143015101
  
--- Diff: R/pkg/tests/fulltests/test_context.R ---
@@ -167,6 +167,18 @@ test_that("spark.lapply should perform simple 
transforms", {
   sparkR.session.stop()
 })
 
+test_that("add jar should work and allow usage of the jar on the driver 
node", {
+  sparkR.sparkContext()
+
+  destDir <- file.path(tempdir(), "testjar")
--- End diff --

I'd remove this `tempdir()` btw.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-10-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r143017669
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1845,6 +1859,21 @@ class SparkContext(config: SparkConf) extends 
Logging {
   logInfo(s"Added JAR $path at $key with timestamp $timestamp")
   postEnvironmentUpdate()
 }
+
+if (addToCurrentClassLoader) {
+  val currentCL = Utils.getContextOrSparkClassLoader
+  currentCL match {
+case cl: MutableURLClassLoader =>
+  val uri = if (path.contains("\\")) {
+// For local paths with backslashes on Windows, URI throws 
an exception
+new File(path).toURI
+  } else {
+new URI(path)
--- End diff --

Could we maybe just use `Utils.resolveURI(path)`? looks the logic is 
similar.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-10-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r143017806
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1845,6 +1859,21 @@ class SparkContext(config: SparkConf) extends 
Logging {
   logInfo(s"Added JAR $path at $key with timestamp $timestamp")
   postEnvironmentUpdate()
 }
+
+if (addToCurrentClassLoader) {
+  val currentCL = Utils.getContextOrSparkClassLoader
+  currentCL match {
+case cl: MutableURLClassLoader =>
+  val uri = if (path.contains("\\")) {
+// For local paths with backslashes on Windows, URI throws 
an exception
+new File(path).toURI
+  } else {
+new URI(path)
+  }
+  cl.addURL(uri.toURL)
+case _ => logWarning(s"Unsupported cl $currentCL will not 
update jars thread cl")
--- End diff --

I'd say`class loader` instead of `cl`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-10-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r14301
  
--- Diff: R/pkg/tests/fulltests/test_context.R ---
@@ -167,6 +167,18 @@ test_that("spark.lapply should perform simple 
transforms", {
   sparkR.session.stop()
 })
 
+test_that("add jar should work and allow usage of the jar on the driver 
node", {
+  sparkR.sparkContext()
+
+  destDir <- file.path(tempdir(), "testjar")
+  jarName <- callJStatic("org.apache.spark.TestUtils", "createDummyJar",
+  destDir, "sparkrTests", "DummyClassForAddJarTest")
--- End diff --

Ah, it looks the problem is here. `createDummyJar` returns URI string, for 
example,

```r
> normalizePath("file:/C:/a/b/c")
[1] "C:\\Users\\IEUser\\workspace\\spark\\file:\\C:\\a\\b\\c"
Warning message:
In normalizePath(path.expand(path), winslash, mustWork) :
  path[1]="file:/C:/a/b/c": The filename, directory name, or volume label 
syntax
 is incorrect
```

This looks ending up with an weird path like 
`"C:\\Users\\IEUser\\workspace\\spark\\file:\\C:\\a\\b\\c"`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-06-17 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r122578282
  
--- Diff: R/pkg/R/context.R ---
@@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) {
   invisible(callJMethod(sc, "addFile", 
suppressWarnings(normalizePath(path)), recursive))
 }
 
+
+#' Adds a JAR dependency for all tasks to be executed on this SparkContext 
in the future.
--- End diff --

In that case do we want to bother having this method for R?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-06-17 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r122578275
  
--- Diff: R/pkg/R/context.R ---
@@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) {
   invisible(callJMethod(sc, "addFile", 
suppressWarnings(normalizePath(path)), recursive))
 }
 
+
+#' Adds a JAR dependency for all tasks to be executed on this SparkContext 
in the future.
+#'
+#' The \code{path} passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
+#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on 
every worker node.
+#' If \code{addToCurrentClassLoader} is true, add the jar to the current 
threads' classloader. In
+#' general adding to the current threads' class loader will impact all 
other application threads
+#' unless they have explicitly changed their class loader.
+#'
+#' @rdname spark.addJar
+#' @param path The path of the jar to be added
+#' @param addToCurrentClassLoader Whether to add the jar to the current 
driver classloader.
+#' Default is FALSE.
+#' @export
+#' @examples
+#'\dontrun{
+#' spark.addJar("/path/to/something.jar", TRUE)
+#'}
+#' @note spark.addJar since 2.2.0
+spark.addJar <- function(path, addToCurrentClassLoader = FALSE) {
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-03-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r106786961
  
--- Diff: R/pkg/R/context.R ---
@@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) {
   invisible(callJMethod(sc, "addFile", 
suppressWarnings(normalizePath(path)), recursive))
 }
 
+
+#' Adds a JAR dependency for all tasks to be executed on this SparkContext 
in the future.
+#'
+#' The \code{path} passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
+#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on 
every worker node.
+#' If \code{addToCurrentClassLoader} is true, add the jar to the current 
threads' classloader. In
+#' general adding to the current threads' class loader will impact all 
other application threads
+#' unless they have explicitly changed their class loader.
+#'
+#' @rdname spark.addJar
+#' @param path The path of the jar to be added
+#' @param addToCurrentClassLoader Whether to add the jar to the current 
driver classloader.
+#' Default is FALSE.
+#' @export
+#' @examples
+#'\dontrun{
+#' spark.addJar("/path/to/something.jar", TRUE)
+#'}
+#' @note spark.addJar since 2.2.0
+spark.addJar <- function(path, addToCurrentClassLoader = FALSE) {
+  sc <- getSparkContext()
+  normalizedPath <- suppressWarnings(normalizePath(path))
+  scala_sc <- callJMethod(sc, "sc")
+  invisible(callJMethod(scala_sc, "addJar", normalizedPath, 
addToCurrentClassLoader))
--- End diff --

it doesn't handle `file:\C:` prefix - where is that coming from?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-03-18 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r106781948
  
--- Diff: R/pkg/R/context.R ---
@@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) {
   invisible(callJMethod(sc, "addFile", 
suppressWarnings(normalizePath(path)), recursive))
 }
 
+
+#' Adds a JAR dependency for all tasks to be executed on this SparkContext 
in the future.
+#'
+#' The \code{path} passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
+#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on 
every worker node.
+#' If \code{addToCurrentClassLoader} is true, add the jar to the current 
threads' classloader. In
+#' general adding to the current threads' class loader will impact all 
other application threads
+#' unless they have explicitly changed their class loader.
+#'
+#' @rdname spark.addJar
+#' @param path The path of the jar to be added
+#' @param addToCurrentClassLoader Whether to add the jar to the current 
driver classloader.
+#' Default is FALSE.
+#' @export
+#' @examples
+#'\dontrun{
+#' spark.addJar("/path/to/something.jar", TRUE)
+#'}
+#' @note spark.addJar since 2.2.0
+spark.addJar <- function(path, addToCurrentClassLoader = FALSE) {
+  sc <- getSparkContext()
+  normalizedPath <- suppressWarnings(normalizePath(path))
+  scala_sc <- callJMethod(sc, "sc")
+  invisible(callJMethod(scala_sc, "addJar", normalizedPath, 
addToCurrentClassLoader))
--- End diff --

why is normalizepath doing that to the url?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-03-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r105839398
  
--- Diff: R/pkg/R/context.R ---
@@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) {
   invisible(callJMethod(sc, "addFile", 
suppressWarnings(normalizePath(path)), recursive))
 }
 
+
+#' Adds a JAR dependency for all tasks to be executed on this SparkContext 
in the future.
+#'
+#' The \code{path} passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
+#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on 
every worker node.
+#' If \code{addToCurrentClassLoader} is true, add the jar to the current 
threads' classloader. In
+#' general adding to the current threads' class loader will impact all 
other application threads
+#' unless they have explicitly changed their class loader.
+#'
+#' @rdname spark.addJar
+#' @param path The path of the jar to be added
+#' @param addToCurrentClassLoader Whether to add the jar to the current 
driver classloader.
+#' Default is FALSE.
+#' @export
+#' @examples
+#'\dontrun{
+#' spark.addJar("/path/to/something.jar", TRUE)
+#'}
+#' @note spark.addJar since 2.2.0
+spark.addJar <- function(path, addToCurrentClassLoader = FALSE) {
+  sc <- getSparkContext()
+  normalizedPath <- suppressWarnings(normalizePath(path))
+  scala_sc <- callJMethod(sc, "sc")
+  invisible(callJMethod(scala_sc, "addJar", normalizedPath, 
addToCurrentClassLoader))
--- End diff --

(`C:\projects\spark\R\lib\SparkR\tests\testthat\` was indeed printed 
together in the logs..).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-03-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r105723828
  
--- Diff: R/pkg/R/context.R ---
@@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) {
   invisible(callJMethod(sc, "addFile", 
suppressWarnings(normalizePath(path)), recursive))
 }
 
+
+#' Adds a JAR dependency for all tasks to be executed on this SparkContext 
in the future.
+#'
+#' The \code{path} passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
+#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on 
every worker node.
+#' If \code{addToCurrentClassLoader} is true, add the jar to the current 
threads' classloader. In
+#' general adding to the current threads' class loader will impact all 
other application threads
+#' unless they have explicitly changed their class loader.
+#'
+#' @rdname spark.addJar
+#' @param path The path of the jar to be added
+#' @param addToCurrentClassLoader Whether to add the jar to the current 
driver classloader.
+#' Default is FALSE.
+#' @export
+#' @examples
+#'\dontrun{
+#' spark.addJar("/path/to/something.jar", TRUE)
+#'}
+#' @note spark.addJar since 2.2.0
+spark.addJar <- function(path, addToCurrentClassLoader = FALSE) {
+  sc <- getSparkContext()
+  normalizedPath <- suppressWarnings(normalizePath(path))
+  scala_sc <- callJMethod(sc, "sc")
+  invisible(callJMethod(scala_sc, "addJar", normalizedPath, 
addToCurrentClassLoader))
--- End diff --

Right after this call, the path  `normalizedPath` was 
`C:\projects\spark\R\lib\SparkR\tests\testthat\file:\C:\Users\appveyor\AppData\Local\Temp\1\Rtmpg5tOYo\testjar\sparkrTests-DummyClassForAddJarTest-1489422615481.jar`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-03-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r105723482
  
--- Diff: R/pkg/inst/tests/testthat/test_context.R ---
@@ -167,6 +167,18 @@ test_that("spark.lapply should perform simple 
transforms", {
   sparkR.session.stop()
 })
 
+test_that("add jar should work and allow usage of the jar on the driver 
node", {
+  sparkR.sparkContext()
+
+  destDir <- paste0(tempdir(), "/", "testjar")
+  jarName <- callJStatic("org.apache.spark.TestUtils", "createDummyJar",
--- End diff --

I checked what it's got right after `createDummyJar` call. It was 
`createDummyJar:C:\Users\appveyor\AppData\Local\Temp\1\Rtmpg5tOYo/testjar`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-03-08 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r104908469
  
--- Diff: R/pkg/R/context.R ---
@@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) {
   invisible(callJMethod(sc, "addFile", 
suppressWarnings(normalizePath(path)), recursive))
 }
 
+
+#' Adds a JAR dependency for all tasks to be executed on this SparkContext 
in the future.
+#'
+#' The \code{path} passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
+#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on 
every worker node.
+#' If \code{addToCurrentClassLoader} is true, add the jar to the current 
threads' classloader. In
+#' general adding to the current threads' class loader will impact all 
other application threads
+#' unless they have explicitly changed their class loader.
+#'
+#' @rdname spark.addJar
+#' @param path The path of the jar to be added
+#' @param addToCurrentClassLoader Whether to add the jar to the current 
driver classloader.
+#' Default is FALSE.
+#' @export
+#' @examples
+#'\dontrun{
+#' spark.addJar("/path/to/something.jar", TRUE)
+#'}
+#' @note spark.addJar since 2.2.0
+spark.addJar <- function(path, addToCurrentClassLoader = FALSE) {
--- End diff --

Mostly for backwards compatibility. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-03-06 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r104592654
  
--- Diff: R/pkg/R/context.R ---
@@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) {
   invisible(callJMethod(sc, "addFile", 
suppressWarnings(normalizePath(path)), recursive))
 }
 
+
+#' Adds a JAR dependency for all tasks to be executed on this SparkContext 
in the future.
+#'
+#' The \code{path} passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
+#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on 
every worker node.
+#' If \code{addToCurrentClassLoader} is true, add the jar to the current 
threads' classloader. In
+#' general adding to the current threads' class loader will impact all 
other application threads
+#' unless they have explicitly changed their class loader.
+#'
+#' @rdname spark.addJar
+#' @param path The path of the jar to be added
+#' @param addToCurrentClassLoader Whether to add the jar to the current 
driver classloader.
+#' Default is FALSE.
+#' @export
+#' @examples
+#'\dontrun{
+#' spark.addJar("/path/to/something.jar", TRUE)
+#'}
+#' @note spark.addJar since 2.2.0
+spark.addJar <- function(path, addToCurrentClassLoader = FALSE) {
+  sc <- getSparkContext()
+  normalizedPath <- suppressWarnings(normalizePath(path))
+  scala_sc <- callJMethod(sc, "sc")
--- End diff --

I'd just combine 342 & 344 and call it sc
```
sc <- callJMethod(getSparkContext(), "sc")
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-03-06 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r104593039
  
--- Diff: R/pkg/R/context.R ---
@@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) {
   invisible(callJMethod(sc, "addFile", 
suppressWarnings(normalizePath(path)), recursive))
 }
 
+
+#' Adds a JAR dependency for all tasks to be executed on this SparkContext 
in the future.
+#'
+#' The \code{path} passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
+#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on 
every worker node.
+#' If \code{addToCurrentClassLoader} is true, add the jar to the current 
threads' classloader. In
+#' general adding to the current threads' class loader will impact all 
other application threads
+#' unless they have explicitly changed their class loader.
+#'
+#' @rdname spark.addJar
+#' @param path The path of the jar to be added
+#' @param addToCurrentClassLoader Whether to add the jar to the current 
driver classloader.
+#' Default is FALSE.
+#' @export
+#' @examples
+#'\dontrun{
+#' spark.addJar("/path/to/something.jar", TRUE)
+#'}
+#' @note spark.addJar since 2.2.0
+spark.addJar <- function(path, addToCurrentClassLoader = FALSE) {
--- End diff --

`spark.addJar` needs to be added to the NAMESPACE file, otherwise it won't 
be accessible. Tests are running within the same namespace so it's always able 
to access private methods.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-03-06 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r104592294
  
--- Diff: R/pkg/R/context.R ---
@@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) {
   invisible(callJMethod(sc, "addFile", 
suppressWarnings(normalizePath(path)), recursive))
 }
 
+
+#' Adds a JAR dependency for all tasks to be executed on this SparkContext 
in the future.
--- End diff --

we don't really expose SparkContext in R actually


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-03-06 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r104592782
  
--- Diff: R/pkg/R/context.R ---
@@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) {
   invisible(callJMethod(sc, "addFile", 
suppressWarnings(normalizePath(path)), recursive))
 }
 
+
+#' Adds a JAR dependency for all tasks to be executed on this SparkContext 
in the future.
+#'
+#' The \code{path} passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
+#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on 
every worker node.
+#' If \code{addToCurrentClassLoader} is true, add the jar to the current 
threads' classloader. In
+#' general adding to the current threads' class loader will impact all 
other application threads
+#' unless they have explicitly changed their class loader.
+#'
+#' @rdname spark.addJar
+#' @param path The path of the jar to be added
+#' @param addToCurrentClassLoader Whether to add the jar to the current 
driver classloader.
+#' Default is FALSE.
--- End diff --

remove default value since it's in the signature already below


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-03-06 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r104592268
  
--- Diff: R/pkg/R/context.R ---
@@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) {
   invisible(callJMethod(sc, "addFile", 
suppressWarnings(normalizePath(path)), recursive))
 }
 
+
+#' Adds a JAR dependency for all tasks to be executed on this SparkContext 
in the future.
+#'
+#' The \code{path} passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
+#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on 
every worker node.
+#' If \code{addToCurrentClassLoader} is true, add the jar to the current 
threads' classloader. In
+#' general adding to the current threads' class loader will impact all 
other application threads
+#' unless they have explicitly changed their class loader.
--- End diff --

should
```
add the jar to the current threads' classloader. In
 +#' general adding to the current threads' class loader will impact all 
other application threads
 +#' unless they have explicitly changed their class loader.
```
be reworded for R or Python? It should be the JVM. `threads` or 
`classloader` aren't familiar concepts to R users.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-03-06 Thread brkyvz
Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r104527246
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1800,17 +1800,39 @@ class SparkContext(config: SparkConf) extends 
Logging {
* an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker 
node.
*/
   def addJar(path: String) {
+addJar(path, false)
+  }
+
+  /**
+   * Adds a JAR dependency for all tasks to be executed on this 
`SparkContext` in the future.
+   * @param path can be either a local file, a file in HDFS (or other 
Hadoop-supported filesystems),
+   * an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker 
node.
--- End diff --

nit: alignment
```scala
@param path can be either...
blah more blah
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-03-06 Thread brkyvz
Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r104527000
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1800,17 +1800,39 @@ class SparkContext(config: SparkConf) extends 
Logging {
* an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker 
node.
*/
   def addJar(path: String) {
+addJar(path, false)
+  }
+
+  /**
+   * Adds a JAR dependency for all tasks to be executed on this 
`SparkContext` in the future.
+   * @param path can be either a local file, a file in HDFS (or other 
Hadoop-supported filesystems),
+   * an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker 
node.
+   * @param addToCurrentClassLoader if true will add the jar to the 
current threads' classloader.
+   * In general adding to the current threads' class loader will impact 
all other application
+   * threads unless they have explicitly changed their class loader.
+   */
+  @DeveloperApi
+  def addJar(path: String, addToCurrentClassLoader: Boolean) {
 if (path == null) {
   logWarning("null specified as parameter to addJar")
 } else {
   var key = ""
-  if (path.contains("\\")) {
-// For local paths with backslashes on Windows, URI throws an 
exception
-key = env.rpcEnv.fileServer.addJar(new File(path))
+
+  val uri = if (path.contains("\\")) {
+// If we have backslashes here this is a windows path, and URI 
will throws an exception,
+// so we use this constrcutor instead
--- End diff --

`constructor`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-03-06 Thread brkyvz
Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r104523062
  
--- Diff: R/pkg/inst/tests/testthat/test_context.R ---
@@ -167,6 +167,18 @@ test_that("spark.lapply should perform simple 
transforms", {
   sparkR.session.stop()
 })
 
+test_that("add jar should work and allow usage of the jar on the driver 
node", {
+  sparkR.sparkContext()
+
+  destDir <- paste0(tempdir(), "/", "testjar")
--- End diff --

you may want to use 
https://stat.ethz.ch/R-manual/R-devel/library/base/html/file.path.html instead 
for windows compatibility


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-03-06 Thread brkyvz
Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r104522704
  
--- Diff: R/pkg/R/context.R ---
@@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) {
   invisible(callJMethod(sc, "addFile", 
suppressWarnings(normalizePath(path)), recursive))
 }
 
+
+#' Adds a JAR dependency for all tasks to be executed on this SparkContext 
in the future.
+#'
+#' The \code{path} passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
+#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on 
every worker node.
+#' If \code{addToCurrentClassLoader} is true, add the jar to the current 
threads' classloader. In
+#' general adding to the current threads' class loader will impact all 
other application threads
+#' unless they have explicitly changed their class loader.
+#'
+#' @rdname spark.addJar
+#' @param path The path of the jar to be added
+#' @param addToCurrentClassLoader Whether to add the jar to the current 
driver classloader.
+#' Default is FALSE.
+#' @export
+#' @examples
+#'\dontrun{
+#' spark.addJar("/path/to/something.jar", TRUE)
+#'}
+#' @note spark.addJar since 2.2.0
+spark.addJar <- function(path, addToCurrentClassLoader = FALSE) {
--- End diff --

why don't we want to add it to the driver classpath by default?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-03-06 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r104486665
  
--- Diff: R/pkg/inst/tests/testthat/test_context.R ---
@@ -167,6 +167,18 @@ test_that("spark.lapply should perform simple 
transforms", {
   sparkR.session.stop()
 })
 
+test_that("add jar should work and allow usage of the jar on the driver 
node", {
+  sparkR.sparkContext()
+
+  destDir <- paste0(tempdir(), "/", "testjar")
+  jarName <- callJStatic("org.apache.spark.TestUtils", "createDummyJar",
+  destDir, "sparkrTests", "DummyClassForAddJarTest")
+
+  spark.addJar(jarName, addToCurrentClassLoader = TRUE)
+  testClass <- newJObject("sparkrTests.DummyClassForAddJarTest")
--- End diff --

yeah i suspect that the windows path didn't make it properly into the 
classloader


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-03-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r104433603
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1842,6 +1864,14 @@ class SparkContext(config: SparkConf) extends 
Logging {
   logInfo(s"Added JAR $path at $key with timestamp $timestamp")
   postEnvironmentUpdate()
 }
+if (addToCurrentClassLoader) {
+  val currentCL = Utils.getContextOrSparkClassLoader
+  currentCL match {
+case cl: MutableURLClassLoader =>
+  cl.addURL(uri.toURL)
--- End diff --

(We could make this inlined too..)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-03-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r104432976
  
--- Diff: core/src/main/scala/org/apache/spark/TestUtils.scala ---
@@ -168,6 +168,27 @@ private[spark] object TestUtils {
 createCompiledClass(className, destDir, sourceFile, classpathUrls)
   }
 
+  /** Create a dummy compile jar for a given package, classname.  Jar will 
be placed in destDir */
+  def createDummyJar(destDir: String, packageName: String, className: 
String): String = {
+val srcDir = new File(destDir, packageName)
+srcDir.mkdirs()
+val excSource = new JavaSourceFromString(new File(srcDir, 
className).toURI.getPath,
+  s"""package $packageName;
+ |
+  |public class $className implements java.io.Serializable {
+ |  public static String helloWorld(String arg) { return "Hello " 
+ arg; }
+ |  public static int addStuff(int arg1, int arg2) { return arg1 + 
arg2; }
+ |}
+""".
+stripMargin)
--- End diff --

(We could make this lined.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-03-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r104433213
  
--- Diff: core/src/test/scala/org/apache/spark/SparkContextSuite.scala ---
@@ -308,6 +308,36 @@ class SparkContextSuite extends SparkFunSuite with 
LocalSparkContext with Eventu
 sc.listJars().head should include (tmpJar.getName)
   }
 
+  for (
+schedulingMode <- Seq("local_mode", "non_local_mode")
+  ) {
--- End diff --

Just personal taste.. maybe 

```scala
Seq("local_mode", "non_local_mode").foreach { schedulingMode =>
  ...
}
```
?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-03-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r104432962
  
--- Diff: core/src/main/scala/org/apache/spark/TestUtils.scala ---
@@ -168,6 +168,27 @@ private[spark] object TestUtils {
 createCompiledClass(className, destDir, sourceFile, classpathUrls)
   }
 
+  /** Create a dummy compile jar for a given package, classname.  Jar will 
be placed in destDir */
+  def createDummyJar(destDir: String, packageName: String, className: 
String): String = {
+val srcDir = new File(destDir, packageName)
+srcDir.mkdirs()
+val excSource = new JavaSourceFromString(new File(srcDir, 
className).toURI.getPath,
+  s"""package $packageName;
+ |
+  |public class $className implements java.io.Serializable {
--- End diff --

(This indentation seems inconsistent BTW)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-03-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r104431249
  
--- Diff: R/pkg/inst/tests/testthat/test_context.R ---
@@ -167,6 +167,18 @@ test_that("spark.lapply should perform simple 
transforms", {
   sparkR.session.stop()
 })
 
+test_that("add jar should work and allow usage of the jar on the driver 
node", {
+  sparkR.sparkContext()
+
+  destDir <- paste0(tempdir(), "/", "testjar")
+  jarName <- callJStatic("org.apache.spark.TestUtils", "createDummyJar",
+  destDir, "sparkrTests", "DummyClassForAddJarTest")
+
+  spark.addJar(jarName, addToCurrentClassLoader = TRUE)
+  testClass <- newJObject("sparkrTests.DummyClassForAddJarTest")
--- End diff --

Maybe, helpful log:

```
ERROR RBackendHandler:  on sparkrTests.DummyClassForAddJarTest failed
java.lang.ClassNotFoundException: sparkrTests.DummyClassForAddJarTest
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:230)
at 
org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:143)
at 
org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:108)
at 
org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:40)
at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)
at 
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)
at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)
at 
io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293)
at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:267)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)
at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:643)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
at 
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
at java.lang.Thread.run(Thread.java:745)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at 

[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-03-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r104429730
  
--- Diff: R/pkg/inst/tests/testthat/test_context.R ---
@@ -167,6 +167,18 @@ test_that("spark.lapply should perform simple 
transforms", {
   sparkR.session.stop()
 })
 
+test_that("add jar should work and allow usage of the jar on the driver 
node", {
+  sparkR.sparkContext()
+
+  destDir <- paste0(tempdir(), "/", "testjar")
+  jarName <- callJStatic("org.apache.spark.TestUtils", "createDummyJar",
+  destDir, "sparkrTests", "DummyClassForAddJarTest")
+
+  spark.addJar(jarName, addToCurrentClassLoader = TRUE)
+  testClass <- newJObject("sparkrTests.DummyClassForAddJarTest")
--- End diff --

Hm.. it seems this line this time.

```
1. Error: add jar should work and allow usage of the jar on the driver node 
(@test_context.R#178) 
 
1: newJObject("sparkrTests.DummyClassForAddJarTest") at 
C:/projects/spark/R/lib/SparkR/tests/testthat/test_context.R:178
2: invokeJava(isStatic = TRUE, className, methodName = "", ...)
3: handleErrors(returnStatus, conn)
4: stop(readString(conn))
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-03-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r104408350
  
--- Diff: core/src/main/scala/org/apache/spark/TestUtils.scala ---
@@ -168,6 +168,27 @@ private[spark] object TestUtils {
 createCompiledClass(className, destDir, sourceFile, classpathUrls)
   }
 
+  /** Create a dummy compile jar for a given package, classname.  Jar will 
be placed in destDir */
+  def createDummyJar(destDir: String, packageName: String, className: 
String): String = {
+val srcDir = new File(destDir, packageName)
+srcDir.mkdirs()
+val excSource = new JavaSourceFromString(new File(srcDir, 
className).getAbsolutePath,
--- End diff --

It's fine to push commit and use AppVeyor IMHO. There is a guide for this - 
https://github.com/apache/spark/blob/master/R/WINDOWS.md if you are willing to 
manually test but it is kind of a grunt work. In my case, I use my personal 
AppVeyor account.

Otherwise, I can send a PR for your branch if you are happy with waiting 
till this weekend :).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-03-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r104360211
  
--- Diff: R/pkg/inst/tests/testthat/test_context.R ---
@@ -167,6 +167,18 @@ test_that("spark.lapply should perform simple 
transforms", {
   sparkR.session.stop()
 })
 
+test_that("add jar should work and allow usage of the jar on the driver 
node", {
--- End diff --

This seems failed as below:

```
1. Error: add jar should work and allow usage of the jar on the driver node 
(@test_context.R#174) 
java.lang.IllegalArgumentException: Illegal character in path at index 12: 
string:///C:\Users\appveyor\AppData\Local\Temp\1\RtmpCqEUJL\testjar\sparkrTests\DummyClassForAddJarTest.java
at java.net.URI.create(URI.java:852)
at 
org.apache.spark.TestUtils$.org$apache$spark$TestUtils$$createURI(TestUtils.scala:119)
at 
org.apache.spark.TestUtils$JavaSourceFromString.(TestUtils.scala:123)
```

(Sorry, I left the comments in a wrong line)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-03-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r104358500
  
--- Diff: core/src/main/scala/org/apache/spark/TestUtils.scala ---
@@ -168,6 +168,27 @@ private[spark] object TestUtils {
 createCompiledClass(className, destDir, sourceFile, classpathUrls)
   }
 
+  /** Create a dummy compile jar for a given package, classname.  Jar will 
be placed in destDir */
+  def createDummyJar(destDir: String, packageName: String, className: 
String): String = {
+val srcDir = new File(destDir, packageName)
+srcDir.mkdirs()
+val excSource = new JavaSourceFromString(new File(srcDir, 
className).getAbsolutePath,
--- End diff --

Can we use URI form here? it seems this one is problematic.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-03-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r104357841
  
--- Diff: R/pkg/R/context.R ---
@@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) {
   invisible(callJMethod(sc, "addFile", 
suppressWarnings(normalizePath(path)), recursive))
 }
 
+
+#' Adds a JAR dependency for all tasks to be executed on this SparkContext 
in the future.
+#'
+#' The \code{path} passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
+#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on 
every worker node.
+#' If \code{addToCurrentClassLoader} is true, add the jar to the current 
threads' classloader. In
+#' general adding to the current threads' class loader will impact all 
other application threads
+#' unless they have explicitly changed their class loader.
+#'
+#' @rdname spark.addJar
+#' @param path The path of the jar to be added
+#' @param addToCurrentClassLoader Whether to add the jar to the current 
driver classloader.
+#' Default is FALSE.
+#' @export
+#' @examples
+#'\dontrun{
+#' spark.addJar("/path/to/something.jar", TRUE)
+#'}
+#' @note spark.addJar since 2.2.0
+spark.addJar <- function(path, addToCurrentClassLoader = FALSE) {
+  sc <- getSparkContext()
+  normalizedPath <- suppressWarnings(normalizePath(path))
+  scala_sc <- callJMethod(sc, "sc")
+  invisible(callJMethod(scala_sc, "addJar", normalizedPath, 
addToCurrentClassLoader))
--- End diff --

This seems failed as below:

```
1. Error: add jar should work and allow usage of the jar on the driver node 
(@test_context.R#174) 
java.lang.IllegalArgumentException: Illegal character in path at index 12: 
string:///C:\Users\appveyor\AppData\Local\Temp\1\RtmpCqEUJL\testjar\sparkrTests\DummyClassForAddJarTest.java
at java.net.URI.create(URI.java:852)
at 
org.apache.spark.TestUtils$.org$apache$spark$TestUtils$$createURI(TestUtils.scala:119)
at 
org.apache.spark.TestUtils$JavaSourceFromString.(TestUtils.scala:123)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-03-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r104358056
  
--- Diff: core/src/main/scala/org/apache/spark/TestUtils.scala ---
@@ -168,6 +168,27 @@ private[spark] object TestUtils {
 createCompiledClass(className, destDir, sourceFile, classpathUrls)
   }
 
+  /** Create a dummy compile jar for a given package, classname.  Jar will 
be placed in destDir */
+  def createDummyJar(destDir: String, packageName: String, className: 
String): String = {
+val srcDir = new File(destDir, packageName)
+srcDir.mkdirs()
+val excSource = new JavaSourceFromString(new File(srcDir, 
className).getAbsolutePath,
+  s"""package $packageName;
+ |
+  |public class $className implements java.io.Serializable {
+ |  public static String helloWorld(String arg) { return "Hello " 
+ arg; }
+ |  public static int addStuff(int arg1, int arg2) { return arg1 + 
arg2; }
+ |}
+""".
+stripMargin)
+val excFile = createCompiledClass(className, srcDir, excSource, 
Seq.empty)
--- End diff --

Can we use URI form here? it seems this one is problematic.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-02-08 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r100177099
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1802,19 +1802,34 @@ class SparkContext(config: SparkConf) extends 
Logging {
* Adds a JAR dependency for all tasks to be executed on this 
`SparkContext` in the future.
* @param path can be either a local file, a file in HDFS (or other 
Hadoop-supported filesystems),
* an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker 
node.
+   * If addToCurrentClassLoader is true, attempt to add the new class to 
the current threads' class
--- End diff --

Add to doc that already loaded urls will have no effect if a url is already 
present.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-02-08 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r100176735
  
--- Diff: python/pyspark/context.py ---
@@ -836,6 +836,17 @@ def addPyFile(self, path):
 import importlib
 importlib.invalidate_caches()
 
+def addJar(self, path, addToCurrentClassLoader=False):
+"""
+Adds a JAR dependency for all tasks to be executed on this 
SparkContext in the future.
+The `path` passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
+filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file 
on every worker node.
+If addToCurrentClassLoader is true, attempt to add the new class 
to the current threads'
+class loader. In general adding to the current threads' class 
loader will impact all other
+application threads unless they have explicitly changed their 
class loader.
--- End diff --

Add a `:param:` annotation here as well for path & addToCurrentClassLoader


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-02-08 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r100176188
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1802,19 +1802,34 @@ class SparkContext(config: SparkConf) extends 
Logging {
* Adds a JAR dependency for all tasks to be executed on this 
`SparkContext` in the future.
* @param path can be either a local file, a file in HDFS (or other 
Hadoop-supported filesystems),
* an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker 
node.
+   * If addToCurrentClassLoader is true, attempt to add the new class to 
the current threads' class
+   * loader. In general adding to the current threads' class loader will 
impact all other
+   * application threads unless they have explicitly changed their class 
loader.
*/
   def addJar(path: String) {
+addJar(path, false)
+  }
+
+  def addJar(path: String, addToCurrentClassLoader: Boolean) {
 if (path == null) {
   logWarning("null specified as parameter to addJar")
 } else {
   var key = ""
-  if (path.contains("\\")) {
+
+  val uri = if (path.contains("\\")) {
 // For local paths with backslashes on Windows, URI throws an 
exception
-key = env.rpcEnv.fileServer.addJar(new File(path))
+new File(path).toURI
   } else {
 val uri = new URI(path)
 // SPARK-17650: Make sure this is a valid URL before adding it to 
the list of dependencies
 Utils.validateURL(uri)
+uri
+  }
+
+  if (path.contains("\\")) {
+// For local paths with backslashes on Windows, URI throws an 
exception
+key = env.rpcEnv.fileServer.addJar(new File(uri))
--- End diff --

If we have backslashes we are in a local path on windows.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-02-01 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r98948530
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1700,19 +1700,34 @@ class SparkContext(config: SparkConf) extends 
Logging {
* Adds a JAR dependency for all tasks to be executed on this 
SparkContext in the future.
* The `path` passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
* filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on 
every worker node.
+   * If addToCurrentClassLoader is true, attempt to add the new class to 
the current threads' class
+   * loader. In general adding to the current threads' class loader will 
impact all other
+   * application threads unless they have explicitly changed their class 
loader.
*/
   def addJar(path: String) {
+addJar(path, false)
+  }
+
+  def addJar(path: String, addToCurrentClassLoader: Boolean) {
--- End diff --

What about if we made the the Scala API as a Developer API since we might 
end up wanting to change the Scala side in the future?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-02-01 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r98949834
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1700,19 +1700,34 @@ class SparkContext(config: SparkConf) extends 
Logging {
* Adds a JAR dependency for all tasks to be executed on this 
SparkContext in the future.
* The `path` passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
* filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on 
every worker node.
+   * If addToCurrentClassLoader is true, attempt to add the new class to 
the current threads' class
+   * loader. In general adding to the current threads' class loader will 
impact all other
+   * application threads unless they have explicitly changed their class 
loader.
*/
   def addJar(path: String) {
+addJar(path, false)
+  }
+
+  def addJar(path: String, addToCurrentClassLoader: Boolean) {
 if (path == null) {
   logWarning("null specified as parameter to addJar")
 } else {
   var key = ""
-  if (path.contains("\\")) {
+
+  val uri = if (path.contains("\\")) {
 // For local paths with backslashes on Windows, URI throws an 
exception
-key = env.rpcEnv.fileServer.addJar(new File(path))
--- End diff --

So if we construct file and then go with toURI its safe? Have we tested 
this on Windows?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-02-01 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r98950178
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1802,19 +1802,34 @@ class SparkContext(config: SparkConf) extends 
Logging {
* Adds a JAR dependency for all tasks to be executed on this 
`SparkContext` in the future.
* @param path can be either a local file, a file in HDFS (or other 
Hadoop-supported filesystems),
* an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker 
node.
+   * If addToCurrentClassLoader is true, attempt to add the new class to 
the current threads' class
+   * loader. In general adding to the current threads' class loader will 
impact all other
+   * application threads unless they have explicitly changed their class 
loader.
*/
   def addJar(path: String) {
+addJar(path, false)
+  }
+
+  def addJar(path: String, addToCurrentClassLoader: Boolean) {
 if (path == null) {
   logWarning("null specified as parameter to addJar")
 } else {
   var key = ""
-  if (path.contains("\\")) {
+
+  val uri = if (path.contains("\\")) {
 // For local paths with backslashes on Windows, URI throws an 
exception
-key = env.rpcEnv.fileServer.addJar(new File(path))
+new File(path).toURI
   } else {
 val uri = new URI(path)
 // SPARK-17650: Make sure this is a valid URL before adding it to 
the list of dependencies
 Utils.validateURL(uri)
+uri
+  }
+
+  if (path.contains("\\")) {
+// For local paths with backslashes on Windows, URI throws an 
exception
+key = env.rpcEnv.fileServer.addJar(new File(uri))
--- End diff --

So I'm a bit confused, we already supposedly have a safe uri object for 
windows above, does it throw an exception on getPath? Or is it that \\ is 
always local and getScheme doesn't return that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2017-02-01 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r98948623
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1802,19 +1802,34 @@ class SparkContext(config: SparkConf) extends 
Logging {
* Adds a JAR dependency for all tasks to be executed on this 
`SparkContext` in the future.
* @param path can be either a local file, a file in HDFS (or other 
Hadoop-supported filesystems),
* an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker 
node.
+   * If addToCurrentClassLoader is true, attempt to add the new class to 
the current threads' class
--- End diff --

attempt to is interesting wording here - when would we expect this to fail 
and should we explicitly call that out?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2016-12-22 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r93730314
  
--- Diff: core/src/main/scala/org/apache/spark/TestUtils.scala ---
@@ -164,6 +164,27 @@ private[spark] object TestUtils {
 createCompiledClass(className, destDir, sourceFile, classpathUrls)
   }
 
+  /** Create a dummy compile jar for a given package, classname.  Jar will 
be placed in destDir */
+  def createDummyJar(destDir: String, packageName: String, className: 
String): String = {
--- End diff --

The R tests do indeed verify that they can call the internal functions.

I can revert that part of the changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2016-12-22 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r93729928
  
--- Diff: core/src/main/scala/org/apache/spark/TestUtils.scala ---
@@ -164,6 +164,27 @@ private[spark] object TestUtils {
 createCompiledClass(className, destDir, sourceFile, classpathUrls)
   }
 
+  /** Create a dummy compile jar for a given package, classname.  Jar will 
be placed in destDir */
+  def createDummyJar(destDir: String, packageName: String, className: 
String): String = {
--- End diff --

Yeah when i wrote this that didn't exist yet.  Changing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2016-12-20 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r93338899
  
--- Diff: core/src/main/scala/org/apache/spark/TestUtils.scala ---
@@ -164,6 +164,27 @@ private[spark] object TestUtils {
 createCompiledClass(className, destDir, sourceFile, classpathUrls)
   }
 
+  /** Create a dummy compile jar for a given package, classname.  Jar will 
be placed in destDir */
+  def createDummyJar(destDir: String, packageName: String, className: 
String): String = {
--- End diff --

I know this is moved code, but could we just use `createJarWithClasses` or 
do the R tests require these inner functions?

If we do keep this though I'd give it a clearer name and description 
because without reading the code, `createDummyJar` and `createJarWithClasses`  
don't seem all that different.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2016-12-10 Thread mariusvniekerk
Github user mariusvniekerk closed the pull request at:

https://github.com/apache/spark/pull/15666


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2016-12-10 Thread mariusvniekerk
GitHub user mariusvniekerk reopened a pull request:

https://github.com/apache/spark/pull/15666

[SPARK-11421] [Core][Python][R] Added ability for addJar to augment the 
current classloader

## What changes were proposed in this pull request?

Adds a flag to sc.addJar to add the jar to the current classloader
## How was this patch tested?

Unit tests, manual tests

This is a continuation of the pull request in 
https://github.com/apache/spark/pull/9313 and is mostly a rebase of that moved 
to master with SparkR additions.

cc @holdenk 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mariusvniekerk/spark SPARK-11421

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15666.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15666


commit 6fb5d66e7669ebe0e8a515e02b1276e1bab652a2
Author: Marius van Niekerk 
Date:   2016-10-28T00:26:17Z

Squashed content from pull request #9313

commit 6a6e98a0fcc7f388009f36b8a31664bda2ccf5d9
Author: Marius van Niekerk 
Date:   2016-10-28T00:26:29Z

Remove _loadClass method since we dont need it anymore under py4j 0.10

commit 2b1e98e50feb7180b94f7b9e304634566f163718
Author: Marius van Niekerk 
Date:   2016-10-28T00:26:36Z

Expose addJar to sparkR as well

commit 7f37d3a060d574bd6c38539ec896fbc4c94060f3
Author: mariusvniekerk 
Date:   2016-10-29T13:24:40Z

Style fixes

commit 9d838b35b53b1e4fdcf39721b4f638ead9e40fcd
Author: Marius van Niekerk 
Date:   2016-10-29T20:15:32Z

Adjust test suite to test add jar in scala as well.

commit d4416d92610affd363701fd08dc53eb720566130
Author: Marius van Niekerk 
Date:   2016-10-29T21:19:16Z

Fixed scala test not working due to incorrect classloader being used.

commit fccb141dd9e6d36db242997f1c6f3e007caa514f
Author: Marius van Niekerk 
Date:   2016-10-30T00:15:27Z

Fixed typo with test.

commit 26b39de51f9a76b121ebcb70079072dfcc9972bd
Author: Marius van Niekerk 
Date:   2016-11-01T01:46:07Z

Fixed documentation.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2016-10-31 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r85865112
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1700,19 +1700,34 @@ class SparkContext(config: SparkConf) extends 
Logging {
* Adds a JAR dependency for all tasks to be executed on this 
SparkContext in the future.
* The `path` passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
* filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on 
every worker node.
+   * If addToCurrentClassLoader is true, attempt to add the new class to 
the current threads' class
+   * loader. In general adding to the current threads' class loader will 
impact all other
+   * application threads unless they have explicitly changed their class 
loader.
*/
   def addJar(path: String) {
+addJar(path, false)
+  }
+
+  def addJar(path: String, addToCurrentClassLoader: Boolean) {
 if (path == null) {
   logWarning("null specified as parameter to addJar")
 } else {
   var key = ""
-  if (path.contains("\\")) {
+
+  val uri = if (path.contains("\\")) {
 // For local paths with backslashes on Windows, URI throws an 
exception
-key = env.rpcEnv.fileServer.addJar(new File(path))
--- End diff --

So this change gets the URI for the windows URI which is used later on to 
construct a File instance.  That should allow the windows special case to work. 
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2016-10-31 Thread mariusvniekerk
Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r85833766
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1700,19 +1700,34 @@ class SparkContext(config: SparkConf) extends 
Logging {
* Adds a JAR dependency for all tasks to be executed on this 
SparkContext in the future.
* The `path` passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
* filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on 
every worker node.
+   * If addToCurrentClassLoader is true, attempt to add the new class to 
the current threads' class
+   * loader. In general adding to the current threads' class loader will 
impact all other
+   * application threads unless they have explicitly changed their class 
loader.
*/
   def addJar(path: String) {
+addJar(path, false)
+  }
+
+  def addJar(path: String, addToCurrentClassLoader: Boolean) {
--- End diff --

Keeping it in the Scala makes it simpler for other spark Scala interpeters 
(eg toree, zeppelin) to make use of this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2016-10-31 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r85830311
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1700,19 +1700,34 @@ class SparkContext(config: SparkConf) extends 
Logging {
* Adds a JAR dependency for all tasks to be executed on this 
SparkContext in the future.
* The `path` passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
* filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on 
every worker node.
+   * If addToCurrentClassLoader is true, attempt to add the new class to 
the current threads' class
+   * loader. In general adding to the current threads' class loader will 
impact all other
+   * application threads unless they have explicitly changed their class 
loader.
*/
   def addJar(path: String) {
+addJar(path, false)
+  }
+
+  def addJar(path: String, addToCurrentClassLoader: Boolean) {
--- End diff --

instead of changing SparkContext.addJar as a broad public API, would it be 
better to have a more narrow change in a R only helper?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2016-10-31 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r85829905
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1700,19 +1700,34 @@ class SparkContext(config: SparkConf) extends 
Logging {
* Adds a JAR dependency for all tasks to be executed on this 
SparkContext in the future.
* The `path` passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
* filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on 
every worker node.
+   * If addToCurrentClassLoader is true, attempt to add the new class to 
the current threads' class
+   * loader. In general adding to the current threads' class loader will 
impact all other
+   * application threads unless they have explicitly changed their class 
loader.
*/
   def addJar(path: String) {
+addJar(path, false)
+  }
+
+  def addJar(path: String, addToCurrentClassLoader: Boolean) {
 if (path == null) {
   logWarning("null specified as parameter to addJar")
 } else {
   var key = ""
-  if (path.contains("\\")) {
+
+  val uri = if (path.contains("\\")) {
 // For local paths with backslashes on Windows, URI throws an 
exception
-key = env.rpcEnv.fileServer.addJar(new File(path))
--- End diff --

this seems to be changing existing behavior?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2016-10-31 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r85829640
  
--- Diff: R/pkg/R/context.R ---
@@ -290,6 +290,33 @@ spark.addFile <- function(path, recursive = FALSE) {
   invisible(callJMethod(sc, "addFile", 
suppressWarnings(normalizePath(path)), recursive))
 }
 
+
+#' Adds a JAR dependency for all tasks to be executed on this SparkContext 
in the future.
+#'
+#' The `path` passed can be either a local file, a file in HDFS (or other 
Hadoop-supported
--- End diff --

use `\code{path}` instead of \`path\` in R doc


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2016-10-27 Thread mariusvniekerk
GitHub user mariusvniekerk opened a pull request:

https://github.com/apache/spark/pull/15666

[SPARK-11421] [Core][Python][R] Added ability for addJar to augment the 
current classloader

## What changes were proposed in this pull request?

Adds a flag to sc.addJar to add the jar to the current classloader

## How was this patch tested?

Unit tests, manual tests

This is a continuation of the pull request in 
https://github.com/apache/spark/pull/9313 and is mostly a rebase of that moved 
to master with SparkR additions.

cc @holdenk 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mariusvniekerk/spark SPARK-11421

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15666.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15666


commit 6fb5d66e7669ebe0e8a515e02b1276e1bab652a2
Author: Marius van Niekerk 
Date:   2016-10-28T00:26:17Z

Squashed content from pull request #9313

commit 6a6e98a0fcc7f388009f36b8a31664bda2ccf5d9
Author: Marius van Niekerk 
Date:   2016-10-28T00:26:29Z

Remove _loadClass method since we dont need it anymore under py4j 0.10

commit 2b1e98e50feb7180b94f7b9e304634566f163718
Author: Marius van Niekerk 
Date:   2016-10-28T00:26:36Z

Expose addJar to sparkR as well




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org