[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user mariusvniekerk closed the pull request at: https://github.com/apache/spark/pull/15666 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r143838425 --- Diff: python/pyspark/tests.py --- @@ -435,6 +436,19 @@ def test_add_file_locally(self): with open(download_path) as test_file: self.assertEqual("Hello World!\n", test_file.readline()) +def test_add_jar(self): +jvm = self.sc._jvm +# We shouldn't be able to load anything from the package before it is added +self.assertFalse(isinstance(jvm.pysparktests.DummyClass, JavaClass)) +# Generate and compile the test jar +destDir = os.path.join(SPARK_HOME, "python/test_support/jar") --- End diff -- Instead you want to use a temp directory? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r143018955 --- Diff: python/pyspark/context.py --- @@ -863,6 +863,21 @@ def addPyFile(self, path): import importlib importlib.invalidate_caches() +def addJar(self, path, addToCurrentClassLoader=False): +""" +Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. +The `path` passed can be either a local file, a file in HDFS (or other Hadoop-supported +filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. +If addToCurrentClassLoader is true, add the jar to the current threads' classloader. --- End diff -- little nit: `addToCurrentClassLoader` ->`` `addToCurrentClassLoader` `` and `ads' cl` -> `ads' cl`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r143019603 --- Diff: python/pyspark/tests.py --- @@ -435,6 +436,19 @@ def test_add_file_locally(self): with open(download_path) as test_file: self.assertEqual("Hello World!\n", test_file.readline()) +def test_add_jar(self): +jvm = self.sc._jvm +# We shouldn't be able to load anything from the package before it is added +self.assertFalse(isinstance(jvm.pysparktests.DummyClass, JavaClass)) +# Generate and compile the test jar +destDir = os.path.join(SPARK_HOME, "python/test_support/jar") --- End diff -- I'd remove this directory too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r143015274 --- Diff: R/pkg/tests/fulltests/test_context.R --- @@ -167,6 +167,18 @@ test_that("spark.lapply should perform simple transforms", { sparkR.session.stop() }) +test_that("add jar should work and allow usage of the jar on the driver node", { + sparkR.sparkContext() + + destDir <- file.path(tempdir(), "testjar") + jarName <- callJStatic("org.apache.spark.TestUtils", "createDummyJar", + destDir, "sparkrTests", "DummyClassForAddJarTest") --- End diff -- little nit: ```r jarName <- callJStatic("org.apache.spark.TestUtils", "createDummyJar", destDir, "sparkrTests", "DummyClassForAddJarTest") ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r143014933 --- Diff: R/pkg/R/context.R --- @@ -319,6 +319,32 @@ spark.addFile <- function(path, recursive = FALSE) { invisible(callJMethod(sc, "addFile", suppressWarnings(normalizePath(path)), recursive)) } + +#' Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. +#' +#' The \code{path} passed can be either a local file, a file in HDFS (or other Hadoop-supported +#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. +#' If \code{addToCurrentClassLoader} is true, add the jar to the current threads' classloader. In +#' general adding to the current threads' class loader will impact all other application threads +#' unless they have explicitly changed their class loader. +#' +#' @rdname spark.addJar +#' @param path The path of the jar to be added +#' @param addToCurrentClassLoader Whether to add the jar to the current driver classloader. +#' @export +#' @examples +#'\dontrun{ +#' spark.addJar("/path/to/something.jar", TRUE) +#'} +#' @note spark.addJar since 2.2.0 +spark.addJar <- function(path, addToCurrentClassLoader = FALSE) { + normalizedPath <- suppressWarnings(normalizePath(path)) + sc <- callJMethod(getSparkContext(), "sc") + invisible(callJMethod(sc, "addJar", normalizedPath, addToCurrentClassLoader)) +} + + + --- End diff -- little nit: I guess we just need a single newline. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r143018036 --- Diff: core/src/main/scala/org/apache/spark/TestUtils.scala --- @@ -168,6 +168,27 @@ private[spark] object TestUtils { createCompiledClass(className, destDir, sourceFile, classpathUrls) } + /** Create a dummy compile jar for a given package, classname. Jar will be placed in destDir */ + def createDummyJar(destDir: String, packageName: String, className: String): String = { +val srcDir = new File(destDir, packageName) +srcDir.mkdirs() +val excSource = new JavaSourceFromString(new File(srcDir, className).toURI.getPath, + s"""package $packageName; + | + |public class $className implements java.io.Serializable { + | public static String helloWorld(String arg) { return "Hello " + arg; } + | public static int addStuff(int arg1, int arg2) { return arg1 + arg2; } + |} +""". +stripMargin) --- End diff -- We could make this lined. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r143015101 --- Diff: R/pkg/tests/fulltests/test_context.R --- @@ -167,6 +167,18 @@ test_that("spark.lapply should perform simple transforms", { sparkR.session.stop() }) +test_that("add jar should work and allow usage of the jar on the driver node", { + sparkR.sparkContext() + + destDir <- file.path(tempdir(), "testjar") --- End diff -- I'd remove this `tempdir()` btw. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r143017669 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1845,6 +1859,21 @@ class SparkContext(config: SparkConf) extends Logging { logInfo(s"Added JAR $path at $key with timestamp $timestamp") postEnvironmentUpdate() } + +if (addToCurrentClassLoader) { + val currentCL = Utils.getContextOrSparkClassLoader + currentCL match { +case cl: MutableURLClassLoader => + val uri = if (path.contains("\\")) { +// For local paths with backslashes on Windows, URI throws an exception +new File(path).toURI + } else { +new URI(path) --- End diff -- Could we maybe just use `Utils.resolveURI(path)`? looks the logic is similar. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r143017806 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1845,6 +1859,21 @@ class SparkContext(config: SparkConf) extends Logging { logInfo(s"Added JAR $path at $key with timestamp $timestamp") postEnvironmentUpdate() } + +if (addToCurrentClassLoader) { + val currentCL = Utils.getContextOrSparkClassLoader + currentCL match { +case cl: MutableURLClassLoader => + val uri = if (path.contains("\\")) { +// For local paths with backslashes on Windows, URI throws an exception +new File(path).toURI + } else { +new URI(path) + } + cl.addURL(uri.toURL) +case _ => logWarning(s"Unsupported cl $currentCL will not update jars thread cl") --- End diff -- I'd say`class loader` instead of `cl`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r14301 --- Diff: R/pkg/tests/fulltests/test_context.R --- @@ -167,6 +167,18 @@ test_that("spark.lapply should perform simple transforms", { sparkR.session.stop() }) +test_that("add jar should work and allow usage of the jar on the driver node", { + sparkR.sparkContext() + + destDir <- file.path(tempdir(), "testjar") + jarName <- callJStatic("org.apache.spark.TestUtils", "createDummyJar", + destDir, "sparkrTests", "DummyClassForAddJarTest") --- End diff -- Ah, it looks the problem is here. `createDummyJar` returns URI string, for example, ```r > normalizePath("file:/C:/a/b/c") [1] "C:\\Users\\IEUser\\workspace\\spark\\file:\\C:\\a\\b\\c" Warning message: In normalizePath(path.expand(path), winslash, mustWork) : path[1]="file:/C:/a/b/c": The filename, directory name, or volume label syntax is incorrect ``` This looks ending up with an weird path like `"C:\\Users\\IEUser\\workspace\\spark\\file:\\C:\\a\\b\\c"`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user mariusvniekerk commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r122578282 --- Diff: R/pkg/R/context.R --- @@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) { invisible(callJMethod(sc, "addFile", suppressWarnings(normalizePath(path)), recursive)) } + +#' Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. --- End diff -- In that case do we want to bother having this method for R? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user mariusvniekerk commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r122578275 --- Diff: R/pkg/R/context.R --- @@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) { invisible(callJMethod(sc, "addFile", suppressWarnings(normalizePath(path)), recursive)) } + +#' Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. +#' +#' The \code{path} passed can be either a local file, a file in HDFS (or other Hadoop-supported +#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. +#' If \code{addToCurrentClassLoader} is true, add the jar to the current threads' classloader. In +#' general adding to the current threads' class loader will impact all other application threads +#' unless they have explicitly changed their class loader. +#' +#' @rdname spark.addJar +#' @param path The path of the jar to be added +#' @param addToCurrentClassLoader Whether to add the jar to the current driver classloader. +#' Default is FALSE. +#' @export +#' @examples +#'\dontrun{ +#' spark.addJar("/path/to/something.jar", TRUE) +#'} +#' @note spark.addJar since 2.2.0 +spark.addJar <- function(path, addToCurrentClassLoader = FALSE) { --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r106786961 --- Diff: R/pkg/R/context.R --- @@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) { invisible(callJMethod(sc, "addFile", suppressWarnings(normalizePath(path)), recursive)) } + +#' Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. +#' +#' The \code{path} passed can be either a local file, a file in HDFS (or other Hadoop-supported +#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. +#' If \code{addToCurrentClassLoader} is true, add the jar to the current threads' classloader. In +#' general adding to the current threads' class loader will impact all other application threads +#' unless they have explicitly changed their class loader. +#' +#' @rdname spark.addJar +#' @param path The path of the jar to be added +#' @param addToCurrentClassLoader Whether to add the jar to the current driver classloader. +#' Default is FALSE. +#' @export +#' @examples +#'\dontrun{ +#' spark.addJar("/path/to/something.jar", TRUE) +#'} +#' @note spark.addJar since 2.2.0 +spark.addJar <- function(path, addToCurrentClassLoader = FALSE) { + sc <- getSparkContext() + normalizedPath <- suppressWarnings(normalizePath(path)) + scala_sc <- callJMethod(sc, "sc") + invisible(callJMethod(scala_sc, "addJar", normalizedPath, addToCurrentClassLoader)) --- End diff -- it doesn't handle `file:\C:` prefix - where is that coming from? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user mariusvniekerk commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r106781948 --- Diff: R/pkg/R/context.R --- @@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) { invisible(callJMethod(sc, "addFile", suppressWarnings(normalizePath(path)), recursive)) } + +#' Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. +#' +#' The \code{path} passed can be either a local file, a file in HDFS (or other Hadoop-supported +#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. +#' If \code{addToCurrentClassLoader} is true, add the jar to the current threads' classloader. In +#' general adding to the current threads' class loader will impact all other application threads +#' unless they have explicitly changed their class loader. +#' +#' @rdname spark.addJar +#' @param path The path of the jar to be added +#' @param addToCurrentClassLoader Whether to add the jar to the current driver classloader. +#' Default is FALSE. +#' @export +#' @examples +#'\dontrun{ +#' spark.addJar("/path/to/something.jar", TRUE) +#'} +#' @note spark.addJar since 2.2.0 +spark.addJar <- function(path, addToCurrentClassLoader = FALSE) { + sc <- getSparkContext() + normalizedPath <- suppressWarnings(normalizePath(path)) + scala_sc <- callJMethod(sc, "sc") + invisible(callJMethod(scala_sc, "addJar", normalizedPath, addToCurrentClassLoader)) --- End diff -- why is normalizepath doing that to the url? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r105839398 --- Diff: R/pkg/R/context.R --- @@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) { invisible(callJMethod(sc, "addFile", suppressWarnings(normalizePath(path)), recursive)) } + +#' Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. +#' +#' The \code{path} passed can be either a local file, a file in HDFS (or other Hadoop-supported +#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. +#' If \code{addToCurrentClassLoader} is true, add the jar to the current threads' classloader. In +#' general adding to the current threads' class loader will impact all other application threads +#' unless they have explicitly changed their class loader. +#' +#' @rdname spark.addJar +#' @param path The path of the jar to be added +#' @param addToCurrentClassLoader Whether to add the jar to the current driver classloader. +#' Default is FALSE. +#' @export +#' @examples +#'\dontrun{ +#' spark.addJar("/path/to/something.jar", TRUE) +#'} +#' @note spark.addJar since 2.2.0 +spark.addJar <- function(path, addToCurrentClassLoader = FALSE) { + sc <- getSparkContext() + normalizedPath <- suppressWarnings(normalizePath(path)) + scala_sc <- callJMethod(sc, "sc") + invisible(callJMethod(scala_sc, "addJar", normalizedPath, addToCurrentClassLoader)) --- End diff -- (`C:\projects\spark\R\lib\SparkR\tests\testthat\` was indeed printed together in the logs..). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r105723828 --- Diff: R/pkg/R/context.R --- @@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) { invisible(callJMethod(sc, "addFile", suppressWarnings(normalizePath(path)), recursive)) } + +#' Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. +#' +#' The \code{path} passed can be either a local file, a file in HDFS (or other Hadoop-supported +#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. +#' If \code{addToCurrentClassLoader} is true, add the jar to the current threads' classloader. In +#' general adding to the current threads' class loader will impact all other application threads +#' unless they have explicitly changed their class loader. +#' +#' @rdname spark.addJar +#' @param path The path of the jar to be added +#' @param addToCurrentClassLoader Whether to add the jar to the current driver classloader. +#' Default is FALSE. +#' @export +#' @examples +#'\dontrun{ +#' spark.addJar("/path/to/something.jar", TRUE) +#'} +#' @note spark.addJar since 2.2.0 +spark.addJar <- function(path, addToCurrentClassLoader = FALSE) { + sc <- getSparkContext() + normalizedPath <- suppressWarnings(normalizePath(path)) + scala_sc <- callJMethod(sc, "sc") + invisible(callJMethod(scala_sc, "addJar", normalizedPath, addToCurrentClassLoader)) --- End diff -- Right after this call, the path `normalizedPath` was `C:\projects\spark\R\lib\SparkR\tests\testthat\file:\C:\Users\appveyor\AppData\Local\Temp\1\Rtmpg5tOYo\testjar\sparkrTests-DummyClassForAddJarTest-1489422615481.jar` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r105723482 --- Diff: R/pkg/inst/tests/testthat/test_context.R --- @@ -167,6 +167,18 @@ test_that("spark.lapply should perform simple transforms", { sparkR.session.stop() }) +test_that("add jar should work and allow usage of the jar on the driver node", { + sparkR.sparkContext() + + destDir <- paste0(tempdir(), "/", "testjar") + jarName <- callJStatic("org.apache.spark.TestUtils", "createDummyJar", --- End diff -- I checked what it's got right after `createDummyJar` call. It was `createDummyJar:C:\Users\appveyor\AppData\Local\Temp\1\Rtmpg5tOYo/testjar` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user mariusvniekerk commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r104908469 --- Diff: R/pkg/R/context.R --- @@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) { invisible(callJMethod(sc, "addFile", suppressWarnings(normalizePath(path)), recursive)) } + +#' Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. +#' +#' The \code{path} passed can be either a local file, a file in HDFS (or other Hadoop-supported +#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. +#' If \code{addToCurrentClassLoader} is true, add the jar to the current threads' classloader. In +#' general adding to the current threads' class loader will impact all other application threads +#' unless they have explicitly changed their class loader. +#' +#' @rdname spark.addJar +#' @param path The path of the jar to be added +#' @param addToCurrentClassLoader Whether to add the jar to the current driver classloader. +#' Default is FALSE. +#' @export +#' @examples +#'\dontrun{ +#' spark.addJar("/path/to/something.jar", TRUE) +#'} +#' @note spark.addJar since 2.2.0 +spark.addJar <- function(path, addToCurrentClassLoader = FALSE) { --- End diff -- Mostly for backwards compatibility. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r104592654 --- Diff: R/pkg/R/context.R --- @@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) { invisible(callJMethod(sc, "addFile", suppressWarnings(normalizePath(path)), recursive)) } + +#' Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. +#' +#' The \code{path} passed can be either a local file, a file in HDFS (or other Hadoop-supported +#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. +#' If \code{addToCurrentClassLoader} is true, add the jar to the current threads' classloader. In +#' general adding to the current threads' class loader will impact all other application threads +#' unless they have explicitly changed their class loader. +#' +#' @rdname spark.addJar +#' @param path The path of the jar to be added +#' @param addToCurrentClassLoader Whether to add the jar to the current driver classloader. +#' Default is FALSE. +#' @export +#' @examples +#'\dontrun{ +#' spark.addJar("/path/to/something.jar", TRUE) +#'} +#' @note spark.addJar since 2.2.0 +spark.addJar <- function(path, addToCurrentClassLoader = FALSE) { + sc <- getSparkContext() + normalizedPath <- suppressWarnings(normalizePath(path)) + scala_sc <- callJMethod(sc, "sc") --- End diff -- I'd just combine 342 & 344 and call it sc ``` sc <- callJMethod(getSparkContext(), "sc") ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r104593039 --- Diff: R/pkg/R/context.R --- @@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) { invisible(callJMethod(sc, "addFile", suppressWarnings(normalizePath(path)), recursive)) } + +#' Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. +#' +#' The \code{path} passed can be either a local file, a file in HDFS (or other Hadoop-supported +#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. +#' If \code{addToCurrentClassLoader} is true, add the jar to the current threads' classloader. In +#' general adding to the current threads' class loader will impact all other application threads +#' unless they have explicitly changed their class loader. +#' +#' @rdname spark.addJar +#' @param path The path of the jar to be added +#' @param addToCurrentClassLoader Whether to add the jar to the current driver classloader. +#' Default is FALSE. +#' @export +#' @examples +#'\dontrun{ +#' spark.addJar("/path/to/something.jar", TRUE) +#'} +#' @note spark.addJar since 2.2.0 +spark.addJar <- function(path, addToCurrentClassLoader = FALSE) { --- End diff -- `spark.addJar` needs to be added to the NAMESPACE file, otherwise it won't be accessible. Tests are running within the same namespace so it's always able to access private methods. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r104592294 --- Diff: R/pkg/R/context.R --- @@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) { invisible(callJMethod(sc, "addFile", suppressWarnings(normalizePath(path)), recursive)) } + +#' Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. --- End diff -- we don't really expose SparkContext in R actually --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r104592782 --- Diff: R/pkg/R/context.R --- @@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) { invisible(callJMethod(sc, "addFile", suppressWarnings(normalizePath(path)), recursive)) } + +#' Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. +#' +#' The \code{path} passed can be either a local file, a file in HDFS (or other Hadoop-supported +#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. +#' If \code{addToCurrentClassLoader} is true, add the jar to the current threads' classloader. In +#' general adding to the current threads' class loader will impact all other application threads +#' unless they have explicitly changed their class loader. +#' +#' @rdname spark.addJar +#' @param path The path of the jar to be added +#' @param addToCurrentClassLoader Whether to add the jar to the current driver classloader. +#' Default is FALSE. --- End diff -- remove default value since it's in the signature already below --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r104592268 --- Diff: R/pkg/R/context.R --- @@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) { invisible(callJMethod(sc, "addFile", suppressWarnings(normalizePath(path)), recursive)) } + +#' Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. +#' +#' The \code{path} passed can be either a local file, a file in HDFS (or other Hadoop-supported +#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. +#' If \code{addToCurrentClassLoader} is true, add the jar to the current threads' classloader. In +#' general adding to the current threads' class loader will impact all other application threads +#' unless they have explicitly changed their class loader. --- End diff -- should ``` add the jar to the current threads' classloader. In +#' general adding to the current threads' class loader will impact all other application threads +#' unless they have explicitly changed their class loader. ``` be reworded for R or Python? It should be the JVM. `threads` or `classloader` aren't familiar concepts to R users. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r104527246 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1800,17 +1800,39 @@ class SparkContext(config: SparkConf) extends Logging { * an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. */ def addJar(path: String) { +addJar(path, false) + } + + /** + * Adds a JAR dependency for all tasks to be executed on this `SparkContext` in the future. + * @param path can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), + * an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. --- End diff -- nit: alignment ```scala @param path can be either... blah more blah ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r104527000 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1800,17 +1800,39 @@ class SparkContext(config: SparkConf) extends Logging { * an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. */ def addJar(path: String) { +addJar(path, false) + } + + /** + * Adds a JAR dependency for all tasks to be executed on this `SparkContext` in the future. + * @param path can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), + * an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. + * @param addToCurrentClassLoader if true will add the jar to the current threads' classloader. + * In general adding to the current threads' class loader will impact all other application + * threads unless they have explicitly changed their class loader. + */ + @DeveloperApi + def addJar(path: String, addToCurrentClassLoader: Boolean) { if (path == null) { logWarning("null specified as parameter to addJar") } else { var key = "" - if (path.contains("\\")) { -// For local paths with backslashes on Windows, URI throws an exception -key = env.rpcEnv.fileServer.addJar(new File(path)) + + val uri = if (path.contains("\\")) { +// If we have backslashes here this is a windows path, and URI will throws an exception, +// so we use this constrcutor instead --- End diff -- `constructor` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r104523062 --- Diff: R/pkg/inst/tests/testthat/test_context.R --- @@ -167,6 +167,18 @@ test_that("spark.lapply should perform simple transforms", { sparkR.session.stop() }) +test_that("add jar should work and allow usage of the jar on the driver node", { + sparkR.sparkContext() + + destDir <- paste0(tempdir(), "/", "testjar") --- End diff -- you may want to use https://stat.ethz.ch/R-manual/R-devel/library/base/html/file.path.html instead for windows compatibility --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r104522704 --- Diff: R/pkg/R/context.R --- @@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) { invisible(callJMethod(sc, "addFile", suppressWarnings(normalizePath(path)), recursive)) } + +#' Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. +#' +#' The \code{path} passed can be either a local file, a file in HDFS (or other Hadoop-supported +#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. +#' If \code{addToCurrentClassLoader} is true, add the jar to the current threads' classloader. In +#' general adding to the current threads' class loader will impact all other application threads +#' unless they have explicitly changed their class loader. +#' +#' @rdname spark.addJar +#' @param path The path of the jar to be added +#' @param addToCurrentClassLoader Whether to add the jar to the current driver classloader. +#' Default is FALSE. +#' @export +#' @examples +#'\dontrun{ +#' spark.addJar("/path/to/something.jar", TRUE) +#'} +#' @note spark.addJar since 2.2.0 +spark.addJar <- function(path, addToCurrentClassLoader = FALSE) { --- End diff -- why don't we want to add it to the driver classpath by default? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user mariusvniekerk commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r104486665 --- Diff: R/pkg/inst/tests/testthat/test_context.R --- @@ -167,6 +167,18 @@ test_that("spark.lapply should perform simple transforms", { sparkR.session.stop() }) +test_that("add jar should work and allow usage of the jar on the driver node", { + sparkR.sparkContext() + + destDir <- paste0(tempdir(), "/", "testjar") + jarName <- callJStatic("org.apache.spark.TestUtils", "createDummyJar", + destDir, "sparkrTests", "DummyClassForAddJarTest") + + spark.addJar(jarName, addToCurrentClassLoader = TRUE) + testClass <- newJObject("sparkrTests.DummyClassForAddJarTest") --- End diff -- yeah i suspect that the windows path didn't make it properly into the classloader --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r104433603 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1842,6 +1864,14 @@ class SparkContext(config: SparkConf) extends Logging { logInfo(s"Added JAR $path at $key with timestamp $timestamp") postEnvironmentUpdate() } +if (addToCurrentClassLoader) { + val currentCL = Utils.getContextOrSparkClassLoader + currentCL match { +case cl: MutableURLClassLoader => + cl.addURL(uri.toURL) --- End diff -- (We could make this inlined too..) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r104432976 --- Diff: core/src/main/scala/org/apache/spark/TestUtils.scala --- @@ -168,6 +168,27 @@ private[spark] object TestUtils { createCompiledClass(className, destDir, sourceFile, classpathUrls) } + /** Create a dummy compile jar for a given package, classname. Jar will be placed in destDir */ + def createDummyJar(destDir: String, packageName: String, className: String): String = { +val srcDir = new File(destDir, packageName) +srcDir.mkdirs() +val excSource = new JavaSourceFromString(new File(srcDir, className).toURI.getPath, + s"""package $packageName; + | + |public class $className implements java.io.Serializable { + | public static String helloWorld(String arg) { return "Hello " + arg; } + | public static int addStuff(int arg1, int arg2) { return arg1 + arg2; } + |} +""". +stripMargin) --- End diff -- (We could make this lined.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r104433213 --- Diff: core/src/test/scala/org/apache/spark/SparkContextSuite.scala --- @@ -308,6 +308,36 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu sc.listJars().head should include (tmpJar.getName) } + for ( +schedulingMode <- Seq("local_mode", "non_local_mode") + ) { --- End diff -- Just personal taste.. maybe ```scala Seq("local_mode", "non_local_mode").foreach { schedulingMode => ... } ``` ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r104432962 --- Diff: core/src/main/scala/org/apache/spark/TestUtils.scala --- @@ -168,6 +168,27 @@ private[spark] object TestUtils { createCompiledClass(className, destDir, sourceFile, classpathUrls) } + /** Create a dummy compile jar for a given package, classname. Jar will be placed in destDir */ + def createDummyJar(destDir: String, packageName: String, className: String): String = { +val srcDir = new File(destDir, packageName) +srcDir.mkdirs() +val excSource = new JavaSourceFromString(new File(srcDir, className).toURI.getPath, + s"""package $packageName; + | + |public class $className implements java.io.Serializable { --- End diff -- (This indentation seems inconsistent BTW) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r104431249 --- Diff: R/pkg/inst/tests/testthat/test_context.R --- @@ -167,6 +167,18 @@ test_that("spark.lapply should perform simple transforms", { sparkR.session.stop() }) +test_that("add jar should work and allow usage of the jar on the driver node", { + sparkR.sparkContext() + + destDir <- paste0(tempdir(), "/", "testjar") + jarName <- callJStatic("org.apache.spark.TestUtils", "createDummyJar", + destDir, "sparkrTests", "DummyClassForAddJarTest") + + spark.addJar(jarName, addToCurrentClassLoader = TRUE) + testClass <- newJObject("sparkrTests.DummyClassForAddJarTest") --- End diff -- Maybe, helpful log: ``` ERROR RBackendHandler: on sparkrTests.DummyClassForAddJarTest failed java.lang.ClassNotFoundException: sparkrTests.DummyClassForAddJarTest at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.spark.util.Utils$.classForName(Utils.scala:230) at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:143) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:108) at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:40) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336) at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336) at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:267) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:643) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144) at java.lang.Thread.run(Thread.java:745) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r104429730 --- Diff: R/pkg/inst/tests/testthat/test_context.R --- @@ -167,6 +167,18 @@ test_that("spark.lapply should perform simple transforms", { sparkR.session.stop() }) +test_that("add jar should work and allow usage of the jar on the driver node", { + sparkR.sparkContext() + + destDir <- paste0(tempdir(), "/", "testjar") + jarName <- callJStatic("org.apache.spark.TestUtils", "createDummyJar", + destDir, "sparkrTests", "DummyClassForAddJarTest") + + spark.addJar(jarName, addToCurrentClassLoader = TRUE) + testClass <- newJObject("sparkrTests.DummyClassForAddJarTest") --- End diff -- Hm.. it seems this line this time. ``` 1. Error: add jar should work and allow usage of the jar on the driver node (@test_context.R#178) 1: newJObject("sparkrTests.DummyClassForAddJarTest") at C:/projects/spark/R/lib/SparkR/tests/testthat/test_context.R:178 2: invokeJava(isStatic = TRUE, className, methodName = "", ...) 3: handleErrors(returnStatus, conn) 4: stop(readString(conn)) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r104408350 --- Diff: core/src/main/scala/org/apache/spark/TestUtils.scala --- @@ -168,6 +168,27 @@ private[spark] object TestUtils { createCompiledClass(className, destDir, sourceFile, classpathUrls) } + /** Create a dummy compile jar for a given package, classname. Jar will be placed in destDir */ + def createDummyJar(destDir: String, packageName: String, className: String): String = { +val srcDir = new File(destDir, packageName) +srcDir.mkdirs() +val excSource = new JavaSourceFromString(new File(srcDir, className).getAbsolutePath, --- End diff -- It's fine to push commit and use AppVeyor IMHO. There is a guide for this - https://github.com/apache/spark/blob/master/R/WINDOWS.md if you are willing to manually test but it is kind of a grunt work. In my case, I use my personal AppVeyor account. Otherwise, I can send a PR for your branch if you are happy with waiting till this weekend :). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r104360211 --- Diff: R/pkg/inst/tests/testthat/test_context.R --- @@ -167,6 +167,18 @@ test_that("spark.lapply should perform simple transforms", { sparkR.session.stop() }) +test_that("add jar should work and allow usage of the jar on the driver node", { --- End diff -- This seems failed as below: ``` 1. Error: add jar should work and allow usage of the jar on the driver node (@test_context.R#174) java.lang.IllegalArgumentException: Illegal character in path at index 12: string:///C:\Users\appveyor\AppData\Local\Temp\1\RtmpCqEUJL\testjar\sparkrTests\DummyClassForAddJarTest.java at java.net.URI.create(URI.java:852) at org.apache.spark.TestUtils$.org$apache$spark$TestUtils$$createURI(TestUtils.scala:119) at org.apache.spark.TestUtils$JavaSourceFromString.(TestUtils.scala:123) ``` (Sorry, I left the comments in a wrong line) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r104358500 --- Diff: core/src/main/scala/org/apache/spark/TestUtils.scala --- @@ -168,6 +168,27 @@ private[spark] object TestUtils { createCompiledClass(className, destDir, sourceFile, classpathUrls) } + /** Create a dummy compile jar for a given package, classname. Jar will be placed in destDir */ + def createDummyJar(destDir: String, packageName: String, className: String): String = { +val srcDir = new File(destDir, packageName) +srcDir.mkdirs() +val excSource = new JavaSourceFromString(new File(srcDir, className).getAbsolutePath, --- End diff -- Can we use URI form here? it seems this one is problematic. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r104357841 --- Diff: R/pkg/R/context.R --- @@ -319,6 +319,34 @@ spark.addFile <- function(path, recursive = FALSE) { invisible(callJMethod(sc, "addFile", suppressWarnings(normalizePath(path)), recursive)) } + +#' Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. +#' +#' The \code{path} passed can be either a local file, a file in HDFS (or other Hadoop-supported +#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. +#' If \code{addToCurrentClassLoader} is true, add the jar to the current threads' classloader. In +#' general adding to the current threads' class loader will impact all other application threads +#' unless they have explicitly changed their class loader. +#' +#' @rdname spark.addJar +#' @param path The path of the jar to be added +#' @param addToCurrentClassLoader Whether to add the jar to the current driver classloader. +#' Default is FALSE. +#' @export +#' @examples +#'\dontrun{ +#' spark.addJar("/path/to/something.jar", TRUE) +#'} +#' @note spark.addJar since 2.2.0 +spark.addJar <- function(path, addToCurrentClassLoader = FALSE) { + sc <- getSparkContext() + normalizedPath <- suppressWarnings(normalizePath(path)) + scala_sc <- callJMethod(sc, "sc") + invisible(callJMethod(scala_sc, "addJar", normalizedPath, addToCurrentClassLoader)) --- End diff -- This seems failed as below: ``` 1. Error: add jar should work and allow usage of the jar on the driver node (@test_context.R#174) java.lang.IllegalArgumentException: Illegal character in path at index 12: string:///C:\Users\appveyor\AppData\Local\Temp\1\RtmpCqEUJL\testjar\sparkrTests\DummyClassForAddJarTest.java at java.net.URI.create(URI.java:852) at org.apache.spark.TestUtils$.org$apache$spark$TestUtils$$createURI(TestUtils.scala:119) at org.apache.spark.TestUtils$JavaSourceFromString.(TestUtils.scala:123) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r104358056 --- Diff: core/src/main/scala/org/apache/spark/TestUtils.scala --- @@ -168,6 +168,27 @@ private[spark] object TestUtils { createCompiledClass(className, destDir, sourceFile, classpathUrls) } + /** Create a dummy compile jar for a given package, classname. Jar will be placed in destDir */ + def createDummyJar(destDir: String, packageName: String, className: String): String = { +val srcDir = new File(destDir, packageName) +srcDir.mkdirs() +val excSource = new JavaSourceFromString(new File(srcDir, className).getAbsolutePath, + s"""package $packageName; + | + |public class $className implements java.io.Serializable { + | public static String helloWorld(String arg) { return "Hello " + arg; } + | public static int addStuff(int arg1, int arg2) { return arg1 + arg2; } + |} +""". +stripMargin) +val excFile = createCompiledClass(className, srcDir, excSource, Seq.empty) --- End diff -- Can we use URI form here? it seems this one is problematic. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user mariusvniekerk commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r100177099 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1802,19 +1802,34 @@ class SparkContext(config: SparkConf) extends Logging { * Adds a JAR dependency for all tasks to be executed on this `SparkContext` in the future. * @param path can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), * an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. + * If addToCurrentClassLoader is true, attempt to add the new class to the current threads' class --- End diff -- Add to doc that already loaded urls will have no effect if a url is already present. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r100176735 --- Diff: python/pyspark/context.py --- @@ -836,6 +836,17 @@ def addPyFile(self, path): import importlib importlib.invalidate_caches() +def addJar(self, path, addToCurrentClassLoader=False): +""" +Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. +The `path` passed can be either a local file, a file in HDFS (or other Hadoop-supported +filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. +If addToCurrentClassLoader is true, attempt to add the new class to the current threads' +class loader. In general adding to the current threads' class loader will impact all other +application threads unless they have explicitly changed their class loader. --- End diff -- Add a `:param:` annotation here as well for path & addToCurrentClassLoader --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user mariusvniekerk commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r100176188 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1802,19 +1802,34 @@ class SparkContext(config: SparkConf) extends Logging { * Adds a JAR dependency for all tasks to be executed on this `SparkContext` in the future. * @param path can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), * an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. + * If addToCurrentClassLoader is true, attempt to add the new class to the current threads' class + * loader. In general adding to the current threads' class loader will impact all other + * application threads unless they have explicitly changed their class loader. */ def addJar(path: String) { +addJar(path, false) + } + + def addJar(path: String, addToCurrentClassLoader: Boolean) { if (path == null) { logWarning("null specified as parameter to addJar") } else { var key = "" - if (path.contains("\\")) { + + val uri = if (path.contains("\\")) { // For local paths with backslashes on Windows, URI throws an exception -key = env.rpcEnv.fileServer.addJar(new File(path)) +new File(path).toURI } else { val uri = new URI(path) // SPARK-17650: Make sure this is a valid URL before adding it to the list of dependencies Utils.validateURL(uri) +uri + } + + if (path.contains("\\")) { +// For local paths with backslashes on Windows, URI throws an exception +key = env.rpcEnv.fileServer.addJar(new File(uri)) --- End diff -- If we have backslashes we are in a local path on windows. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r98948530 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1700,19 +1700,34 @@ class SparkContext(config: SparkConf) extends Logging { * Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. * The `path` passed can be either a local file, a file in HDFS (or other Hadoop-supported * filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. + * If addToCurrentClassLoader is true, attempt to add the new class to the current threads' class + * loader. In general adding to the current threads' class loader will impact all other + * application threads unless they have explicitly changed their class loader. */ def addJar(path: String) { +addJar(path, false) + } + + def addJar(path: String, addToCurrentClassLoader: Boolean) { --- End diff -- What about if we made the the Scala API as a Developer API since we might end up wanting to change the Scala side in the future? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r98949834 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1700,19 +1700,34 @@ class SparkContext(config: SparkConf) extends Logging { * Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. * The `path` passed can be either a local file, a file in HDFS (or other Hadoop-supported * filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. + * If addToCurrentClassLoader is true, attempt to add the new class to the current threads' class + * loader. In general adding to the current threads' class loader will impact all other + * application threads unless they have explicitly changed their class loader. */ def addJar(path: String) { +addJar(path, false) + } + + def addJar(path: String, addToCurrentClassLoader: Boolean) { if (path == null) { logWarning("null specified as parameter to addJar") } else { var key = "" - if (path.contains("\\")) { + + val uri = if (path.contains("\\")) { // For local paths with backslashes on Windows, URI throws an exception -key = env.rpcEnv.fileServer.addJar(new File(path)) --- End diff -- So if we construct file and then go with toURI its safe? Have we tested this on Windows? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r98950178 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1802,19 +1802,34 @@ class SparkContext(config: SparkConf) extends Logging { * Adds a JAR dependency for all tasks to be executed on this `SparkContext` in the future. * @param path can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), * an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. + * If addToCurrentClassLoader is true, attempt to add the new class to the current threads' class + * loader. In general adding to the current threads' class loader will impact all other + * application threads unless they have explicitly changed their class loader. */ def addJar(path: String) { +addJar(path, false) + } + + def addJar(path: String, addToCurrentClassLoader: Boolean) { if (path == null) { logWarning("null specified as parameter to addJar") } else { var key = "" - if (path.contains("\\")) { + + val uri = if (path.contains("\\")) { // For local paths with backslashes on Windows, URI throws an exception -key = env.rpcEnv.fileServer.addJar(new File(path)) +new File(path).toURI } else { val uri = new URI(path) // SPARK-17650: Make sure this is a valid URL before adding it to the list of dependencies Utils.validateURL(uri) +uri + } + + if (path.contains("\\")) { +// For local paths with backslashes on Windows, URI throws an exception +key = env.rpcEnv.fileServer.addJar(new File(uri)) --- End diff -- So I'm a bit confused, we already supposedly have a safe uri object for windows above, does it throw an exception on getPath? Or is it that \\ is always local and getScheme doesn't return that? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r98948623 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1802,19 +1802,34 @@ class SparkContext(config: SparkConf) extends Logging { * Adds a JAR dependency for all tasks to be executed on this `SparkContext` in the future. * @param path can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), * an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. + * If addToCurrentClassLoader is true, attempt to add the new class to the current threads' class --- End diff -- attempt to is interesting wording here - when would we expect this to fail and should we explicitly call that out? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user mariusvniekerk commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r93730314 --- Diff: core/src/main/scala/org/apache/spark/TestUtils.scala --- @@ -164,6 +164,27 @@ private[spark] object TestUtils { createCompiledClass(className, destDir, sourceFile, classpathUrls) } + /** Create a dummy compile jar for a given package, classname. Jar will be placed in destDir */ + def createDummyJar(destDir: String, packageName: String, className: String): String = { --- End diff -- The R tests do indeed verify that they can call the internal functions. I can revert that part of the changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user mariusvniekerk commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r93729928 --- Diff: core/src/main/scala/org/apache/spark/TestUtils.scala --- @@ -164,6 +164,27 @@ private[spark] object TestUtils { createCompiledClass(className, destDir, sourceFile, classpathUrls) } + /** Create a dummy compile jar for a given package, classname. Jar will be placed in destDir */ + def createDummyJar(destDir: String, packageName: String, className: String): String = { --- End diff -- Yeah when i wrote this that didn't exist yet. Changing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r93338899 --- Diff: core/src/main/scala/org/apache/spark/TestUtils.scala --- @@ -164,6 +164,27 @@ private[spark] object TestUtils { createCompiledClass(className, destDir, sourceFile, classpathUrls) } + /** Create a dummy compile jar for a given package, classname. Jar will be placed in destDir */ + def createDummyJar(destDir: String, packageName: String, className: String): String = { --- End diff -- I know this is moved code, but could we just use `createJarWithClasses` or do the R tests require these inner functions? If we do keep this though I'd give it a clearer name and description because without reading the code, `createDummyJar` and `createJarWithClasses` don't seem all that different. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user mariusvniekerk closed the pull request at: https://github.com/apache/spark/pull/15666 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
GitHub user mariusvniekerk reopened a pull request: https://github.com/apache/spark/pull/15666 [SPARK-11421] [Core][Python][R] Added ability for addJar to augment the current classloader ## What changes were proposed in this pull request? Adds a flag to sc.addJar to add the jar to the current classloader ## How was this patch tested? Unit tests, manual tests This is a continuation of the pull request in https://github.com/apache/spark/pull/9313 and is mostly a rebase of that moved to master with SparkR additions. cc @holdenk You can merge this pull request into a Git repository by running: $ git pull https://github.com/mariusvniekerk/spark SPARK-11421 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15666.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15666 commit 6fb5d66e7669ebe0e8a515e02b1276e1bab652a2 Author: Marius van NiekerkDate: 2016-10-28T00:26:17Z Squashed content from pull request #9313 commit 6a6e98a0fcc7f388009f36b8a31664bda2ccf5d9 Author: Marius van Niekerk Date: 2016-10-28T00:26:29Z Remove _loadClass method since we dont need it anymore under py4j 0.10 commit 2b1e98e50feb7180b94f7b9e304634566f163718 Author: Marius van Niekerk Date: 2016-10-28T00:26:36Z Expose addJar to sparkR as well commit 7f37d3a060d574bd6c38539ec896fbc4c94060f3 Author: mariusvniekerk Date: 2016-10-29T13:24:40Z Style fixes commit 9d838b35b53b1e4fdcf39721b4f638ead9e40fcd Author: Marius van Niekerk Date: 2016-10-29T20:15:32Z Adjust test suite to test add jar in scala as well. commit d4416d92610affd363701fd08dc53eb720566130 Author: Marius van Niekerk Date: 2016-10-29T21:19:16Z Fixed scala test not working due to incorrect classloader being used. commit fccb141dd9e6d36db242997f1c6f3e007caa514f Author: Marius van Niekerk Date: 2016-10-30T00:15:27Z Fixed typo with test. commit 26b39de51f9a76b121ebcb70079072dfcc9972bd Author: Marius van Niekerk Date: 2016-11-01T01:46:07Z Fixed documentation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user mariusvniekerk commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r85865112 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1700,19 +1700,34 @@ class SparkContext(config: SparkConf) extends Logging { * Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. * The `path` passed can be either a local file, a file in HDFS (or other Hadoop-supported * filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. + * If addToCurrentClassLoader is true, attempt to add the new class to the current threads' class + * loader. In general adding to the current threads' class loader will impact all other + * application threads unless they have explicitly changed their class loader. */ def addJar(path: String) { +addJar(path, false) + } + + def addJar(path: String, addToCurrentClassLoader: Boolean) { if (path == null) { logWarning("null specified as parameter to addJar") } else { var key = "" - if (path.contains("\\")) { + + val uri = if (path.contains("\\")) { // For local paths with backslashes on Windows, URI throws an exception -key = env.rpcEnv.fileServer.addJar(new File(path)) --- End diff -- So this change gets the URI for the windows URI which is used later on to construct a File instance. That should allow the windows special case to work. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user mariusvniekerk commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r85833766 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1700,19 +1700,34 @@ class SparkContext(config: SparkConf) extends Logging { * Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. * The `path` passed can be either a local file, a file in HDFS (or other Hadoop-supported * filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. + * If addToCurrentClassLoader is true, attempt to add the new class to the current threads' class + * loader. In general adding to the current threads' class loader will impact all other + * application threads unless they have explicitly changed their class loader. */ def addJar(path: String) { +addJar(path, false) + } + + def addJar(path: String, addToCurrentClassLoader: Boolean) { --- End diff -- Keeping it in the Scala makes it simpler for other spark Scala interpeters (eg toree, zeppelin) to make use of this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r85830311 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1700,19 +1700,34 @@ class SparkContext(config: SparkConf) extends Logging { * Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. * The `path` passed can be either a local file, a file in HDFS (or other Hadoop-supported * filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. + * If addToCurrentClassLoader is true, attempt to add the new class to the current threads' class + * loader. In general adding to the current threads' class loader will impact all other + * application threads unless they have explicitly changed their class loader. */ def addJar(path: String) { +addJar(path, false) + } + + def addJar(path: String, addToCurrentClassLoader: Boolean) { --- End diff -- instead of changing SparkContext.addJar as a broad public API, would it be better to have a more narrow change in a R only helper? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r85829905 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1700,19 +1700,34 @@ class SparkContext(config: SparkConf) extends Logging { * Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. * The `path` passed can be either a local file, a file in HDFS (or other Hadoop-supported * filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. + * If addToCurrentClassLoader is true, attempt to add the new class to the current threads' class + * loader. In general adding to the current threads' class loader will impact all other + * application threads unless they have explicitly changed their class loader. */ def addJar(path: String) { +addJar(path, false) + } + + def addJar(path: String, addToCurrentClassLoader: Boolean) { if (path == null) { logWarning("null specified as parameter to addJar") } else { var key = "" - if (path.contains("\\")) { + + val uri = if (path.contains("\\")) { // For local paths with backslashes on Windows, URI throws an exception -key = env.rpcEnv.fileServer.addJar(new File(path)) --- End diff -- this seems to be changing existing behavior? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r85829640 --- Diff: R/pkg/R/context.R --- @@ -290,6 +290,33 @@ spark.addFile <- function(path, recursive = FALSE) { invisible(callJMethod(sc, "addFile", suppressWarnings(normalizePath(path)), recursive)) } + +#' Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. +#' +#' The `path` passed can be either a local file, a file in HDFS (or other Hadoop-supported --- End diff -- use `\code{path}` instead of \`path\` in R doc --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
GitHub user mariusvniekerk opened a pull request: https://github.com/apache/spark/pull/15666 [SPARK-11421] [Core][Python][R] Added ability for addJar to augment the current classloader ## What changes were proposed in this pull request? Adds a flag to sc.addJar to add the jar to the current classloader ## How was this patch tested? Unit tests, manual tests This is a continuation of the pull request in https://github.com/apache/spark/pull/9313 and is mostly a rebase of that moved to master with SparkR additions. cc @holdenk You can merge this pull request into a Git repository by running: $ git pull https://github.com/mariusvniekerk/spark SPARK-11421 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15666.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15666 commit 6fb5d66e7669ebe0e8a515e02b1276e1bab652a2 Author: Marius van NiekerkDate: 2016-10-28T00:26:17Z Squashed content from pull request #9313 commit 6a6e98a0fcc7f388009f36b8a31664bda2ccf5d9 Author: Marius van Niekerk Date: 2016-10-28T00:26:29Z Remove _loadClass method since we dont need it anymore under py4j 0.10 commit 2b1e98e50feb7180b94f7b9e304634566f163718 Author: Marius van Niekerk Date: 2016-10-28T00:26:36Z Expose addJar to sparkR as well --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org