[GitHub] spark pull request #15888: [SPARK-18444][SPARKR] SparkR running in yarn-clus...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15888 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15888: [SPARK-18444][SPARKR] SparkR running in yarn-clus...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15888#discussion_r8285 --- Diff: R/pkg/inst/tests/testthat/test_sparkR.R --- @@ -0,0 +1,36 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +context("functions in sparkR.R") + +test_that("sparkCheckInstall", { + # "local, yarn-client, mesos-client" mode, SPARK_HOME was set correctly, + # and the SparkR job was submitted by "spark-submit" + sparkHome <- paste0(tempdir(), "/", "sparkHome") + dir.create(sparkHome) + master <- "" + deployMode <- "" + expect_true(is.null(sparkCheckInstall(sparkHome, master, deployMode))) + unlink(sparkHome, recursive = TRUE) + + # "yarn-cluster, mesos-cluster" mode, SPARK_HOME was not set, + # and the SparkR job was submitted by "spark-submit" + sparkHome <- "" + master <- "" + deployMode <- "" + expect_true(is.null(sparkCheckInstall(sparkHome, master, deployMode))) --- End diff -- Added. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15888: [SPARK-18444][SPARKR] SparkR running in yarn-clus...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/15888#discussion_r88808464 --- Diff: R/pkg/R/sparkR.R --- @@ -373,8 +373,13 @@ sparkR.session <- function( overrideEnvs(sparkConfigMap, paramMap) } + deployMode <- "" + if (exists("spark.submit.deployMode", envir = sparkConfigMap)) { +deployMode <- sparkConfigMap[["spark.submit.deployMode"]] + } --- End diff -- fair enough --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15888: [SPARK-18444][SPARKR] SparkR running in yarn-clus...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/15888#discussion_r88808475 --- Diff: R/pkg/R/sparkR.R --- @@ -550,24 +555,27 @@ processSparkPackages <- function(packages) { # # @param sparkHome directory to find Spark package. # @param master the Spark master URL, used to check local or remote mode. +# @param deployMode whether to deploy your driver on the worker nodes (cluster) +#or locally as an external client (client). # @return NULL if no need to update sparkHome, and new sparkHome otherwise. -sparkCheckInstall <- function(sparkHome, master) { +sparkCheckInstall <- function(sparkHome, master, deployMode) { if (!isSparkRShell()) { if (!is.na(file.info(sparkHome)$isdir)) { msg <- paste0("Spark package found in SPARK_HOME: ", sparkHome) message(msg) NULL } else { - if (!nzchar(master) || isMasterLocal(master)) { -msg <- paste0("Spark not found in SPARK_HOME: ", - sparkHome) + if (isMasterLocal(master)) { +msg <- paste0("Spark not found in SPARK_HOME: ", sparkHome) message(msg) packageLocalDir <- install.spark() packageLocalDir - } else { + } else if (isClientMode(master) || deployMode == "client") { --- End diff -- ah yes.. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15888: [SPARK-18444][SPARKR] SparkR running in yarn-clus...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/15888#discussion_r88763342 --- Diff: R/pkg/R/sparkR.R --- @@ -373,8 +373,13 @@ sparkR.session <- function( overrideEnvs(sparkConfigMap, paramMap) } + deployMode <- "" + if (exists("spark.submit.deployMode", envir = sparkConfigMap)) { +deployMode <- sparkConfigMap[["spark.submit.deployMode"]] + } --- End diff -- you could instead just do `deployMode <- sparkConfigMap[["spark.submit.deployMode"]]` it will be set to NULL if the key is not set. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15888: [SPARK-18444][SPARKR] SparkR running in yarn-clus...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15888#discussion_r88214305 --- Diff: R/pkg/R/sparkR.R --- @@ -550,24 +566,28 @@ processSparkPackages <- function(packages) { # # @param sparkHome directory to find Spark package. # @param master the Spark master URL, used to check local or remote mode. +# @param deployMode whether to deploy your driver on the worker nodes (cluster) or +#locally as an external client (client). # @return NULL if no need to update sparkHome, and new sparkHome otherwise. -sparkCheckInstall <- function(sparkHome, master) { +sparkCheckInstall <- function(sparkHome, master, deployMode) { if (!isSparkRShell()) { if (!is.na(file.info(sparkHome)$isdir)) { msg <- paste0("Spark package found in SPARK_HOME: ", sparkHome) message(msg) NULL } else { - if (!nzchar(master) || isMasterLocal(master)) { + if (isMasterLocal(master)) { msg <- paste0("Spark not found in SPARK_HOME: ", sparkHome) message(msg) packageLocalDir <- install.spark() packageLocalDir - } else { + } else if (nzchar(deployMode) && deployMode == "client") { msg <- paste0("Spark not found in SPARK_HOME: ", sparkHome, "\n", installInstruction("remote")) stop(msg) + } else { +NULL --- End diff -- @felixcheung If we submit SparkR jobs by ```spark-submit```, then ```master``` and ```deployMode``` should be empty. ```sparkHome``` should be set correctly in ```client``` and ```local``` mode, but empty in ```cluster``` mode. ```sparkCheckInstall``` return NULL in all these scenarios. If we start SparkR by ```SparkR.session``` and pass in corresponding arguments, we should download Spark package when ```master = local[*]```. If ```deployMode``` was set with ```client``` and w/o ```SPARK_HOME``` set correctly, we should stop due to misconfigure. Otherwise(deploy with ```yarn``` or ```mesos``` cluster mode) return NULL here. From Spark 2.0, ```master = yarn-cluster``` was deprecated by ```master = yarn, deployMode = cluster```. Since ```SparkR.session``` was introduced from 2.0, I don't think we should support the old convention here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15888: [SPARK-18444][SPARKR] SparkR running in yarn-clus...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/15888#discussion_r88127026 --- Diff: R/pkg/R/sparkR.R --- @@ -558,16 +558,18 @@ sparkCheckInstall <- function(sparkHome, master) { message(msg) NULL } else { - if (!nzchar(master) || isMasterLocal(master)) { + if (isMasterLocal(master)) { msg <- paste0("Spark not found in SPARK_HOME: ", sparkHome) message(msg) packageLocalDir <- install.spark() packageLocalDir - } else { + } else if (master == "yarn-client") { msg <- paste0("Spark not found in SPARK_HOME: ", sparkHome, "\n", installInstruction("remote")) stop(msg) + } else { --- End diff -- also what happens to "mesos"? this is what we are saying we should support in SparkR 2.1.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15888: [SPARK-18444][SPARKR] SparkR running in yarn-clus...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15888#discussion_r88032672 --- Diff: R/pkg/R/sparkR.R --- @@ -558,16 +558,18 @@ sparkCheckInstall <- function(sparkHome, master) { message(msg) NULL } else { - if (!nzchar(master) || isMasterLocal(master)) { --- End diff -- ```master``` will be always empty if the job was submitted by ```spark-submit```. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15888: [SPARK-18444][SPARKR] SparkR running in yarn-clus...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15888#discussion_r88032786 --- Diff: R/pkg/R/sparkR.R --- @@ -558,16 +558,18 @@ sparkCheckInstall <- function(sparkHome, master) { message(msg) NULL } else { - if (!nzchar(master) || isMasterLocal(master)) { + if (isMasterLocal(master)) { msg <- paste0("Spark not found in SPARK_HOME: ", sparkHome) message(msg) packageLocalDir <- install.spark() packageLocalDir - } else { + } else if (master == "yarn-client") { msg <- paste0("Spark not found in SPARK_HOME: ", sparkHome, "\n", installInstruction("remote")) stop(msg) + } else { --- End diff -- ```yarn-cluster``` mode, will do nothing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15888: [SPARK-18444][SPARKR] SparkR running in yarn-clus...
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/15888 [SPARK-18444][SPARKR] SparkR running in yarn-cluster mode should not download Spark package. ## What changes were proposed in this pull request? When running SparkR job in yarn-cluster mode, it will download Spark package from apache website which is not necessary. ``` ./bin/spark-submit --master yarn-cluster ./examples/src/main/r/dataframe.R ``` The following is output: ``` Attaching package: âSparkRâ The following objects are masked from âpackage:statsâ: cov, filter, lag, na.omit, predict, sd, var, window The following objects are masked from âpackage:baseâ: as.data.frame, colnames, colnames<-, drop, endsWith, intersect, rank, rbind, sample, startsWith, subset, summary, transform, union Spark not found in SPARK_HOME: Spark not found in the cache directory. Installation will start. MirrorUrl not provided. Looking for preferred site from apache website... .. ``` There's no ```SPARK_HOME``` in yarn-cluster mode since the R process is in a remote host of the yarn cluster rather than in the client host. The JVM comes up first and the R process then connects to it. So in such cases we should never have to download Spark as Spark is already running. ## How was this patch tested? Offline test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yanboliang/spark spark-18444 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15888.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15888 commit 16aa40086f8e2e58f7e3d7c3ec95a2e4d5967e5b Author: Yanbo Liang Date: 2016-11-15T10:01:38Z SparkR running in yarn-cluster mode should not download Spark package. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org