[ https://issues.apache.org/jira/browse/SPARK-18444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15666725#comment-15666725 ]
Apache Spark commented on SPARK-18444: -------------------------------------- User 'yanboliang' has created a pull request for this issue: https://github.com/apache/spark/pull/15888 > SparkR running in yarn-cluster mode should not download Spark package > --------------------------------------------------------------------- > > Key: SPARK-18444 > URL: https://issues.apache.org/jira/browse/SPARK-18444 > Project: Spark > Issue Type: Bug > Components: SparkR > Reporter: Yanbo Liang > > When running SparkR job in yarn-cluster mode, it will download Spark package > from apache website which is not necessary. > {code} > ./bin/spark-submit --master yarn-cluster ./examples/src/main/r/dataframe.R > {code} > The following is output: > {code} > Attaching package: ‘SparkR’ > The following objects are masked from ‘package:stats’: > cov, filter, lag, na.omit, predict, sd, var, window > The following objects are masked from ‘package:base’: > as.data.frame, colnames, colnames<-, drop, endsWith, intersect, > rank, rbind, sample, startsWith, subset, summary, transform, union > Spark not found in SPARK_HOME: > Spark not found in the cache directory. Installation will start. > MirrorUrl not provided. > Looking for preferred site from apache website... > ...... > {code} > There's no {{SPARK_HOME}} in yarn-cluster mode since the R process is in a > remote host of the yarn cluster rather than in the client host. The JVM comes > up first and the R process then connects to it. So in such cases we should > never have to download Spark as Spark is already running. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org