Yanbo Liang created SPARK-18444:
-----------------------------------

             Summary: SparkR running in yarn-cluster mode should not download 
Spark package
                 Key: SPARK-18444
                 URL: https://issues.apache.org/jira/browse/SPARK-18444
             Project: Spark
          Issue Type: Bug
          Components: SparkR
            Reporter: Yanbo Liang


When running SparkR job in yarn-cluster mode, it will download Spark package 
from apache website which is not necessary.
{code}
./bin/spark-submit --master yarn-cluster ./examples/src/main/r/dataframe.R
{code}
The following is output:
{code}
Attaching package: ‘SparkR’

The following objects are masked from ‘package:stats’:

    cov, filter, lag, na.omit, predict, sd, var, window

The following objects are masked from ‘package:base’:

    as.data.frame, colnames, colnames<-, drop, endsWith, intersect,
    rank, rbind, sample, startsWith, subset, summary, transform, union

Spark not found in SPARK_HOME:
Spark not found in the cache directory. Installation will start.
MirrorUrl not provided.
Looking for preferred site from apache website...
......
{code} 
There's no {{SPARK_HOME}} in yarn-cluster mode since the R process is in a 
remote host of the yarn cluster rather than in the client host. The JVM comes 
up first and the R process then connects to it. So in such cases we should 
never have to download Spark as Spark is already running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to