GitHub user yanboliang opened a pull request:

    https://github.com/apache/spark/pull/15888

    [SPARK-18444][SPARKR] SparkR running in yarn-cluster mode should not 
download Spark package.

    ## What changes were proposed in this pull request?
    When running SparkR job in yarn-cluster mode, it will download Spark 
package from apache website which is not necessary.
    ```
    ./bin/spark-submit --master yarn-cluster ./examples/src/main/r/dataframe.R
    ```
    The following is output:
    ```
    Attaching package: ‘SparkR’
    
    The following objects are masked from ‘package:stats’:
    
        cov, filter, lag, na.omit, predict, sd, var, window
    
    The following objects are masked from ‘package:base’:
    
        as.data.frame, colnames, colnames<-, drop, endsWith, intersect,
        rank, rbind, sample, startsWith, subset, summary, transform, union
    
    Spark not found in SPARK_HOME:
    Spark not found in the cache directory. Installation will start.
    MirrorUrl not provided.
    Looking for preferred site from apache website...
    ......
    ```
    There's no ```SPARK_HOME``` in yarn-cluster mode since the R process is in 
a remote host of the yarn cluster rather than in the client host. The JVM comes 
up first and the R process then connects to it. So in such cases we should 
never have to download Spark as Spark is already running.
    
    ## How was this patch tested?
    Offline test.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yanboliang/spark spark-18444

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15888.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15888
    
----
commit 16aa40086f8e2e58f7e3d7c3ec95a2e4d5967e5b
Author: Yanbo Liang <yblia...@gmail.com>
Date:   2016-11-15T10:01:38Z

    SparkR running in yarn-cluster mode should not download Spark package.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to