Repository: spark
Updated Branches:
  refs/heads/master 4725cb988 -> 2462dbcce


[SPARK-10971][SPARKR] RRunner should allow setting path to Rscript.

Add a new spark conf option "spark.sparkr.r.driver.command" to specify the 
executable for an R script in client modes.

The existing spark conf option "spark.sparkr.r.command" is used to specify the 
executable for an R script in cluster modes for both driver and workers. See 
also [launch R worker 
script](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/r/RRDD.scala#L395).

BTW, [envrionment variable 
"SPARKR_DRIVER_R"](https://github.com/apache/spark/blob/master/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java#L275)
 is used to locate R shell on the local host.

For your information, PYSPARK has two environment variables serving simliar 
purpose:
PYSPARK_PYTHON        Python binary executable to use for PySpark in both 
driver and workers (default is `python`).
PYSPARK_DRIVER_PYTHON   Python binary executable to use for PySpark in driver 
only (default is PYSPARK_PYTHON).
pySpark use the code 
[here](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala#L41)
 to determine the python executable for a python script.

Author: Sun Rui <rui....@intel.com>

Closes #9179 from sun-rui/SPARK-10971.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2462dbcc
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2462dbcc
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2462dbcc

Branch: refs/heads/master
Commit: 2462dbcce89d657bca17ae311c99c2a4bee4a5fa
Parents: 4725cb9
Author: Sun Rui <rui....@intel.com>
Authored: Fri Oct 23 21:38:04 2015 -0700
Committer: Shivaram Venkataraman <shiva...@cs.berkeley.edu>
Committed: Fri Oct 23 21:38:04 2015 -0700

----------------------------------------------------------------------
 .../scala/org/apache/spark/deploy/RRunner.scala   | 11 ++++++++++-
 docs/configuration.md                             | 18 ++++++++++++++++++
 2 files changed, 28 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/2462dbcc/core/src/main/scala/org/apache/spark/deploy/RRunner.scala
----------------------------------------------------------------------
diff --git a/core/src/main/scala/org/apache/spark/deploy/RRunner.scala 
b/core/src/main/scala/org/apache/spark/deploy/RRunner.scala
index 58cc1f9..ed183cf 100644
--- a/core/src/main/scala/org/apache/spark/deploy/RRunner.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/RRunner.scala
@@ -40,7 +40,16 @@ object RRunner {
 
     // Time to wait for SparkR backend to initialize in seconds
     val backendTimeout = sys.env.getOrElse("SPARKR_BACKEND_TIMEOUT", 
"120").toInt
-    val rCommand = "Rscript"
+    val rCommand = {
+      // "spark.sparkr.r.command" is deprecated and replaced by 
"spark.r.command",
+      // but kept here for backward compatibility.
+      var cmd = sys.props.getOrElse("spark.sparkr.r.command", "Rscript")
+      cmd = sys.props.getOrElse("spark.r.command", cmd)
+      if (sys.props.getOrElse("spark.submit.deployMode", "client") == 
"client") {
+        cmd = sys.props.getOrElse("spark.r.driver.command", cmd)
+      }
+      cmd
+    }
 
     // Check if the file path exists.
     // If not, change directory to current working directory for YARN cluster 
mode

http://git-wip-us.apache.org/repos/asf/spark/blob/2462dbcc/docs/configuration.md
----------------------------------------------------------------------
diff --git a/docs/configuration.md b/docs/configuration.md
index be9c36b..682384d 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -1589,6 +1589,20 @@ Apart from these, the following properties are also 
available, and may be useful
     Number of threads used by RBackend to handle RPC calls from SparkR package.
   </td>
 </tr>
+<tr>
+  <td><code>spark.r.command</code></td>
+  <td>Rscript</td>
+  <td>
+    Executable for executing R scripts in cluster modes for both driver and 
workers.
+  </td>
+</tr>
+<tr>
+  <td><code>spark.r.driver.command</code></td>
+  <td>spark.r.command</td>
+  <td>
+    Executable for executing R scripts in client modes for driver. Ignored in 
cluster modes.
+  </td>
+</tr>
 </table>
 
 #### Cluster Managers
@@ -1629,6 +1643,10 @@ The following variables can be set in `spark-env.sh`:
     <td>Python binary executable to use for PySpark in driver only (default is 
<code>PYSPARK_PYTHON</code>).</td>
   </tr>
   <tr>
+    <td><code>SPARKR_DRIVER_R</code></td>
+    <td>R binary executable to use for SparkR shell (default is 
<code>R</code>).</td>
+  </tr>
+  <tr>
     <td><code>SPARK_LOCAL_IP</code></td>
     <td>IP address of the machine to bind to.</td>
   </tr>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to