Repository: spark Updated Branches: refs/heads/master 4725cb988 -> 2462dbcce
[SPARK-10971][SPARKR] RRunner should allow setting path to Rscript. Add a new spark conf option "spark.sparkr.r.driver.command" to specify the executable for an R script in client modes. The existing spark conf option "spark.sparkr.r.command" is used to specify the executable for an R script in cluster modes for both driver and workers. See also [launch R worker script](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/r/RRDD.scala#L395). BTW, [envrionment variable "SPARKR_DRIVER_R"](https://github.com/apache/spark/blob/master/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java#L275) is used to locate R shell on the local host. For your information, PYSPARK has two environment variables serving simliar purpose: PYSPARK_PYTHON Python binary executable to use for PySpark in both driver and workers (default is `python`). PYSPARK_DRIVER_PYTHON Python binary executable to use for PySpark in driver only (default is PYSPARK_PYTHON). pySpark use the code [here](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala#L41) to determine the python executable for a python script. Author: Sun Rui <rui....@intel.com> Closes #9179 from sun-rui/SPARK-10971. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2462dbcc Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2462dbcc Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2462dbcc Branch: refs/heads/master Commit: 2462dbcce89d657bca17ae311c99c2a4bee4a5fa Parents: 4725cb9 Author: Sun Rui <rui....@intel.com> Authored: Fri Oct 23 21:38:04 2015 -0700 Committer: Shivaram Venkataraman <shiva...@cs.berkeley.edu> Committed: Fri Oct 23 21:38:04 2015 -0700 ---------------------------------------------------------------------- .../scala/org/apache/spark/deploy/RRunner.scala | 11 ++++++++++- docs/configuration.md | 18 ++++++++++++++++++ 2 files changed, 28 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/2462dbcc/core/src/main/scala/org/apache/spark/deploy/RRunner.scala ---------------------------------------------------------------------- diff --git a/core/src/main/scala/org/apache/spark/deploy/RRunner.scala b/core/src/main/scala/org/apache/spark/deploy/RRunner.scala index 58cc1f9..ed183cf 100644 --- a/core/src/main/scala/org/apache/spark/deploy/RRunner.scala +++ b/core/src/main/scala/org/apache/spark/deploy/RRunner.scala @@ -40,7 +40,16 @@ object RRunner { // Time to wait for SparkR backend to initialize in seconds val backendTimeout = sys.env.getOrElse("SPARKR_BACKEND_TIMEOUT", "120").toInt - val rCommand = "Rscript" + val rCommand = { + // "spark.sparkr.r.command" is deprecated and replaced by "spark.r.command", + // but kept here for backward compatibility. + var cmd = sys.props.getOrElse("spark.sparkr.r.command", "Rscript") + cmd = sys.props.getOrElse("spark.r.command", cmd) + if (sys.props.getOrElse("spark.submit.deployMode", "client") == "client") { + cmd = sys.props.getOrElse("spark.r.driver.command", cmd) + } + cmd + } // Check if the file path exists. // If not, change directory to current working directory for YARN cluster mode http://git-wip-us.apache.org/repos/asf/spark/blob/2462dbcc/docs/configuration.md ---------------------------------------------------------------------- diff --git a/docs/configuration.md b/docs/configuration.md index be9c36b..682384d 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -1589,6 +1589,20 @@ Apart from these, the following properties are also available, and may be useful Number of threads used by RBackend to handle RPC calls from SparkR package. </td> </tr> +<tr> + <td><code>spark.r.command</code></td> + <td>Rscript</td> + <td> + Executable for executing R scripts in cluster modes for both driver and workers. + </td> +</tr> +<tr> + <td><code>spark.r.driver.command</code></td> + <td>spark.r.command</td> + <td> + Executable for executing R scripts in client modes for driver. Ignored in cluster modes. + </td> +</tr> </table> #### Cluster Managers @@ -1629,6 +1643,10 @@ The following variables can be set in `spark-env.sh`: <td>Python binary executable to use for PySpark in driver only (default is <code>PYSPARK_PYTHON</code>).</td> </tr> <tr> + <td><code>SPARKR_DRIVER_R</code></td> + <td>R binary executable to use for SparkR shell (default is <code>R</code>).</td> + </tr> + <tr> <td><code>SPARK_LOCAL_IP</code></td> <td>IP address of the machine to bind to.</td> </tr> --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org