[GitHub] spark pull request: [SPARK-12699][SPARKR] R driver process should ...
Github user felixcheung closed the pull request at: https://github.com/apache/spark/pull/10652 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12699][SPARKR] R driver process should ...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/10652#discussion_r50342342 --- Diff: core/src/main/scala/org/apache/spark/deploy/RPackageUtils.scala --- @@ -36,7 +36,8 @@ private[deploy] object RPackageUtils extends Logging { private final val hasRPackage = "Spark-HasRPackage" /** Base of the shell command used in order to install R packages. */ - private final val baseInstallCmd = Seq("R", "CMD", "INSTALL", "-l") + private final val baseInstallCmd = Seq("R", "--no-save", "--no-site-file", "--no-environ", +"--no-restore", "CMD", "INSTALL", "-l") --- End diff -- It actually would load the same site file, saved session etc when launching R with `R CMD` - look for `R CMD` in https://stat.ethz.ch/R-manual/R-devel/library/base/html/Startup.html --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12699][SPARKR] R driver process should ...
Github user felixcheung commented on the pull request: https://github.com/apache/spark/pull/10652#issuecomment-173121016 I realize that, my point is even in client mode the driver could be running on a worker machine, as in the case Spark job is submitted from another YARN app. On Tue, Jan 19, 2016 at 11:22 PM -0800, "sun-rui" wrote: It is possible to get deploy mode from "spark.submit.deployMode", and check if it is "client". You can take a look at https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/r/RUtils.scala#L49 --- Reply to this email directly or view it on GitHub: https://github.com/apache/spark/pull/10652#issuecomment-173116720 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12699][SPARKR] R driver process should ...
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/10652#issuecomment-173116720 It is possible to get deploy mode from "spark.submit.deployMode", and check if it is "client". You can take a look at https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/r/RUtils.scala#L49 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12699][SPARKR] R driver process should ...
Github user felixcheung commented on the pull request: https://github.com/apache/spark/pull/10652#issuecomment-173110574 I don't know if there is a way to distinguish that. It could be `spark-submit` or calling `SparkSubmit` class from Oozie and running the job in YARN client mode in which case the driver is actually running on a worker, which could be the same worker running executors. I guess we could explicitly bypass this if the cluster manager is `LOCAL`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12699][SPARKR] R driver process should ...
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/10652#issuecomment-173084751 @felixcheung, yes, something like that --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12699][SPARKR] R driver process should ...
Github user felixcheung commented on the pull request: https://github.com/apache/spark/pull/10652#issuecomment-173078695 @sun-rui is it `spark-submit foo.R`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12699][SPARKR] R driver process should ...
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/10652#issuecomment-173070829 RRunner is not only for running driver on cluster, but also for running an R script locally in client mode. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12699][SPARKR] R driver process should ...
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/10652#discussion_r50209041 --- Diff: core/src/main/scala/org/apache/spark/deploy/RPackageUtils.scala --- @@ -36,7 +36,8 @@ private[deploy] object RPackageUtils extends Logging { private final val hasRPackage = "Spark-HasRPackage" /** Base of the shell command used in order to install R packages. */ - private final val baseInstallCmd = Seq("R", "CMD", "INSTALL", "-l") + private final val baseInstallCmd = Seq("R", "--no-save", "--no-site-file", "--no-environ", +"--no-restore", "CMD", "INSTALL", "-l") --- End diff -- This is just installation that will not start R session, so these options won't be used? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12699][SPARKR] R driver process should ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10652#issuecomment-173062880 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12699][SPARKR] R driver process should ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10652#issuecomment-173062882 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49735/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12699][SPARKR] R driver process should ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10652#issuecomment-173062773 **[Test build #49735 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49735/consoleFull)** for PR 10652 at commit [`78eb194`](https://github.com/apache/spark/commit/78eb194ecb699a856861309c37c7814a5310d149). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12699][SPARKR] R driver process should ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10652#issuecomment-173037120 **[Test build #49735 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49735/consoleFull)** for PR 10652 at commit [`78eb194`](https://github.com/apache/spark/commit/78eb194ecb699a856861309c37c7814a5310d149). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12699][SPARKR] R driver process should ...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/10652#issuecomment-173032446 Yeah doing it just for the cluster mode driver seems fine to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12699][SPARKR] R driver process should ...
Github user felixcheung commented on the pull request: https://github.com/apache/spark/pull/10652#issuecomment-173026763 Driver could also be running in YARN cluster mode in which a clean state might make sense? To me this is just to reduce the level of variability. And this was brought up in PR #10171 I could also change this to only for driver in cluster mode but not from `sparkR` shell. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12699][SPARKR] R driver process should ...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/10652#issuecomment-173014932 So I'm not completely sure this is a good idea. Users might have their own R environment setup scripts in their home directory (site-file or init-file as in the R docs you linked to) that they expect to work on the driver side. On the executor side it is much more limited in terms of what code runs (i.e. invisible to the user) so I don't think the same expectations can be matched with respect to that ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12699][SPARKR] R driver process should ...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/10652#discussion_r49159546 --- Diff: core/src/main/scala/org/apache/spark/deploy/RPackageUtils.scala --- @@ -36,7 +36,8 @@ private[deploy] object RPackageUtils extends Logging { private final val hasRPackage = "Spark-HasRPackage" /** Base of the shell command used in order to install R packages. */ - private final val baseInstallCmd = Seq("R", "CMD", "INSTALL", "-l") + private final val baseInstallCmd = Seq("R", "--no-save", "--no-site-file", "--no-environ", +"--no-restore", "CMD", "INSTALL", "-l") --- End diff -- I actually think it does - it's easier to try to install package in a clean state than trying to debug when the job failed because the package failed to install. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12699][SPARKR] R driver process should ...
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/10652#discussion_r49152626 --- Diff: core/src/main/scala/org/apache/spark/deploy/RPackageUtils.scala --- @@ -36,7 +36,8 @@ private[deploy] object RPackageUtils extends Logging { private final val hasRPackage = "Spark-HasRPackage" /** Base of the shell command used in order to install R packages. */ - private final val baseInstallCmd = Seq("R", "CMD", "INSTALL", "-l") + private final val baseInstallCmd = Seq("R", "--no-save", "--no-site-file", "--no-environ", +"--no-restore", "CMD", "INSTALL", "-l") --- End diff -- I guess these options do not make sense for R package installation? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12699][SPARKR] R driver process should ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10652#issuecomment-169863988 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12699][SPARKR] R driver process should ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10652#issuecomment-169863989 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48993/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12699][SPARKR] R driver process should ...
Github user felixcheung commented on the pull request: https://github.com/apache/spark/pull/10652#issuecomment-169861623 jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12699][SPARKR] R driver process should ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10652#issuecomment-169842106 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48985/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12699][SPARKR] R driver process should ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10652#issuecomment-169842103 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12699][SPARKR] R driver process should ...
Github user felixcheung commented on the pull request: https://github.com/apache/spark/pull/10652#issuecomment-169839126 jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12699][SPARKR] R driver process should ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10652#issuecomment-169837014 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12699][SPARKR] R driver process should ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10652#issuecomment-169837015 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48981/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12699][SPARKR] R driver process should ...
GitHub user felixcheung opened a pull request: https://github.com/apache/spark/pull/10652 [SPARK-12699][SPARKR] R driver process should start in a clean state Currently we have R worker process launched with the --vanilla option that brings it up in a clean state (without init profile or workspace data, https://stat.ethz.ch/R-manual/R-devel/library/base/html/Startup.html). However, the R process for the Spark driver is not. We should do that because 1. That would make driver consistent with the worker process in R - for instance, a library would not be load in driver but not worker 2. Since SparkR depends on .libPath and .First() it could be broken by something in the user workspace, for example Here are the changes proposed: 1. When starting `sparkR` shell (except: allow save/restore workspace, since the driver/shell is local) 2. When launching R driver in cluster mode 3. In cluster mode, when calling R to install shipped R package This is discussed in PR #10171 @shivaram @sun-rui You can merge this pull request into a Git repository by running: $ git pull https://github.com/felixcheung/spark rvanilla Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10652.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10652 commit c3488c9eda1f731c24769f20eb570d97e4aa5939 Author: felixcheung Date: 2016-01-07T09:13:54Z add R command line options commit 24fee57e42beec3315979b8db4d817474bcd4baa Author: felixcheung Date: 2016-01-07T22:40:50Z allow save/restore user workspace when running shell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org