[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14613018#comment-14613018 ]
Vincent Warmerdam edited comment on SPARK-8596 at 7/3/15 10:21 AM: ------------------------------------------------------------------- I now have a more elegant way to get any R shell connected to spark. If you have run the spark submit: ``` /root/spark/sbin/stop-all.sh /root/spark/sbin/start-all.sh ``` then this snippet will collect all data you need automatically (the tutorial had manual labor involved) ``` region_ip <- system("curl http://169.254.169.254/latest/meta-data/public-hostname", intern=TRUE) spark_link <- paste0('spark://', region_ip, ':7077') .libPaths(c(.libPaths(), '/root/spark/R/lib')) Sys.setenv(SPARK_HOME = '/root/spark') Sys.setenv(PATH = paste(Sys.getenv(c("PATH")), '/root/spark/bin', sep=':')) library(SparkR) sc <- sparkR.init(spark_link) sqlContext <- sparkRSQL.init(sc) ``` This snippet can be made part of the '.Rprofile', which will allow any user of Rstudio to automatically be connected to Spark. This will only work if `/root/spark/sbin/start-all.sh` is run. Do we want this? Possible downside might be that errors will be thrown in the R user doesn't understand that `stat-all.sh` needs to be run first. **edit** My current branch does this. After connecting to spark, the terminal now shows this as well: ``` ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ _/ /__ / .__/\_,_/_/ /_/\_ version 1.4.0 /_/ Spark Context available as "sc". Spark SQL Context available as "sqlContext". During startup - Warning message: package ‘SparkR’ was built under R version 3.1.3 ``` It doesnt yet work in Rstudio but it can be provided as a startup script. was (Author: cantdutchthis): I now have a more elegant way to get any R shell connected to spark. If you have run the spark submit: ``` /root/spark/sbin/stop-all.sh /root/spark/sbin/start-all.sh ``` then this snippet will collect all data you need automatically (the tutorial had manual labor involved) ``` region_ip <- system("curl http://169.254.169.254/latest/meta-data/public-hostname", intern=TRUE) spark_link <- paste0('spark://', region_ip, ':7077') .libPaths(c(.libPaths(), '/root/spark/R/lib')) Sys.setenv(SPARK_HOME = '/root/spark') Sys.setenv(PATH = paste(Sys.getenv(c("PATH")), '/root/spark/bin', sep=':')) library(SparkR) sc <- sparkR.init(spark_link) sqlContext <- sparkRSQL.init(sc) ``` This snippet can be made part of the '.Rprofile', which will allow any user of Rstudio to automatically be connected to Spark. This will only work if `/root/spark/sbin/start-all.sh` is run. Do we want this? Possible downside might be that errors will be thrown in the R user doesn't understand that `stat-all.sh` needs to be run first. **edit** My current branch does this. After connecting to spark, the terminal now shows this as well: ``` ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ _/ /__ / .__/\_,_/_/ /_/\_ version 1.4.0 /_/ Spark Context available as "sc". Spark SQL Context available as "sqlContext". During startup - Warning message: package ‘SparkR’ was built under R version 3.1.3 ``` > Install and configure RStudio server on Spark EC2 > ------------------------------------------------- > > Key: SPARK-8596 > URL: https://issues.apache.org/jira/browse/SPARK-8596 > Project: Spark > Issue Type: Improvement > Components: EC2, SparkR > Reporter: Shivaram Venkataraman > > This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org