[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14613018#comment-14613018
 ] 

Vincent Warmerdam edited comment on SPARK-8596 at 7/3/15 10:21 AM:
-------------------------------------------------------------------

I now have a more elegant way to get any R shell connected to spark. If you 
have run the spark submit: 

```
/root/spark/sbin/stop-all.sh
/root/spark/sbin/start-all.sh
```

then this snippet will collect all data you need automatically (the tutorial 
had manual labor involved) 

```
region_ip <- system("curl 
http://169.254.169.254/latest/meta-data/public-hostname";, intern=TRUE)
spark_link <- paste0('spark://', region_ip, ':7077')

.libPaths(c(.libPaths(), '/root/spark/R/lib'))
Sys.setenv(SPARK_HOME = '/root/spark')
Sys.setenv(PATH = paste(Sys.getenv(c("PATH")), '/root/spark/bin', sep=':'))
library(SparkR)

sc <- sparkR.init(spark_link)
sqlContext <- sparkRSQL.init(sc)
```

This snippet can be made part of the '.Rprofile', which will allow any user of 
Rstudio to automatically be connected to Spark. This will only work if 
`/root/spark/sbin/start-all.sh` is run. Do we want this? Possible downside 
might be that errors will be thrown in the R user doesn't understand that 
`stat-all.sh` needs to be run first. 

**edit** 

My current branch does this. After connecting to spark, the terminal now shows 
this as well: 

```
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/   _/
   /__ / .__/\_,_/_/ /_/\_   version 1.4.0
      /_/

Spark Context available as "sc".
Spark SQL Context available as "sqlContext".
During startup - Warning message:
package ‘SparkR’ was built under R version 3.1.3
```
It doesnt yet work in Rstudio but it can be provided as a startup script. 


was (Author: cantdutchthis):
I now have a more elegant way to get any R shell connected to spark. If you 
have run the spark submit: 

```
/root/spark/sbin/stop-all.sh
/root/spark/sbin/start-all.sh
```

then this snippet will collect all data you need automatically (the tutorial 
had manual labor involved) 

```
region_ip <- system("curl 
http://169.254.169.254/latest/meta-data/public-hostname";, intern=TRUE)
spark_link <- paste0('spark://', region_ip, ':7077')

.libPaths(c(.libPaths(), '/root/spark/R/lib'))
Sys.setenv(SPARK_HOME = '/root/spark')
Sys.setenv(PATH = paste(Sys.getenv(c("PATH")), '/root/spark/bin', sep=':'))
library(SparkR)

sc <- sparkR.init(spark_link)
sqlContext <- sparkRSQL.init(sc)
```

This snippet can be made part of the '.Rprofile', which will allow any user of 
Rstudio to automatically be connected to Spark. This will only work if 
`/root/spark/sbin/start-all.sh` is run. Do we want this? Possible downside 
might be that errors will be thrown in the R user doesn't understand that 
`stat-all.sh` needs to be run first. 

**edit** 

My current branch does this. After connecting to spark, the terminal now shows 
this as well: 

```
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/   _/
   /__ / .__/\_,_/_/ /_/\_   version 1.4.0
      /_/

Spark Context available as "sc".
Spark SQL Context available as "sqlContext".
During startup - Warning message:
package ‘SparkR’ was built under R version 3.1.3
```


> Install and configure RStudio server on Spark EC2
> -------------------------------------------------
>
>                 Key: SPARK-8596
>                 URL: https://issues.apache.org/jira/browse/SPARK-8596
>             Project: Spark
>          Issue Type: Improvement
>          Components: EC2, SparkR
>            Reporter: Shivaram Venkataraman
>
> This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to