[jira] [Comment Edited] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-07-07 Thread Vincent Warmerdam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616616#comment-14616616
 ] 

Vincent Warmerdam edited comment on SPARK-8596 at 7/7/15 3:59 PM:
--

made the changes. confirmed that rstudio now works out of the box with the 
startupscript. 

https://github.com/koaning/spark-ec2/tree/rstudio-install

what is the easiest way to double check and confirm that this will work 
properly? 


was (Author: cantdutchthis):
made the changes. confirmed that rstudio now works out of the box with the 
startupscript. 

https://github.com/koaning/spark-ec2/tree/rstudio-install

 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-07-06 Thread Vincent Warmerdam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615523#comment-14615523
 ] 

Vincent Warmerdam edited comment on SPARK-8596 at 7/6/15 7:49 PM:
--

1. true. made commit: 
https://github.com/koaning/spark-ec2/blob/rstudio-install/rstudio/init.sh#L35-37.
 do we wan this code here, or somewhere else? seems like this is something 
system wide and not rstudio specific... what about /spark/init.sh instead of 
/rstudio/init.sh? 

2. it gives you a breaking error. 

```
 sc - sparkR.init(spark_link) 
Launching java with spark-submit command /root/spark/bin/spark-submit  
sparkr-shell /tmp/RtmpSaSV2q/backend_port53744f8e9f59 
/root/spark/conf/spark-env.sh: line 30: ulimit: open files: cannot modify 
limit: Operation not permitted
```
ulimit is a command that can only be run by root, not the rstudio user.

3. can i remove this line: 
https://github.com/koaning/spark-ec2/blob/branch-1.4/templates/root/spark/conf/spark-env.sh#L30
 

not all users will want to use rstudio so doing this might break things (im 
assuming pyspark might use this script as well?). we need to change the file 
though if we want rstudio to work. perhaps we can move the new 
`/etc/security/limits.conf` paremeters current set in /rstudio/init.sh to 
/spark/init.sh? 




was (Author: cantdutchthis):
1. true. made commit: 
https://github.com/koaning/spark-ec2/blob/rstudio-install/rstudio/init.sh#L35-37.
 do we wan this code here, or somewhere else? seems like this is something 
system wide and not rstudio specific... what about /spark/init.sh instead of 
/rstudio/init.sh? 

2. it gives you a breaking error. 

```
 sc - sparkR.init(spark_link) 
Launching java with spark-submit command /root/spark/bin/spark-submit  
sparkr-shell /tmp/RtmpSaSV2q/backend_port53744f8e9f59 
/root/spark/conf/spark-env.sh: line 30: ulimit: open files: cannot modify 
limit: Operation not permitted
```
ulimit is a command that can only be run by root, not the rstudio user.

3. can i remove this line: 
https://github.com/koaning/spark-ec2/blob/branch-1.4/templates/root/spark/conf/spark-env.sh#L30
 

not all users will want to use rstudio so doing this might break things (im 
assuming pyspark might use this script as well?). we need to change the file 
though if we want rstudio to work. perhaps we can move the new 
`/etc/security/limits.conf` paremeters in here? 



 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-07-06 Thread Vincent Warmerdam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615523#comment-14615523
 ] 

Vincent Warmerdam edited comment on SPARK-8596 at 7/6/15 7:48 PM:
--

1. true. made commit: 
https://github.com/koaning/spark-ec2/blob/rstudio-install/rstudio/init.sh#L35-37.
 do we wan this code here, or somewhere else? seems like this is something 
system wide and not rstudio specific... what about /spark/init.sh instead of 
/rstudio/init.sh? 

2. it gives you a breaking error. 

```
 sc - sparkR.init(spark_link) 
Launching java with spark-submit command /root/spark/bin/spark-submit  
sparkr-shell /tmp/RtmpSaSV2q/backend_port53744f8e9f59 
/root/spark/conf/spark-env.sh: line 30: ulimit: open files: cannot modify 
limit: Operation not permitted
```
ulimit is a command that can only be run by root, not the rstudio user.

3. can i remove this line: 
https://github.com/koaning/spark-ec2/blob/branch-1.4/templates/root/spark/conf/spark-env.sh#L30
 

not all users will want to use rstudio so doing this might break things (im 
assuming pyspark might use this script as well?). we need to change the file 
though if we want rstudio to work. perhaps we can move the new 
`/etc/security/limits.conf` paremeters in here? 




was (Author: cantdutchthis):
1. true. made commit: 
https://github.com/koaning/spark-ec2/blob/rstudio-install/rstudio/init.sh#L35-37.
 do we wan this code here, or somewhere else? seems like this is something 
system wide and not rstudio specific... what about /spark/init.sh instead of 
/rstudio/init.sh? 

2. it gives you a breaking error. 

```
 sc - sparkR.init(spark_link) 
Launching java with spark-submit command /root/spark/bin/spark-submit  
sparkr-shell /tmp/RtmpSaSV2q/backend_port53744f8e9f59 
/root/spark/conf/spark-env.sh: line 30: ulimit: open files: cannot modify 
limit: Operation not permitted
```
ulimit is a command that can only be run by root, while the rstudio user isn't. 

3. can i remove this line: 
https://github.com/koaning/spark-ec2/blob/branch-1.4/templates/root/spark/conf/spark-env.sh#L30
 

not all users will want to use rstudio and im not sure how this might break 
things (im assuming pyspark might use this script as well?) perhaps we can move 
the new `/etc/security/limits.conf` paremeters in here? 



 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-07-03 Thread Vincent Warmerdam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613018#comment-14613018
 ] 

Vincent Warmerdam edited comment on SPARK-8596 at 7/3/15 9:59 AM:
--

I now have a more elegant way to get any R shell connected to spark. If you 
have run the spark submit: 

```
/root/spark/sbin/stop-all.sh
/root/spark/sbin/start-all.sh
```

then this snippet will collect all data you need automatically (the tutorial 
had manual labor involved) 

```
region_ip - system(curl 
http://169.254.169.254/latest/meta-data/public-hostname;, intern=TRUE)
spark_link - paste0('spark://', region_ip, ':7077')

.libPaths(c(.libPaths(), '/root/spark/R/lib'))
Sys.setenv(SPARK_HOME = '/root/spark')
Sys.setenv(PATH = paste(Sys.getenv(c(PATH)), '/root/spark/bin', sep=':'))
library(SparkR)

sc - sparkR.init(spark_link)
sqlContext - sparkRSQL.init(sc)
```

This snippet can be made part of the '.Rprofile', which will allow any user of 
Rstudio to automatically be connected to Spark. This will only work if 
`/root/spark/sbin/start-all.sh` is run. Do we want this? Possible downside 
might be that errors will be thrown in the R user doesn't understand that 
`stat-all.sh` needs to be run first. 

**edit** 

My current branch does this. After connecting to spark, the terminal now shows 
this as well: 

```
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/   _/
   /__ / .__/\_,_/_/ /_/\_   version 1.4.0
  /_/

Spark Context available as sc.
Spark SQL Context available as sqlContext.
During startup - Warning message:
package ‘SparkR’ was built under R version 3.1.3
```



was (Author: cantdutchthis):
I now have a more elegant way to get any R shell connected to spark. If you 
have run the spark submit: 

```
/root/spark/sbin/stop-all.sh
/root/spark/sbin/start-all.sh
```

then this snippet will collect all data you need automatically (the tutorial 
had manual labor involved) 

```
region_ip - system(curl 
http://169.254.169.254/latest/meta-data/public-hostname;, intern=TRUE)
spark_link - paste0('spark://', region_ip, ':7077')

.libPaths(c(.libPaths(), '/root/spark/R/lib'))
Sys.setenv(SPARK_HOME = '/root/spark')
Sys.setenv(PATH = paste(Sys.getenv(c(PATH)), '/root/spark/bin', sep=':'))
library(SparkR)

sc - sparkR.init(spark_link)
sqlContext - sparkRSQL.init(sc)
```

This snippet can be made part of the '.Rprofile', which will allow any user of 
Rstudio to automatically be connected to Spark. This will only work if 
`/root/spark/sbin/start-all.sh` is run. Do we want this? Possible downside 
might be that errors will be thrown in the R user doesn't understand that 
`stat-all.sh` needs to be run first. 

 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-07-03 Thread Vincent Warmerdam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613018#comment-14613018
 ] 

Vincent Warmerdam edited comment on SPARK-8596 at 7/3/15 10:21 AM:
---

I now have a more elegant way to get any R shell connected to spark. If you 
have run the spark submit: 

```
/root/spark/sbin/stop-all.sh
/root/spark/sbin/start-all.sh
```

then this snippet will collect all data you need automatically (the tutorial 
had manual labor involved) 

```
region_ip - system(curl 
http://169.254.169.254/latest/meta-data/public-hostname;, intern=TRUE)
spark_link - paste0('spark://', region_ip, ':7077')

.libPaths(c(.libPaths(), '/root/spark/R/lib'))
Sys.setenv(SPARK_HOME = '/root/spark')
Sys.setenv(PATH = paste(Sys.getenv(c(PATH)), '/root/spark/bin', sep=':'))
library(SparkR)

sc - sparkR.init(spark_link)
sqlContext - sparkRSQL.init(sc)
```

This snippet can be made part of the '.Rprofile', which will allow any user of 
Rstudio to automatically be connected to Spark. This will only work if 
`/root/spark/sbin/start-all.sh` is run. Do we want this? Possible downside 
might be that errors will be thrown in the R user doesn't understand that 
`stat-all.sh` needs to be run first. 

**edit** 

My current branch does this. After connecting to spark, the terminal now shows 
this as well: 

```
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/   _/
   /__ / .__/\_,_/_/ /_/\_   version 1.4.0
  /_/

Spark Context available as sc.
Spark SQL Context available as sqlContext.
During startup - Warning message:
package ‘SparkR’ was built under R version 3.1.3
```
It doesnt yet work in Rstudio but it can be provided as a startup script. 


was (Author: cantdutchthis):
I now have a more elegant way to get any R shell connected to spark. If you 
have run the spark submit: 

```
/root/spark/sbin/stop-all.sh
/root/spark/sbin/start-all.sh
```

then this snippet will collect all data you need automatically (the tutorial 
had manual labor involved) 

```
region_ip - system(curl 
http://169.254.169.254/latest/meta-data/public-hostname;, intern=TRUE)
spark_link - paste0('spark://', region_ip, ':7077')

.libPaths(c(.libPaths(), '/root/spark/R/lib'))
Sys.setenv(SPARK_HOME = '/root/spark')
Sys.setenv(PATH = paste(Sys.getenv(c(PATH)), '/root/spark/bin', sep=':'))
library(SparkR)

sc - sparkR.init(spark_link)
sqlContext - sparkRSQL.init(sc)
```

This snippet can be made part of the '.Rprofile', which will allow any user of 
Rstudio to automatically be connected to Spark. This will only work if 
`/root/spark/sbin/start-all.sh` is run. Do we want this? Possible downside 
might be that errors will be thrown in the R user doesn't understand that 
`stat-all.sh` needs to be run first. 

**edit** 

My current branch does this. After connecting to spark, the terminal now shows 
this as well: 

```
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/   _/
   /__ / .__/\_,_/_/ /_/\_   version 1.4.0
  /_/

Spark Context available as sc.
Spark SQL Context available as sqlContext.
During startup - Warning message:
package ‘SparkR’ was built under R version 3.1.3
```


 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-07-02 Thread Vincent Warmerdam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611832#comment-14611832
 ] 

Vincent Warmerdam edited comment on SPARK-8596 at 7/2/15 11:55 AM:
---

By the way, I now have scripts that do install Rstudio (just ran and 
confirmed). 

The code is here: 

https://github.com/koaning/spark-ec2/tree/rstudio-install (added rstudio as a 
module) 
https://github.com/koaning/spark/tree/rstudio-install

When initializing with this command: 

./spark-ec2 --key-pair=spark-df 
--identity-file=/Users/code/Downloads/spark-df.pem --region=eu-west-1 -s 1 
--instance-type=c3.2xlarge 
--spark-ec2-git-repo=https://github.com/koaning/spark-ec2 
--spark-ec2-git-branch=rstudio-install launch mysparkr

I can confirm that rstudio is installand and that a correct user is added. 
There are two concerns:

- should we not force the user to supply the password themselves? setting a 
standard password seems like a security vulnerability. 
- I am not sure if this gets installed on all the slave nodes. I added this 
module 
(https://github.com/koaning/spark-ec2/blob/rstudio-install/rstudio/init.sh) and 
we only need it on the master node. I wonder what the best way is to ensure 
this.


was (Author: cantdutchthis):
By the way, I now have scripts that do install Rstudio (just ran and 
confirmed). 

The code is here: 

https://github.com/koaning/spark-ec2/tree/rstudio-install
https://github.com/koaning/spark/tree/rstudio-install

When initializing with this command: 

./spark-ec2 --key-pair=spark-df 
--identity-file=/Users/code/Downloads/spark-df.pem --region=eu-west-1 -s 1 
--instance-type=c3.2xlarge 
--spark-ec2-git-repo=https://github.com/koaning/spark-ec2 
--spark-ec2-git-branch=rstudio-install launch mysparkr

I can confirm that rstudio is installand and that a correct user is added. 
There are two concerns:

- should we not force the user to supply the password themselves? setting a 
standard password seems like a security vulnerability. 
- I am not sure if this gets installed on all the slave nodes. I added this 
module 
(https://github.com/koaning/spark-ec2/blob/rstudio-install/rstudio/init.sh) and 
we only need it on the master node. I wonder what the best way is to ensure 
this.

 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-06-28 Thread Vincent Warmerdam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604587#comment-14604587
 ] 

Vincent Warmerdam edited comment on SPARK-8596 at 6/28/15 8:49 AM:
---

1. on it. just created [SPARK-8596][EC2] Added port for Rstudio
2. on it




was (Author: cantdutchthis):
1. on it. 



 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-06-27 Thread Vincent Warmerdam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604053#comment-14604053
 ] 

Vincent Warmerdam edited comment on SPARK-8596 at 6/27/15 8:31 AM:
---

I'm writing a small tutorial to get up to scratch with rstudio on AWS. It 
works. The main issue seems that currently ec2 installs an old version of R 
(3.1) while most packages like ggplot require a new version (3.2). I'm going to 
share the tutorial with the Rstudio guys soon. 

My approach is to run `spark/bin/start-all.sh` on the master node and then run 
the following commands in Rstudio on the master node: 

.libPaths( c( .libPaths(), '/root/spark/R/lib') )
Sys.setenv(SPARK_HOME = '/root/spark')
Sys.setenv(PATH = paste(Sys.getenv(c(PATH)), '/root/spark/bin', sep=':'))
library(SparkR)
sc - sparkR.init('SPARK MASTER ADR')
sqlContext - sparkRSQL.init(sc)

This works on my end, and I've been able to use the dataframe API with a json 
blob on s3 with this sqlContext. The main issue on my end is that because of 
the old R version I can't install visualisation/knitr packages. The spark 
dataframe works like a charm in the GUI though. 


was (Author: cantdutchthis):
I'm writing a small tutorial to get up to scratch with rstudio on AWS. It 
works. The main issue seems that currently ec2 installs an old version of R 
(3.1) while most packages like ggplot require a new version (3.2). I'm going to 
share the tutorial with the Rstudio guys soon. 

My approach is to run `spark/bin/start-all.sh` on the master node and then run 
the following commands in Rstudio on the master node: 

.libPaths( c( .libPaths(), '/root/spark/R/lib') )
Sys.setenv(SPARK_HOME = '/root/spark')
Sys.setenv(PATH = paste(Sys.getenv(c(PATH)), '/root/spark/bin', sep=':'))
library(SparkR)
sc - sparkR.init('SPARK MASTER ADR')
sqlContext - sparkRSQL.init(sc)

This works on my end, and I've been able to use the dataframe API with a json 
blob on s3 with this sqlContext. The main issue on my end is that because of 
the old R version I can't install visualisation/knitr packages. The spark 
dataframe works like a charm in the GUI though. 



 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-06-27 Thread Vincent Warmerdam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604053#comment-14604053
 ] 

Vincent Warmerdam edited comment on SPARK-8596 at 6/27/15 8:03 AM:
---

I'm writing a small tutorial to get up to scratch with rstudio on AWS. It 
works. The main issue seems that currently ec2 installs an old version of R 
(3.1) while most packages like ggplot require a new version (3.2). I'm going to 
share the tutorial with the Rstudio guys soon. 

My approach is to run `spark/bin/start-all.sh` on the master node and then run 
the following commands in Rstudio on the master node: 

.libPaths( c( .libPaths(), '/root/spark/R/lib') )
Sys.setenv(SPARK_HOME = '/root/spark')
Sys.setenv(PATH = paste(Sys.getenv(c(PATH)), '/root/spark/bin', sep=':'))
library(SparkR)
sc - sparkR.init('SPARK MASTER ADR')
sqlContext - sparkRSQL.init(sc)

This works on my end, and I've been able to use the dataframe API with a json 
blob on s3 with this sqlContext. The main issue on my end is that because of 
the old R version I can't install visualisation/knitr packages. The spark 
dataframe works like a charm in the GUI though. 




was (Author: cantdutchthis):
I'm writing a small tutorial to get up to scratch with rstudio on AWS. It 
works. The main issue seems that currently ec2 installs an old version of R 
(3.1) while most packages like ggplot require a new version (3.2). I'm going to 
share the tutorial with the Rstudio guys soon. 

My approach is to run `spark/bin/start-all.sh` on the master node and then run 
the following commands in Rstudio on the master node: 

.libPaths( c( .libPaths(), '/root/spark/R/lib') )
Sys.setenv(SPARK_HOME = '/root/spark')
Sys.setenv(PATH = paste(Sys.getenv(c(PATH)), '/root/spark/bin', sep=':'))
library(SparkR)
sc - sparkR.init('SPARK MASTER ADR')
sqlContext - sparkRSQL.init(sc)

This works on my end, and I've been able to use the dataframe API with a json 
blob on s3 with this sqlContext. 

 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-06-27 Thread Vincent Warmerdam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604053#comment-14604053
 ] 

Vincent Warmerdam edited comment on SPARK-8596 at 6/27/15 8:04 AM:
---

I'm writing a small tutorial to get up to scratch with rstudio on AWS. It 
works. The main issue seems that currently ec2 installs an old version of R 
(3.1) while most packages like ggplot require a new version (3.2). I'm going to 
share the tutorial with the Rstudio guys soon. 

My approach is to run `spark/bin/start-all.sh` on the master node and then run 
the following commands in Rstudio on the master node: 

.libPaths( c( .libPaths(), '/root/spark/R/lib') )
Sys.setenv(SPARK_HOME = '/root/spark')
Sys.setenv(PATH = paste(Sys.getenv(c(PATH)), '/root/spark/bin', sep=':'))
library(SparkR)
sc - sparkR.init('SPARK MASTER ADR')
sqlContext - sparkRSQL.init(sc)

This works on my end, and I've been able to use the dataframe API with a json 
blob on s3 with this sqlContext. The main issue on my end is that because of 
the old R version I can't install visualisation/knitr packages. The spark 
dataframe works like a charm in the GUI though. 




was (Author: cantdutchthis):
I'm writing a small tutorial to get up to scratch with rstudio on AWS. It 
works. The main issue seems that currently ec2 installs an old version of R 
(3.1) while most packages like ggplot require a new version (3.2). I'm going to 
share the tutorial with the Rstudio guys soon. 

My approach is to run `spark/bin/start-all.sh` on the master node and then run 
the following commands in Rstudio on the master node: 

.libPaths( c( .libPaths(), '/root/spark/R/lib') )
Sys.setenv(SPARK_HOME = '/root/spark')
Sys.setenv(PATH = paste(Sys.getenv(c(PATH)), '/root/spark/bin', sep=':'))
library(SparkR)
sc - sparkR.init('SPARK MASTER ADR')
sqlContext - sparkRSQL.init(sc)

This works on my end, and I've been able to use the dataframe API with a json 
blob on s3 with this sqlContext. The main issue on my end is that because of 
the old R version I can't install visualisation/knitr packages. The spark 
dataframe works like a charm in the GUI though. 



 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-06-27 Thread Vincent Warmerdam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604328#comment-14604328
 ] 

Vincent Warmerdam edited comment on SPARK-8596 at 6/27/15 8:07 PM:
---

Rstudio doesn't want to be run as a root user in general. Rstudio doesn't have 
https out of the box, so any port sniffing suddenly becomes a security risk if 
somebody can use the webui to gain root access. 

Instead,  you can just go and add a new user 

$ useradd analyst 
$ passwd analyst

This user will then be able to log in. Note that in order to see rstudio, you 
will need to also edit the security group for the master node to allow TCP to 
connect to this port. 

I'd love to help out and spend some time on these issues by the way. I've got a 
small tutorial .md file ready, can I share that via Jira?  Would like to double 
check it with you guys because I may be doing a dirty trick. For Rstudio not to 
give errors, I remove a line of code in a shell script (because this new user 
is not root it cannot run `ulimit` commands). 

Dirty trick (run as root): 

sed -e 's/^ulimit/#ulimit/g' /root/spark/conf/spark-env.sh  
/root/spark/conf/spark-env2.sh
mv /root/spark/conf/spark-env2.sh /root/spark/conf/spark-env.sh
ulimit -n 100

Installing the right R version always was a bit tricky. Will follow other Jira 
ticket as well, let me know when it is up. 



was (Author: cantdutchthis):
Rstudio doesn't want to be run as a root user in general. Rstudio doesn't have 
https out of the box, so any port sniffing suddenly becomes a security risk if 
somebody can use the webui to gain root access. 

Instead,  you can just go and add a new user 

$ useradd analyst 
$ passwd analyst

This user will then be able to log in. Note that in order to see rstudio, you 
will need to also edit the security group for the master node to allow TCP to 
connect to this port. 

I'd love to help out and spend some time on these issues by the way. I've got a 
small tutorial .md file ready, can I share that via Jira?  Would like to double 
check it with you guys because I may be doing a dirty trick. For Rstudio not to 
give errors, I remove a line of code in a shell script (because this new user 
is not root it cannot run `ulimit` commands). 

Dirty trick (run as root): 

sed -e 's/^ulimit/#ulimit/g' /root/spark/conf/spark-env.sh  
/root/spark/conf/spark-env2.sh
mv /root/spark/conf/spark-env2.sh /root/spark/conf/spark-env.sh
ulimit -n 100

Installing the right R version always was a bit tricky. Will follow other Jira 
ticket as well. 


 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org