[jira] [Comment Edited] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616616#comment-14616616 ] Vincent Warmerdam edited comment on SPARK-8596 at 7/7/15 3:59 PM: -- made the changes. confirmed that rstudio now works out of the box with the startupscript. https://github.com/koaning/spark-ec2/tree/rstudio-install what is the easiest way to double check and confirm that this will work properly? was (Author: cantdutchthis): made the changes. confirmed that rstudio now works out of the box with the startupscript. https://github.com/koaning/spark-ec2/tree/rstudio-install Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615523#comment-14615523 ] Vincent Warmerdam edited comment on SPARK-8596 at 7/6/15 7:49 PM: -- 1. true. made commit: https://github.com/koaning/spark-ec2/blob/rstudio-install/rstudio/init.sh#L35-37. do we wan this code here, or somewhere else? seems like this is something system wide and not rstudio specific... what about /spark/init.sh instead of /rstudio/init.sh? 2. it gives you a breaking error. ``` sc - sparkR.init(spark_link) Launching java with spark-submit command /root/spark/bin/spark-submit sparkr-shell /tmp/RtmpSaSV2q/backend_port53744f8e9f59 /root/spark/conf/spark-env.sh: line 30: ulimit: open files: cannot modify limit: Operation not permitted ``` ulimit is a command that can only be run by root, not the rstudio user. 3. can i remove this line: https://github.com/koaning/spark-ec2/blob/branch-1.4/templates/root/spark/conf/spark-env.sh#L30 not all users will want to use rstudio so doing this might break things (im assuming pyspark might use this script as well?). we need to change the file though if we want rstudio to work. perhaps we can move the new `/etc/security/limits.conf` paremeters current set in /rstudio/init.sh to /spark/init.sh? was (Author: cantdutchthis): 1. true. made commit: https://github.com/koaning/spark-ec2/blob/rstudio-install/rstudio/init.sh#L35-37. do we wan this code here, or somewhere else? seems like this is something system wide and not rstudio specific... what about /spark/init.sh instead of /rstudio/init.sh? 2. it gives you a breaking error. ``` sc - sparkR.init(spark_link) Launching java with spark-submit command /root/spark/bin/spark-submit sparkr-shell /tmp/RtmpSaSV2q/backend_port53744f8e9f59 /root/spark/conf/spark-env.sh: line 30: ulimit: open files: cannot modify limit: Operation not permitted ``` ulimit is a command that can only be run by root, not the rstudio user. 3. can i remove this line: https://github.com/koaning/spark-ec2/blob/branch-1.4/templates/root/spark/conf/spark-env.sh#L30 not all users will want to use rstudio so doing this might break things (im assuming pyspark might use this script as well?). we need to change the file though if we want rstudio to work. perhaps we can move the new `/etc/security/limits.conf` paremeters in here? Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615523#comment-14615523 ] Vincent Warmerdam edited comment on SPARK-8596 at 7/6/15 7:48 PM: -- 1. true. made commit: https://github.com/koaning/spark-ec2/blob/rstudio-install/rstudio/init.sh#L35-37. do we wan this code here, or somewhere else? seems like this is something system wide and not rstudio specific... what about /spark/init.sh instead of /rstudio/init.sh? 2. it gives you a breaking error. ``` sc - sparkR.init(spark_link) Launching java with spark-submit command /root/spark/bin/spark-submit sparkr-shell /tmp/RtmpSaSV2q/backend_port53744f8e9f59 /root/spark/conf/spark-env.sh: line 30: ulimit: open files: cannot modify limit: Operation not permitted ``` ulimit is a command that can only be run by root, not the rstudio user. 3. can i remove this line: https://github.com/koaning/spark-ec2/blob/branch-1.4/templates/root/spark/conf/spark-env.sh#L30 not all users will want to use rstudio so doing this might break things (im assuming pyspark might use this script as well?). we need to change the file though if we want rstudio to work. perhaps we can move the new `/etc/security/limits.conf` paremeters in here? was (Author: cantdutchthis): 1. true. made commit: https://github.com/koaning/spark-ec2/blob/rstudio-install/rstudio/init.sh#L35-37. do we wan this code here, or somewhere else? seems like this is something system wide and not rstudio specific... what about /spark/init.sh instead of /rstudio/init.sh? 2. it gives you a breaking error. ``` sc - sparkR.init(spark_link) Launching java with spark-submit command /root/spark/bin/spark-submit sparkr-shell /tmp/RtmpSaSV2q/backend_port53744f8e9f59 /root/spark/conf/spark-env.sh: line 30: ulimit: open files: cannot modify limit: Operation not permitted ``` ulimit is a command that can only be run by root, while the rstudio user isn't. 3. can i remove this line: https://github.com/koaning/spark-ec2/blob/branch-1.4/templates/root/spark/conf/spark-env.sh#L30 not all users will want to use rstudio and im not sure how this might break things (im assuming pyspark might use this script as well?) perhaps we can move the new `/etc/security/limits.conf` paremeters in here? Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613018#comment-14613018 ] Vincent Warmerdam edited comment on SPARK-8596 at 7/3/15 9:59 AM: -- I now have a more elegant way to get any R shell connected to spark. If you have run the spark submit: ``` /root/spark/sbin/stop-all.sh /root/spark/sbin/start-all.sh ``` then this snippet will collect all data you need automatically (the tutorial had manual labor involved) ``` region_ip - system(curl http://169.254.169.254/latest/meta-data/public-hostname;, intern=TRUE) spark_link - paste0('spark://', region_ip, ':7077') .libPaths(c(.libPaths(), '/root/spark/R/lib')) Sys.setenv(SPARK_HOME = '/root/spark') Sys.setenv(PATH = paste(Sys.getenv(c(PATH)), '/root/spark/bin', sep=':')) library(SparkR) sc - sparkR.init(spark_link) sqlContext - sparkRSQL.init(sc) ``` This snippet can be made part of the '.Rprofile', which will allow any user of Rstudio to automatically be connected to Spark. This will only work if `/root/spark/sbin/start-all.sh` is run. Do we want this? Possible downside might be that errors will be thrown in the R user doesn't understand that `stat-all.sh` needs to be run first. **edit** My current branch does this. After connecting to spark, the terminal now shows this as well: ``` __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ _/ /__ / .__/\_,_/_/ /_/\_ version 1.4.0 /_/ Spark Context available as sc. Spark SQL Context available as sqlContext. During startup - Warning message: package ‘SparkR’ was built under R version 3.1.3 ``` was (Author: cantdutchthis): I now have a more elegant way to get any R shell connected to spark. If you have run the spark submit: ``` /root/spark/sbin/stop-all.sh /root/spark/sbin/start-all.sh ``` then this snippet will collect all data you need automatically (the tutorial had manual labor involved) ``` region_ip - system(curl http://169.254.169.254/latest/meta-data/public-hostname;, intern=TRUE) spark_link - paste0('spark://', region_ip, ':7077') .libPaths(c(.libPaths(), '/root/spark/R/lib')) Sys.setenv(SPARK_HOME = '/root/spark') Sys.setenv(PATH = paste(Sys.getenv(c(PATH)), '/root/spark/bin', sep=':')) library(SparkR) sc - sparkR.init(spark_link) sqlContext - sparkRSQL.init(sc) ``` This snippet can be made part of the '.Rprofile', which will allow any user of Rstudio to automatically be connected to Spark. This will only work if `/root/spark/sbin/start-all.sh` is run. Do we want this? Possible downside might be that errors will be thrown in the R user doesn't understand that `stat-all.sh` needs to be run first. Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613018#comment-14613018 ] Vincent Warmerdam edited comment on SPARK-8596 at 7/3/15 10:21 AM: --- I now have a more elegant way to get any R shell connected to spark. If you have run the spark submit: ``` /root/spark/sbin/stop-all.sh /root/spark/sbin/start-all.sh ``` then this snippet will collect all data you need automatically (the tutorial had manual labor involved) ``` region_ip - system(curl http://169.254.169.254/latest/meta-data/public-hostname;, intern=TRUE) spark_link - paste0('spark://', region_ip, ':7077') .libPaths(c(.libPaths(), '/root/spark/R/lib')) Sys.setenv(SPARK_HOME = '/root/spark') Sys.setenv(PATH = paste(Sys.getenv(c(PATH)), '/root/spark/bin', sep=':')) library(SparkR) sc - sparkR.init(spark_link) sqlContext - sparkRSQL.init(sc) ``` This snippet can be made part of the '.Rprofile', which will allow any user of Rstudio to automatically be connected to Spark. This will only work if `/root/spark/sbin/start-all.sh` is run. Do we want this? Possible downside might be that errors will be thrown in the R user doesn't understand that `stat-all.sh` needs to be run first. **edit** My current branch does this. After connecting to spark, the terminal now shows this as well: ``` __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ _/ /__ / .__/\_,_/_/ /_/\_ version 1.4.0 /_/ Spark Context available as sc. Spark SQL Context available as sqlContext. During startup - Warning message: package ‘SparkR’ was built under R version 3.1.3 ``` It doesnt yet work in Rstudio but it can be provided as a startup script. was (Author: cantdutchthis): I now have a more elegant way to get any R shell connected to spark. If you have run the spark submit: ``` /root/spark/sbin/stop-all.sh /root/spark/sbin/start-all.sh ``` then this snippet will collect all data you need automatically (the tutorial had manual labor involved) ``` region_ip - system(curl http://169.254.169.254/latest/meta-data/public-hostname;, intern=TRUE) spark_link - paste0('spark://', region_ip, ':7077') .libPaths(c(.libPaths(), '/root/spark/R/lib')) Sys.setenv(SPARK_HOME = '/root/spark') Sys.setenv(PATH = paste(Sys.getenv(c(PATH)), '/root/spark/bin', sep=':')) library(SparkR) sc - sparkR.init(spark_link) sqlContext - sparkRSQL.init(sc) ``` This snippet can be made part of the '.Rprofile', which will allow any user of Rstudio to automatically be connected to Spark. This will only work if `/root/spark/sbin/start-all.sh` is run. Do we want this? Possible downside might be that errors will be thrown in the R user doesn't understand that `stat-all.sh` needs to be run first. **edit** My current branch does this. After connecting to spark, the terminal now shows this as well: ``` __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ _/ /__ / .__/\_,_/_/ /_/\_ version 1.4.0 /_/ Spark Context available as sc. Spark SQL Context available as sqlContext. During startup - Warning message: package ‘SparkR’ was built under R version 3.1.3 ``` Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611832#comment-14611832 ] Vincent Warmerdam edited comment on SPARK-8596 at 7/2/15 11:55 AM: --- By the way, I now have scripts that do install Rstudio (just ran and confirmed). The code is here: https://github.com/koaning/spark-ec2/tree/rstudio-install (added rstudio as a module) https://github.com/koaning/spark/tree/rstudio-install When initializing with this command: ./spark-ec2 --key-pair=spark-df --identity-file=/Users/code/Downloads/spark-df.pem --region=eu-west-1 -s 1 --instance-type=c3.2xlarge --spark-ec2-git-repo=https://github.com/koaning/spark-ec2 --spark-ec2-git-branch=rstudio-install launch mysparkr I can confirm that rstudio is installand and that a correct user is added. There are two concerns: - should we not force the user to supply the password themselves? setting a standard password seems like a security vulnerability. - I am not sure if this gets installed on all the slave nodes. I added this module (https://github.com/koaning/spark-ec2/blob/rstudio-install/rstudio/init.sh) and we only need it on the master node. I wonder what the best way is to ensure this. was (Author: cantdutchthis): By the way, I now have scripts that do install Rstudio (just ran and confirmed). The code is here: https://github.com/koaning/spark-ec2/tree/rstudio-install https://github.com/koaning/spark/tree/rstudio-install When initializing with this command: ./spark-ec2 --key-pair=spark-df --identity-file=/Users/code/Downloads/spark-df.pem --region=eu-west-1 -s 1 --instance-type=c3.2xlarge --spark-ec2-git-repo=https://github.com/koaning/spark-ec2 --spark-ec2-git-branch=rstudio-install launch mysparkr I can confirm that rstudio is installand and that a correct user is added. There are two concerns: - should we not force the user to supply the password themselves? setting a standard password seems like a security vulnerability. - I am not sure if this gets installed on all the slave nodes. I added this module (https://github.com/koaning/spark-ec2/blob/rstudio-install/rstudio/init.sh) and we only need it on the master node. I wonder what the best way is to ensure this. Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604587#comment-14604587 ] Vincent Warmerdam edited comment on SPARK-8596 at 6/28/15 8:49 AM: --- 1. on it. just created [SPARK-8596][EC2] Added port for Rstudio 2. on it was (Author: cantdutchthis): 1. on it. Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604053#comment-14604053 ] Vincent Warmerdam edited comment on SPARK-8596 at 6/27/15 8:31 AM: --- I'm writing a small tutorial to get up to scratch with rstudio on AWS. It works. The main issue seems that currently ec2 installs an old version of R (3.1) while most packages like ggplot require a new version (3.2). I'm going to share the tutorial with the Rstudio guys soon. My approach is to run `spark/bin/start-all.sh` on the master node and then run the following commands in Rstudio on the master node: .libPaths( c( .libPaths(), '/root/spark/R/lib') ) Sys.setenv(SPARK_HOME = '/root/spark') Sys.setenv(PATH = paste(Sys.getenv(c(PATH)), '/root/spark/bin', sep=':')) library(SparkR) sc - sparkR.init('SPARK MASTER ADR') sqlContext - sparkRSQL.init(sc) This works on my end, and I've been able to use the dataframe API with a json blob on s3 with this sqlContext. The main issue on my end is that because of the old R version I can't install visualisation/knitr packages. The spark dataframe works like a charm in the GUI though. was (Author: cantdutchthis): I'm writing a small tutorial to get up to scratch with rstudio on AWS. It works. The main issue seems that currently ec2 installs an old version of R (3.1) while most packages like ggplot require a new version (3.2). I'm going to share the tutorial with the Rstudio guys soon. My approach is to run `spark/bin/start-all.sh` on the master node and then run the following commands in Rstudio on the master node: .libPaths( c( .libPaths(), '/root/spark/R/lib') ) Sys.setenv(SPARK_HOME = '/root/spark') Sys.setenv(PATH = paste(Sys.getenv(c(PATH)), '/root/spark/bin', sep=':')) library(SparkR) sc - sparkR.init('SPARK MASTER ADR') sqlContext - sparkRSQL.init(sc) This works on my end, and I've been able to use the dataframe API with a json blob on s3 with this sqlContext. The main issue on my end is that because of the old R version I can't install visualisation/knitr packages. The spark dataframe works like a charm in the GUI though. Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604053#comment-14604053 ] Vincent Warmerdam edited comment on SPARK-8596 at 6/27/15 8:03 AM: --- I'm writing a small tutorial to get up to scratch with rstudio on AWS. It works. The main issue seems that currently ec2 installs an old version of R (3.1) while most packages like ggplot require a new version (3.2). I'm going to share the tutorial with the Rstudio guys soon. My approach is to run `spark/bin/start-all.sh` on the master node and then run the following commands in Rstudio on the master node: .libPaths( c( .libPaths(), '/root/spark/R/lib') ) Sys.setenv(SPARK_HOME = '/root/spark') Sys.setenv(PATH = paste(Sys.getenv(c(PATH)), '/root/spark/bin', sep=':')) library(SparkR) sc - sparkR.init('SPARK MASTER ADR') sqlContext - sparkRSQL.init(sc) This works on my end, and I've been able to use the dataframe API with a json blob on s3 with this sqlContext. The main issue on my end is that because of the old R version I can't install visualisation/knitr packages. The spark dataframe works like a charm in the GUI though. was (Author: cantdutchthis): I'm writing a small tutorial to get up to scratch with rstudio on AWS. It works. The main issue seems that currently ec2 installs an old version of R (3.1) while most packages like ggplot require a new version (3.2). I'm going to share the tutorial with the Rstudio guys soon. My approach is to run `spark/bin/start-all.sh` on the master node and then run the following commands in Rstudio on the master node: .libPaths( c( .libPaths(), '/root/spark/R/lib') ) Sys.setenv(SPARK_HOME = '/root/spark') Sys.setenv(PATH = paste(Sys.getenv(c(PATH)), '/root/spark/bin', sep=':')) library(SparkR) sc - sparkR.init('SPARK MASTER ADR') sqlContext - sparkRSQL.init(sc) This works on my end, and I've been able to use the dataframe API with a json blob on s3 with this sqlContext. Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604053#comment-14604053 ] Vincent Warmerdam edited comment on SPARK-8596 at 6/27/15 8:04 AM: --- I'm writing a small tutorial to get up to scratch with rstudio on AWS. It works. The main issue seems that currently ec2 installs an old version of R (3.1) while most packages like ggplot require a new version (3.2). I'm going to share the tutorial with the Rstudio guys soon. My approach is to run `spark/bin/start-all.sh` on the master node and then run the following commands in Rstudio on the master node: .libPaths( c( .libPaths(), '/root/spark/R/lib') ) Sys.setenv(SPARK_HOME = '/root/spark') Sys.setenv(PATH = paste(Sys.getenv(c(PATH)), '/root/spark/bin', sep=':')) library(SparkR) sc - sparkR.init('SPARK MASTER ADR') sqlContext - sparkRSQL.init(sc) This works on my end, and I've been able to use the dataframe API with a json blob on s3 with this sqlContext. The main issue on my end is that because of the old R version I can't install visualisation/knitr packages. The spark dataframe works like a charm in the GUI though. was (Author: cantdutchthis): I'm writing a small tutorial to get up to scratch with rstudio on AWS. It works. The main issue seems that currently ec2 installs an old version of R (3.1) while most packages like ggplot require a new version (3.2). I'm going to share the tutorial with the Rstudio guys soon. My approach is to run `spark/bin/start-all.sh` on the master node and then run the following commands in Rstudio on the master node: .libPaths( c( .libPaths(), '/root/spark/R/lib') ) Sys.setenv(SPARK_HOME = '/root/spark') Sys.setenv(PATH = paste(Sys.getenv(c(PATH)), '/root/spark/bin', sep=':')) library(SparkR) sc - sparkR.init('SPARK MASTER ADR') sqlContext - sparkRSQL.init(sc) This works on my end, and I've been able to use the dataframe API with a json blob on s3 with this sqlContext. The main issue on my end is that because of the old R version I can't install visualisation/knitr packages. The spark dataframe works like a charm in the GUI though. Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604328#comment-14604328 ] Vincent Warmerdam edited comment on SPARK-8596 at 6/27/15 8:07 PM: --- Rstudio doesn't want to be run as a root user in general. Rstudio doesn't have https out of the box, so any port sniffing suddenly becomes a security risk if somebody can use the webui to gain root access. Instead, you can just go and add a new user $ useradd analyst $ passwd analyst This user will then be able to log in. Note that in order to see rstudio, you will need to also edit the security group for the master node to allow TCP to connect to this port. I'd love to help out and spend some time on these issues by the way. I've got a small tutorial .md file ready, can I share that via Jira? Would like to double check it with you guys because I may be doing a dirty trick. For Rstudio not to give errors, I remove a line of code in a shell script (because this new user is not root it cannot run `ulimit` commands). Dirty trick (run as root): sed -e 's/^ulimit/#ulimit/g' /root/spark/conf/spark-env.sh /root/spark/conf/spark-env2.sh mv /root/spark/conf/spark-env2.sh /root/spark/conf/spark-env.sh ulimit -n 100 Installing the right R version always was a bit tricky. Will follow other Jira ticket as well, let me know when it is up. was (Author: cantdutchthis): Rstudio doesn't want to be run as a root user in general. Rstudio doesn't have https out of the box, so any port sniffing suddenly becomes a security risk if somebody can use the webui to gain root access. Instead, you can just go and add a new user $ useradd analyst $ passwd analyst This user will then be able to log in. Note that in order to see rstudio, you will need to also edit the security group for the master node to allow TCP to connect to this port. I'd love to help out and spend some time on these issues by the way. I've got a small tutorial .md file ready, can I share that via Jira? Would like to double check it with you guys because I may be doing a dirty trick. For Rstudio not to give errors, I remove a line of code in a shell script (because this new user is not root it cannot run `ulimit` commands). Dirty trick (run as root): sed -e 's/^ulimit/#ulimit/g' /root/spark/conf/spark-env.sh /root/spark/conf/spark-env2.sh mv /root/spark/conf/spark-env2.sh /root/spark/conf/spark-env.sh ulimit -n 100 Installing the right R version always was a bit tricky. Will follow other Jira ticket as well. Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org