[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624335#comment-14624335 ] Apache Spark commented on SPARK-8596: - User 'koaning' has created a pull request for this issue: https://github.com/apache/spark/pull/7366 Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618863#comment-14618863 ] Shivaram Venkataraman commented on SPARK-8596: -- Thanks for the PR. Will review this today. And we don't have anything like this open for ipython as far as I know. You can open a new JIRA and discuss this. Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618064#comment-14618064 ] Vincent Warmerdam commented on SPARK-8596: -- i can confirm it works, but i cant confirm if im not breaking anything else. so i was wondering if there was some sort of test script to test if this provisioning script works. anyway: https://github.com/mesos/spark-ec2/pull/129 i do have a final question: do we have something like this open for the ipython/jupyter notebook as well? Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616924#comment-14616924 ] Shivaram Venkataraman commented on SPARK-8596: -- You can test this by launching a new cluster with a command that looks like {code} ./spark-ec2 -s 2 -t r3.xlarge -i pem -k key --spark-ec2-git-repo https://github.com/koaning/spark-ec2 --spark-ec2-git-branch rstudio-install launch rstudio-test {code} This cluster setup will now use the spark-ec2 scripts from your repo while setting things up. Once you think its good, you can open a PR on github.com/mesos/spark-ec2 Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616616#comment-14616616 ] Vincent Warmerdam commented on SPARK-8596: -- made the changes. confirmed that rstudio now works out of the box with the startupscript. https://github.com/koaning/spark-ec2/tree/rstudio-install Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614968#comment-14614968 ] Vincent Warmerdam commented on SPARK-8596: -- done and done. this task feels like it is getting ready for merge. i am only wondering about these few lines of code in the `init.sh` script here: https://github.com/koaning/spark-ec2/blob/rstudio-install/rstudio/init.sh#L36-38 ``` sed -e 's/^ulimit/#ulimit/g' /root/spark/conf/spark-env.sh /root/spark/conf/spark-env2.sh mv /root/spark/conf/spark-env2.sh /root/spark/conf/spark-env.sh ulimit -n 100 ``` the ulimit can only be set by the root user. but it doesn't feel right to remove that line from the `init.sh` script in the rstudio folder. [~shivaram], thoughts? Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615675#comment-14615675 ] Shivaram Venkataraman commented on SPARK-8596: -- I think you can move the /etc/security/limits.conf to `setup-slave.sh` -- That gets run on every machine. Regarding editing spark-env.sh, can you put that `ulimit` call within an `if user_is_root` ? Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615264#comment-14615264 ] Shivaram Venkataraman commented on SPARK-8596: -- Hmm the problem is I'm not sure the ulimit applies across shells, which is why we were doing it in spark-env.sh. Hmm here are a couple of things 1. We could change it in the AMI in /etc/security/limits.conf 2. Does leaving it in there lead to any error or does it just print a warning statement ? If its just a warning, I'd say lets leave it in there. The ulimit used to be only needed for really large shuffles, so some of the RStudio use cases might work even without it. Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615523#comment-14615523 ] Vincent Warmerdam commented on SPARK-8596: -- 1. true. made commit: https://github.com/koaning/spark-ec2/blob/rstudio-install/rstudio/init.sh#L35-37. do we wan this code here, or somewhere else? seems like this is something system wide and not rstudio specific... what about /spark/init.sh instead of /rstudio/init.sh? 2. it gives you a breaking error. ``` sc - sparkR.init(spark_link) Launching java with spark-submit command /root/spark/bin/spark-submit sparkr-shell /tmp/RtmpSaSV2q/backend_port53744f8e9f59 /root/spark/conf/spark-env.sh: line 30: ulimit: open files: cannot modify limit: Operation not permitted ``` ulimit is a command that can only be run by root, while the rstudio user isn't. 3. can i remove this line: https://github.com/koaning/spark-ec2/blob/branch-1.4/templates/root/spark/conf/spark-env.sh#L30 not all users will want to use rstudio and im not sure how this might break things (im assuming pyspark might use this script as well?) perhaps we can move the new `/etc/security/limits.conf` paremeters in here? Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614294#comment-14614294 ] Vincent Warmerdam commented on SPARK-8596: -- Ok. How about I create an 'startSpark.R' file in the user folder for the rstudio user? That way we won't have any interference with the `/root/spark/bin/sparkR` script and the R user will have a good way for a quick start. To get a new user to be able to run jobs, it still seems like I need to run `chmod a+w /mnt/spark` to solve the previously named error. ``` 15/06/30 21:38:50 ERROR util.Utils: Failed to create local root dir in /mnt/spark. Ignoring this directory. 15/06/30 21:38:50 ERROR util.Utils: Failed to create local root dir in /mnt2/spark. Ignoring this directory. ``` Do we want to do this? or force the user to do this manually and to leave them with a proper tutorial? [~RedOakMark], did you find an alternative solution? Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614331#comment-14614331 ] Shivaram Venkataraman commented on SPARK-8596: -- I think its fine to make /mnt/spark and /mnt2/spark writable by all users by default. These are just directories used to store tmp files for a Spark job. Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613273#comment-14613273 ] Shivaram Venkataraman commented on SPARK-8596: -- Some points that might be helpful 1. The `start-all.sh` should be automatically run on cluster startup. Users don't need to automatically run it 2. The Spark master URL is present in a file `/root/spark-ec2/cluster-url`. You can just read the file to get the value (no need to get the hostname from EC2 etc.) 3. I think it should be fine to put this in a profile that gets picked up by RStudio. However I'd say we should not use .Rprofile as that may interfere with users using the /root/spark/bin/sparkR script ? Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613018#comment-14613018 ] Vincent Warmerdam commented on SPARK-8596: -- I now have a more elegant way to get any R shell connected to spark. If you have run the spark submit: ``` /root/spark/sbin/stop-all.sh /root/spark/sbin/start-all.sh ``` then this snippet will collect all data you need automatically (the tutorial had manual labor involved) ``` region_ip - system(curl http://169.254.169.254/latest/meta-data/public-hostname;, intern=TRUE) spark_link - paste0('spark://', region_ip, ':7077') .libPaths(c(.libPaths(), '/root/spark/R/lib')) Sys.setenv(SPARK_HOME = '/root/spark') Sys.setenv(PATH = paste(Sys.getenv(c(PATH)), '/root/spark/bin', sep=':')) library(SparkR) sc - sparkR.init(spark_link) sqlContext - sparkRSQL.init(sc) ``` This snippet can be made part of the '.Rprofile', which will allow any user of Rstudio to automatically be connected to Spark. This will only work if `/root/spark/sbin/start-all.sh` is run. Do we want this? Possible downside might be that errors will be thrown in the R user doesn't understand that `stat-all.sh` needs to be run first. Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613100#comment-14613100 ] Vincent Warmerdam commented on SPARK-8596: -- We just found that `chmod a+w /mnt/spark` solves the problem, but it is not very elegant. Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611747#comment-14611747 ] Vincent Warmerdam commented on SPARK-8596: -- Cool, would love to hear your end of the story. It seems the only bother to get the script to work. Slightly deviating subject: I'm not just a frequent R user, I do a lot of python as well. Is there a similar ticket like this for the iPython (jupyter) notebook? It seems like the most appropriate GUI for the python language. Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611832#comment-14611832 ] Vincent Warmerdam commented on SPARK-8596: -- By the way, I now have scripts that do install Rstudio (just ran and confirmed). The code is here: https://github.com/koaning/spark-ec2/tree/rstudio-install https://github.com/koaning/spark/tree/rstudio-install When initializing with this command: ./spark-ec2 --key-pair=spark-df --identity-file=/Users/code/Downloads/spark-df.pem --region=eu-west-1 -s 1 --instance-type=c3.2xlarge --spark-ec2-git-repo=https://github.com/koaning/spark-ec2 --spark-ec2-git-branch=rstudio-install launch mysparkr I can confirm that rstudio is installand and that a correct user is added. There are two concerns: - should we not force the user to supply the password themselves? setting a standard password seems like a security vulnerability. - I am not sure if this gets installed on all the slave nodes. I added this module (https://github.com/koaning/spark-ec2/blob/rstudio-install/rstudio/init.sh) and we only need it on the master node. I wonder what the best way is to ensure this. Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612178#comment-14612178 ] Shivaram Venkataraman commented on SPARK-8596: -- Thanks [~cantdutchthis] -- To answer your questions 1. Regarding the password we could set a default password and at the end of spark_ec2.py we could add 'Please change this password'. Or if we wanted to be a bit more secure we could generate a random password in spark_ec2.py, set it and then print the password out at the end saying 'please use RStudio with this password' 2. The init.sh by default only runs on the master. If we want it to run on the slaves we need to `ssh` to all the slaves and do it. But for Rstudio what you have should be fine. I'll test this out soon and get back on this JIRA. Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610555#comment-14610555 ] Mark Stephenson commented on SPARK-8596: [~cantdutchthis]: we have been getting the same error and it's definitely a user permissions issue. Even when giving the new RStudio user ownership rights to the ./spark folder, there are additional classpath errors. We are working on a solution today to utilize and login to RStudio as the 'hadoop' user to start with, just to make sure that the proof of concept works, and then expound a longer term solution with some potential bootstrap code. Will advise once we have it solved. Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609127#comment-14609127 ] Vincent Warmerdam commented on SPARK-8596: -- Mhm. I seem to stumble on another issue with adding a new user for Rstudio. Here is a link to the tutorial that I recently made (which I would like to push to the Rstudio blog, once this issue is fixed). https://gist.github.com/koaning/5a896eb5c773c24091c2. The odd thing is that the tutorial works fine if you do not run the `/root/spark/bin/sparkR` command and move on to installing Rstudio instead. If you run the sparkR shell then you get this error in later after Rstudio has been provisioned: ``` sc - sparkR.init('spark://ec2-52-18-7-11.eu-west-1.compute.amazonaws.com:7077') Launching java with spark-submit command /root/spark/bin/spark-submit sparkr-shell /tmp/RtmpxBIfkg/backend_port104b15f47402 15/06/30 21:38:49 INFO spark.SparkContext: Running Spark version 1.4.0 15/06/30 21:38:49 INFO spark.SecurityManager: Changing view acls to: analyst 15/06/30 21:38:49 INFO spark.SecurityManager: Changing modify acls to: analyst 15/06/30 21:38:49 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(analyst); users with modify permissions: Set(analyst) 15/06/30 21:38:49 INFO slf4j.Slf4jLogger: Slf4jLogger started 15/06/30 21:38:49 INFO Remoting: Starting remoting 15/06/30 21:38:50 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@172.31.6.135:58940] 15/06/30 21:38:50 INFO util.Utils: Successfully started service 'sparkDriver' on port 58940. 15/06/30 21:38:50 INFO spark.SparkEnv: Registering MapOutputTracker 15/06/30 21:38:50 INFO spark.SparkEnv: Registering BlockManagerMaster 15/06/30 21:38:50 ERROR util.Utils: Failed to create local root dir in /mnt/spark. Ignoring this directory. 15/06/30 21:38:50 ERROR util.Utils: Failed to create local root dir in /mnt2/spark. Ignoring this directory. 15/06/30 21:38:50 ERROR storage.DiskBlockManager: Failed to create any local dir. 15/06/30 21:38:50 INFO util.Utils: Shutdown hook called Error in readTypedObject(con, type) : Unsupported type for deserialization ``` I get the impression this error is caused by the fact that we create another user that doesn't have full root access and can therefore not create a local dir. What might be the best way of dealing with this? What assumptions does Spark make in terms of permissions? Can any user push spark jobs via the spark link or are there some permissions involved on the filesystem before one can do this? Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609148#comment-14609148 ] Shivaram Venkataraman commented on SPARK-8596: -- I think the assumption is that the root user is running the scripts in /root/spark/bin -- No other use cases have been tests AFAIK. On the other hand the Spark master (i.e the service running at spark://master_host_name:7077 doesn't do any authentication as far as I know. So we should be able to submit jobs from other user accounts but you might need to copy Spark to that user's account before running things. Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604588#comment-14604588 ] Apache Spark commented on SPARK-8596: - User 'koaning' has created a pull request for this issue: https://github.com/apache/spark/pull/7068 Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604587#comment-14604587 ] Vincent Warmerdam commented on SPARK-8596: -- 1. on it. Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604897#comment-14604897 ] Shivaram Venkataraman commented on SPARK-8596: -- Merged https://github.com/apache/spark/pull/7068 to open the RStudio port. I'm keeping the JIRA open till we fix the second part of this issue too. Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604053#comment-14604053 ] Vincent Warmerdam commented on SPARK-8596: -- I'm writing a small tutorial to get up to scratch with rstudio on AWS. It works. The main issue seems that currently ec2 installs an old version of R (3.1) while most packages like ggplot require a new version (3.2). I'm going to share the tutorial with the Rstudio guys soon. My approach is to run `spark/bin/start-all.sh` on the master node and then run the following commands in Rstudio on the master node: .libPaths( c( .libPaths(), '/root/spark/R/lib') ) Sys.setenv(SPARK_HOME = '/root/spark') Sys.setenv(PATH = paste(Sys.getenv(c(PATH)), '/root/spark/bin', sep=':')) library(SparkR) sc - sparkR.init('SPARK MASTER ADR') sqlContext - sparkRSQL.init(sc) This works on my end, and I've been able to use the dataframe API with a json blob on s3 with this sqlContext. Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604055#comment-14604055 ] Vincent Warmerdam commented on SPARK-8596: -- Can we make an issue on the R version? Seems like something that could be fixed by using a different standard AMI. Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604307#comment-14604307 ] Shivaram Venkataraman commented on SPARK-8596: -- [~cantdutchthis] Thanks for taking a look at this. One thing that many users have run into is that if you install RStudio server on the EC2 master node it doesn't work as RStudio requires a root user's password. Is there some configuration you did to overcome this ? Regarding the R version lets open a new JIRA for it. I can describe how the Spark EC2 AMIs are built and we can try to see if we can install R from some other YUM repo etc. Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604328#comment-14604328 ] Vincent Warmerdam commented on SPARK-8596: -- Rstudio doesn't want to be run as a root user in general. Rstudio doesn't have https out of the box, so any port sniffing suddenly becomes a security risk if somebody can use the webui to gain root access. Instead, you can just go and add a new user $ useradd analyst $ passwd analyst This user will then be able to log in. Note that in order to see rstudio, you will need to also edit the security group for the master node to allow TCP to connect to this port. I'd love to help out and spend some time on these issues by the way. I've got a small tutorial .md file ready, can I share that via Jira? Would like to double check it with you guys because I may be doing a dirty trick. For Rstudio not to give errors, I remove a line of code in a shell script (because this new user is not root it cannot run `ulimit` commands). Dirty trick (run as root): sed -e 's/^ulimit/#ulimit/g' /root/spark/conf/spark-env.sh /root/spark/conf/spark-env2.sh mv /root/spark/conf/spark-env2.sh /root/spark/conf/spark-env.sh ulimit -n 100 Installing the right R version always was a bit tricky. Will follow other Jira ticket as well. Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604440#comment-14604440 ] Shivaram Venkataraman commented on SPARK-8596: -- Thanks ! These are very useful instructions. We can break up this jira into a bunch of smaller issues. 1. Opening the RStudio port in the EC2 cluster. For this we need to add the right port number to the Spark EC2 script at https://github.com/apache/spark/blob/0b5abbf5f96a5f6bfd15a65e8788cf3fa96fe54c/ec2/spark_ec2.py#L507.This should be a pretty simple change -- Would you like to open a PR for this ? 2. We need to add code to install rstudio, add a new user (lets say username rstudio, password rstudio) -- To do this we will need to modify scripts in the spark-ec2 repo at https://github.com/mesos/spark-ec2. At a high-level these scripts are run on the master node after the cluster is launched and these scripts install Spark, Hadoop etc. on the AMI. So we can just add a new module to spark-ec2 called rstudio and then in rstudio/setup.sh we can add code to setup the new users etc. as well. Let me know if you want to take a shot at the second one as well Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600164#comment-14600164 ] Guorong Xu commented on SPARK-8596: --- Right now I am using another way to solve this issue, I did not install RStudio on the head node instead of installing RStudio on the instance which launches a cluster. And then I use the below command to initiate a spark context. sc - sparkR.init(master=spark://[Remote_head_node]:7077, sparkEnvir=list(spark.executor.memory=1g)) Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14599739#comment-14599739 ] Guorong Xu commented on SPARK-8596: --- When I install Spark on EC2 following ec2-script, I assume the Spark should be installed on the driver node. If I install Spark in /home/rstudio on the driver node again, then I will have two copies of Spark installation on the drive node. Will Rstudio submit jobs to the right Spark and do computing cross all worker nodes? Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2
[ https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14599940#comment-14599940 ] Shivaram Venkataraman commented on SPARK-8596: -- I think it should technically work if you submit jobs to the Spark master url that is already setup by the EC2 cluster then you should be able to use the cluster. I think this will work as we don't do any user-authentication / permissions in Spark. But I haven't tried it before, so let us know what happens. Install and configure RStudio server on Spark EC2 - Key: SPARK-8596 URL: https://issues.apache.org/jira/browse/SPARK-8596 Project: Spark Issue Type: Improvement Components: EC2, SparkR Reporter: Shivaram Venkataraman This will make it convenient for R users to use SparkR from their browsers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org