Re: Spark 1.4.0 - Using SparkR on EC2 Instance
That’s correct. We were setting up a Spark EC2 cluster from the command line, then installing RStudio Server, logging into that through the web interface and attempting to initialize the cluster within RStudio. We have made some progress on this outside of the thread - I will see what I can compile and share as a potential walkthrough. On Jul 8, 2015, at 9:25 PM, BenPorter [via Apache Spark User List] ml-node+s1001560n23732...@n3.nabble.com wrote: RedOakMark - just to make sure I understand what you did. You ran the EC2 script on a local machine to spin up the cluster, but then did not try to run anything in R/RStudio from your local machine. Instead you installed RStudio on the driver and ran it as a local cluster from that driver. Is that correct? Otherwise, you make no reference to the master/EC2 server in this code, so I have to assume that means you were running this directly from the master. Thanks, Ben If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-4-0-Using-SparkR-on-EC2-Instance-tp23506p23732.html http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-4-0-Using-SparkR-on-EC2-Instance-tp23506p23732.html To unsubscribe from Spark 1.4.0 - Using SparkR on EC2 Instance, click here http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=23506code=bWFya0ByZWRvYWtzdHJhdGVnaWMuY29tfDIzNTA2fDE0OTQ4NTQ4ODQ=. NAML http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-4-0-Using-SparkR-on-EC2-Instance-tp23506p23742.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Spark 1.4.0 - Using SparkR on EC2 Instance
For anyone monitoring the thread, I was able to successfully install and run a small Spark cluster and model using this method: First, make sure that the username being used to login to RStudio Server is the one that was used to install Spark on the EC2 instance. Thanks to Shivaram for his help here. Login to RStudio and ensure that these references are used - set the library location to the folder where spark is installed. In my case, ~/home/rstudio/spark. # # This line loads SparkR (the R package) from the installed directory library(SparkR, lib.loc=./spark/R/lib) The edits to this line were important, so that Spark knew where the install folder was located when initializing the cluster. # Initialize the Spark local cluster in R, as ‘sc’ sc - sparkR.init(local[2], SparkR, ./spark) From here, we ran a basic model using Spark, from RStudio, which ran successfully. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-4-0-Using-SparkR-on-EC2-Instance-tp23506p23514.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark 1.4.0 - Using SparkR on EC2 Instance
Good morning, I am having a bit of trouble finalizing the installation and usage of the newest Spark version 1.4.0, deploying to an Amazon EC2 instance and using RStudio to run on top of it. Using these instructions ( http://spark.apache.org/docs/latest/ec2-scripts.html http://spark.apache.org/docs/latest/ec2-scripts.html ) we can fire up an EC2 instance (which we have been successful doing - we have gotten the cluster to launch from the command line without an issue). Then, I installed RStudio Server on the same EC2 instance (the master) and successfully logged into it (using the test/test user) through the web browser. This is where I get stuck - within RStudio, when I try to reference/find the folder that SparkR was installed, to load the SparkR library and initialize a SparkContext, I get permissions errors on the folders, or the library cannot be found because I cannot find the folder in which the library is sitting. Has anyone successfully launched and utilized SparkR 1.4.0 in this way, with RStudio Server running on top of the master instance? Are we on the right track, or should we manually launch a cluster and attempt to connect to it from another instance running R? Thank you in advance! Mark -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-4-0-Using-SparkR-on-EC2-Instance-tp23506.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org