Are you using the SparkR from the latest Spark 1.4 release ? The function was not available in the older AMPLab version
Shivaram On Tue, Jun 30, 2015 at 1:43 PM, Nicholas Sharkey <nicholasshar...@gmail.com > wrote: > Any idea why I can't get the sparkRSQL.init function to work? The other > parts of SparkR seems like it's working fine. And yes, the SparkR library > is loaded. > > Thanks. > > > sc <- sparkR.init(master=" > http://ec2-52-18-1-4.eu-west-1.compute.amazonaws.com") > ... > > > sqlContext <- sparkRSQL.init(sc) > > Error: could not find function "sparkRSQL.init" > > > On Tue, Jun 30, 2015 at 10:56 AM, Shivaram Venkataraman < > shiva...@eecs.berkeley.edu> wrote: > >> The API exported in the 1.4 release is different from the one used in the >> 2014 demo. Please see the latest documentation at >> http://people.apache.org/~pwendell/spark-releases/latest/sparkr.html or >> Chris's demo from Spark Summit at >> https://spark-summit.org/2015/events/a-data-frame-abstraction-layer-for-sparkr/ >> >> Thanks >> Shivaram >> >> On Tue, Jun 30, 2015 at 7:40 AM, Nicholas Sharkey < >> nicholasshar...@gmail.com> wrote: >> >>> Good morning Sivaram, >>> >>> I believe I have our setup close but I'm getting an error on the last >>> step of the word count example from the Spark Summit >>> <https://spark-summit.org/2014/wp-content/uploads/2014/07/SparkR-SparkSummit.pdf> >>> . >>> >>> Off the top of your head can you think of where this error (below, and >>> attached) is coming from? I can get into the details of how I setup this >>> machine if needed, but wanted to keep the initial question short. >>> >>> Thanks. >>> >>> *Begin Code* >>> >>> *library(SparkR)* >>> >>> *# sc <- sparkR.init("local[2]")* >>> *sc <- >>> sparkR.init("http://ec2-54-171-173-195.eu-west-1.compute.amazonaws.com:[2]")* >>> >>> *lines <- textFile(sc, "mytextfile.txt") # hi hi all all all one one one >>> one* >>> >>> *words <- flatMap(lines,* >>> * function(line){* >>> * strsplit(line, " ")[[1]]* >>> * })* >>> >>> *wordcount <- lapply(words,* >>> * function(word){* >>> * list(word, 1)* >>> * })* >>> >>> *counts <- reduceByKey(wordcount, "+", numPartitions=2)* >>> >>> *# Error in (function (classes, fdef, mtable) : * >>> *# unable to find an inherited method for function >>> ‘reduceByKey’ for signature ‘"PipelinedRDD", "character", "numeric"’* >>> >>> >>> *End Code * >>> >>> On Fri, Jun 26, 2015 at 7:04 PM, Shivaram Venkataraman < >>> shiva...@eecs.berkeley.edu> wrote: >>> >>>> My workflow as to install RStudio on a cluster launched using Spark EC2 >>>> scripts. However I did a bunch of tweaking after that (like copying the >>>> spark installation over etc.). When I get some time I'll try to write the >>>> steps down in the JIRA. >>>> >>>> Thanks >>>> Shivaram >>>> >>>> >>>> On Fri, Jun 26, 2015 at 10:21 AM, <m...@redoakstrategic.com> wrote: >>>> >>>>> So you created an EC2 instance with RStudio installed first, then >>>>> installed Spark under that same username? That makes sense, I just want >>>>> to >>>>> verify your work flow. >>>>> >>>>> Thank you again for your willingness to help! >>>>> >>>>> >>>>> >>>>> On Fri, Jun 26, 2015 at 10:13 AM -0700, "Shivaram Venkataraman" < >>>>> shiva...@eecs.berkeley.edu> wrote: >>>>> >>>>> I was using RStudio on the master node of the same cluster in the >>>>>> demo. However I had installed Spark under the user `rstudio` (i.e. >>>>>> /home/rstudio) and that will make the permissions work correctly. You >>>>>> will >>>>>> need to copy the config files from /root/spark/conf after installing >>>>>> Spark >>>>>> though and it might need some more manual tweaks. >>>>>> >>>>>> Thanks >>>>>> Shivaram >>>>>> >>>>>> On Fri, Jun 26, 2015 at 9:59 AM, Mark Stephenson < >>>>>> m...@redoakstrategic.com> wrote: >>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> In your demo video, were you using RStudio to hit a separate EC2 >>>>>>> Spark cluster? I noticed that it appeared your browser that you were >>>>>>> using >>>>>>> EC2 at that time, so I was just curious. It appears that might be one >>>>>>> of >>>>>>> the possible workarounds - fire up a separate EC2 instance with RStudio >>>>>>> Server that initializes the spark context against a separate Spark >>>>>>> cluster. >>>>>>> >>>>>>> On Jun 26, 2015, at 11:46 AM, Shivaram Venkataraman < >>>>>>> shiva...@eecs.berkeley.edu> wrote: >>>>>>> >>>>>>> We don't have a documented way to use RStudio on EC2 right now. We >>>>>>> have a ticket open at >>>>>>> https://issues.apache.org/jira/browse/SPARK-8596 to discuss >>>>>>> work-arounds and potential solutions for this. >>>>>>> >>>>>>> Thanks >>>>>>> Shivaram >>>>>>> >>>>>>> On Fri, Jun 26, 2015 at 6:27 AM, RedOakMark < >>>>>>> m...@redoakstrategic.com> wrote: >>>>>>> >>>>>>>> Good morning, >>>>>>>> >>>>>>>> I am having a bit of trouble finalizing the installation and usage >>>>>>>> of the >>>>>>>> newest Spark version 1.4.0, deploying to an Amazon EC2 instance and >>>>>>>> using >>>>>>>> RStudio to run on top of it. >>>>>>>> >>>>>>>> Using these instructions ( >>>>>>>> http://spark.apache.org/docs/latest/ec2-scripts.html >>>>>>>> <http://spark.apache.org/docs/latest/ec2-scripts.html> ) we can >>>>>>>> fire up an >>>>>>>> EC2 instance (which we have been successful doing - we have gotten >>>>>>>> the >>>>>>>> cluster to launch from the command line without an issue). Then, I >>>>>>>> installed RStudio Server on the same EC2 instance (the master) and >>>>>>>> successfully logged into it (using the test/test user) through the >>>>>>>> web >>>>>>>> browser. >>>>>>>> >>>>>>>> This is where I get stuck - within RStudio, when I try to >>>>>>>> reference/find the >>>>>>>> folder that SparkR was installed, to load the SparkR library and >>>>>>>> initialize >>>>>>>> a SparkContext, I get permissions errors on the folders, or the >>>>>>>> library >>>>>>>> cannot be found because I cannot find the folder in which the >>>>>>>> library is >>>>>>>> sitting. >>>>>>>> >>>>>>>> Has anyone successfully launched and utilized SparkR 1.4.0 in this >>>>>>>> way, with >>>>>>>> RStudio Server running on top of the master instance? Are we on >>>>>>>> the right >>>>>>>> track, or should we manually launch a cluster and attempt to >>>>>>>> connect to it >>>>>>>> from another instance running R? >>>>>>>> >>>>>>>> Thank you in advance! >>>>>>>> >>>>>>>> Mark >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> View this message in context: >>>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-4-0-Using-SparkR-on-EC2-Instance-tp23506.html >>>>>>>> Sent from the Apache Spark User List mailing list archive at >>>>>>>> Nabble.com. >>>>>>>> >>>>>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>> >>> >> >