Re: Spark 1.4.0 - Using SparkR on EC2 Instance

Shivaram Venkataraman Tue, 30 Jun 2015 13:55:12 -0700

Are you using the SparkR from the latest Spark 1.4 release ? The function
was not available in the older AMPLab version


Shivaram

On Tue, Jun 30, 2015 at 1:43 PM, Nicholas Sharkey <nicholasshar...@gmail.com
> wrote:

> Any idea why I can't get the sparkRSQL.init function to work? The other
> parts of SparkR seems like it's working fine. And yes, the SparkR library
> is loaded.
>
> Thanks.
>
> > sc <- sparkR.init(master="
> http://ec2-52-18-1-4.eu-west-1.compute.amazonaws.com";)
> ...
>
> > sqlContext <- sparkRSQL.init(sc)
>
> Error: could not find function "sparkRSQL.init"
>
>
> On Tue, Jun 30, 2015 at 10:56 AM, Shivaram Venkataraman <
> shiva...@eecs.berkeley.edu> wrote:
>
>> The API exported in the 1.4 release is different from the one used in the
>> 2014 demo. Please see the latest documentation at
>> http://people.apache.org/~pwendell/spark-releases/latest/sparkr.html or
>> Chris's demo from Spark Summit at
>> https://spark-summit.org/2015/events/a-data-frame-abstraction-layer-for-sparkr/
>>
>> Thanks
>> Shivaram
>>
>> On Tue, Jun 30, 2015 at 7:40 AM, Nicholas Sharkey <
>> nicholasshar...@gmail.com> wrote:
>>
>>> Good morning Sivaram,
>>>
>>> I believe I have our setup close but I'm getting an error on the last
>>> step of the word count example from the Spark Summit
>>> <https://spark-summit.org/2014/wp-content/uploads/2014/07/SparkR-SparkSummit.pdf>
>>> .
>>>
>>> Off the top of your head can you think of where this error (below, and
>>> attached) is coming from? I can get into the details of how I setup this
>>> machine if needed, but wanted to keep the initial question short.
>>>
>>> Thanks.
>>>
>>> *Begin Code*
>>>
>>> *library(SparkR)*
>>>
>>> *# sc <- sparkR.init("local[2]")*
>>> *sc <-
>>> sparkR.init("http://ec2-54-171-173-195.eu-west-1.compute.amazonaws.com:[2]";)*
>>>
>>> *lines <- textFile(sc, "mytextfile.txt") # hi hi all all all one one one
>>> one*
>>>
>>> *words <- flatMap(lines,*
>>> *                 function(line){*
>>> *                   strsplit(line, " ")[[1]]*
>>> *                 })*
>>>
>>> *wordcount <- lapply(words,*
>>> *                    function(word){*
>>> *                      list(word, 1)*
>>> *                    })*
>>>
>>> *counts <- reduceByKey(wordcount, "+", numPartitions=2)*
>>>
>>> *# Error in (function (classes, fdef, mtable)  : *
>>> *#             unable to find an inherited method for function
>>> ‘reduceByKey’ for signature ‘"PipelinedRDD", "character", "numeric"’*
>>>
>>>
>>> *End Code *
>>>
>>> On Fri, Jun 26, 2015 at 7:04 PM, Shivaram Venkataraman <
>>> shiva...@eecs.berkeley.edu> wrote:
>>>
>>>> My workflow as to install RStudio on a cluster launched using Spark EC2
>>>> scripts. However I did a bunch of tweaking after that (like copying the
>>>> spark installation over etc.). When I get some time I'll try to write the
>>>> steps down in the JIRA.
>>>>
>>>> Thanks
>>>> Shivaram
>>>>
>>>>
>>>> On Fri, Jun 26, 2015 at 10:21 AM, <m...@redoakstrategic.com> wrote:
>>>>
>>>>> So you created an EC2 instance with RStudio installed first, then
>>>>> installed Spark under that same username?  That makes sense, I just want 
>>>>> to
>>>>> verify your work flow.
>>>>>
>>>>> Thank you again for your willingness to help!
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 26, 2015 at 10:13 AM -0700, "Shivaram Venkataraman" <
>>>>> shiva...@eecs.berkeley.edu> wrote:
>>>>>
>>>>>  I was using RStudio on the master node of the same cluster in the
>>>>>> demo. However I had installed Spark under the user `rstudio` (i.e.
>>>>>> /home/rstudio) and that will make the permissions work correctly. You 
>>>>>> will
>>>>>> need to copy the config files from /root/spark/conf after installing 
>>>>>> Spark
>>>>>> though and it might need some more manual tweaks.
>>>>>>
>>>>>> Thanks
>>>>>> Shivaram
>>>>>>
>>>>>> On Fri, Jun 26, 2015 at 9:59 AM, Mark Stephenson <
>>>>>> m...@redoakstrategic.com> wrote:
>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> In your demo video, were you using RStudio to hit a separate EC2
>>>>>>> Spark cluster?  I noticed that it appeared your browser that you were 
>>>>>>> using
>>>>>>> EC2 at that time, so I was just curious.  It appears that might be one 
>>>>>>> of
>>>>>>> the possible workarounds - fire up a separate EC2 instance with RStudio
>>>>>>> Server that initializes the spark context against a separate Spark 
>>>>>>> cluster.
>>>>>>>
>>>>>>> On Jun 26, 2015, at 11:46 AM, Shivaram Venkataraman <
>>>>>>> shiva...@eecs.berkeley.edu> wrote:
>>>>>>>
>>>>>>> We don't have a documented way to use RStudio on EC2 right now. We
>>>>>>> have a ticket open at
>>>>>>> https://issues.apache.org/jira/browse/SPARK-8596 to discuss
>>>>>>> work-arounds and potential solutions for this.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Shivaram
>>>>>>>
>>>>>>> On Fri, Jun 26, 2015 at 6:27 AM, RedOakMark <
>>>>>>> m...@redoakstrategic.com> wrote:
>>>>>>>
>>>>>>>> Good morning,
>>>>>>>>
>>>>>>>> I am having a bit of trouble finalizing the installation and usage
>>>>>>>> of the
>>>>>>>> newest Spark version 1.4.0, deploying to an Amazon EC2 instance and
>>>>>>>> using
>>>>>>>> RStudio to run on top of it.
>>>>>>>>
>>>>>>>> Using these instructions (
>>>>>>>> http://spark.apache.org/docs/latest/ec2-scripts.html
>>>>>>>> <http://spark.apache.org/docs/latest/ec2-scripts.html>  ) we can
>>>>>>>> fire up an
>>>>>>>> EC2 instance (which we have been successful doing - we have gotten
>>>>>>>> the
>>>>>>>> cluster to launch from the command line without an issue).  Then, I
>>>>>>>> installed RStudio Server on the same EC2 instance (the master) and
>>>>>>>> successfully logged into it (using the test/test user) through the
>>>>>>>> web
>>>>>>>> browser.
>>>>>>>>
>>>>>>>> This is where I get stuck - within RStudio, when I try to
>>>>>>>> reference/find the
>>>>>>>> folder that SparkR was installed, to load the SparkR library and
>>>>>>>> initialize
>>>>>>>> a SparkContext, I get permissions errors on the folders, or the
>>>>>>>> library
>>>>>>>> cannot be found because I cannot find the folder in which the
>>>>>>>> library is
>>>>>>>> sitting.
>>>>>>>>
>>>>>>>> Has anyone successfully launched and utilized SparkR 1.4.0 in this
>>>>>>>> way, with
>>>>>>>> RStudio Server running on top of the master instance?  Are we on
>>>>>>>> the right
>>>>>>>> track, or should we manually launch a cluster and attempt to
>>>>>>>> connect to it
>>>>>>>> from another instance running R?
>>>>>>>>
>>>>>>>> Thank you in advance!
>>>>>>>>
>>>>>>>> Mark
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> View this message in context:
>>>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-4-0-Using-SparkR-on-EC2-Instance-tp23506.html
>>>>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>>>>> Nabble.com.
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>>>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>

Re: Spark 1.4.0 - Using SparkR on EC2 Instance

Reply via email to