Re: Spark and Scala

Mark Hamstra Sat, 13 Sep 2014 00:48:07 -0700

Sorry, posting too late at night.  That should be "...transformations, that
produce further RDDs; and actions, that return values to the driver
program."


On Sat, Sep 13, 2014 at 12:45 AM, Mark Hamstra <m...@clearstorydata.com>
wrote:

> Again, RDD operations are of two basic varieties: transformations, that
> produce further RDDs; and operations, that return values to the driver
> program.  You've used several RDD transformations and then finally the
> top(1) action, which returns an array of one element to your driver
> program.  That is exactly what you should expect from the description of
> RDD#top in the API.
> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD
>
> On Sat, Sep 13, 2014 at 12:34 AM, Deep Pradhan <pradhandeep1...@gmail.com>
> wrote:
>
>> Take for example this:
>>
>>
>> *val lines = sc.textFile(args(0))*
>> *val nodes = lines.map(s =>{  *
>> *    val fields = s.split("\\s+")*
>> *    (fields(0),fields(1))*
>> *    }).distinct().groupByKey().cache() *
>>
>> *val nodeSizeTuple = nodes.map(node => (node._1.toInt, node._2.size))*
>> *val rootNode = nodeSizeTuple.top(1)(Ordering.by(f => f._2))*
>>
>> The nodeSizeTuple is an RDD,but rootNode is an array. Here I have used
>> all RDD operations, but I am getting an array.
>> What about this case?
>>
>> On Sat, Sep 13, 2014 at 11:45 AM, Deep Pradhan <pradhandeep1...@gmail.com
>> > wrote:
>>
>>> Is it always true that whenever we apply operations on an RDD, we get
>>> another RDD?
>>> Or does it depend on the return type of the operation?
>>>
>>> On Sat, Sep 13, 2014 at 9:45 AM, Soumya Simanta <
>>> soumya.sima...@gmail.com> wrote:
>>>
>>>>
>>>> An RDD is a fault-tolerant distributed structure. It is the primary
>>>> abstraction in Spark.
>>>>
>>>> I would strongly suggest that you have a look at the following to get a
>>>> basic idea.
>>>>
>>>> http://www.cs.berkeley.edu/~pwendell/strataconf/api/core/spark/RDD.html
>>>> http://spark.apache.org/docs/latest/quick-start.html#basics
>>>>
>>>> https://www.usenix.org/conference/nsdi12/technical-sessions/presentation/zaharia
>>>>
>>>> On Sat, Sep 13, 2014 at 12:06 AM, Deep Pradhan <
>>>> pradhandeep1...@gmail.com> wrote:
>>>>
>>>>> Take for example this:
>>>>> I have declared one queue *val queue = Queue.empty[Int]*, which is a
>>>>> pure scala line in the program. I actually want the queue to be an RDD but
>>>>> there are no direct methods to create RDD which is a queue right? What say
>>>>> do you have on this?
>>>>> Does there exist something like: *Create and RDD which is a queue *?
>>>>>
>>>>> On Sat, Sep 13, 2014 at 8:43 AM, Hari Shreedharan <
>>>>> hshreedha...@cloudera.com> wrote:
>>>>>
>>>>>> No, Scala primitives remain primitives. Unless you create an RDD
>>>>>> using one of the many methods - you would not be able to access any of 
>>>>>> the
>>>>>> RDD methods. There is no automatic porting. Spark is an application as 
>>>>>> far
>>>>>> as scala is concerned - there is no compilation (except of course, the
>>>>>> scala, JIT compilation etc).
>>>>>>
>>>>>> On Fri, Sep 12, 2014 at 8:04 PM, Deep Pradhan <
>>>>>> pradhandeep1...@gmail.com> wrote:
>>>>>>
>>>>>>> I know that unpersist is a method on RDD.
>>>>>>> But my confusion is that, when we port our Scala programs to Spark,
>>>>>>> doesn't everything change to RDDs?
>>>>>>>
>>>>>>> On Fri, Sep 12, 2014 at 10:16 PM, Nicholas Chammas <
>>>>>>> nicholas.cham...@gmail.com> wrote:
>>>>>>>
>>>>>>>> unpersist is a method on RDDs. RDDs are abstractions introduced by
>>>>>>>> Spark.
>>>>>>>>
>>>>>>>> An Int is just a Scala Int. You can't call unpersist on Int in
>>>>>>>> Scala, and that doesn't change in Spark.
>>>>>>>>
>>>>>>>> On Fri, Sep 12, 2014 at 12:33 PM, Deep Pradhan <
>>>>>>>> pradhandeep1...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> There is one thing that I am confused about.
>>>>>>>>> Spark has codes that have been implemented in Scala. Now, can we
>>>>>>>>> run any Scala code on the Spark framework? What will be the 
>>>>>>>>> difference in
>>>>>>>>> the execution of the scala code in normal systems and on Spark?
>>>>>>>>> The reason for my question is the following:
>>>>>>>>> I had a variable
>>>>>>>>> *val temp = <some operations>*
>>>>>>>>> This temp was being created inside the loop, so as to manually
>>>>>>>>> throw it out of the cache, every time the loop ends I was calling
>>>>>>>>> *temp.unpersist()*, this was returning an error saying that *value
>>>>>>>>> unpersist is not a method of Int*, which means that temp is an
>>>>>>>>> Int.
>>>>>>>>> Can some one explain to me why I was not able to call *unpersist*
>>>>>>>>> on *temp*?
>>>>>>>>>
>>>>>>>>> Thank You
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Spark and Scala

Reply via email to