Re: Spark and Scala

Deep Pradhan Sat, 13 Sep 2014 00:35:51 -0700

Take for example this:


*val lines = sc.textFile(args(0))*
*val nodes = lines.map(s =>{  *
*    val fields = s.split("\\s+")*
*    (fields(0),fields(1))*
*    }).distinct().groupByKey().cache() *

*val nodeSizeTuple = nodes.map(node => (node._1.toInt, node._2.size))*
*val rootNode = nodeSizeTuple.top(1)(Ordering.by(f => f._2))*

The nodeSizeTuple is an RDD,but rootNode is an array. Here I have used all
RDD operations, but I am getting an array.
What about this case?

On Sat, Sep 13, 2014 at 11:45 AM, Deep Pradhan <pradhandeep1...@gmail.com>
wrote:

> Is it always true that whenever we apply operations on an RDD, we get
> another RDD?
> Or does it depend on the return type of the operation?
>
> On Sat, Sep 13, 2014 at 9:45 AM, Soumya Simanta <soumya.sima...@gmail.com>
> wrote:
>
>>
>> An RDD is a fault-tolerant distributed structure. It is the primary
>> abstraction in Spark.
>>
>> I would strongly suggest that you have a look at the following to get a
>> basic idea.
>>
>> http://www.cs.berkeley.edu/~pwendell/strataconf/api/core/spark/RDD.html
>> http://spark.apache.org/docs/latest/quick-start.html#basics
>>
>> https://www.usenix.org/conference/nsdi12/technical-sessions/presentation/zaharia
>>
>> On Sat, Sep 13, 2014 at 12:06 AM, Deep Pradhan <pradhandeep1...@gmail.com
>> > wrote:
>>
>>> Take for example this:
>>> I have declared one queue *val queue = Queue.empty[Int]*, which is a
>>> pure scala line in the program. I actually want the queue to be an RDD but
>>> there are no direct methods to create RDD which is a queue right? What say
>>> do you have on this?
>>> Does there exist something like: *Create and RDD which is a queue *?
>>>
>>> On Sat, Sep 13, 2014 at 8:43 AM, Hari Shreedharan <
>>> hshreedha...@cloudera.com> wrote:
>>>
>>>> No, Scala primitives remain primitives. Unless you create an RDD using
>>>> one of the many methods - you would not be able to access any of the RDD
>>>> methods. There is no automatic porting. Spark is an application as far as
>>>> scala is concerned - there is no compilation (except of course, the scala,
>>>> JIT compilation etc).
>>>>
>>>> On Fri, Sep 12, 2014 at 8:04 PM, Deep Pradhan <
>>>> pradhandeep1...@gmail.com> wrote:
>>>>
>>>>> I know that unpersist is a method on RDD.
>>>>> But my confusion is that, when we port our Scala programs to Spark,
>>>>> doesn't everything change to RDDs?
>>>>>
>>>>> On Fri, Sep 12, 2014 at 10:16 PM, Nicholas Chammas <
>>>>> nicholas.cham...@gmail.com> wrote:
>>>>>
>>>>>> unpersist is a method on RDDs. RDDs are abstractions introduced by
>>>>>> Spark.
>>>>>>
>>>>>> An Int is just a Scala Int. You can't call unpersist on Int in Scala,
>>>>>> and that doesn't change in Spark.
>>>>>>
>>>>>> On Fri, Sep 12, 2014 at 12:33 PM, Deep Pradhan <
>>>>>> pradhandeep1...@gmail.com> wrote:
>>>>>>
>>>>>>> There is one thing that I am confused about.
>>>>>>> Spark has codes that have been implemented in Scala. Now, can we run
>>>>>>> any Scala code on the Spark framework? What will be the difference in 
>>>>>>> the
>>>>>>> execution of the scala code in normal systems and on Spark?
>>>>>>> The reason for my question is the following:
>>>>>>> I had a variable
>>>>>>> *val temp = <some operations>*
>>>>>>> This temp was being created inside the loop, so as to manually throw
>>>>>>> it out of the cache, every time the loop ends I was calling
>>>>>>> *temp.unpersist()*, this was returning an error saying that *value
>>>>>>> unpersist is not a method of Int*, which means that temp is an Int.
>>>>>>> Can some one explain to me why I was not able to call *unpersist*
>>>>>>> on *temp*?
>>>>>>>
>>>>>>> Thank You
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Spark and Scala

Reply via email to