Again, RDD operations are of two basic varieties: transformations, that
produce further RDDs; and operations, that return values to the driver
program.  You've used several RDD transformations and then finally the
top(1) action, which returns an array of one element to your driver
program.  That is exactly what you should expect from the description of
RDD#top in the API.
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD

On Sat, Sep 13, 2014 at 12:34 AM, Deep Pradhan <pradhandeep1...@gmail.com>
wrote:

> Take for example this:
>
>
> *val lines = sc.textFile(args(0))*
> *val nodes = lines.map(s =>{  *
> *    val fields = s.split("\\s+")*
> *    (fields(0),fields(1))*
> *    }).distinct().groupByKey().cache() *
>
> *val nodeSizeTuple = nodes.map(node => (node._1.toInt, node._2.size))*
> *val rootNode = nodeSizeTuple.top(1)(Ordering.by(f => f._2))*
>
> The nodeSizeTuple is an RDD,but rootNode is an array. Here I have used all
> RDD operations, but I am getting an array.
> What about this case?
>
> On Sat, Sep 13, 2014 at 11:45 AM, Deep Pradhan <pradhandeep1...@gmail.com>
> wrote:
>
>> Is it always true that whenever we apply operations on an RDD, we get
>> another RDD?
>> Or does it depend on the return type of the operation?
>>
>> On Sat, Sep 13, 2014 at 9:45 AM, Soumya Simanta <soumya.sima...@gmail.com
>> > wrote:
>>
>>>
>>> An RDD is a fault-tolerant distributed structure. It is the primary
>>> abstraction in Spark.
>>>
>>> I would strongly suggest that you have a look at the following to get a
>>> basic idea.
>>>
>>> http://www.cs.berkeley.edu/~pwendell/strataconf/api/core/spark/RDD.html
>>> http://spark.apache.org/docs/latest/quick-start.html#basics
>>>
>>> https://www.usenix.org/conference/nsdi12/technical-sessions/presentation/zaharia
>>>
>>> On Sat, Sep 13, 2014 at 12:06 AM, Deep Pradhan <
>>> pradhandeep1...@gmail.com> wrote:
>>>
>>>> Take for example this:
>>>> I have declared one queue *val queue = Queue.empty[Int]*, which is a
>>>> pure scala line in the program. I actually want the queue to be an RDD but
>>>> there are no direct methods to create RDD which is a queue right? What say
>>>> do you have on this?
>>>> Does there exist something like: *Create and RDD which is a queue *?
>>>>
>>>> On Sat, Sep 13, 2014 at 8:43 AM, Hari Shreedharan <
>>>> hshreedha...@cloudera.com> wrote:
>>>>
>>>>> No, Scala primitives remain primitives. Unless you create an RDD using
>>>>> one of the many methods - you would not be able to access any of the RDD
>>>>> methods. There is no automatic porting. Spark is an application as far as
>>>>> scala is concerned - there is no compilation (except of course, the scala,
>>>>> JIT compilation etc).
>>>>>
>>>>> On Fri, Sep 12, 2014 at 8:04 PM, Deep Pradhan <
>>>>> pradhandeep1...@gmail.com> wrote:
>>>>>
>>>>>> I know that unpersist is a method on RDD.
>>>>>> But my confusion is that, when we port our Scala programs to Spark,
>>>>>> doesn't everything change to RDDs?
>>>>>>
>>>>>> On Fri, Sep 12, 2014 at 10:16 PM, Nicholas Chammas <
>>>>>> nicholas.cham...@gmail.com> wrote:
>>>>>>
>>>>>>> unpersist is a method on RDDs. RDDs are abstractions introduced by
>>>>>>> Spark.
>>>>>>>
>>>>>>> An Int is just a Scala Int. You can't call unpersist on Int in
>>>>>>> Scala, and that doesn't change in Spark.
>>>>>>>
>>>>>>> On Fri, Sep 12, 2014 at 12:33 PM, Deep Pradhan <
>>>>>>> pradhandeep1...@gmail.com> wrote:
>>>>>>>
>>>>>>>> There is one thing that I am confused about.
>>>>>>>> Spark has codes that have been implemented in Scala. Now, can we
>>>>>>>> run any Scala code on the Spark framework? What will be the difference 
>>>>>>>> in
>>>>>>>> the execution of the scala code in normal systems and on Spark?
>>>>>>>> The reason for my question is the following:
>>>>>>>> I had a variable
>>>>>>>> *val temp = <some operations>*
>>>>>>>> This temp was being created inside the loop, so as to manually
>>>>>>>> throw it out of the cache, every time the loop ends I was calling
>>>>>>>> *temp.unpersist()*, this was returning an error saying that *value
>>>>>>>> unpersist is not a method of Int*, which means that temp is an
>>>>>>>> Int.
>>>>>>>> Can some one explain to me why I was not able to call *unpersist*
>>>>>>>> on *temp*?
>>>>>>>>
>>>>>>>> Thank You
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to