Sorry, posting too late at night. That should be "...transformations, that produce further RDDs; and actions, that return values to the driver program."
On Sat, Sep 13, 2014 at 12:45 AM, Mark Hamstra <m...@clearstorydata.com> wrote: > Again, RDD operations are of two basic varieties: transformations, that > produce further RDDs; and operations, that return values to the driver > program. You've used several RDD transformations and then finally the > top(1) action, which returns an array of one element to your driver > program. That is exactly what you should expect from the description of > RDD#top in the API. > http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD > > On Sat, Sep 13, 2014 at 12:34 AM, Deep Pradhan <pradhandeep1...@gmail.com> > wrote: > >> Take for example this: >> >> >> *val lines = sc.textFile(args(0))* >> *val nodes = lines.map(s =>{ * >> * val fields = s.split("\\s+")* >> * (fields(0),fields(1))* >> * }).distinct().groupByKey().cache() * >> >> *val nodeSizeTuple = nodes.map(node => (node._1.toInt, node._2.size))* >> *val rootNode = nodeSizeTuple.top(1)(Ordering.by(f => f._2))* >> >> The nodeSizeTuple is an RDD,but rootNode is an array. Here I have used >> all RDD operations, but I am getting an array. >> What about this case? >> >> On Sat, Sep 13, 2014 at 11:45 AM, Deep Pradhan <pradhandeep1...@gmail.com >> > wrote: >> >>> Is it always true that whenever we apply operations on an RDD, we get >>> another RDD? >>> Or does it depend on the return type of the operation? >>> >>> On Sat, Sep 13, 2014 at 9:45 AM, Soumya Simanta < >>> soumya.sima...@gmail.com> wrote: >>> >>>> >>>> An RDD is a fault-tolerant distributed structure. It is the primary >>>> abstraction in Spark. >>>> >>>> I would strongly suggest that you have a look at the following to get a >>>> basic idea. >>>> >>>> http://www.cs.berkeley.edu/~pwendell/strataconf/api/core/spark/RDD.html >>>> http://spark.apache.org/docs/latest/quick-start.html#basics >>>> >>>> https://www.usenix.org/conference/nsdi12/technical-sessions/presentation/zaharia >>>> >>>> On Sat, Sep 13, 2014 at 12:06 AM, Deep Pradhan < >>>> pradhandeep1...@gmail.com> wrote: >>>> >>>>> Take for example this: >>>>> I have declared one queue *val queue = Queue.empty[Int]*, which is a >>>>> pure scala line in the program. I actually want the queue to be an RDD but >>>>> there are no direct methods to create RDD which is a queue right? What say >>>>> do you have on this? >>>>> Does there exist something like: *Create and RDD which is a queue *? >>>>> >>>>> On Sat, Sep 13, 2014 at 8:43 AM, Hari Shreedharan < >>>>> hshreedha...@cloudera.com> wrote: >>>>> >>>>>> No, Scala primitives remain primitives. Unless you create an RDD >>>>>> using one of the many methods - you would not be able to access any of >>>>>> the >>>>>> RDD methods. There is no automatic porting. Spark is an application as >>>>>> far >>>>>> as scala is concerned - there is no compilation (except of course, the >>>>>> scala, JIT compilation etc). >>>>>> >>>>>> On Fri, Sep 12, 2014 at 8:04 PM, Deep Pradhan < >>>>>> pradhandeep1...@gmail.com> wrote: >>>>>> >>>>>>> I know that unpersist is a method on RDD. >>>>>>> But my confusion is that, when we port our Scala programs to Spark, >>>>>>> doesn't everything change to RDDs? >>>>>>> >>>>>>> On Fri, Sep 12, 2014 at 10:16 PM, Nicholas Chammas < >>>>>>> nicholas.cham...@gmail.com> wrote: >>>>>>> >>>>>>>> unpersist is a method on RDDs. RDDs are abstractions introduced by >>>>>>>> Spark. >>>>>>>> >>>>>>>> An Int is just a Scala Int. You can't call unpersist on Int in >>>>>>>> Scala, and that doesn't change in Spark. >>>>>>>> >>>>>>>> On Fri, Sep 12, 2014 at 12:33 PM, Deep Pradhan < >>>>>>>> pradhandeep1...@gmail.com> wrote: >>>>>>>> >>>>>>>>> There is one thing that I am confused about. >>>>>>>>> Spark has codes that have been implemented in Scala. Now, can we >>>>>>>>> run any Scala code on the Spark framework? What will be the >>>>>>>>> difference in >>>>>>>>> the execution of the scala code in normal systems and on Spark? >>>>>>>>> The reason for my question is the following: >>>>>>>>> I had a variable >>>>>>>>> *val temp = <some operations>* >>>>>>>>> This temp was being created inside the loop, so as to manually >>>>>>>>> throw it out of the cache, every time the loop ends I was calling >>>>>>>>> *temp.unpersist()*, this was returning an error saying that *value >>>>>>>>> unpersist is not a method of Int*, which means that temp is an >>>>>>>>> Int. >>>>>>>>> Can some one explain to me why I was not able to call *unpersist* >>>>>>>>> on *temp*? >>>>>>>>> >>>>>>>>> Thank You >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >