Declaring multiple RDDs and efficiency concerns

2014-11-14 Thread Simone Franzini
Let's say I have to apply a complex sequence of operations to a certain RDD. In order to make code more modular/readable, I would typically have something like this: object myObject { def main(args: Array[String]) { val rdd1 = function1(myRdd) val rdd2 = function2(rdd1) val rdd3 =

Re: Declaring multiple RDDs and efficiency concerns

2014-11-14 Thread Rishi Yadav
how about using fluent style of Scala programming. On Fri, Nov 14, 2014 at 8:31 AM, Simone Franzini captainfr...@gmail.com wrote: Let's say I have to apply a complex sequence of operations to a certain RDD. In order to make code more modular/readable, I would typically have something like

Re: Declaring multiple RDDs and efficiency concerns

2014-11-14 Thread Sean Owen
This code executes on the driver, and an RDD here is really just a handle on all the distributed data out there. It's a local bookkeeping object. So, manipulation of these objects themselves in the local driver code has virtually no performance impact. These two versions would be about identical*.