Declaring multiple RDDs and efficiency concerns

Simone Franzini Fri, 14 Nov 2014 08:32:45 -0800

Let's say I have to apply a complex sequence of operations to a certain RDD.
In order to make code more modular/readable, I would typically have
something like this:


object myObject {
  def main(args: Array[String]) {
    val rdd1 = function1(myRdd)
    val rdd2 = function2(rdd1)
    val rdd3 = function3(rdd2)
  }

  def function1(rdd: RDD) : RDD = { doSomething }
  def function2(rdd: RDD) : RDD = { doSomethingElse }
  def function3(rdd: RDD) : RDD = { doSomethingElseYet }
}

So I am explicitly declaring vals for the intermediate steps. Does this end
up using more storage than if I just chained all of the operations and
declared only one val instead?
If yes, is there a better way to chain together the operations?
Ideally I would like to do something like:

val rdd = function1.function2.function3

Is there a way I can write the signature of my functions to accomplish
this? Is this also an efficiency issue or just a stylistic one?

Simone Franzini, PhD

http://www.linkedin.com/in/simonefranzini

Declaring multiple RDDs and efficiency concerns

Reply via email to