On Thu, Sep 8, 2016 at 11:35 AM, Ashish Tadose <ashishtad...@gmail.com>
wrote:

> I wish to organize these dataframe operations by grouping them Scala
> Object methods.
> Something like below
>
>
>
>> *Object Driver {*
>> *def main(args: Array[String]) {*
>> *  val df = Operations.process(sparkContext)*
>> *  }**}*
>>
>>
>> *Object Operations {*
>> *  def process(sparkContext: SparkContext) : DataFrame = {*
>> *    //series of dataframe operations *
>> *  }**}*
>
>
> My stupid question is would retrieving DF from other Scala Object's method
> as return type is right thing do in terms of large scale.
> Would returning DF to driver will cause all data get passed to the driver
> code or it would be return just pointer to the DF?
>

As long as the methods do not trigger any executions, it is fine to pass a
DataFrame back to the driver.  Think of a DataFrame as an abstraction over
RDDs.  When you return an RDD or DataFrame you're not returning the object
itself.  Instead you're returning a recipe that details the series of
operations needed to produce the data.

Reply via email to