On Thu, Sep 8, 2016 at 11:35 AM, Ashish Tadose <ashishtad...@gmail.com> wrote:
> I wish to organize these dataframe operations by grouping them Scala > Object methods. > Something like below > > > >> *Object Driver {* >> *def main(args: Array[String]) {* >> * val df = Operations.process(sparkContext)* >> * }**}* >> >> >> *Object Operations {* >> * def process(sparkContext: SparkContext) : DataFrame = {* >> * //series of dataframe operations * >> * }**}* > > > My stupid question is would retrieving DF from other Scala Object's method > as return type is right thing do in terms of large scale. > Would returning DF to driver will cause all data get passed to the driver > code or it would be return just pointer to the DF? > As long as the methods do not trigger any executions, it is fine to pass a DataFrame back to the driver. Think of a DataFrame as an abstraction over RDDs. When you return an RDD or DataFrame you're not returning the object itself. Instead you're returning a recipe that details the series of operations needed to produce the data.