dataset best practice question

Mohit Jaggi Tue, 15 Jan 2019 11:32:02 -0800

Fellow Spark Coders,
I am trying to move from using Dataframes to Datasets for a reasonably
large code base. Today the code looks like this:


df_a= read_csv
df_b = df.withColumn ( some_transform_that_adds_more_columns )
//repeat the above several times

With datasets, this will require defining

case class A { f1, f2, f3 } //fields from csv file
case class B { f1, f2, f3, f4 } //union of A and new field added by
some_transform_that_adds_more_columns
//repeat this 10 times

Is there a better way?

Mohit.

dataset best practice question

Reply via email to