Fellow Spark Coders, I am trying to move from using Dataframes to Datasets for a reasonably large code base. Today the code looks like this:
df_a= read_csv df_b = df.withColumn ( some_transform_that_adds_more_columns ) //repeat the above several times With datasets, this will require defining case class A { f1, f2, f3 } //fields from csv file case class B { f1, f2, f3, f4 } //union of A and new field added by some_transform_that_adds_more_columns //repeat this 10 times Is there a better way? Mohit.