> ds_b = ds_a
>
> .withColumn(“f4”, someUdf)
>
> .withColumn(“f5”, someUdf)
>
> .withColumn(“f6”, someUdf)
>
> .as[B]
>
>
>
> Kevin
>
>
>
> *From:* Mohit Jaggi
> *Sent:* Tuesday, January 15, 2019 1:31 PM
> *To:* user
> *Subjec
= spark.read.csv(“path”).as[A]
ds_b = ds_a
.withColumn(“f4”, someUdf)
.withColumn(“f5”, someUdf)
.withColumn(“f6”, someUdf)
.as[B]
Kevin
From: Mohit Jaggi
Sent: Tuesday, January 15, 2019 1:31 PM
To: user
Subject: dataset best practice question
Fellow Spark Coders,
I am
Fellow Spark Coders,
I am trying to move from using Dataframes to Datasets for a reasonably
large code base. Today the code looks like this:
df_a= read_csv
df_b = df.withColumn ( some_transform_that_adds_more_columns )
//repeat the above several times
With datasets, this will require defining