subject:"dataset best practice question"

Re: dataset best practice question

2019-01-18 Thread Mohit Jaggi

> ds_b = ds_a > > .withColumn(“f4”, someUdf) > > .withColumn(“f5”, someUdf) > > .withColumn(“f6”, someUdf) > > .as[B] > > > > Kevin > > > > *From:* Mohit Jaggi > *Sent:* Tuesday, January 15, 2019 1:31 PM > *To:* user > *Subjec

RE: dataset best practice question

2019-01-15 Thread kevin.r.mellott

= spark.read.csv(“path”).as[A] ds_b = ds_a .withColumn(“f4”, someUdf) .withColumn(“f5”, someUdf) .withColumn(“f6”, someUdf) .as[B] Kevin From: Mohit Jaggi Sent: Tuesday, January 15, 2019 1:31 PM To: user Subject: dataset best practice question Fellow Spark Coders, I am

dataset best practice question

2019-01-15 Thread Mohit Jaggi

Fellow Spark Coders, I am trying to move from using Dataframes to Datasets for a reasonably large code base. Today the code looks like this: df_a= read_csv df_b = df.withColumn ( some_transform_that_adds_more_columns ) //repeat the above several times With datasets, this will require defining