In upcoming 2.0 release, the signature for map() has become: def map[U : Encoder](func: T => U): Dataset[U] = withTypedPlan {
Note: DataFrame and DataSet are unified in 2.0 FYI On Thu, Apr 21, 2016 at 6:49 AM, Apurva Nandan <apurva3...@gmail.com> wrote: > Hello everyone, > > Generally speaking, I guess it's well known that dataframes are much > faster than RDD when it comes to performance. > My question is how do you go around when it comes to transforming a > dataframe using map. > I mean then the dataframe gets converted into RDD, hence now do you again > convert this RDD to a new dataframe for better performance? > Further, if you have a process which involves series of transformations > i.e. from one RDD to another, do you keep on converting each RDD to a > dataframe first, all the time? > > It's also possible that I might be missing something here, please share > your experiences. > > > Thanks and Regards, > Apurva >