Re: RDD generated from Dataframes

2016-04-21 Thread Sean Owen
I don't think that's generally true, but is true to the extent that you can push down the work of higher-level logical operators like select and groupBy, on common types, that can be understood and optimized. Your arbitrary user code is opaque and can't be optimized. So DataFrame.groupBy.max is

Re: RDD generated from Dataframes

2016-04-21 Thread Ted Yu
In upcoming 2.0 release, the signature for map() has become: def map[U : Encoder](func: T => U): Dataset[U] = withTypedPlan { Note: DataFrame and DataSet are unified in 2.0 FYI On Thu, Apr 21, 2016 at 6:49 AM, Apurva Nandan wrote: > Hello everyone, > > Generally

RDD generated from Dataframes

2016-04-21 Thread Apurva Nandan
Hello everyone, Generally speaking, I guess it's well known that dataframes are much faster than RDD when it comes to performance. My question is how do you go around when it comes to transforming a dataframe using map. I mean then the dataframe gets converted into RDD, hence now do you again