I don't think that's generally true, but is true to the extent that
you can push down the work of higher-level logical operators like
select and groupBy, on common types, that can be understood and
optimized. Your arbitrary user code is opaque and can't be optimized.
So DataFrame.groupBy.max is
In upcoming 2.0 release, the signature for map() has become:
def map[U : Encoder](func: T => U): Dataset[U] = withTypedPlan {
Note: DataFrame and DataSet are unified in 2.0
FYI
On Thu, Apr 21, 2016 at 6:49 AM, Apurva Nandan wrote:
> Hello everyone,
>
> Generally
Hello everyone,
Generally speaking, I guess it's well known that dataframes are much faster
than RDD when it comes to performance.
My question is how do you go around when it comes to transforming a
dataframe using map.
I mean then the dataframe gets converted into RDD, hence now do you again