Re: RDD generated from Dataframes

Ted Yu Thu, 21 Apr 2016 06:56:43 -0700

In upcoming 2.0 release, the signature for map() has become:

  def map[U : Encoder](func: T => U): Dataset[U] = withTypedPlan {


Note: DataFrame and DataSet are unified in 2.0

FYI

On Thu, Apr 21, 2016 at 6:49 AM, Apurva Nandan <apurva3...@gmail.com> wrote:

> Hello everyone,
>
> Generally speaking, I guess it's well known that dataframes are much
> faster than RDD when it comes to performance.
> My question is how do you go around when it comes to transforming a
> dataframe using map.
> I mean then the dataframe gets converted into RDD, hence now do you again
> convert this RDD to a new dataframe for better performance?
> Further, if you have a process which involves series of transformations
> i.e. from one RDD to another, do you keep on converting each RDD to a
> dataframe first, all the time?
>
> It's also possible that I might be missing something here, please share
> your experiences.
>
>
> Thanks and Regards,
> Apurva
>

Re: RDD generated from Dataframes

Reply via email to