Thanks a lot!
Best Regards,
Weiwei
On Sat, Feb 20, 2016 at 11:53 PM, Hemant Bhanawat
wrote:
> toDF internally calls sqlcontext.createDataFrame which transforms the RDD
> to RDD[InternalRow]. This RDD[InternalRow] is then mapped to a dataframe.
>
> Type conversions (from
toDF internally calls sqlcontext.createDataFrame which transforms the RDD
to RDD[InternalRow]. This RDD[InternalRow] is then mapped to a dataframe.
Type conversions (from scala types to catalyst types) are involved but no
shuffling.
Hemant Bhanawat
Hi there,
Could someone explain to me what is behind the scene of rdd.toDF()? More
importantly, will this step involve a lot of shuffles and cause the surge
of the size of intermediate files? Thank you.
Best Regards,
Vivian