Re: Behind the scene of RDD to DataFrame

2016-02-21 Thread Weiwei Zhang
Thanks a lot! Best Regards, Weiwei On Sat, Feb 20, 2016 at 11:53 PM, Hemant Bhanawat wrote: > toDF internally calls sqlcontext.createDataFrame which transforms the RDD > to RDD[InternalRow]. This RDD[InternalRow] is then mapped to a dataframe. > > Type conversions (from

Re: Behind the scene of RDD to DataFrame

2016-02-20 Thread Hemant Bhanawat
toDF internally calls sqlcontext.createDataFrame which transforms the RDD to RDD[InternalRow]. This RDD[InternalRow] is then mapped to a dataframe. Type conversions (from scala types to catalyst types) are involved but no shuffling. Hemant Bhanawat

Behind the scene of RDD to DataFrame

2016-02-20 Thread Weiwei Zhang
Hi there, Could someone explain to me what is behind the scene of rdd.toDF()? More importantly, will this step involve a lot of shuffles and cause the surge of the size of intermediate files? Thank you. Best Regards, Vivian