RE: RDD.join vs spark SQL join

2015-08-15 Thread Xiao JIANG
in that stage).ThanksBest Regards On Fri, Aug 14, 2015 at 1:25 AM, Xiao JIANG jiangxia...@outlook.com wrote: Hi,May I know the performance difference the rdd.join function and spark SQL join operation. If I want to join several big Rdds, how should I decide which one I should use? What are the factors

RDD.join vs spark SQL join

2015-08-13 Thread Xiao JIANG
Hi,May I know the performance difference the rdd.join function and spark SQL join operation. If I want to join several big Rdds, how should I decide which one I should use? What are the factors to consider here? Thanks!

How to get total CPU consumption for Spark job

2015-08-07 Thread Xiao JIANG
Hi all, I was running some Hive/spark job on hadoop cluster. I want to see how spark helps improve not only the elapsed time but also the total CPU consumption. For Hive, I can get the 'Total MapReduce CPU Time Spent' from the log when the job finishes. But I didn't find any CPU stats for Spark