Dear All,


I have a hive table with 100 million data and I just ran some very simple 
operations on this dataset like:



  val df = sqlContext.sql("select * from user ").toDF
  df.cache
  df.registerTempTable("tb")
  val b=sqlContext.sql("select  
'uid',max(length(uid)),count(distinct(uid)),count(uid),sum(case when uid is 
null then 0 else 1 end),sum(case when uid is null then 1 else 0 end),sum(case 
when uid is null then 1 else 0 end)/count(uid) from tb")
  b.show  //the result just one line but this step is extremely slow

Is this expected? Why show is so slow for dataframe? Is it a bug in the 
optimizer? or I did something wrong?


Best Regards,
tylor

Reply via email to