Dear All,
I have a hive table with 100 million data and I just ran some very simple operations on this dataset like: val df = sqlContext.sql("select * from user ").toDF df.cache df.registerTempTable("tb") val b=sqlContext.sql("select 'uid',max(length(uid)),count(distinct(uid)),count(uid),sum(case when uid is null then 0 else 1 end),sum(case when uid is null then 1 else 0 end),sum(case when uid is null then 1 else 0 end)/count(uid) from tb") b.show //the result just one line but this step is extremely slow Is this expected? Why show is so slow for dataframe? Is it a bug in the optimizer? or I did something wrong? Best Regards, tylor