Dear All,
I have a hive table with 100 million data and I just ran some very simple
operations on this dataset like:
val df = sqlContext.sql("select * from user ").toDF
df.cache
df.registerTempTable("tb")
val b=sqlContext.sql("select
'uid',max(length(uid)),count(distinct(uid)),count(uid),sum(case when uid is
null then 0 else 1 end),sum(case when uid is null then 1 else 0 end),sum(case
when uid is null then 1 else 0 end)/count(uid) from tb")
b.show //the result just one line but this step is extremely slow
Is this expected? Why show is so slow for dataframe? Is it a bug in the
optimizer? or I did something wrong?
Best Regards,
tylor