RE: spark1.4.1 extremely slow for take(1) or head() or first() or show

2015-12-03 Thread Mich Talebzadeh
Can you try running it directly on hive to see the timing or through spark-sql may be. Spark does what Hive does that is processing large sets of data, but it attempts to do the intermediate iterations in memory if it can (i.e. if there is enough memory available to keep the data set in

spark1.4.1 extremely slow for take(1) or head() or first() or show

2015-12-03 Thread hxw黄祥为
Dear All, I have a hive table with 100 million data and I just ran some very simple operations on this dataset like: val df = sqlContext.sql("select * from user ").toDF df.cache df.registerTempTable("tb") val b=sqlContext.sql("select

Re: spark1.4.1 extremely slow for take(1) or head() or first() or show

2015-12-03 Thread Sahil Sareen
"select 'uid',max(length(uid)),count(distinct(uid)),count(uid),sum(case when uid is null then 0 else 1 end),sum(case when uid is null then 1 else 0 end),sum(case when uid is null then 1 else 0 end)/count(uid) from tb" Is this as is, or did you use a UDF here? -Sahil On Thu, Dec 3, 2015 at 4:06