RE: spark1.4.1 extremely slow for take(1) or head() or first() or show

2015-12-03 Thread Mich Talebzadeh
Can you try running it directly on hive to see the timing or through spark-sql may be. Spark does what Hive does that is processing large sets of data, but it attempts to do the intermediate iterations in memory if it can (i.e. if there is enough memory available to keep the data set in

Re: spark1.4.1 extremely slow for take(1) or head() or first() or show

2015-12-03 Thread Sahil Sareen
"select 'uid',max(length(uid)),count(distinct(uid)),count(uid),sum(case when uid is null then 0 else 1 end),sum(case when uid is null then 1 else 0 end),sum(case when uid is null then 1 else 0 end)/count(uid) from tb" Is this as is, or did you use a UDF here? -Sahil On Thu, Dec 3, 2015 at 4:06