Re: Spark-sql versus Impala versus Hive

2015-06-19 Thread Sanjay Subramanian
Hi guys
I am using CDH 5.3.3 and that comes with Hive 0.13.1 and Spark 1.2
So to answer your question its not Tez (that I believe comes with HortonWorks)
This Hive query was run with hive defaults.
I used additional hive params right now to improve the timingsSET 
mapreduce.job.reduces=16;SET mapreduce.tasktracker.map.tasks.maximum=24;SET 
mapreduce.tasktracker.reduce.tasks.maximum=16;SET 
mapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.SnappyCodec;SET
 mapreduce.map.output.compress=true;

Now Time taken: 140.139 seconds, Fetched: 29597 row(s)(surprisingly close to 
spark-sql now LOL. Time to tweak spark-sql now) 
EARLIER RESULTS
Hive – 326.021 seconds, Fetched: 29597 row(s)
Impala – Fetched 27625 row(s) in 17.02s
spark-sql – Time taken: 120.236 seconds 

I don't have the bandwidth to manage individual components on the cluster :-) 
since I am solo doing all this and delivering ML solutions to production LOL.So 
I depend on distribution such as CDH. The downside is that one is always couple 
of versions behind.
Thanks for your questions.
regards
sanjay
  From: Michael Armbrust mich...@databricks.com
 To: user user@spark.apache.org 
 Sent: Thursday, June 18, 2015 3:25 PM
 Subject: Re: Spark-sql versus Impala versus Hive
   
I would also love to see a more recent version of Spark SQL.  There have been a 
lot of performance improvements between 1.2 and 1.4 :)


On Thu, Jun 18, 2015 at 3:18 PM, Steve Nunez snu...@hortonworks.com wrote:

Interesting. What where the Hive settings? Specifically it would be useful to 
know if this was Hive on Tez.
- Steve
From: Sanjay Subramanian
Reply-To: Sanjay Subramanian
Date: Thursday, June 18, 2015 at 11:08
To: user@spark.apache.org
Subject: Spark-sql versus Impala versus Hive


I just published results of my findings 
herehttps://bigdatalatte.wordpress.com/2015/06/18/spark-sql-versus-impala-versus-hive/







  

Spark-sql versus Impala versus Hive

2015-06-18 Thread Sanjay Subramanian
I just published results of my findings 
herehttps://bigdatalatte.wordpress.com/2015/06/18/spark-sql-versus-impala-versus-hive/




Re: Spark-sql versus Impala versus Hive

2015-06-18 Thread Michael Armbrust
I would also love to see a more recent version of Spark SQL.  There have
been a lot of performance improvements between 1.2 and 1.4 :)

On Thu, Jun 18, 2015 at 3:18 PM, Steve Nunez snu...@hortonworks.com wrote:

   Interesting. What where the Hive settings? Specifically it would be
 useful to know if this was Hive on Tez.

  - Steve

   From: Sanjay Subramanian
 Reply-To: Sanjay Subramanian
 Date: Thursday, June 18, 2015 at 11:08
 To: user@spark.apache.org
 Subject: Spark-sql versus Impala versus Hive

I just published results of my findings here

 https://bigdatalatte.wordpress.com/2015/06/18/spark-sql-versus-impala-versus-hive/





Re: Spark-sql versus Impala versus Hive

2015-06-18 Thread Steve Nunez
Interesting. What where the Hive settings? Specifically it would be useful to 
know if this was Hive on Tez.

- Steve

From: Sanjay Subramanian
Reply-To: Sanjay Subramanian
Date: Thursday, June 18, 2015 at 11:08
To: user@spark.apache.orgmailto:user@spark.apache.org
Subject: Spark-sql versus Impala versus Hive

I just published results of my findings here
https://bigdatalatte.wordpress.com/2015/06/18/spark-sql-versus-impala-versus-hive/