Hey Matei,
Thanks for your reply. We would keep in mind to use JDBC server for smaller
queries.
For the mapreduce job start-up, are you pointing towards JVM initialization
latencies in MR? Other than JVM initialization, does Spark do any
optimization (that is not done by mapreduce) to speed up
It's hard to tell without more details, but the start-up latency in Hive can
sometimes be high, especially if you are running Hive on MapReduce. MR just
takes 20-30 seconds per job to spin up even if the job is doing nothing.
For real use of Spark SQL for short queries by the way, I'd recommend
Hello,
We were comparing performance of some of our production hive queries
between Hive and Spark. We compared Hive(0.13)+hadoop (1.2.1) against both
Spark 0.9 and 1.1. We could see that the performance gains have been good
in Spark.
We tried a very simple query,
select count(*) from T where