Spark SQL is not the same as Hive on Spark. Spark SQL is a query engine that is designed from ground up for Spark without the historic baggage of Hive. It also does more than SQL now -- it is meant for structured data processing (e.g. the new DataFrame API) and SQL. Spark SQL is mostly compatible with Hive, but 100% compatibility is not a goal (nor desired, since Hive has a lot of weird SQL semantics in the course of its evolution).
Hive on Spark is meant to replace Hive's MapReduce runtime with Spark's. For more information, see this blog post: https://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html On Sun, Feb 15, 2015 at 3:03 AM, The Watcher <watche...@gmail.com> wrote: > I'm a little confused around Hive & Spark, can someone shed some light ? > > Using Spark, I can access the Hive metastore and run Hive queries. Since I > am able to do this in stand-alone mode, it can't be using map-reduce to run > the Hive queries and I suppose it's building a query plan and executing it > all in Spark. > > So, is this the same as > > https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started > ? > If not, why not and aren't they likely to merge at some point ? > > If Spark really builds its own query plan, joins, etc without Hive's then > is everything that requires special SQL syntax in Hive supported : window > functions, cubes, rollups, skewed tables, etc > > Thanks >