Re: Spark & Hive

Reynold Xin Sun, 15 Feb 2015 20:07:05 -0800

Spark SQL is not the same as Hive on Spark.

Spark SQL is a query engine that is designed from ground up for Spark
without the historic baggage of Hive. It also does more than SQL now -- it
is meant for structured data processing (e.g. the new DataFrame API) and
SQL. Spark SQL is mostly compatible with Hive, but 100% compatibility is
not a goal (nor desired, since Hive has a lot of weird SQL semantics in the
course of its evolution).


Hive on Spark is meant to replace Hive's MapReduce runtime with Spark's.

For more information, see this blog post:
https://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html



On Sun, Feb 15, 2015 at 3:03 AM, The Watcher <[email protected]> wrote:

> I'm a little confused around Hive & Spark, can someone shed some light ?
>
> Using Spark, I can access the Hive metastore and run Hive queries. Since I
> am able to do this in stand-alone mode, it can't be using map-reduce to run
> the Hive queries and I suppose it's building a query plan and executing it
> all in Spark.
>
> So, is this the same as
>
> https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
> ?
> If not, why not and aren't they likely to merge at some point ?
>
> If Spark really builds its own query plan, joins, etc without Hive's then
> is everything that requires special SQL syntax in Hive supported : window
> functions, cubes, rollups, skewed tables, etc
>
> Thanks
>

Re: Spark & Hive

Reply via email to