Re: When queried through hiveContext, does hive executes these queries using its execution engine (default is map-reduce), or spark just reads the data and performs those queries itself?

2016-06-08 Thread lalit sharma
To add on what Vikash said above, bit more internals :
1. There are 2 components which work together to achieve Hive + Spark
integration
   a. HiveContext which extends SqlContext adds logic to add hive specific
things e.g. loading jars to talk to underlying metastore db, load configs
in hive-site.xml
   b. HiveThriftServer2 which uses native HiveServer2 and add logic for
creating sessions, handling operations.
2. Once thrift server is up , authentication , session management is all
delegated to Hive classes. Once parsing of query is done and logical plan
is created and passed on to create DataFrame.

So no mapReduce , spark intelligently uses needed pieces from Hive and use
its own execution engine.

--Regards,
Lalit

On Wed, Jun 8, 2016 at 9:59 PM, Vikash Pareek  wrote:

> Himanshu,
>
> Spark doesn't use hive execution engine (Map Reduce) to execute query.
> Spark
> only reads the meta data from hive meta store db and executes the query
> within Spark execution engine. This meta data is used by Spark's own SQL
> execution engine (this includes components such as catalyst, tungsten to
> optimize queries) to execute query and generate result faster than hive
> (Map
> Reduce).
>
> Using HiveContext means connecting to hive meta store db. Thus, HiveContext
> can access hive meta data, and hive meta data includes location of data,
> serialization and de-serializations, compression codecs, columns, datatypes
> etc. thus, Spark have enough information about the hive tables and it's
> data
> to understand the target data and execute the query over its on execution
> engine.
>
> Overall, Spark replaced the Map Reduce model completely by it's
> in-memory(RDD) computation engine.
>
> - Vikash Pareek
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/When-queried-through-hiveContext-does-hive-executes-these-queries-using-its-execution-engine-default-tp27114p27117.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: When queried through hiveContext, does hive executes these queries using its execution engine (default is map-reduce), or spark just reads the data and performs those queries itself?

2016-06-08 Thread Vikash Pareek
Himanshu,

Spark doesn't use hive execution engine (Map Reduce) to execute query. Spark
only reads the meta data from hive meta store db and executes the query
within Spark execution engine. This meta data is used by Spark's own SQL
execution engine (this includes components such as catalyst, tungsten to
optimize queries) to execute query and generate result faster than hive (Map
Reduce).

Using HiveContext means connecting to hive meta store db. Thus, HiveContext
can access hive meta data, and hive meta data includes location of data,
serialization and de-serializations, compression codecs, columns, datatypes
etc. thus, Spark have enough information about the hive tables and it's data
to understand the target data and execute the query over its on execution
engine.

Overall, Spark replaced the Map Reduce model completely by it's
in-memory(RDD) computation engine.

- Vikash Pareek



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/When-queried-through-hiveContext-does-hive-executes-these-queries-using-its-execution-engine-default-tp27114p27117.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



When queried through hiveContext, does hive executes these queries using its execution engine (default is map-reduce), or spark just reads the data and performs those queries itself?

2016-06-08 Thread Himanshu Mehra
So what happens underneath when we query on a hive table using hiveContext? 

1. Does Spark talks to metastore to get the data location on hdfs and read
the data from there to perform those queries?
2. Spark passes those queries to hive and hive executes those queries on the
table and returns the results to spark? In this case, might hive be using
map-reduce to execute the queries?

Please clarify this confusion. I have looked into the code seems like spark
is just fetching the data from hdfs. Please convince me otherwise.

Thanks

Best



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/When-queried-through-hiveContext-does-hive-executes-these-queries-using-its-execution-engine-default-tp27114.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org