Re: When queried through hiveContext, does hive executes these queries using its execution engine (default is map-reduce), or spark just reads the data and performs those queries itself?
To add on what Vikash said above, bit more internals : 1. There are 2 components which work together to achieve Hive + Spark integration a. HiveContext which extends SqlContext adds logic to add hive specific things e.g. loading jars to talk to underlying metastore db, load configs in hive-site.xml b. HiveThriftServer2 which uses native HiveServer2 and add logic for creating sessions, handling operations. 2. Once thrift server is up , authentication , session management is all delegated to Hive classes. Once parsing of query is done and logical plan is created and passed on to create DataFrame. So no mapReduce , spark intelligently uses needed pieces from Hive and use its own execution engine. --Regards, Lalit On Wed, Jun 8, 2016 at 9:59 PM, Vikash Pareekwrote: > Himanshu, > > Spark doesn't use hive execution engine (Map Reduce) to execute query. > Spark > only reads the meta data from hive meta store db and executes the query > within Spark execution engine. This meta data is used by Spark's own SQL > execution engine (this includes components such as catalyst, tungsten to > optimize queries) to execute query and generate result faster than hive > (Map > Reduce). > > Using HiveContext means connecting to hive meta store db. Thus, HiveContext > can access hive meta data, and hive meta data includes location of data, > serialization and de-serializations, compression codecs, columns, datatypes > etc. thus, Spark have enough information about the hive tables and it's > data > to understand the target data and execute the query over its on execution > engine. > > Overall, Spark replaced the Map Reduce model completely by it's > in-memory(RDD) computation engine. > > - Vikash Pareek > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/When-queried-through-hiveContext-does-hive-executes-these-queries-using-its-execution-engine-default-tp27114p27117.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Re: When queried through hiveContext, does hive executes these queries using its execution engine (default is map-reduce), or spark just reads the data and performs those queries itself?
Himanshu, Spark doesn't use hive execution engine (Map Reduce) to execute query. Spark only reads the meta data from hive meta store db and executes the query within Spark execution engine. This meta data is used by Spark's own SQL execution engine (this includes components such as catalyst, tungsten to optimize queries) to execute query and generate result faster than hive (Map Reduce). Using HiveContext means connecting to hive meta store db. Thus, HiveContext can access hive meta data, and hive meta data includes location of data, serialization and de-serializations, compression codecs, columns, datatypes etc. thus, Spark have enough information about the hive tables and it's data to understand the target data and execute the query over its on execution engine. Overall, Spark replaced the Map Reduce model completely by it's in-memory(RDD) computation engine. - Vikash Pareek -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/When-queried-through-hiveContext-does-hive-executes-these-queries-using-its-execution-engine-default-tp27114p27117.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
When queried through hiveContext, does hive executes these queries using its execution engine (default is map-reduce), or spark just reads the data and performs those queries itself?
So what happens underneath when we query on a hive table using hiveContext? 1. Does Spark talks to metastore to get the data location on hdfs and read the data from there to perform those queries? 2. Spark passes those queries to hive and hive executes those queries on the table and returns the results to spark? In this case, might hive be using map-reduce to execute the queries? Please clarify this confusion. I have looked into the code seems like spark is just fetching the data from hdfs. Please convince me otherwise. Thanks Best -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/When-queried-through-hiveContext-does-hive-executes-these-queries-using-its-execution-engine-default-tp27114.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org