Re: How Spark establishes connectivity to Hive

Artemis User Tue, 15 Mar 2022 14:12:27 -0700

I guess it's really depends on your configuration. The Hive metastoreis providing just the metadata/schema data for your database, not actualdata storage. Hive is running on top of Hadoop. If you configure yourSpark to run on the same Hadoop cluster using Yarn, your SQL dataframein Spark will be converted automatically to data files in HDFS in Hiveformat provided by the metastore. So HDFS is the connecting mediumbetween Spark and Hive.

The default Spark distribution package is bundled with a thrift serverthat you can save/retrieve dataframe to/from Hive tables using on astandalone metastore, without the presence of a Hive server. However,you do have to run a script (not provided by Spark) to initialize thestandalone metastore. In the standalone Hive mode, Spark read/writefrom/to these hive tables directly, which are stored as plain text fileson disk (not parquet).

The benefit of using a thrift server is having the option ofsaving/retrieving Spark data by third party applications via JDBCdirectly. Since most people running Spark without Hadoop, and you canrun a standalone Hive to achieve the same outcome, not sure whatbenefits of running Spark on HDFS and Hive would bring, considering thehuge admin overhead associated with both Hadoop and Hive. Hope thishelps...



On 3/14/22 1:54 PM, Venkatesan Muniappan wrote:

hi Team,
I wanted to understand how spark connects to Hive. Does it connect toHive metastore directly bypassing hive server?. Lets say when we areinserting data into a hive table with its I/O format as Parquet. DoesSpark creates the parquet file from the Dataframe/RDD/DataSet and putit in its HDFS location and update metastore about the new parquetfile?. Or it simply run the insert statement on Hiverserver (throughjdbc or some other means).
We are using Spark 2.4.3 and Hive 2.1.1 in our cluster.

Is there a document that explains about this?. Please share.

Thanks,
Venkat
2016173438

Re: How Spark establishes connectivity to Hive

Reply via email to