Hi Pralabh, You need to check the latest compatibility between Spark version that can successfully work as Hive execution engine
This is my old file alluding to spark-1.3.1 as the execution engine set spark.home=/data6/hduser/spark-1.3.1-bin-hadoop2.6; --set spark.home=/usr/lib/spark-1.6.2-bin-hadoop2.6; set spark.master=yarn-client; set hive.execution.engine=spark; Hive is great as a data warehouse but the default mapReduce used is Jurassic Park. On the other hand Spark has performant inbuilt API for Hive. Otherwise you can connect to Hive on a remote cluster through JDBC. In python you can do from pyspark.sql import SparkSession from pyspark import SparkContext from pyspark.sql import SQLContext from pyspark.sql import HiveContext And use it like below sqltext = "" if (spark.sql("SHOW TABLES IN test like 'randomDataPy'").count() == 1): rows = spark.sql(f"""SELECT COUNT(1) FROM {fullyQualifiedTableName}""").collect()[0][0] print ("number of rows is ",rows) else: print("\nTable test.randomDataPy does not exist, creating table ") sqltext = """ CREATE TABLE test.randomDataPy( ID INT , CLUSTERED INT , SCATTERED INT , RANDOMISED INT , RANDOM_STRING VARCHAR(50) , SMALL_VC VARCHAR(50) , PADDING VARCHAR(4000) ) STORED AS PARQUET """ spark.sql(sqltext) HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Thu, 1 Jul 2021 at 11:50, Pralabh Kumar <pralabhku...@gmail.com> wrote: > Hi mich > > Thx for replying.your answer really helps. The comparison was done in > 2016. I would like to know the latest comparison with spark 3.0 > > Also what you are suggesting is to migrate queries to Spark ,which is > hivecontxt or hive on spark, which is what Facebook also did > . Is that understanding correct ? > > Regards > Pralabh > > On Thu, 1 Jul 2021, 15:44 Mich Talebzadeh, <mich.talebza...@gmail.com> > wrote: > >> Hi Prahabh, >> >> This question has been asked before :) >> >> Few years ago (late 2016), I made a presentation on running Hive Queries >> on the Spark execution engine for Hortonworks. >> >> >> https://www.slideshare.net/MichTalebzadeh1/query-engines-for-hive-mr-spark-tez-with-llap-considerations >> >> The issue you will face will be compatibility problems with versions of >> Hive and Spark. >> >> My suggestion would be to use Spark as a massive parallel processing and >> Hive as a storage layer. However, you need to test what can be migrated or >> not. >> >> HTH >> >> >> Mich >> >> >> view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> >> On Thu, 1 Jul 2021 at 10:52, Pralabh Kumar <pralabhku...@gmail.com> >> wrote: >> >>> Hi Dev >>> >>> I am having thousands of legacy hive queries . As a plan to move to >>> Spark , we are planning to migrate Hive queries on Spark . Now there are >>> two approaches >>> >>> >>> 1. One is Hive on Spark , which is similar to changing the >>> execution engine in hive queries like TEZ. >>> 2. Another one is migrating hive queries to Hivecontext/sparksql , >>> an approach used by Facebook and presented in Spark conference. >>> >>> https://databricks.com/session/experiences-migrating-hive-workload-to-sparksql#:~:text=Spark%20SQL%20in%20Apache%20Spark,SQL%20with%20minimal%20user%20intervention >>> . >>> >>> >>> Can you please guide me which option to go for . I am personally >>> inclined to go for option 2 . It also allows the use of the latest spark . >>> >>> Please help me on the same , as there are not much comparisons online >>> available keeping Spark 3.0 in perspective. >>> >>> Regards >>> Pralabh Kumar >>> >>> >>>