Hello Community, I am struggling to save Dataframe to Hive Table,
Versions: Hive 1.2.1 Spark 2.0.1 *Working code:* /* @Author: Chetan Khatri /* @Author: Chetan Khatri Description: This Scala script has written for HBase to Hive module, which reads table from HBase and dump it out to Hive */ import it.nerdammer.spark.hbase._ import org.apache.spark.sql.Row import org.apache.spark.sql.types.StructType import org.apache.spark.sql.types.StructField import org.apache.spark.sql.types.StringType import org.apache.spark.sql.SparkSession // Approach 1: // Read HBase Table val hBaseRDD = sc.hbaseTable[(Option[String], Option[String], Option[String], Option[String], Option[String])]("university").select("stid", "name","subject","grade","city").inColumnFamily("emp") // Iterate HBaseRDD and generate RDD[Row] val rowRDD = hBaseRDD.map(i => Row(i._1.get,i._2.get,i._3.get,i._4.get,i._5.get)) // Create sqlContext for createDataFrame method val sqlContext = new org.apache.spark.sql.SQLContext(sc) // Create Schema Structure object empSchema { val stid = StructField("stid", StringType) val name = StructField("name", StringType) val subject = StructField("subject", StringType) val grade = StructField("grade", StringType) val city = StructField("city", StringType) val struct = StructType(Array(stid, name, subject, grade, city)) } import sqlContext.implicits._ // Create DataFrame with rowRDD and Schema structure val stdDf = sqlContext.createDataFrame(rowRDD,empSchema.struct); // Importing Hive import org.apache.spark.sql.hive // Enable Hive with Hive warehouse in SparkSession val spark = SparkSession.builder().appName("Spark Hive Example").config("spark.sql.warehouse.dir", "/usr/local/hive/warehouse/").enableHiveSupport().getOrCreate() // Saving Dataframe to Hive Table Successfully. stdDf.write.mode("append").saveAsTable("employee") // Approach 2 : Where error comes import spark.implicits._ import spark.sql sql("use default") sql("create table employee(stid STRING, name STRING, subject STRING, grade STRING, city STRING)") scala> sql("show TABLES").show() +---------+-----------+ |tableName|isTemporary| +---------+-----------+ | employee| false| +---------+-----------+ stdDf.write.mode("append").saveAsTable("employee") ERROR Exception: org.apache.spark.sql.AnalysisException: Saving data in MetastoreRelation default, employee is not supported.; at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:221) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86) at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:378) at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:354) ... 56 elided Questions: At Approach 1, It stores data where hive table is not previously created, when i say saveAsTable it automatically creates for me and next time it also appends data into that, How to store data in previously created tables ? It also gives warning WARN metastore.HiveMetaStore: Location: file:/usr/local/spark/spark-warehouse/employee specified for non-external table:employee but i have already provided path of HiveMetaStore then why it is storing in spark's warehouse meta-store. Hive-setup done with reference to: http://mitu.co.in/wp-content/uploads/2015/12/Hive-Installation-on-Ubuntu-14.04-and-Hadoop-2.6.3.pdf and it's working well, I could not change the Hive version, it must be 1.2.1 Thank you.