About saving DataFrame to Hive 1.2.1 with Spark 2.0.1

Chetan Khatri Mon, 16 Jan 2017 11:18:57 -0800

Hello Community,

I am struggling to save Dataframe to Hive Table,


Versions:

Hive 1.2.1
Spark 2.0.1

*Working code:*

/*
@Author: Chetan Khatri
/* @Author: Chetan Khatri Description: This Scala script has written for
HBase to Hive module, which reads table from HBase and dump it out to Hive
*/ import it.nerdammer.spark.hbase._ import org.apache.spark.sql.Row import
org.apache.spark.sql.types.StructType import
org.apache.spark.sql.types.StructField import
org.apache.spark.sql.types.StringType import
org.apache.spark.sql.SparkSession // Approach 1: // Read HBase Table val
hBaseRDD = sc.hbaseTable[(Option[String], Option[String], Option[String],
Option[String], Option[String])]("university").select("stid",
"name","subject","grade","city").inColumnFamily("emp") // Iterate HBaseRDD
and generate RDD[Row] val rowRDD = hBaseRDD.map(i =>
Row(i._1.get,i._2.get,i._3.get,i._4.get,i._5.get)) // Create sqlContext for
createDataFrame method val sqlContext = new
org.apache.spark.sql.SQLContext(sc) // Create Schema Structure object
empSchema { val stid = StructField("stid", StringType) val name =
StructField("name", StringType) val subject = StructField("subject",
StringType) val grade = StructField("grade", StringType) val city =
StructField("city", StringType) val struct = StructType(Array(stid, name,
subject, grade, city)) } import sqlContext.implicits._ // Create DataFrame
with rowRDD and Schema structure val stdDf =
sqlContext.createDataFrame(rowRDD,empSchema.struct); // Importing Hive
import org.apache.spark.sql.hive // Enable Hive with Hive warehouse in
SparkSession val spark = SparkSession.builder().appName("Spark Hive
Example").config("spark.sql.warehouse.dir",
"/usr/local/hive/warehouse/").enableHiveSupport().getOrCreate() // Saving
Dataframe to Hive Table Successfully.
stdDf.write.mode("append").saveAsTable("employee") // Approach 2 : Where
error comes import spark.implicits._ import spark.sql sql("use default")
sql("create table employee(stid STRING, name STRING, subject STRING, grade
STRING, city STRING)") scala> sql("show TABLES").show()
+---------+-----------+ |tableName|isTemporary| +---------+-----------+ |
employee| false| +---------+-----------+
stdDf.write.mode("append").saveAsTable("employee") ERROR Exception:
org.apache.spark.sql.AnalysisException: Saving data in MetastoreRelation
default, employee is not supported.; at
org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:221)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114) at
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
at
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
at
org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:378)
at
org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:354)
... 56 elided Questions: At Approach 1, It stores data where hive table is
not previously created, when i say saveAsTable it automatically creates for
me and next time it also appends data into that, How to store data in
previously created tables ?
It also gives warning WARN metastore.HiveMetaStore: Location:
file:/usr/local/spark/spark-warehouse/employee specified for non-external
table:employee but i have already provided path of HiveMetaStore then why
it is storing in spark's warehouse meta-store.

Hive-setup done with reference to:
http://mitu.co.in/wp-content/uploads/2015/12/Hive-Installation-on-Ubuntu-14.04-and-Hadoop-2.6.3.pdf
and it's working well, I could not change the Hive version, it must be 1.2.1

Thank you.

About saving DataFrame to Hive 1.2.1 with Spark 2.0.1

Reply via email to