Hi,

Let us create a DF based on an existing table in Hive using spark-shell

scala> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
HiveContext: org.apache.spark.sql.hive.HiveContext =
org.apache.spark.sql.hive.HiveContext@7c666865
// Go to correct database in Hive
scala> HiveContext.sql("use oraclehadoop")
res31: org.apache.spark.sql.DataFrame = [result: string]
// Create a DF based on Hive table sales
scala> val s = HiveContext.table("sales")
s: org.apache.spark.sql.DataFrame = [prod_id: bigint, cust_id: bigint,
time_id: timestamp, channel_id: bigint, promo_id: bigint, quantity_sold:
decimal(10,0), amount_sold: decimal(10,0), year: int, month: int]
// Register it as a temporary table
scala> s.registerTempTable("tmp")
//Get the rows
scala> HiveContext.sql("select count(1) from sales").show
+------+
|   _c0|
+------+
|917359|
+------+

// However, you cannot add a column to that table as shown below as it
expects that table to exist in Hive database

HiveContext.sql("ALTER TABLE tmp ADD COLUMNS(newcol INT)")

16/05/17 08:20:24 ERROR Driver: FAILED: SemanticException [Error 10001]:
Table not found oraclehadoop.tmp
org.apache.hadoop.hive.ql.parse.SemanticException: Table not found
oraclehadoop.tmp

So the assumption is that in-memory table is a placeholder in Spark?

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com

Reply via email to