Hi, Let us create a DF based on an existing table in Hive using spark-shell
scala> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) HiveContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@7c666865 // Go to correct database in Hive scala> HiveContext.sql("use oraclehadoop") res31: org.apache.spark.sql.DataFrame = [result: string] // Create a DF based on Hive table sales scala> val s = HiveContext.table("sales") s: org.apache.spark.sql.DataFrame = [prod_id: bigint, cust_id: bigint, time_id: timestamp, channel_id: bigint, promo_id: bigint, quantity_sold: decimal(10,0), amount_sold: decimal(10,0), year: int, month: int] // Register it as a temporary table scala> s.registerTempTable("tmp") //Get the rows scala> HiveContext.sql("select count(1) from sales").show +------+ | _c0| +------+ |917359| +------+ // However, you cannot add a column to that table as shown below as it expects that table to exist in Hive database HiveContext.sql("ALTER TABLE tmp ADD COLUMNS(newcol INT)") 16/05/17 08:20:24 ERROR Driver: FAILED: SemanticException [Error 10001]: Table not found oraclehadoop.tmp org.apache.hadoop.hive.ql.parse.SemanticException: Table not found oraclehadoop.tmp So the assumption is that in-memory table is a placeholder in Spark? HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com