I am writing a Spark job to persist data using HiveContext so that it can be accessed via the JDBC Thrift server. Although my code doesn't throw an error, I am unable to see my persisted data when I query from the Thrift server.
I tried three different ways to get this to work: 1) val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) val schemaRdd: SchemaRDD = sqlContext.createSchemaRDD(rdd) schemaRdd.saveAsTable("test_table") rdd -> RDD of a case class sc -> Spark Context Case class used in all my examples: case class SomeClass(key:String, value:String) extends Serializable 2) I then created a table called test_table after logging in to the thrift server and added two dummy records in it. val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) import sqlContext._ val usermeta = hql(" SELECT key, value from test_table") val rdd = usermeta.map(t=>{SomeClass("3","idddddddd")}) val schemaRdd = createSchemaRDD(rdd) schemaRdd.insertInto("test_table") 3) Tried the documented link on the Spark Sql programming page val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) sqlContext.sql("CREATE TABLE IF NOT EXISTS test_table (key String, value String)") The Spark job does other computations which it is able to complete and returns the correct results; just the Sql part doesn't work. What am I doing wrong? I thought that the HiveContext could be accessed from running command line queries in the Thrift Server. Cheers, Steve