persist spark output in hive using DataFrame and saveAsTable API

2015-12-07 Thread Divya Gehlot
Hi, I am new bee to Spark. Could somebody guide me how can I persist my spark RDD results in Hive using SaveAsTable API. Would appreciate if you could provide the example for hive external table. Thanks in advance.

Re: persist spark output in hive using DataFrame and saveAsTable API

2015-12-07 Thread UMESH CHAUDHARY
currently saveAsTable will create Hive Internal table by default see here If you want to save it as external table, use saveAsParquetFile and create an external hive table on that parquet file. On Mon,

Re: persist spark output in hive using DataFrame and saveAsTable API

2015-12-07 Thread Fengdong Yu
If your RDD is JSON format, that’s easy. val df = sqlContext.read.json(rdd) df.saveAsTable(“your_table_name") > On Dec 7, 2015, at 5:28 PM, Divya Gehlot wrote: > > Hi, > I am new bee to Spark. > Could somebody guide me how can I persist my spark RDD results in Hive

Re: persist spark output in hive using DataFrame and saveAsTable API

2015-12-07 Thread Fengdong Yu
I suppose your output data is “ORC”, and want to save to hive database: test, external table name is : testTable import scala.collection.immutable sqlContext.createExternalTable(“test.testTable", "org.apache.spark.sql.hive.orc", Map("path" -> “/data/test/mydata")) > On Dec 7, 2015, at 5:28

Re: persist spark output in hive using DataFrame and saveAsTable API

2015-12-07 Thread Divya Gehlot
My input format is CSV and I am using Spark 1.3(HDP 2,2 comes with Spark 1.3 so ...) I am using Spark-csv to read my CSV file and using dataframe API to process ... I followed these steps and succesfully able to read