subject:"Spark process creating and writing to a Hive ORC table"

Re: Spark process creating and writing to a Hive ORC table

2016-04-01 Thread Mich Talebzadeh

yes this is feasible. You can use databricks jar file to loas csv files from staging directory. This is pretty standard val df = sqlContext.read.format("com.databricks.spark.csv").option("inferSchema", "true").option("header", "true").load("hdfs://xx:9000/data/stg/") You can then create an

Spark process creating and writing to a Hive ORC table

2016-03-31 Thread Ashok Kumar

Hello, How feasible is to use Spark to extract csv files and creates and writes the content to an ORC table in a Hive database. Is Parquet file the best (optimum) format to write to HDFS from Spark app. Thanks