Spark 1.3.1 + Hive: write output to CSV with header on S3

Roberto Coluccio Fri, 17 Jul 2015 02:29:44 -0700

Hello community,

I'm currently using Spark 1.3.1 with Hive support for outputting processed
data on an external Hive table backed on S3. I'm using a manual
specification of the delimiter, but I'd want to know if is there any
"clean" way to write in CSV format:


*val* sparkConf = *new* SparkConf()

*val* sc = *new* SparkContext(sparkConf)

*val* hiveContext = *new* org.apache.spark.sql.hive.HiveContext(sc)

*import* hiveContext.implicits._

hiveContext.sql( "CREATE EXTERNAL TABLE IF NOT EXISTS table_name(field1
STRING, field2 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '" + path_on_s3 + "'")

hiveContext.sql(<an INSERT OVERWRITE query to write into the above table>)


I also need the header of the table to be printed on each written file. I
tried with:


hiveContext.sql("set hive.cli.print.header=true")


But it didn't work.


Any hint?


Thank you.


Best regards,

Roberto

Spark 1.3.1 + Hive: write output to CSV with header on S3

Reply via email to