Hi,

I'm reading in a CSV file, and I would like to write it back as a permanent
table, but with particular partitioning by year, etc.

Currently I do this:

from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)
df =
sqlContext.read.format('com.databricks.spark.csv').options(header='true',
inferschema='true').load('/Users/imran/Downloads/intermediate.csv')
df.saveAsTable("intermediate")

Which works great.

I also know I can do this:
df.write.partitionBy("year").parquet("path/to/output")

But how do I combine the two, to save a permanent table with partitioning,
in Parquet format?

thanks,
imran

Reply via email to