Hi,
I'm reading in a CSV file, and I would like to write it back as a permanent
table, but with partitioning by year, etc.
Currently I do this:
from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)
df =
sqlContext.read.format('com.databricks.spark.csv').options(header='true',
This is tracked by these JIRAs..
https://issues.apache.org/jira/browse/SPARK-5947
https://issues.apache.org/jira/browse/SPARK-5948
From: denny.g@gmail.com
Date: Wed, 1 Apr 2015 04:35:08 +
Subject: Creating Partitioned Parquet Tables via SparkSQL
To: user@spark.apache.org
Creating
: Wed, 1 Apr 2015 04:35:08 +
Subject: Creating Partitioned Parquet Tables via SparkSQL
To: user@spark.apache.org
Creating Parquet tables via .saveAsTable is great but was wondering if
there was an equivalent way to create partitioned parquet tables.
Thanks!