Re: How to overwrite partition when writing Parquet?

2015-08-20 Thread Romi Kuntsman
Cheng - what if I want to overwrite a specific partition?

I'll to remove the folder, as Hemant suggested...

On Thu, Aug 20, 2015 at 1:17 PM Cheng Lian lian.cs@gmail.com wrote:

 You can apply a filter first to filter out data of needed dates and then
 append them.


 Cheng


 On 8/20/15 4:59 PM, Hemant Bhanawat wrote:

 How can I overwrite only a given partition or manually remove a partition
 before writing?

 I don't know if (and I don't think)  there is a way to do that using a
 mode. But doesn't manually deleting the directory of a particular partition
 help? For directory structure, check this out...


 http://spark.apache.org/docs/latest/sql-programming-guide.html#partition-discovery


 On Wed, Aug 19, 2015 at 8:18 PM, Romi Kuntsman r...@totango.com wrote:

 Hello,

 I have a DataFrame, with a date column which I want to use as a partition.
 Each day I want to write the data for the same date in Parquet, and then
 read a dataframe for a date range.

 I'm using:

 myDataframe.write().partitionBy(date).mode(SaveMode.Overwrite).parquet(parquetDir);

 If I use SaveMode.Append, then writing data for the same partition adds
 the same data there again.
 If I use SaveMode.Overwrite, then writing data for a single partition
 removes all the data for all partitions.

 How can I overwrite only a given partition or manually remove a partition
 before writing?

 Many thanks!
 Romi K.






Re: How to overwrite partition when writing Parquet?

2015-08-20 Thread Cheng Lian
You can apply a filter first to filter out data of needed dates and then 
append them.


Cheng

On 8/20/15 4:59 PM, Hemant Bhanawat wrote:
How can I overwrite only a given partition or manually remove a 
partition before writing?


I don't know if (and I don't think)  there is a way to do that using a 
mode. But doesn't manually deleting the directory of a particular 
partition help? For directory structure, check this out...


http://spark.apache.org/docs/latest/sql-programming-guide.html#partition-discovery


On Wed, Aug 19, 2015 at 8:18 PM, Romi Kuntsman r...@totango.com 
mailto:r...@totango.com wrote:


Hello,

I have a DataFrame, with a date column which I want to use as a
partition.
Each day I want to write the data for the same date in Parquet,
and then read a dataframe for a date range.

I'm using:

myDataframe.write().partitionBy(date).mode(SaveMode.Overwrite).parquet(parquetDir);

If I use SaveMode.Append, then writing data for the same partition
adds the same data there again.
If I use SaveMode.Overwrite, then writing data for a single
partition removes all the data for all partitions.

How can I overwrite only a given partition or manually remove a
partition before writing?

Many thanks!
Romi K.






How to overwrite partition when writing Parquet?

2015-08-19 Thread Romi Kuntsman
Hello,

I have a DataFrame, with a date column which I want to use as a partition.
Each day I want to write the data for the same date in Parquet, and then
read a dataframe for a date range.

I'm using:
myDataframe.write().partitionBy(date).mode(SaveMode.Overwrite).parquet(parquetDir);

If I use SaveMode.Append, then writing data for the same partition adds the
same data there again.
If I use SaveMode.Overwrite, then writing data for a single partition
removes all the data for all partitions.

How can I overwrite only a given partition or manually remove a partition
before writing?

Many thanks!
Romi K.