Re: How to overwrite partition when writing Parquet?
Cheng - what if I want to overwrite a specific partition? I'll to remove the folder, as Hemant suggested... On Thu, Aug 20, 2015 at 1:17 PM Cheng Lian lian.cs@gmail.com wrote: You can apply a filter first to filter out data of needed dates and then append them. Cheng On 8/20/15 4:59 PM, Hemant Bhanawat wrote: How can I overwrite only a given partition or manually remove a partition before writing? I don't know if (and I don't think) there is a way to do that using a mode. But doesn't manually deleting the directory of a particular partition help? For directory structure, check this out... http://spark.apache.org/docs/latest/sql-programming-guide.html#partition-discovery On Wed, Aug 19, 2015 at 8:18 PM, Romi Kuntsman r...@totango.com wrote: Hello, I have a DataFrame, with a date column which I want to use as a partition. Each day I want to write the data for the same date in Parquet, and then read a dataframe for a date range. I'm using: myDataframe.write().partitionBy(date).mode(SaveMode.Overwrite).parquet(parquetDir); If I use SaveMode.Append, then writing data for the same partition adds the same data there again. If I use SaveMode.Overwrite, then writing data for a single partition removes all the data for all partitions. How can I overwrite only a given partition or manually remove a partition before writing? Many thanks! Romi K.
Re: How to overwrite partition when writing Parquet?
You can apply a filter first to filter out data of needed dates and then append them. Cheng On 8/20/15 4:59 PM, Hemant Bhanawat wrote: How can I overwrite only a given partition or manually remove a partition before writing? I don't know if (and I don't think) there is a way to do that using a mode. But doesn't manually deleting the directory of a particular partition help? For directory structure, check this out... http://spark.apache.org/docs/latest/sql-programming-guide.html#partition-discovery On Wed, Aug 19, 2015 at 8:18 PM, Romi Kuntsman r...@totango.com mailto:r...@totango.com wrote: Hello, I have a DataFrame, with a date column which I want to use as a partition. Each day I want to write the data for the same date in Parquet, and then read a dataframe for a date range. I'm using: myDataframe.write().partitionBy(date).mode(SaveMode.Overwrite).parquet(parquetDir); If I use SaveMode.Append, then writing data for the same partition adds the same data there again. If I use SaveMode.Overwrite, then writing data for a single partition removes all the data for all partitions. How can I overwrite only a given partition or manually remove a partition before writing? Many thanks! Romi K.
How to overwrite partition when writing Parquet?
Hello, I have a DataFrame, with a date column which I want to use as a partition. Each day I want to write the data for the same date in Parquet, and then read a dataframe for a date range. I'm using: myDataframe.write().partitionBy(date).mode(SaveMode.Overwrite).parquet(parquetDir); If I use SaveMode.Append, then writing data for the same partition adds the same data there again. If I use SaveMode.Overwrite, then writing data for a single partition removes all the data for all partitions. How can I overwrite only a given partition or manually remove a partition before writing? Many thanks! Romi K.