Cheng - what if I want to overwrite a specific partition? I'll to remove the folder, as Hemant suggested...
On Thu, Aug 20, 2015 at 1:17 PM Cheng Lian <lian.cs....@gmail.com> wrote: > You can apply a filter first to filter out data of needed dates and then > append them. > > > Cheng > > > On 8/20/15 4:59 PM, Hemant Bhanawat wrote: > > How can I overwrite only a given partition or manually remove a partition > before writing? > > I don't know if (and I don't think) there is a way to do that using a > mode. But doesn't manually deleting the directory of a particular partition > help? For directory structure, check this out... > > > http://spark.apache.org/docs/latest/sql-programming-guide.html#partition-discovery > > > On Wed, Aug 19, 2015 at 8:18 PM, Romi Kuntsman <r...@totango.com> wrote: > >> Hello, >> >> I have a DataFrame, with a date column which I want to use as a partition. >> Each day I want to write the data for the same date in Parquet, and then >> read a dataframe for a date range. >> >> I'm using: >> >> myDataframe.write().partitionBy("date").mode(SaveMode.Overwrite).parquet(parquetDir); >> >> If I use SaveMode.Append, then writing data for the same partition adds >> the same data there again. >> If I use SaveMode.Overwrite, then writing data for a single partition >> removes all the data for all partitions. >> >> How can I overwrite only a given partition or manually remove a partition >> before writing? >> >> Many thanks! >> Romi K. >> > > >