Yes, that would work. You would typically add below option on dataframe to use insert overwrite (InsertOverwrite is a new API, I haven't updated documentation yet).
- hoodie.datasource.write.operation: insert_overwrite Let me know if you have any questions. @Balaji Thanks for creating the follow up ticket. Agree this can be supported in a much simpler way using insert_overwrite primitive. On Wed, Oct 21, 2020 at 6:19 PM Balaji Varadarajan <v.bal...@ymail.com.invalid> wrote: > cc Satish who implemented Insert Overwrite support. > We have recently landed Insert Overwrite support in Hudi. Partition level > deletion is a logical extension of this feature but not currently available > yet. I have added a jira to track this : > https://issues.apache.org/jira/browse/HUDI-1350 > Meanwhile, using master branch, you can do this in 2 steps. You can > generate a record for each partition you want to delete and commit the > batch. This would essentially truncate the partition to 1 record. You can > then issue a hard delete on that record. By keeping cleaner retention to > 1, you can essentially cleanup the files in the directory. Satish - Can you > chime in and see if this makes sense and if you are seeing any issues with > this ? > Thanks,Balaji.V > On Tuesday, October 20, 2020, 11:31:45 PM PDT, selvaraj periyasamy < > selvaraj.periyasamy1...@gmail.com> wrote: > > Team , > > I have a COW table which has sub partition columns > Date/Hour . For some of the use case , I need to totally remove free > petitions (removing few hours alone) . Hudi maintains metadata info. > Manually removing folders as well as in hive megastore , may mess up hudi > metadata. What is the best way to do this? > > > Thanks, > Selva >