Yes, that would work. You would typically add below option on dataframe to
use insert overwrite  (InsertOverwrite is a new API, I haven't updated
documentation yet).

   - hoodie.datasource.write.operation: insert_overwrite


Let me know if you have any questions.

@Balaji Thanks for creating the follow up ticket. Agree this can be
supported in a much simpler way using insert_overwrite primitive.

On Wed, Oct 21, 2020 at 6:19 PM Balaji Varadarajan
<v.bal...@ymail.com.invalid> wrote:

>  cc Satish who implemented Insert Overwrite support.
> We have recently landed Insert Overwrite support in Hudi. Partition level
> deletion is a logical extension of this feature but not currently available
> yet.  I have added a jira to track this :
> https://issues.apache.org/jira/browse/HUDI-1350
> Meanwhile, using master branch, you can do this in 2 steps. You can
> generate a record for each partition you want to delete and commit the
> batch. This would essentially truncate the partition to 1 record. You can
> then issue a hard delete on that record.  By keeping cleaner retention to
> 1, you can essentially cleanup the files in the directory. Satish - Can you
> chime in and see if this makes sense and if you are seeing any issues with
> this ?
> Thanks,Balaji.V
>     On Tuesday, October 20, 2020, 11:31:45 PM PDT, selvaraj periyasamy <
> selvaraj.periyasamy1...@gmail.com> wrote:
>
>  Team ,
>
> I have a COW table which has sub partition columns
> Date/Hour . For some of the use case , I need to totally remove free
> petitions (removing few hours alone) . Hudi maintains metadata info.
> Manually removing folders as well as in hive megastore , may mess up hudi
> metadata. What is the best way to do this?
>
>
> Thanks,
> Selva
>

Reply via email to