[ 
https://issues.apache.org/jira/browse/HUDI-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin updated HUDI-1127:
----------------------------------
    Sprint: Hudi-Sprint-Jan-24

> Handling late arriving Deletes
> ------------------------------
>
>                 Key: HUDI-1127
>                 URL: https://issues.apache.org/jira/browse/HUDI-1127
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: deltastreamer, writer-core
>    Affects Versions: 0.9.0
>            Reporter: Bhavani Sudha
>            Assignee: Alexey Kudinkin
>            Priority: Blocker
>              Labels: sev:high
>             Fix For: 0.11.0
>
>
> Recently I was working on a [PR|https://github.com/apache/hudi/pull/1704] to 
> enhance OverwriteWithLatestAvroPayload class to consider records in storage 
> when merging. Briefly, this class will ignore older updates if the record in 
> storage is the latest one ( based on the Precombine field). 
> Based on this, the expectation is that we handle any write operation that 
> should be dealt with the same way - if they are older they should be ignored. 
> While at this, I identified that we cannot handle all Deletes the same way. 
> This is because we process deletes in two ways mainly -
>  * by adding and enabling a metadata field  `_hoodie_is_deleted` to our in 
> the original record and sending it as an UPSERT operation.
>  * by using an empty payload using the EmptyHoodieRecordPayload and sending 
> the write as a DELETE operation. 
> While the former has ordering field and can be processed as expected (older 
> deletes will be ignored), the later does not have any ordering field to 
> identify if its an older delete or not and hence will let the older delete to 
> go through.
> Just opening this issue to track this gap. We would need to identify what is 
> the right choice here and fix as needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to