[ https://issues.apache.org/jira/browse/HUDI-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexey Kudinkin updated HUDI-1127: ---------------------------------- Sprint: Hudi-Sprint-Jan-24 > Handling late arriving Deletes > ------------------------------ > > Key: HUDI-1127 > URL: https://issues.apache.org/jira/browse/HUDI-1127 > Project: Apache Hudi > Issue Type: Improvement > Components: deltastreamer, writer-core > Affects Versions: 0.9.0 > Reporter: Bhavani Sudha > Assignee: Alexey Kudinkin > Priority: Blocker > Labels: sev:high > Fix For: 0.11.0 > > > Recently I was working on a [PR|https://github.com/apache/hudi/pull/1704] to > enhance OverwriteWithLatestAvroPayload class to consider records in storage > when merging. Briefly, this class will ignore older updates if the record in > storage is the latest one ( based on the Precombine field). > Based on this, the expectation is that we handle any write operation that > should be dealt with the same way - if they are older they should be ignored. > While at this, I identified that we cannot handle all Deletes the same way. > This is because we process deletes in two ways mainly - > * by adding and enabling a metadata field `_hoodie_is_deleted` to our in > the original record and sending it as an UPSERT operation. > * by using an empty payload using the EmptyHoodieRecordPayload and sending > the write as a DELETE operation. > While the former has ordering field and can be processed as expected (older > deletes will be ignored), the later does not have any ordering field to > identify if its an older delete or not and hence will let the older delete to > go through. > Just opening this issue to track this gap. We would need to identify what is > the right choice here and fix as needed. -- This message was sent by Atlassian Jira (v8.20.1#820001)