[I] Add support to give priority to ingestion writer when there is a conflicting pending clustering [hudi]

via GitHub Sun, 30 Nov 2025 02:47:00 -0800


hudi-bot opened a new issue, #16643:
URL: https://github.com/apache/hudi/issues/16643

As of now, when ingestion writer conflicts with pending clustering, the
ingestion write is aborted, while the clustering eventually succeeds. But many
times, ingestion writer might have stronger SLAs and might want to make
progress over clustering, since clustering is considered to be more of an
optimization.

So, would be good to add this support to Hudi

## JIRA info

- Link: https://issues.apache.org/jira/browse/HUDI-8291
- Type: Improvement
- Fix version(s):
- 1.1.0

---

## Comments

02/Oct/24 22:53;shivnarayan;h2. Design
h3. Design considerations:
We have two approaches to take. either we introduce a new state to the
timeline or we avoid making any changes to timeline, but take a hit on ensuring
all readers of pending clustering plans account for concurrent deletion of the
plan or deser eagerly within the lock which might bring in some complexity.
We can dive into the specifics below
h3. Design Option1 : *Introducing "abort" state to actions in timeline*
As of now, an action typically starts with "requested" state and then moves
to "inflight" and eventually goes to "complete". If not, a rollback will be
triggered in which the commit files are deleted from timeline. We are
introducing a new state called "abort". With this, an action can actually move
from
➝ "complete"
"requested" ➝ "inflight" ➝
➝ "abort"
Abort means that the action is aborted or cancelled.
Cancellable clustering plan requests will be added to
".hoodie/cancel_requests/" folder within ".hoodie" by any ingestion writer.
format:
[\{clustering_instant}_\{writer_commit_time}.cancel]
h3. Writer:
whenever it detects a conflict w/ clustering and wants to proceed ahead,
will check if there are any pending requests for cancellations.
{code:java}
if not {
acquires lock
Checks if there are no pending requests for cancellations and clustering is
not yet complete
adds a cancellation request to ".hoodie/clustering_cancel_requests/"
else if there is a pending clustering,
return
else if clustering is already complete:
return.
releases lock
} {code}
In all cases, the writer assumes that it can proceed w/o any hiccups.
Todo:
If clustering completes, should the writer bail out?
Eventually during conflict resolution step:
{code:java}
if there is a conflict w/ an on-going clustering,
and if there is a cancellation request
continue and complete the commit.
else if the clustering moved to abort state.
continue and complete the commit.
else if clustering is completed or is in progress w/o any request for
cancellations,
current writer should abort itself. {code}
h4. Clustering execution runnable:
just before wrapping up the clustering (we could optimize this down the
line, to do it eagerly at different points in time. For now, lets say we do
this check at the end)
{code:java}
takes a lock
checks for any cancellation request.
if yes,
rollsback the clustering.
and moves the state to abort state.
else,
continues as usual. does conflict resolution as usual. bcoz, if the writer
does not want to cancel the clustering, there could be data loss. hence the
conflict resolution step should not change for clustering writer. {code}
h4. Consideration:
Instead of adding the cancel requests to a adhoc folder in ".hoodie", we
could also introduce a potential action in the timeline.
t5.rc.requested
t5.rc.inflight
t5.rc.abort.requested
t5.rc.abort
Up until now, for the past 8+ years, we were able to confine to just 3
states (requested, inflight, complete). But this one introduces a 4th state.
So, we need to really think about if we wanted to go this route.
Above is just an alternative to "conveying the cancellation request" to the
cluster worker by other ingestion writer. Irrespective of this, the final abort
state is required.
h4. Fixes to other entities accessing pending clustering.
lets try to understand the usages of the pending clustering.
1. Delete partition: When planning for delete partition, ignores file groups
to be replaced that already has a pending clustering. 2. Future clustering
planning will ignore file groups from pending clustering plans. 2.a Consistent
Bucket index. Will ignore file groups that are part of already pending
clustering. 3. Future compaction planning will ignore file groups from pending
clustering plans. 4. CommitActionExecutor: updateStrategy.handleUpdate() to
resolve conflicts with pending clustering for incoming writes. We should be
able to deprecate this once we have conurrent update support with pending
clustering. 5. Upsert partitioner. To avoid adding small file handling to file
groups from pending clustering. 6. Guess w/ consistent bucket index, incoming
records will be added to both replaced and new file groups from pending
clustering and hence pending clustering is required there.
In most cases, callers wanted to ignore file groups from a pending
clustering. So, eventually if its deleted, should not be an issue.
Delete partition and consistency bucket index might need some fixes to redo
or replan itself again if it detects a pending clustering is eventually deleted
or up for cancellation.
h4. Example illustration of the above design
Lets walk through a simple scenario and see how it might pan out.
[t5.rc.re|http://t5.rc.re/]
t5.rc.infli
crashed.
restart :
we detect that there is a cancellation request and hence trigger a rollback.
t6.rb.req
t6.rb.inf
t6.rb.complete
Note: t6 fully rollsback t5's prev attempt.
And then proceeds on to move the action to abort.
t5.rc.requested
t5.rc.inflight (empty file)
t5.rc.aborted (empty)
h2. Design choice 2:
All we are trying to achieve w/ approach 1 is to 1) intimate clustering
worker that a particular clustering has to be cancelled. And 2) the abort state
is introduced so that any other concurrent reader may not run into any deser
issue. If not, we could go ahead and delete the clustering plan. In this
design, we try to take a stab to see if we can avoid both.
1) should be doable by adding some additional metadata to commits. Every
writer will be given some priority. for simplicity say, ingestion writer gets
priority of 1 and clustering gets 2. Ingestion writer will not worry about any
conflicts wrt any pending clustering and will proceed onto complete its commits.
On the other hand, clustering worker during the conflict resolution step,
will find out all conflicting commits, and find for high priority writers
(commit metadata). Based on that it will cancel itself.
2) is what might be tricky.
Before we go further, lets try to understand the usages of the pending
clustering.
Please refer to above section where I have compiled all users/callers of
pending clustering.
So, its gonna be tough to fix all callers so that they account for the
scenario that a pending clustering could be deleted or revoked eventually.
To the bare minimum, we should account for scenario, where timeline contains
a pending replace commit, but when a caller is about to deser replace commit
metadata, it was deleted or rolled back by the concurrent clustering worker.
If at all we really wanted to achieve this, we need every caller to take
locks, refresh timeline, and deser replace commit metadata eagerly and release
the lock. so, that while the caller is within the lock, deser replace commit
may not fail at all. But post releasing the lock, a concurrent clustering
worker could delete the replace commit, and the caller will have to account for
it.
h2. Conclusion:
I feel approach 1 seems elegant and does not involve lot of complexities.
But it does introduce a new state to the timeline. Will brainstorm w/ the team
further.;;;

---

08/Oct/24 01:03;shivnarayan;Here is the RFC w. the first approach
[https://github.com/apache/hudi/pull/11555/files?short_path=0de0b99#diff-0de0b9940a382f28dbf83c9007047ea84e0a61c553dfb6b55b279a327cc7f159]
;;;

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Add support to give priority to ingestion writer when there is a conflicting pending clustering [hudi]

Reply via email to