hudi-bot opened a new issue, #16643:
URL: https://github.com/apache/hudi/issues/16643

   As of now, when ingestion writer conflicts with pending clustering, the 
ingestion write is aborted, while the clustering eventually succeeds. But many 
times, ingestion writer might have stronger SLAs and might want to make 
progress over clustering, since clustering is considered to be more of an 
optimization. 
   
    
   
   So, would be good to add this support to Hudi 
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-8291
   - Type: Improvement
   - Fix version(s):
     - 1.1.0
   
   
   ---
   
   
   ## Comments
   
   02/Oct/24 22:53;shivnarayan;h2. Design
   h3. Design considerations:
   We have two approaches to take. either we introduce a new state to the 
timeline or we avoid making any changes to timeline, but take a hit on ensuring 
all readers of pending clustering plans account for concurrent deletion of the 
plan or deser eagerly within the lock which might bring in some complexity.
    
   We can dive into the specifics below
   h3. Design Option1 : *Introducing "abort" state to actions in timeline* 
   As of now, an action typically starts with "requested" state and then moves 
to "inflight" and eventually goes to "complete". If not, a rollback will be 
triggered in which the commit files are deleted from timeline. We are 
introducing a new state called "abort". With this, an action can actually move 
from
                                                        ➝ "complete"
   "requested" ➝ "inflight" ➝
                                                        ➝ "abort"
    
   Abort means that the action is aborted or cancelled.
    
   Cancellable clustering plan requests will be added to
    
   ".hoodie/cancel_requests/" folder within ".hoodie" by any ingestion writer.
    
   format:
   [\{clustering_instant}_\{writer_commit_time}.cancel]
   h3. Writer:
   whenever it detects a conflict w/ clustering and wants to proceed ahead,
   will check if there are any pending requests for cancellations.
    
   {code:java}
   if not {
   acquires lock
   Checks if there are no pending requests for cancellations and clustering is 
not yet complete
   adds a cancellation request to ".hoodie/clustering_cancel_requests/"
   else if there is a pending clustering,
   return
   else if clustering is already complete:
   return.
   releases lock
   } {code}
    
    
   In all cases, the writer assumes that it can proceed w/o any hiccups.
   Todo:
   If clustering completes, should the writer bail out?
    
   Eventually during conflict resolution step:
   {code:java}
   if there is a conflict w/ an on-going clustering,
   and if there is a cancellation request
   continue and complete the commit.
   else if the clustering moved to abort state.
   continue and complete the commit.
   else if clustering is completed or is in progress w/o any request for 
cancellations,
   current writer should abort itself. {code}
    
   h4. Clustering execution runnable:
   just before wrapping up the clustering (we could optimize this down the 
line, to do it eagerly at different points in time. For now, lets say we do 
this check at the end)
    
   {code:java}
   takes a lock
   checks for any cancellation request.
   if yes,
   rollsback the clustering.
   and moves the state to abort state.
   else,
   continues as usual. does conflict resolution as usual. bcoz, if the writer 
does not want to cancel the clustering, there could be data loss. hence the 
conflict resolution step should not change for clustering writer. {code}
    
   h4. Consideration:
   Instead of adding the cancel requests to a adhoc folder in ".hoodie", we 
could also introduce a potential action in the timeline.
    
   t5.rc.requested
   t5.rc.inflight
   t5.rc.abort.requested
   t5.rc.abort
    
   Up until now, for the past 8+ years, we were able to confine to just 3 
states (requested, inflight, complete). But this one introduces a 4th state. 
So, we need to really think about if we wanted to go this route.
   Above is just an alternative to "conveying the cancellation request" to the 
cluster worker by other ingestion writer. Irrespective of this, the final abort 
state is required.
    
   h4. Fixes to other entities accessing pending clustering.
    
   lets try to understand the usages of the pending clustering.
   1. Delete partition: When planning for delete partition, ignores file groups 
to be replaced that already has a pending clustering. 2. Future clustering 
planning will ignore file groups from pending clustering plans. 2.a Consistent 
Bucket index. Will ignore file groups that are part of already pending 
clustering. 3. Future compaction planning will ignore file groups from pending 
clustering plans. 4. CommitActionExecutor: updateStrategy.handleUpdate() to 
resolve conflicts with pending clustering for incoming writes. We should be 
able to deprecate this once we have conurrent update support with pending 
clustering. 5. Upsert partitioner. To avoid adding small file handling to file 
groups from pending clustering. 6. Guess w/ consistent bucket index, incoming 
records will be added to both replaced and new file groups from pending 
clustering and hence pending clustering is required there.
    
   In most cases, callers wanted to ignore file groups from a pending 
clustering. So, eventually if its deleted, should not be an issue.
   Delete partition and consistency bucket index might need some fixes to redo 
or replan itself again if it detects a pending clustering is eventually deleted 
or up for cancellation.
    
   h4. Example illustration of the above design
   Lets walk through a simple scenario and see how it might pan out.
    
   [t5.rc.re|http://t5.rc.re/]
   t5.rc.infli
   crashed.
    
   restart :
   we detect that there is a cancellation request and hence trigger a rollback.
    
   t6.rb.req
   t6.rb.inf
   t6.rb.complete
   Note: t6 fully rollsback t5's prev attempt.
    
   And then proceeds on to move the action to abort.
    
   t5.rc.requested
   t5.rc.inflight (empty file)
   t5.rc.aborted (empty)
    
    
   h2. Design choice 2:
   All we are trying to achieve w/ approach 1 is to 1) intimate clustering 
worker that a particular clustering has to be cancelled. And 2) the abort state 
is introduced so that any other concurrent reader may not run into any deser 
issue. If not, we could go ahead and delete the clustering plan. In this 
design, we try to take a stab to see if we can avoid both.
    
   1) should be doable by adding some additional metadata to commits. Every 
writer will be given some priority. for simplicity say, ingestion writer gets 
priority of 1 and clustering gets 2. Ingestion writer will not worry about any 
conflicts wrt any pending clustering and will proceed onto complete its commits.
   On the other hand, clustering worker during the conflict resolution step, 
will find out all conflicting commits, and find for high priority writers 
(commit metadata). Based on that it will cancel itself.
    
   2) is what might be tricky.
   Before we go further, lets try to understand the usages of the pending 
clustering.
   Please refer to above section where I have compiled all users/callers of 
pending clustering.
    
   So, its gonna be tough to fix all callers so that they account for the 
scenario that a pending clustering could be deleted or revoked eventually.
    
   To the bare minimum, we should account for scenario, where timeline contains 
a pending replace commit, but when a caller is about to deser replace commit 
metadata, it was deleted or rolled back by the concurrent clustering worker.
    
   If at all we really wanted to achieve this, we need every caller to take 
locks, refresh timeline, and deser replace commit metadata eagerly and release 
the lock. so, that while the caller is within the lock, deser replace commit 
may not fail at all. But post releasing the lock, a concurrent clustering 
worker could delete the replace commit, and the caller will have to account for 
it.
    
   h2. Conclusion:
   I feel approach 1 seems elegant and does not involve lot of complexities. 
But it does introduce a new state to the timeline. Will brainstorm w/ the team 
further.;;;
   
   ---
   
   08/Oct/24 01:03;shivnarayan;Here is the RFC w. the first approach 
[https://github.com/apache/hudi/pull/11555/files?short_path=0de0b99#diff-0de0b9940a382f28dbf83c9007047ea84e0a61c553dfb6b55b279a327cc7f159]
 ;;;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to