voonhous commented on PR #8673:
URL: https://github.com/apache/hudi/pull/8673#issuecomment-1878361789

   Adding some illustration for future reference:
   
   # Normal Flow
   
   
![image](https://github.com/apache/hudi/assets/6312314/da36bc4f-4410-413e-b2ad-fdc995cf560e)
   
   As can be seen in the normal flow (between the JM and TM), a cycle is 
depicted by the green box.
   
   
   # TM global failover
   
   
![image](https://github.com/apache/hudi/assets/6312314/22819d88-2bea-4929-a60c-b955a4c4506e)
   
   When there are TM restarts, global_failover will happen and 
`restored_write_meta` events (which will be detected by the JM as a bootstrap 
event) will be sent to the JM. Once JM has collected all the 
`restored_write_meta` events, a normal cycle will be invoked again. 
   
   This is done by rolling back the commit that is already on the timeline (and 
therefore, all its corresponding files that might have been written to the 
filesystem (via markers). After a rollback is done, a new commit is created and 
the normal flow as depicted in the section above is hence restored.
   
   This case is handled and the job can recover properly for such failures
   
   
   # JM blocked while TM global failover is occuring
   
   Note: this case is not handled as of now.
   
   
![image](https://github.com/apache/hudi/assets/6312314/b0d7c74c-1ff5-4569-9432-151d83d06935)
   
   
   As can be seen here, if the JM is blocked by a rollback/archive. In this 
example, an archive, and if checkpoint timeouts, causing a global failover, TMs 
will restart. 
   
   As discussed previously, TM will send `restored_write_meta` event to the JM. 
Since JM's executor is being blocked at this point in time (performing an 
archive) invoked from `#notifyCheckpointComplete`, it will not be able to 
handle events from operator.
   
   Once archive is done, it will be unblocked and a new instant is hence 
created by the JM (following the normal execution flow). For illustration, we 
will name this instant `instant_x`.
   
   After JM's executor exits from `#notifyCheckpointComplete`. At this point, 
TM will see the new `instant_x` and start peforming the writes.  
   
   JM will start handling the operator events and will perform a 
`#startInstant`, rolling back the previous `instant_x` that was just created 
and create a new instant, `instant_y`. TM is unaware of this and will continue 
with the previous fetched `instant_x`. 
   
   Desync happens and the writer will be perform abnormally until the instant 
between JM and TM lines up.
   
   Once desync happens, alot of unhandled cases can happen, depending on the 
state of the a TM when rollback of `instant_x` is happening at the JM. For 
brevity, we will not discuss them as they should not be happening in the first 
place.
   
   # TLDR
   
   Blocked JM might will cause desync between JM and TM.
   
   Unable to reproduce the data-loss, but am able to reproduce the precursor 
that may be causing the data-loss.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to