voonhous commented on PR #8673: URL: https://github.com/apache/hudi/pull/8673#issuecomment-1878361789
Adding some illustration for future reference: # Normal Flow ![image](https://github.com/apache/hudi/assets/6312314/da36bc4f-4410-413e-b2ad-fdc995cf560e) As can be seen in the normal flow (between the JM and TM), a cycle is depicted by the green box. # TM global failover ![image](https://github.com/apache/hudi/assets/6312314/22819d88-2bea-4929-a60c-b955a4c4506e) When there are TM restarts, global_failover will happen and `restored_write_meta` events (which will be detected by the JM as a bootstrap event) will be sent to the JM. Once JM has collected all the `restored_write_meta` events, a normal cycle will be invoked again. This is done by rolling back the commit that is already on the timeline (and therefore, all its corresponding files that might have been written to the filesystem (via markers). After a rollback is done, a new commit is created and the normal flow as depicted in the section above is hence restored. This case is handled and the job can recover properly for such failures # JM blocked while TM global failover is occuring Note: this case is not handled as of now. ![image](https://github.com/apache/hudi/assets/6312314/b0d7c74c-1ff5-4569-9432-151d83d06935) As can be seen here, if the JM is blocked by a rollback/archive. In this example, an archive, and if checkpoint timeouts, causing a global failover, TMs will restart. As discussed previously, TM will send `restored_write_meta` event to the JM. Since JM's executor is being blocked at this point in time (performing an archive) invoked from `#notifyCheckpointComplete`, it will not be able to handle events from operator. Once archive is done, it will be unblocked and a new instant is hence created by the JM (following the normal execution flow). For illustration, we will name this instant `instant_x`. After JM's executor exits from `#notifyCheckpointComplete`. At this point, TM will see the new `instant_x` and start peforming the writes. JM will start handling the operator events and will perform a `#startInstant`, rolling back the previous `instant_x` that was just created and create a new instant, `instant_y`. TM is unaware of this and will continue with the previous fetched `instant_x`. Desync happens and the writer will be perform abnormally until the instant between JM and TM lines up. Once desync happens, alot of unhandled cases can happen, depending on the state of the a TM when rollback of `instant_x` is happening at the JM. For brevity, we will not discuss them as they should not be happening in the first place. # TLDR Blocked JM might will cause desync between JM and TM. Unable to reproduce the data-loss, but am able to reproduce the precursor that may be causing the data-loss. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org