Re: [D] RocksDB as The Replica of MDT/RLI [hudi]

via GitHub Thu, 19 Mar 2026 05:41:52 -0700


GitHub user danny0405 edited a comment on the discussion: RocksDB as The 
Replica of MDT/RLI


> I don't like how we are coupling index choices and concurrency models.

yeah, the simple bucket index is required to impl NBCC now and we may need more 
flexible and general design for concurrent modifications in streaming 
concurrent write scenarios.

> can you please explain in detail,how the failover and OCC handling are 
> related.

I think we can categorize the concurrent write cases into two: write with 
conflicts and write without conflicts.

* If the write detects conflicts, the whole job/task will trigger failover and 
the RocksDB replica will rebootstrap from scrach, which can ensure the 
consistency of the index backend akka to MDT RLI index, but this needs to 
introduce specific early conflict detection just in the checkpoint lifecycle:
  * persist the uncommitted write metadata under the Hudi table path;
  * in the last step of the #snapshot of write function, send a request to the 
coordinator to detect the conflicts;
  * need a customized conflict resolution strategy to combine all the existing 
uncommited write metadata with the latest timeline to validate where there are 
conflicts;
        

The pre-commit conflict resolution does not work well for Flink streaming 
because it happens after a successful checkpoint, Hudi deems the write as 
failed if there is conflict while Flink deems the write as successful(from the 
latest successful checkpoint), to fix gap, the early confclit resolutuon is 
required here.

* If the write does not detect confclits, there are still cases that another 
concurrent write modify the table with new record locations, the solution is we 
might need a early detection of the index backend freshness before each write: 
maintain a mappings between job-id to instant time so we can load the index 
changes maded from concurrent writers incrementally.(put the job-id in commit 
metadata or maintain it on the coordinator). This introduces a lot of 
complexities though, I'm expecting a more general solution for NBCC that is 
index type agnostic and not struggle in this index concurrent modification trap.

Here is the table for support of cuncurrent modifications with Flink RLI:

| use case/concurrency mode | OCC | NBCC |
|---|---|---|
| write & write | Y(with early conflic detection and index refreshing) | N |
| write & compaction | Y | N |
| write & clustering | Y(with early index refreshing) | N |

GitHub link: 
https://github.com/apache/hudi/discussions/18296#discussioncomment-16171800

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Re: [D] RocksDB as The Replica of MDT/RLI [hudi]

Reply via email to