danny0405 commented on PR #17610: URL: https://github.com/apache/hudi/pull/17610#issuecomment-3995426734
### RocksDB as The Replica of MDT ### The update to RocksDB The rocksDB instantces are initialized and bootstrapped from scrach by reading the full MDT RLI index for each job restart or task failover. The incremental index upserts inferred from the data inputs are applied directly on these RocksDB instances, these index upserts are pass along with the data payloads altogether to the `IndexWrite` op for actual MDT update. The MDT update happens in the same lifecycle of data records write and the incremental upserts are a replica image of the upserts into the RocksDB. The RLI would be utilized for two cases: - serves as the source of truth of the index mapping and been utilized in the RocksDB bootstrap - cross engine compatibility The new write flow with RockDB replica: <img width="4704" height="1394" alt="image" src="https://github.com/user-attachments/assets/2e36f972-54d7-482e-8d62-045bc96b07a2" /> ### The Clean/Eviction of Index Payloads in RocksDB For global RLI, the rocksDB instance would be closed and removed each time a task fails over or got a job restart. For partitioned RLI, for local RocksDB instance per `BucketAssign` task, the paylods under the same data partition is stored as a separate column family, when the data partition is based on datetime, the column family can be dropped very efficiently with a configurable partition lookup TTL. ### The Additional Storage Cost -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
