GitHub user danny0405 added a comment to the discussion: RocksDB as The Replica 
of MDT/RLI

> Question 1: If we treat rocksDB as the primary source of truth during writes, 
> how does concurrent updates from another writer get visible correctly during 
> BucketAssignOp stage

The concurrent upsert consistency still works under OCC since the task failover 
would anyway retrigger the full bootstrap of the RocksDB replica, as of now, 
simple bucket index is a prerequisite for NBCC so it should not be a strong 
concern or blocker for the RLI.

> Question 2 : the 2x is due to just compression? I think there will be an 
> additional 2x additional storage for un-compacted updates

While the RocksDB suggests light compression(LZ4/Snappy) for L0 ~ L2 to get 
best write throughput, and heavier compression (Zstd/Zlib) for L3+, for updates 
that un-compacted, the native MDT also got the similar case for its payloads in 
log files, it may relies on the compression frequency gap between MDT(10 delta 
commits trigger a compaction) and RocksDB(4 sst triggers a compaction).

GitHub link: 
https://github.com/apache/hudi/discussions/18296#discussioncomment-16061388

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to