GitHub user vinothchandar edited a comment on the discussion: RocksDB as The 
Replica of MDT/RLI

> The incremental index upserts inferred from the data inputs are applied 
> directly on these RocksDB instances

**Question 1**: If we treat rocksDB as the primary source of truth during 
writes, how does concurrent updates from another writer get visible correctly 
during `BucketAssignOp` stage? We need a solution that can work with NBCC and 
multiple writers. Can you expand on this in a comment below? 

> the paylods under the same data partition is stored as a separate column 
> family,  

+1. This will also mean that for partitioned RLI i.e records always have a 
immutable partitioning field(s), the index size can be simply 
`O(size_of_data_partitions_written_to)` and not `O(total_size_of_table)` (as is 
the case for global indexing or mutable partition fields) 

> the RocksDB storage size turns out to be a nearly 2x size storage against the 
> native HFile

**Question 2** : the 2x is due to just compression?  I think there will be an 
additional 2x additional storage for un-compacted updates, since rocksdb will 
also do its own async compaction periodically. 

GitHub link: 
https://github.com/apache/hudi/discussions/18296#discussioncomment-16056769

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to