GitHub user vinothchandar added a comment to the discussion: RocksDB as The Replica of MDT/RLI
> The incremental index upserts inferred from the data inputs are applied > directly on these RocksDB instances **Question 1**: If we treat rocksDB as the primary source of truth during writes, how does concurrent updates from another writer get visible correctly during `BucketAssignOp` stage? We need a solution that can work with NBCC and multiple writers. Can you expand on this in a comment below? > the paylods under the same data partition is stored as a separate column > family, +1. This will also mean that for partitioned RLI i.e records always have a immutable partitioning field(s), the index size can be simply `O(size_of_data_partitions_written_to)` and not `O(total_size_of_table)` (as is the case for global indexing or mutable partition fields) > the RocksDB storage size turns out to be a nearly 2x size storage against the > native HFile **Question 2** : the 2x is due to just compression? I think there will be an additional 2x additional storage for un-compacted updates, since rocksdb will also do its own async compaction periodically. GitHub link: https://github.com/apache/hudi/discussions/18296#discussioncomment-16056769 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
