I don't want to comment on the 10540 ticket since it seems very well focused on vnode-aligned sstable partitioning and compaction. I'm pretty excited about that ticket. RACS should enable:
- smaller scale LCS, more constrained I/O consumption - less sstables to hit in read path - multithreaded/multiprocessor compactions and even serving of data based on individual vnode or pools of vnodes - better alignment of tombstones with data they should be nullifying/eventually removing - repair streaming efficiency - backups have more granularity for not uploading sstables that didn't change for the range since last backup snapshot There is ongoing discussions as to using Priam for cluster management where I am, and as I understand it (superficially) Priam does not use vnodes and use manual tokens, and expands via node multiples. I believe it has certain advantages over vnodes including expanding by multiple machines at once, backups could possibly do (nodecount / RF) number of nodes for data backups rather than the mess of vnodes where you have to do basically all of them. But we could still do some divisor split of the manual range and apply RACS to that. I guess this would be vnode-lite. We could have some number like 100 subranges on a node and expansion might just involve temporary lower bound count of subranges until the sstables can be reprocessed to the typical subrange count. Is this theoretically correct, or are there glaring things I might have missed with respect to RACS-style compaction and manual tokens?