Re: RangeAwareCompaction for manual token management
It should work fine with num_tokens: 1 Without vnodes it also flushes to per-range sstables (if you have RF=3 you will "always" get 3 sstables after flush), while with vnodes it groups the ranges and flushes a full disk, so if you have a single data_file_directories you get only one sstable, then compaction will write them out to per-range sstables once they accumulates enough data. /Marcus On Thu, Jul 19, 2018 at 11:34 PM Carl Mueller wrote: > I don't want to comment on the 10540 ticket since it seems very well > focused on vnode-aligned sstable partitioning and compaction. I'm pretty > excited about that ticket. RACS should enable: > > - smaller scale LCS, more constrained I/O consumption > - less sstables to hit in read path > - multithreaded/multiprocessor compactions and even serving of data based > on individual vnode or pools of vnodes > - better alignment of tombstones with data they should be > nullifying/eventually removing > - repair streaming efficiency > - backups have more granularity for not uploading sstables that didn't > change for the range since last backup snapshot > > There is ongoing discussions as to using Priam for cluster management where > I am, and as I understand it (superficially) Priam does not use vnodes and > use manual tokens, and expands via node multiples. I believe it has certain > advantages over vnodes including expanding by multiple machines at once, > backups could possibly do (nodecount / RF) number of nodes for data backups > rather than the mess of vnodes where you have to do basically all of them. > > But we could still do some divisor split of the manual range and apply RACS > to that. I guess this would be vnode-lite. We could have some number like > 100 subranges on a node and expansion might just involve temporary lower > bound count of subranges until the sstables can be reprocessed to the > typical subrange count. > > Is this theoretically correct, or are there glaring things I might have > missed with respect to RACS-style compaction and manual tokens? >
Re: RangeAwareCompaction for manual token management
I've had similar thoughts in the past about RACS and manual tokens. I think it would be a good idea to be able to split it based on some configurable factor other than vnodes. I think Marcus may have already addressed this to some extent as well but if not it's theoretically possible. On 20 July 2018 at 07:34, Carl Mueller wrote: > I don't want to comment on the 10540 ticket since it seems very well > focused on vnode-aligned sstable partitioning and compaction. I'm pretty > excited about that ticket. RACS should enable: > > - smaller scale LCS, more constrained I/O consumption > - less sstables to hit in read path > - multithreaded/multiprocessor compactions and even serving of data based > on individual vnode or pools of vnodes > - better alignment of tombstones with data they should be > nullifying/eventually removing > - repair streaming efficiency > - backups have more granularity for not uploading sstables that didn't > change for the range since last backup snapshot > > There is ongoing discussions as to using Priam for cluster management where > I am, and as I understand it (superficially) Priam does not use vnodes and > use manual tokens, and expands via node multiples. I believe it has certain > advantages over vnodes including expanding by multiple machines at once, > backups could possibly do (nodecount / RF) number of nodes for data backups > rather than the mess of vnodes where you have to do basically all of them. > > But we could still do some divisor split of the manual range and apply RACS > to that. I guess this would be vnode-lite. We could have some number like > 100 subranges on a node and expansion might just involve temporary lower > bound count of subranges until the sstables can be reprocessed to the > typical subrange count. > > Is this theoretically correct, or are there glaring things I might have > missed with respect to RACS-style compaction and manual tokens? >
RangeAwareCompaction for manual token management
I don't want to comment on the 10540 ticket since it seems very well focused on vnode-aligned sstable partitioning and compaction. I'm pretty excited about that ticket. RACS should enable: - smaller scale LCS, more constrained I/O consumption - less sstables to hit in read path - multithreaded/multiprocessor compactions and even serving of data based on individual vnode or pools of vnodes - better alignment of tombstones with data they should be nullifying/eventually removing - repair streaming efficiency - backups have more granularity for not uploading sstables that didn't change for the range since last backup snapshot There is ongoing discussions as to using Priam for cluster management where I am, and as I understand it (superficially) Priam does not use vnodes and use manual tokens, and expands via node multiples. I believe it has certain advantages over vnodes including expanding by multiple machines at once, backups could possibly do (nodecount / RF) number of nodes for data backups rather than the mess of vnodes where you have to do basically all of them. But we could still do some divisor split of the manual range and apply RACS to that. I guess this would be vnode-lite. We could have some number like 100 subranges on a node and expansion might just involve temporary lower bound count of subranges until the sstables can be reprocessed to the typical subrange count. Is this theoretically correct, or are there glaring things I might have missed with respect to RACS-style compaction and manual tokens?