Re: RangeAwareCompaction for manual token management

2018-07-22 Thread Marcus Eriksson
It should work fine with num_tokens: 1

Without vnodes it also flushes to per-range sstables (if you have RF=3 you
will "always" get 3 sstables after flush), while with vnodes it groups the
ranges and flushes a full disk, so if you have a single
data_file_directories you get only one sstable, then compaction will write
them out to per-range sstables once they accumulates enough data.

/Marcus

On Thu, Jul 19, 2018 at 11:34 PM Carl Mueller
 wrote:

> I don't want to comment on the 10540 ticket since it seems very well
> focused on vnode-aligned sstable partitioning and compaction. I'm pretty
> excited about that ticket. RACS should enable:
>
> - smaller scale LCS, more constrained I/O consumption
> - less sstables to hit in read path
> - multithreaded/multiprocessor compactions and even serving of data based
> on individual vnode or pools of vnodes
> - better alignment of tombstones with data they should be
> nullifying/eventually removing
> - repair streaming efficiency
> - backups have more granularity for not uploading sstables that didn't
> change for the range since last backup snapshot
>
> There is ongoing discussions as to using Priam for cluster management where
> I am, and as I understand it (superficially) Priam does not use vnodes and
> use manual tokens, and expands via node multiples. I believe it has certain
> advantages over vnodes including expanding by multiple machines at once,
> backups could possibly do (nodecount / RF) number of nodes for data backups
> rather than the mess of vnodes where you have to do basically all of them.
>
> But we could still do some divisor split of the manual range and apply RACS
> to that. I guess this would be vnode-lite. We could have some number like
> 100 subranges on a  node and expansion might just involve temporary lower
> bound count of subranges until the sstables can be reprocessed to the
> typical subrange count.
>
> Is this theoretically correct, or are there glaring things I might have
> missed with respect to RACS-style compaction and manual tokens?
>


Re: RangeAwareCompaction for manual token management

2018-07-19 Thread kurt greaves
I've had similar thoughts in the past about RACS and manual tokens. I think
it would be a good idea to be able to split it based on some configurable
factor other than vnodes. I think Marcus may have already addressed this to
some extent as well but if not it's theoretically possible.


On 20 July 2018 at 07:34, Carl Mueller  wrote:

> I don't want to comment on the 10540 ticket since it seems very well
> focused on vnode-aligned sstable partitioning and compaction. I'm pretty
> excited about that ticket. RACS should enable:
>
> - smaller scale LCS, more constrained I/O consumption
> - less sstables to hit in read path
> - multithreaded/multiprocessor compactions and even serving of data based
> on individual vnode or pools of vnodes
> - better alignment of tombstones with data they should be
> nullifying/eventually removing
> - repair streaming efficiency
> - backups have more granularity for not uploading sstables that didn't
> change for the range since last backup snapshot
>
> There is ongoing discussions as to using Priam for cluster management where
> I am, and as I understand it (superficially) Priam does not use vnodes and
> use manual tokens, and expands via node multiples. I believe it has certain
> advantages over vnodes including expanding by multiple machines at once,
> backups could possibly do (nodecount / RF) number of nodes for data backups
> rather than the mess of vnodes where you have to do basically all of them.
>
> But we could still do some divisor split of the manual range and apply RACS
> to that. I guess this would be vnode-lite. We could have some number like
> 100 subranges on a  node and expansion might just involve temporary lower
> bound count of subranges until the sstables can be reprocessed to the
> typical subrange count.
>
> Is this theoretically correct, or are there glaring things I might have
> missed with respect to RACS-style compaction and manual tokens?
>


RangeAwareCompaction for manual token management

2018-07-19 Thread Carl Mueller
I don't want to comment on the 10540 ticket since it seems very well
focused on vnode-aligned sstable partitioning and compaction. I'm pretty
excited about that ticket. RACS should enable:

- smaller scale LCS, more constrained I/O consumption
- less sstables to hit in read path
- multithreaded/multiprocessor compactions and even serving of data based
on individual vnode or pools of vnodes
- better alignment of tombstones with data they should be
nullifying/eventually removing
- repair streaming efficiency
- backups have more granularity for not uploading sstables that didn't
change for the range since last backup snapshot

There is ongoing discussions as to using Priam for cluster management where
I am, and as I understand it (superficially) Priam does not use vnodes and
use manual tokens, and expands via node multiples. I believe it has certain
advantages over vnodes including expanding by multiple machines at once,
backups could possibly do (nodecount / RF) number of nodes for data backups
rather than the mess of vnodes where you have to do basically all of them.

But we could still do some divisor split of the manual range and apply RACS
to that. I guess this would be vnode-lite. We could have some number like
100 subranges on a  node and expansion might just involve temporary lower
bound count of subranges until the sstables can be reprocessed to the
typical subrange count.

Is this theoretically correct, or are there glaring things I might have
missed with respect to RACS-style compaction and manual tokens?