[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

Benedict (JIRA) Mon, 09 Jul 2018 11:55:31 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537398#comment-16537398
 ]


Benedict commented on CASSANDRA-7282:
-------------------------------------

bq. but without Value Types that's a pretty long shot with Java

Well, you can do n-ary (i.e. B-Tree) equivalents of the Van Emde Boas layout; 
or you could do a simpler binary Van Emde Boas initial search (sized to about a 
cache fetch unit), to select the correct range of a sorted array to search, 
which would probably yield similar results for our domain sizes.  

But to be honest I only really put the binary Van Emde Boas approach in here to 
save time, as it was faster than digging too closely into the differences in 
the benchmark between the two more basic approaches.

bq. I'll start the other ticket by moving the range detection to an upper level 
by creating a new Memtable per range

Great.  Thanks

bq.  I'll need to find a good way from getting that event from bootstrap & 
streaming 

So thinking about this a bit more, it seems there are four options:

1) Locate all the places the token ring changes, insert some code
2) Subscribe to changes in the token ring, and flush if we introduce a new 
range (or expand an existing range)
3) Detect an out-of-range message, and synchronise our ranges with the ring
4) Detect an out-of-range message, and simply route it to an out-of-range bucket

The thing is, there's nothing right now preventing us receiving records for a 
range not assigned to us.  So we will need an out-of-range bucket.  But at the 
same time, we will have to update ourselves, consciously, when the ring 
updates.  If we're moving up out of the memtable to impose this, we cannot rely 
on flush to resynchronise ourselves.

So, probably we will need an out-of-range bucket anyway - that IMO should be 
served by a CSLM until we implement NBHOM degrading to a skip list under skew.

But we also want to know when our owned ranges change - and this can happen for 
a variety of circumstances.  Probably the most robust thing to do, is to 
subscribe to changes in TokenMetadata.  Which isn't currently possible, but 
shouldn't be onerous to add - though I hate adding to an already ugly class.

bq. Also, any table not using ranges (such as system, maybe some new virtual 
ones?) don't use any.

Those tables that don't use any ranges all use the LocalStrategy for 
replication, which right now looks like it would yield a separate range for 
every token in the cluster, all mapping to localhost.  We should probably 
confirm this is the case, and override the method to just return a single range 
owned by localhost.

We should probably take this discussion to another ticket though.


> Faster Memtable map
> -------------------
>
>                 Key: CASSANDRA-7282
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benedict
>            Assignee: Michael Burman
>            Priority: Major
>              Labels: performance
>             Fix For: 4.x
>
>         Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, 
> run1.svg, writes.svg
>
>
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in 
> our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
> majority of users use a hash partitioner, it occurs to me we could maintain a 
> hybrid ordered list / hash map. The list would impose the normal order on the 
> collection, but a hash index would live alongside as part of the same data 
> structure, simply mapping into the list and permitting O(1) lookups and 
> inserts.
> I've chosen to implement this initial version as a linked-list node per item, 
> but we can optimise this in future by storing fatter nodes that permit a 
> cache-line's worth of hashes to be checked at once,  further reducing the 
> constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

Reply via email to