[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

Michael Burman (JIRA) Fri, 29 Jun 2018 01:13:10 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16527305#comment-16527305
 ]


Michael Burman commented on CASSANDRA-7282:
-------------------------------------------

I've updated this work with the following updates to get this process restarted:
 * Ported to newer storage format and rebased to current master
 * Removed visibility optimization from resize operation as I could make this 
NPE with a race condition if large resizes happen in short duration
 ** I tested different resize algorithms, such as tracking the read and write 
path chains and resizing if certain criteria is met. This is the strategy used 
by the Linux kernel's RCU, but I did not manage to make any notable performance 
gains using that method so I left the original resize algorithm.
 * Each node token range is now backed by a separate index.
 ** Reduces the contention on the index updates, resizes and size parameter 
update (which was a contention point for all threads previously).
 ** Apply a normalization for token range hash updates to improve the 
distribution of hashes and that way the coverage of the index
 *** Center the hash range to around 0 first and then apply a scaling 
multiplier (proportional size of the token range compared to the available hash 
range [Long.MIN_VALUE, Long.MAX_VALUE])
 ** Allows different growth sizes for each token range
 ** For system tables and cases where node might see writes that were not known 
during the initilization and overflow index with range [Long.MIN_VALUE, 
Long.MAX_VALUE] is used (works like the original patch)
 ** For lookup to the correct index I used a balanced BST. If there's a better 
way, I'm all ears. Linear search would be as fast if the amount of vnodes is 
small, but this scales to larger amount of vnodes also.

I've tried to run different workloads against it and also tried to use 
different hashing methods, including the 32-bit hash from MurmurHash to trigger 
a bad hash distribution, but I didn't couldn't see any scenario with 
performance degradation compared to CSLM. With a hash attack (which Murmur is 
vulnerable to) this can be done of course, but those will break Cassandra node 
distribution also.

Link to the branch: [https://github.com/burmanm/cassandra/tree/7282-trunk-2]

> Faster Memtable map
> -------------------
>
>                 Key: CASSANDRA-7282
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benedict
>            Priority: Major
>              Labels: performance
>             Fix For: 4.x
>
>         Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, 
> run1.svg, writes.svg
>
>
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in 
> our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
> majority of users use a hash partitioner, it occurs to me we could maintain a 
> hybrid ordered list / hash map. The list would impose the normal order on the 
> collection, but a hash index would live alongside as part of the same data 
> structure, simply mapping into the list and permitting O(1) lookups and 
> inserts.
> I've chosen to implement this initial version as a linked-list node per item, 
> but we can optimise this in future by storing fatter nodes that permit a 
> cache-line's worth of hashes to be checked at once,  further reducing the 
> constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

Reply via email to