[ https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16527305#comment-16527305 ]
Michael Burman commented on CASSANDRA-7282: ------------------------------------------- I've updated this work with the following updates to get this process restarted: * Ported to newer storage format and rebased to current master * Removed visibility optimization from resize operation as I could make this NPE with a race condition if large resizes happen in short duration ** I tested different resize algorithms, such as tracking the read and write path chains and resizing if certain criteria is met. This is the strategy used by the Linux kernel's RCU, but I did not manage to make any notable performance gains using that method so I left the original resize algorithm. * Each node token range is now backed by a separate index. ** Reduces the contention on the index updates, resizes and size parameter update (which was a contention point for all threads previously). ** Apply a normalization for token range hash updates to improve the distribution of hashes and that way the coverage of the index *** Center the hash range to around 0 first and then apply a scaling multiplier (proportional size of the token range compared to the available hash range [Long.MIN_VALUE, Long.MAX_VALUE]) ** Allows different growth sizes for each token range ** For system tables and cases where node might see writes that were not known during the initilization and overflow index with range [Long.MIN_VALUE, Long.MAX_VALUE] is used (works like the original patch) ** For lookup to the correct index I used a balanced BST. If there's a better way, I'm all ears. Linear search would be as fast if the amount of vnodes is small, but this scales to larger amount of vnodes also. I've tried to run different workloads against it and also tried to use different hashing methods, including the 32-bit hash from MurmurHash to trigger a bad hash distribution, but I didn't couldn't see any scenario with performance degradation compared to CSLM. With a hash attack (which Murmur is vulnerable to) this can be done of course, but those will break Cassandra node distribution also. Link to the branch: [https://github.com/burmanm/cassandra/tree/7282-trunk-2] > Faster Memtable map > ------------------- > > Key: CASSANDRA-7282 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7282 > Project: Cassandra > Issue Type: Improvement > Reporter: Benedict > Priority: Major > Labels: performance > Fix For: 4.x > > Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, > run1.svg, writes.svg > > > Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in > our memtables. Maintaining this is an O(lg(n)) operation; since the vast > majority of users use a hash partitioner, it occurs to me we could maintain a > hybrid ordered list / hash map. The list would impose the normal order on the > collection, but a hash index would live alongside as part of the same data > structure, simply mapping into the list and permitting O(1) lookups and > inserts. > I've chosen to implement this initial version as a linked-list node per item, > but we can optimise this in future by storing fatter nodes that permit a > cache-line's worth of hashes to be checked at once, further reducing the > constant factor costs for lookups. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org