[ https://issues.apache.org/jira/browse/CASSANDRA-17240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488240#comment-17488240 ]
Branimir Lambov commented on CASSANDRA-17240: --------------------------------------------- Attached some performance data comparing the new trie memtable with the legacy skip list one. The test we ran is a density test which runs a 90:10 write:read workload with 100-byte payloads to over 1TB of data on an {{i3.4xlarge}} instance with the following settings to remove some of the biggest throughput bottlenecks: {code:java} memtable_allocation_type: offheap_objects memtable_flush_writers: 8 memtable_heap_space_in_mb: 16384 memtable_offheap_space_in_mb: 16384 concurrent_reads: 256 concurrent_writes: 256 commitlog_total_space_in_mb: 51200 commitlog_segment_size_in_mb: 320 commitlog_compression: class_name: LZ4Compressor disk_access_mode: mmap_index_only file_cache_size_in_mb: 8192 compaction_throughput_mb_per_sec: 0 concurrent_compactors: 30 {code} The test is meant to measure sustained throughput, and with the current C* code is quickly limited by the performance of compaction (compaction cannot keep up, sstables accumulate, and reads start dominating the time). The throughput stage graph looks like this: !throughput_apache.png! {{TrieMemtable}} (in red) starts off with double the performance of the legacy {{SkipListMemtable}} (in orange), and maintains a significant lead throughout the test. We have previously seen a significant improvement in throughput when memtables are sharded, thus we also tested two sharded variations of the skip list solution, with and without locking. Both versions lead over the unsharded skip-list, but are far from the performance of the new solution. (Note: the locking version (in green), which gives compaction threads more chances to run, meets its performance towards the end of the test when it is completely dominated by the effects of compaction.) With improved and tuned compaction (using further improvements we intend to port C*), the trie memtable maintains ~2.3x better throughput: !throughput_SG.png! One interesting aspect of the comparison is the heap behavior, especially old generation sizes, during the throughput stage. {{{}SkipListMemtable{}}}: !SkipListMemtable-OSS.png! vs. {{{}TrieMemtable{}}}: !TrieMemtable-OSS.png! The total garbage collection time through all stages of the test is more than halved. Additionally, the new memtable is able to accept more data for the same memory allocation, which results in 30% bigger L0 sstables, reducing the number of sstables and the need for compaction and further improving performance. > CEP-19: Trie memtable implementation > ------------------------------------ > > Key: CASSANDRA-17240 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17240 > Project: Cassandra > Issue Type: Improvement > Components: Local/Memtable > Reporter: Branimir Lambov > Priority: Normal > Attachments: SkipListMemtable-OSS.png, TrieMemtable-OSS.png, > density_SG.html.gz, density_test_with_sharding.html.gz, throughput_SG.png, > throughput_apache.png > > > Trie-based memtable implementation as described in CEP-19, built on top of > CASSANDRA-17034 and CASSANDRA-6936. > The implementation is available in this > [branch|https://github.com/blambov/cassandra/tree/CASSANDRA-17240]. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org