[jira] [Comment Edited] (CASSANDRA-17240) CEP-19: Trie memtable implementation

Branimir Lambov (Jira) Mon, 07 Feb 2022 08:38:07 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-17240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488240#comment-17488240
 ]

Branimir Lambov edited comment on CASSANDRA-17240 at 2/7/22, 4:37 PM:
----------------------------------------------------------------------

Attached some performance data comparing the new trie memtable with the legacy 
skip list one. The test we ran is a density test which runs a 90:10 write:read 
workload with 100-byte payloads to over 1TB of data on an {{i3.4xlarge}} 
instance with the following settings to remove some of the biggest throughput 
bottlenecks:
{code:java}
memtable_allocation_type: offheap_objects
memtable_flush_writers: 8
memtable_heap_space_in_mb: 16384
memtable_offheap_space_in_mb: 16384
concurrent_reads: 256
concurrent_writes: 256
commitlog_total_space_in_mb: 51200
commitlog_segment_size_in_mb: 320
commitlog_compression:
        class_name: LZ4Compressor
disk_access_mode: mmap_index_only
file_cache_size_in_mb: 8192
compaction_throughput_mb_per_sec: 0
concurrent_compactors: 30
{code}
The test is meant to measure sustained throughput, and with the current C* code 
is quickly limited by the performance of compaction (compaction cannot keep up, 
sstables accumulate, and reads start dominating the time). The throughput stage 
graph looks like this:

!throughput_apache.png!

{{TrieMemtable}} (in red) starts off with double the performance of the legacy 
{{SkipListMemtable}} (in orange), and maintains a significant lead throughout 
the test. We have previously seen a significant improvement in throughput when 
memtables are sharded, thus we also tested two sharded variations of the skip 
list solution, with and without locking. Both versions lead over the unsharded 
skip-list, but are far from the performance of the new solution. (Note: the 
locking version (in green), which gives compaction threads more chances to run, 
meets its performance towards the end of the test when it is completely 
dominated by the effects of compaction.)

With improved and tuned compaction (using further improvements we intend to 
port to C*), the trie memtable maintains ~2.3x better throughput:

!throughput_SG.png!

One interesting aspect of the comparison is the heap behavior, especially old 
generation sizes, during the throughput stage.

{{{}SkipListMemtable{}}}:
!SkipListMemtable-OSS.png! 
vs. {{{}TrieMemtable{}}}:
!TrieMemtable-OSS.png! 
The total garbage collection time through all stages of the test is more than 
halved.

Additionally, the new memtable is able to accept more data for the same memory 
allocation, which results in 30% bigger L0 sstables, reducing the number of 
sstables and the need for compaction and further improving performance.

was (Author: blambov):
Attached some performance data comparing the new trie memtable with the legacy 
skip list one. The test we ran is a density test which runs a 90:10 write:read 
workload with 100-byte payloads to over 1TB of data on an {{i3.4xlarge}} 
instance with the following settings to remove some of the biggest throughput 
bottlenecks:
{code:java}
memtable_allocation_type: offheap_objects
memtable_flush_writers: 8
memtable_heap_space_in_mb: 16384
memtable_offheap_space_in_mb: 16384
concurrent_reads: 256
concurrent_writes: 256
commitlog_total_space_in_mb: 51200
commitlog_segment_size_in_mb: 320
commitlog_compression:
        class_name: LZ4Compressor
disk_access_mode: mmap_index_only
file_cache_size_in_mb: 8192
compaction_throughput_mb_per_sec: 0
concurrent_compactors: 30
{code}
The test is meant to measure sustained throughput, and with the current C* code 
is quickly limited by the performance of compaction (compaction cannot keep up, 
sstables accumulate, and reads start dominating the time). The throughput stage 
graph looks like this:

!throughput_apache.png!

{{TrieMemtable}} (in red) starts off with double the performance of the legacy 
{{SkipListMemtable}} (in orange), and maintains a significant lead throughout 
the test. We have previously seen a significant improvement in throughput when 
memtables are sharded, thus we also tested two sharded variations of the skip 
list solution, with and without locking. Both versions lead over the unsharded 
skip-list, but are far from the performance of the new solution. (Note: the 
locking version (in green), which gives compaction threads more chances to run, 
meets its performance towards the end of the test when it is completely 
dominated by the effects of compaction.)

With improved and tuned compaction (using further improvements we intend to 
port C*), the trie memtable maintains ~2.3x better throughput:

!throughput_SG.png!

One interesting aspect of the comparison is the heap behavior, especially old 
generation sizes, during the throughput stage.

{{{}SkipListMemtable{}}}:
!SkipListMemtable-OSS.png! 
vs. {{{}TrieMemtable{}}}:
!TrieMemtable-OSS.png! 
The total garbage collection time through all stages of the test is more than 
halved.

Additionally, the new memtable is able to accept more data for the same memory 
allocation, which results in 30% bigger L0 sstables, reducing the number of 
sstables and the need for compaction and further improving performance.

> CEP-19: Trie memtable implementation
> ------------------------------------
>
>                 Key: CASSANDRA-17240
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17240
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/Memtable
>            Reporter: Branimir Lambov
>            Priority: Normal
>         Attachments: SkipListMemtable-OSS.png, TrieMemtable-OSS.png, 
> density_SG.html.gz, density_test_with_sharding.html.gz, throughput_SG.png, 
> throughput_apache.png
>
>
> Trie-based memtable implementation as described in CEP-19, built on top of 
> CASSANDRA-17034 and CASSANDRA-6936.
> The implementation is available in this 
> [branch|https://github.com/blambov/cassandra/tree/CASSANDRA-17240].

--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-17240) CEP-19: Trie memtable implementation

Reply via email to