[jira] [Comment Edited] (CASSANDRA-15299) CASSANDRA-13304 follow-up: improve checksumming and compression in protocol v5-beta

Sam Tunnicliffe (Jira) Thu, 29 Oct 2020 04:13:02 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-15299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17220921#comment-17220921
 ]


Sam Tunnicliffe edited comment on CASSANDRA-15299 at 10/29/20, 11:11 AM:
-------------------------------------------------------------------------

Here are some numbers from running tlp-stress against 4.0-beta2 vs the 15299 
branch {{70923bae4}} with protocol v4 & v5.
 The dataset was sized to be mostly in-memory and reading/writing at 
{{CL.ONE}}. Each workload ran for 10 minutes, with a 1m warmup, on a 3 node 
cluster.

Both latency and throughput are pretty much on a par, with v5 showing a bit of 
an improvement where the workload is amenable to compression. 
 Memory usage and GC were pretty much the same too. If anything, during the v5 
runs the servers spent less time in GC, but this was so close as not be 
significant.

There are definitely some places we can improve the performance of v5, and I 
don't think anything here indicates a serious regression in either v4 or v5 
performance. Given the additional integrity checks in v5, I think not 
regressing meets the perf bar here, at least in the first instance.
h4. 100% read workload, reads a single row at a time.
{code:java}
                                    Reads                                       
 
                            Count  Latency (p99)  1min (req/s) 
4.0-beta2 V4            135449878          72.43     224222.86
15299     V4            138424112           68.3     229765.45 
15299     V5            137618437          70.35     231602.36 
4.0-beta2 V4 LZ4        103348953          66.68     173437.38
15299     V4 LZ4        105114560          68.83     176192.36
15299     V5 LZ4        131833462          70.19     222092.99

{code}
h4. Mixed r/w workload (50/50). K/V with blobs upto 65k
{code:java}
                    Writes                                        Reads
                           Count  Latency (p99)  1min (req/s) |      Count  
Latency (p99)  1min (req/s) 
4.0-beta2 V4            34455009          76.59      56557.78 |   34464217      
    74.85      56546.40
15299     V4            33368171          74.90      54859.94 |   33361940      
    67.77      54848.57 
15299     V5            32991815          76.00      54780.74 |   33001153      
    76.72      54856.99
4.0-beta2 V4 LZ4        32152220          83.41      53306.03 |   32147206      
    83.22      53334.00
15299     V4 LZ4        31158895          71.01      51106.60 |   31153453      
    72.75      51087.01
15299     V5 LZ4        32634296          75.73      54370.71 |   32644765      
    76.70      54396.15

{code}
h4. Mixed r/w/d workload (60/30/10). Wide rows with values up to 200b and 
slicing on both the selects and deletions.
{code:java}
                    Writes                                        Reads         
                               Deletes  
                           Count  Latency (p99)  1min (req/s) |   Count  
Latency (p99)  1min (req/s) |   Count  Latency (p99)  1min (req/s)
4.0-beta2 V4            18725688         197.47      25377.16 | 9357971         
193.86      12687.94 | 3117394         178.44       4221.90
15299     V4            17975636         185.88      24160.78 | 8986125         
184.97      12087.13 | 2995562         179.99       4023.88
15299     V5            18429252         192.91      25349.35 | 9223277         
188.15      12678.46 | 3073312         184.24       4232.23
4.0-beta2 V4 LZ4        18407719         179.94      25160.16 | 9197664         
178.25      12575.03 | 3068134         180.90       4195.38
15299     V4 LZ4        17678994         171.39      24073.09 | 8842952         
196.06      12064.18 | 2947344         170.53       4026.37 
15299     V5 LZ4        18274085         208.57      25127.02 | 9138491         
163.23      12558.28 | 3045264         203.54       4188.88 

{code}


was (Author: beobal):
Here are some numbers from running tlp-stress against 4.0-beta2 vs the 15299 
branch {{70923bae4}} with protocol v4 & v5.
 The dataset was sized to be mostly in-memory and reading/writing at 
{{CL.ONE}}. Each workload ran for 10 minutes, with a 1m warmup, on a 3 node 
cluster.

Both latency and throughput are pretty much on a par, with v5 showing a bit of 
an improvement where the workload is amenable to compression. 
 Memory usage and GC were pretty much the same too. If anything, during the v5 
runs the servers spent less time in GC, but this was so close as not be 
significant.

There are definitely some places we can improve the performance of v5, and I 
don't think anything here indicates a serious regression in either v4 or v5 
performance. Given the additional integrity checks in v5, I think not 
regressing meets the perf bar here, at least in the first instance.
h4. 100% read workload, reads a single row at a time.
{code:java}
                            Count  Latency (p99)  1min (req/s) 
4.0-beta2 V4            135449878          72.43     224222.86
15299     V4            138424112           68.3     229765.45 
15299     V5            137618437          70.35     231602.36 
4.0-beta2 V4 LZ4        103348953          66.68     173437.38
15299     V4 LZ4        105114560          68.83     176192.36
15299     V5 LZ4        131833462          70.19     222092.99

{code}
h4. Mixed r/w workload (50/50). K/V with blobs upto 65k
{code:java}
                           Count  Latency (p99)  1min (req/s) |      Count  
Latency (p99)  1min (req/s) 
4.0-beta2 V4            34455009          76.59      56557.78 |   34464217      
    74.85      56546.40
15299     V4            33368171          74.90      54859.94 |   33361940      
    67.77      54848.57 
15299     V5            32991815          76.00      54780.74 |   33001153      
    76.72      54856.99
4.0-beta2 V4 LZ4        32152220          83.41      53306.03 |   32147206      
    83.22      53334.00
15299     V4 LZ4        31158895          71.01      51106.60 |   31153453      
    72.75      51087.01
15299     V5 LZ4        32634296          75.73      54370.71 |   32644765      
    76.70      54396.15

{code}
h4. Mixed r/w/d workload (60/30/10). Wide rows with values up to 200b and 
slicing on both the selects and deletions.
{code:java}
                           Count  Latency (p99)  1min (req/s) |   Count  
Latency (p99)  1min (req/s) |   Count  Latency (p99)  1min (req/s)
4.0-beta2 V4            18725688         197.47      25377.16 | 9357971         
193.86      12687.94 | 3117394         178.44       4221.90
15299     V4            17975636         185.88      24160.78 | 8986125         
184.97      12087.13 | 2995562         179.99       4023.88
15299     V5            18429252         192.91      25349.35 | 9223277         
188.15      12678.46 | 3073312         184.24       4232.23
4.0-beta2 V4 LZ4        18407719         179.94      25160.16 | 9197664         
178.25      12575.03 | 3068134         180.90       4195.38
15299     V4 LZ4        17678994         171.39      24073.09 | 8842952         
196.06      12064.18 | 2947344         170.53       4026.37 
15299     V5 LZ4        18274085         208.57      25127.02 | 9138491         
163.23      12558.28 | 3045264         203.54       4188.88 

{code}

> CASSANDRA-13304 follow-up: improve checksumming and compression in protocol 
> v5-beta
> -----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15299
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15299
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Messaging/Client
>            Reporter: Aleksey Yeschenko
>            Assignee: Sam Tunnicliffe
>            Priority: Normal
>              Labels: protocolv5
>             Fix For: 4.0-alpha
>
>
> CASSANDRA-13304 made an important improvement to our native protocol: it 
> introduced checksumming/CRC32 to request and response bodies. It’s an 
> important step forward, but it doesn’t cover the entire stream. In 
> particular, the message header is not covered by a checksum or a crc, which 
> poses a correctness issue if, for example, {{streamId}} gets corrupted.
> Additionally, we aren’t quite using CRC32 correctly, in two ways:
> 1. We are calculating the CRC32 of the *decompressed* value instead of 
> computing the CRC32 on the bytes written on the wire - losing the properties 
> of the CRC32. In some cases, due to this sequencing, attempting to decompress 
> a corrupt stream can cause a segfault by LZ4.
> 2. When using CRC32, the CRC32 value is written in the incorrect byte order, 
> also losing some of the protections.
> See https://users.ece.cmu.edu/~koopman/pubs/KoopmanCRCWebinar9May2012.pdf for 
> explanation for the two points above.
> Separately, there are some long-standing issues with the protocol - since 
> *way* before CASSANDRA-13304. Importantly, both checksumming and compression 
> operate on individual message bodies rather than frames of multiple complete 
> messages. In reality, this has several important additional downsides. To 
> name a couple:
> # For compression, we are getting poor compression ratios for smaller 
> messages - when operating on tiny sequences of bytes. In reality, for most 
> small requests and responses we are discarding the compressed value as it’d 
> be smaller than the uncompressed one - incurring both redundant allocations 
> and compressions.
> # For checksumming and CRC32 we pay a high overhead price for small messages. 
> 4 bytes extra is *a lot* for an empty write response, for example.
> To address the correctness issue of {{streamId}} not being covered by the 
> checksum/CRC32 and the inefficiency in compression and checksumming/CRC32, we 
> should switch to a framing protocol with multiple messages in a single frame.
> I suggest we reuse the framing protocol recently implemented for internode 
> messaging in CASSANDRA-15066 to the extent that its logic can be borrowed, 
> and that we do it before native protocol v5 graduates from beta. See 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderCrc.java
>  and 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderLZ4.java.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-15299) CASSANDRA-13304 follow-up: improve checksumming and compression in protocol v5-beta

Reply via email to