[ https://issues.apache.org/jira/browse/CASSANDRA-16360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771183#comment-17771183 ]
Maxim Muzafarov commented on CASSANDRA-16360: --------------------------------------------- [~samt] Hello, I have prepared a patch to replace the CRC32 algorithm with the more efficient CRC32C. This will only be used for a message payload in encoders and decoders. I suggest keeping the CRC24 algorithm for headers as it is used now (see the details below). New benchmarks have also been added to prove that the changes are worthwhile. The implementation consists of the following parts: - CASSANDRA-18780 with a fix for ConnectionBurnTest which is used for the patch verification; - and a code that changes CRC32 to CRC32C; Can you take a look at these changes? [https://github.com/apache/cassandra/pull/2647] h3. Implementation Plan As you mentioned above and in the [Drift backwards compatibility #2|https://lists.apache.org/thread/2h0xtmf6ygqyf89s5hjy65185pplkh43] thread on the dev-list the difference between server-server and client-server protocols, so this patch focuses only on the internode communication protocol, which seems easier for us to change. Replacing the CRC algorithm in the client-server communication will probably require a bump to up the native protocol version, or as I suggested [Drift backwards compatibility #1|https://lists.apache.org/thread/t977l7b4x78zkcr24rn16j8v0jmhrmk5] we can try to implement the feature flags for it, but still, this thing needs a discussion and it is better to move it to another issue. So, my plan here is: - We replace the CRC for the internode messaging protocol in this issue; - We create an issue to investigate possible solutions to fix the exact replacement in the SimpleClient and other drivers (this is related to client-server communication that we have now); h3. Current solution: Replace CRC32 for the payload only Benedict mentioned that we can also try to get rid of CRC24 and replace it with CRC32C. I think this is not possible and it is not worth it for the following reasons. h4. Benchmarks haven't shown any valuable difference. I have implemented the prototype [cassandra-16360-poc|https://github.com/apache/cassandra/compare/trunk...Mmuzaf:cassandra:cassandra-16360-poc#diff-3e4dbb556bbe3df9b3061a15c96c6be539727cb8788d5274e597425b71c0c58aR62] that replaces both CRC24 and CRC32 in an encoder and its corresponding decoder. So, I have prepared and measured [here|https://gist.github.com/Mmuzaf/349c5f8dbdfada82e5199f36cbfc15b4] a few combinations of encoder-decoder pairs: - CRC24 + CRC32 (current) - CRC24 + CRC32C (change for payload) - CRC32C + CRC32C (a new implementation) So, according to benchmarks, we can see a valuable difference between pairs [CRC24 + CRC32] and [CRC32C + CRC32C] and no difference between [CRC24 + CRC32C] and [CRC32C + CRC32C]. This leads me to think that calculating CRC24 over an 8-byte header makes a really small contribution to the total time. ||CRC24 + CRC32||CRC32C + CRC32C|| |1109.830 ± 111.170 ns/op|863.869 ± 20.206 ns/op| ||CRC24 + CRC32C||CRC32C + CRC32C|| |1039.133 ± 61.430 ns/op|1090.281 ± 97.314 ns/op| h4. Replacing CRC24 makes the header longer than 8 bytes So, for the LZ4 encoder and decoder replacing the CRC24 algorithm (3 bytes) with the CRC32C algorithm (4 bytes) makes the header itself longer than 8 bytes. Currently, we use one hop ({{{}readLong{}}}) to read the whole message header. If the message header is longer than 8 bytes, then the efficiency of the replacement itself is not worthwhile because, as it requires two reads instead of one for almost no good reason. h4. Patch complexity and code duplication It is worth noting that adding another implementation of frame decoders and frame encoders is much more difficult and time-consuming to verify on the reviewer's side, thus reducing the chance that the patch will be reviewed :) h3. Benchmarks I have included the FramingBench, which measures encoding and decoding operations, in the patch to verify that we are on the right track. {code:java} [java] Benchmark (payloadSize) (type) Mode Cnt Score Error Units [java] FramingBench.decode 2037 CRC avgt 4 382.764 ± 75.327 ns/op [java] FramingBench.decode 2037 CRC32C avgt 4 272.388 ± 63.618 ns/op [java] FramingBench.decode 2037 LZ4 avgt 4 377.231 ± 20.023 ns/op [java] FramingBench.decode 2037 LZ4_CRC32C avgt 4 278.301 ± 28.004 ns/op [java] FramingBench.decode 16373 CRC avgt 4 1324.664 ± 39.001 ns/op [java] FramingBench.decode 16373 CRC32C avgt 4 1318.600 ± 102.559 ns/op [java] FramingBench.decode 16373 LZ4 avgt 4 1292.971 ± 66.381 ns/op [java] FramingBench.decode 16373 LZ4_CRC32C avgt 4 1283.954 ± 87.084 ns/op [java] FramingBench.encode 2037 CRC avgt 4 382.130 ± 12.318 ns/op [java] FramingBench.encode 2037 CRC32C avgt 4 311.646 ± 17.114 ns/op [java] FramingBench.encode 2037 LZ4 avgt 4 2126.181 ± 287.391 ns/op [java] FramingBench.encode 2037 LZ4_CRC32C avgt 4 2026.204 ± 219.598 ns/op [java] FramingBench.encode 16373 CRC avgt 4 1311.842 ± 102.518 ns/op [java] FramingBench.encode 16373 CRC32C avgt 4 1310.921 ± 139.089 ns/op [java] FramingBench.encode 16373 LZ4 avgt 4 6818.232 ± 200.489 ns/op [java] FramingBench.encode 16373 LZ4_CRC32C avgt 4 6499.657 ± 618.240 ns/op {code} [https://gist.github.com/Mmuzaf/a7ac6f5759c16ee24683856704e7c941] > CRC32 is inefficient on x86 > --------------------------- > > Key: CASSANDRA-16360 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16360 > Project: Cassandra > Issue Type: Improvement > Components: Messaging/Client > Reporter: Avi Kivity > Assignee: Maxim Muzafarov > Priority: Normal > Labels: protocolv6 > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > The client/server protocol specifies CRC24 and CRC32 as the checksum > algorithm (cql_protocol_V5_framing.asc). Those however are expensive to > compute; this affects both the client and the server. > > A better checksum algorithm is CRC32C, which has hardware support on x86 (as > well as other modern architectures). -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org