[ https://issues.apache.org/jira/browse/CASSANDRA-15299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226048#comment-17226048 ]
Alex Petrov commented on CASSANDRA-15299: ----------------------------------------- Patch looks good to me, I have only several comments: * in {{CQLMessageHeader#processLargeMessage}}, we seem to be creating a {{LargeMessage}} object to call {{processCqlFrame()}} over a frame that is assembled from a single byte buffer. Maybe we can just call {{processCqlFrame}} directly? * this is probably something that we should address outside this ticket, but still: "corrupt frame recovered" seems to be slightly misleading wording, same as {{Frame#recoverable}}. We can not recover the frame itself, we just can skip/drop it. Maybe we can rename this, along with metrics in Messaging, before they become public in 4.0. * in {{CQLMessageHeader#processCqlFrame}}, we only call {{handleErrorAndRelease}}. However, it may theoretically happen that we fail before we finish {{messageDecoder.decode(channel, frame)}}. Maybe we can do something like the [1], to make it consistent with what we do in {{ProtocolDecoder#decode}}? * in {{CQLMessageHeader$LargeMessage#onComplete}}, we wrap {{processCqlFrame(assembleFrame()}} call in try/catch which is identical to try/catch block from {{processCqlFrame}} itself. Just checking if this is intended. * in {{Dispatcher#processRequest}}, we can slightly simplify code by declaring {{FlushItem<?> flushItem}} variable outside try/catch block and assigning a corresponding value in try or catch, and only calling {{flush}} once. * during sever bootstrap initialization, we're using deprecated low/high watermark child options, probably we should use {{.childOption(ChannelOption.WRITE_BUFFER_WATER_MARK, new WriteBufferWaterMark(8 * 1024, 32 * 1024))}} instead. * {{SimpleClientBurnTest#random}} is unused If this is helpful, I've also put nits mentioned above (and a couple cleanups) in a commit [here|https://github.com/apache/cassandra/commit/76ee6e00f42446d94679e9c9001c81ebfa9418ab]. Not sure if this is helpful, but I've also put together a v5 flow chart, which might be helpful if anyone wants a quick overview of what's going on in v5. [1] {code} protected void processCqlFrame(Frame frame) { M message = null; try { message = messageDecoder.decode(channel, frame); dispatcher.accept(channel, message, this::toFlushItem); } catch (Exception e) { if (message == null) frame.release(); handleErrorAndRelease(e, frame.header); } } {code} > CASSANDRA-13304 follow-up: improve checksumming and compression in protocol > v5-beta > ----------------------------------------------------------------------------------- > > Key: CASSANDRA-15299 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15299 > Project: Cassandra > Issue Type: Improvement > Components: Messaging/Client > Reporter: Aleksey Yeschenko > Assignee: Sam Tunnicliffe > Priority: Normal > Labels: protocolv5 > Fix For: 4.0-alpha > > Attachments: Process CQL Frame.png, V5 Flow Chart.png > > > CASSANDRA-13304 made an important improvement to our native protocol: it > introduced checksumming/CRC32 to request and response bodies. It’s an > important step forward, but it doesn’t cover the entire stream. In > particular, the message header is not covered by a checksum or a crc, which > poses a correctness issue if, for example, {{streamId}} gets corrupted. > Additionally, we aren’t quite using CRC32 correctly, in two ways: > 1. We are calculating the CRC32 of the *decompressed* value instead of > computing the CRC32 on the bytes written on the wire - losing the properties > of the CRC32. In some cases, due to this sequencing, attempting to decompress > a corrupt stream can cause a segfault by LZ4. > 2. When using CRC32, the CRC32 value is written in the incorrect byte order, > also losing some of the protections. > See https://users.ece.cmu.edu/~koopman/pubs/KoopmanCRCWebinar9May2012.pdf for > explanation for the two points above. > Separately, there are some long-standing issues with the protocol - since > *way* before CASSANDRA-13304. Importantly, both checksumming and compression > operate on individual message bodies rather than frames of multiple complete > messages. In reality, this has several important additional downsides. To > name a couple: > # For compression, we are getting poor compression ratios for smaller > messages - when operating on tiny sequences of bytes. In reality, for most > small requests and responses we are discarding the compressed value as it’d > be smaller than the uncompressed one - incurring both redundant allocations > and compressions. > # For checksumming and CRC32 we pay a high overhead price for small messages. > 4 bytes extra is *a lot* for an empty write response, for example. > To address the correctness issue of {{streamId}} not being covered by the > checksum/CRC32 and the inefficiency in compression and checksumming/CRC32, we > should switch to a framing protocol with multiple messages in a single frame. > I suggest we reuse the framing protocol recently implemented for internode > messaging in CASSANDRA-15066 to the extent that its logic can be borrowed, > and that we do it before native protocol v5 graduates from beta. See > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderCrc.java > and > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderLZ4.java. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org