[ https://issues.apache.org/jira/browse/CASSANDRA-15299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219275#comment-17219275 ]
Sam Tunnicliffe commented on CASSANDRA-15299: --------------------------------------------- [~maedhroz], thanks for the initial comments and apologies for the delay in responding. I've pushed an update to the branch with some rework to the backpressure mechanism and resource management/book keeping that popped up during some perf testing (explanations in the commit messages) I've also made a start on your review comments, covering the *Correctness*, *Testing*, *Zombies* and *Questions* sections (see below). I'll have more patches for the remaining sections tomorrow. {quote}Are we incrementing receivedCount both on the first frame of a large message (in processFirstFrameOfLargeMessage()) and on the last frame (in processSubsequentFrameOfLargeMessage())?{quote} Good catch, removed. {quote}forceOverAllocation() only allocates enough to hit the limit, but is it possible for the subsequent release to de-allocate the full frame size?{quote} Removed forceOverAllocation in b8251bf3b157f29dbed258aa2795728964db7804 {quote}Flusher uses a PayloadAllocator from either the CRC or LZ4 encoder. CRC includes the header and trailer size when it hits the BufferPool, while LZ4 uses the Payload constructor that doesn't take those into account. Is this correct?{quote} It is correct. Using the CRC encoder, you know the exact size of the payload in advance, so we allocate a buffer exactly the right size (payload + header + trailer). The producer then writes directly to that buffer and the encoder updates it in place with the header/trailer when done. For LZ4, this isn’t possible as we don’t know the compressed size up front. So in this case, the payload is sized for the frame content and the encoder allocates another buffer once it does know the compressed size. {quote}PostV5ExceptionHandler - in exceptionCaught(), does payload.release() need to be in a finally block?{quote} Added a finally block {quote}CQLMessageHandler - should processOneContainedMessage() release the frame in a finally block?{quote} Not applicable anymore as we only construct the frame if we’re processing it {quote}It isn't really a correctness issue per-se, but FrameSet would feel safer if we used composition rather than inheritance. (Was the motivator not creating the additional object?){quote} As far as I remember, there was no major motivation for it other than we needed a List<Frame> with some mildly specialized behaviour (I think I had recently been doing something in OutboundConnection, where Deliver extends AtomicInteger). I don’t really mind changing it to use composition, but why do you say that would feel safer? (Totally agreed we should rename to XList though, but I will hold off until actually renaming Frame -> X) {quote}CQLTester - It might be a little awkward to follow the semantics of "require" -> "prepare" -> "initialize". Maybe we should just inline prepareNetwork().{quote} Yep, I don’t recall exactly why I broke it out into 3, rather than 2 pieces. Inlined prepare in require. {quote}The large frame accumulation state machine might be a good unit test target. Lowest level might be a test-only subclass of AbstractMessageHandler.LargeMessage that simulates the cases we're interested in. Another would be to go one level up and test the CQLMessageHandler.LargeMessage, but that pulls all of CQLMessageHandler in, as it's not a static class. (We could make it static, create a little interface to abstract CQLMessageHandler away, etc.){quote} I’ll follow up on using a test-only subclass to exercise the accumulation. {quote}It might be worth pulling up WaitQueue (which is already static) as a top-level class, then throw a few unit-level tests around it it.{quote} I agree that it might be nice to unit test this, but I’m not sure I would prioritize it, given that the class is exercised by existing unit and burn tests, plus has been used during extensive perf testing of the messaging subsystem when validating CASSANDRA-15066 {quote}StartupMessage - There's enough logic that it might be nice to unit test execute(). (Mocking Connection and checking replays, etc.){quote} Similarly, I agree that we could add direct test coverage for this but I don’t see it as super high priority as all the functionality should be covered by other unit or dtests (I will check that though). {quote}transport.Frame.Header.BODY_LENGTH_SIZE is unused{quote} It’s used again now that I reverted some of the changes to Frame.Decoder in b8251bf3b157f29dbed258aa2795728964db7804 {quote}PasswordAuthenticatorTest - import java.net.InetSocketAddress is unused{quote} Removed, thanks {quote}OutboundConnection - import java.util.stream.Stream is unused{quote} I don’t think I’ve touched that file {quote}StressSettings - import com.datastax.driver.core.Metadata and import com.google.common.collect.ImmutableMap are unused{quote} Removed, thanks {quote}FrameCompressor.LZ4Compressor - compress() doesn't need the throws clause, and outputOffset + 0 could be simplified{quote} Done {quote}ProtocolNegotiationTest - can remove the dead throws clauses from a few tests{quote} Removed, thanks. {quote}ProtocolErrorTest - testErrorMessageWithNullString() doesn't need throws clause{quote} Removed, thanks {quote}CQLMessageHandler - UNKNOWN_STREAM_ID and import org.apache.cassandra.net.Crc are unused{quote} Both removed in b8251bf3b157f29dbed258aa2795728964db7804 {quote}FrameEncoder.Payload#isEmpty() is unused{quote} Removed, thanks {quote}InflightRequestPayloadTrackerTestGLOBAL_LIMITS is unused{quote} Removed in 409ffb584c3977b190a355ac388eb7f831db15be {quote}CQLConnectionTest - extra semicolon at the end of handleErrorDuringNegotiation() and handleCorruptionAfterNegotiation(), and TestConsumer#scheduled is unused{quote} Thanks, the semicolons got cleaned up in a refactor and I’ve removed the unused variable. {quote}Frame.Decoder - decodeHeader and decodeFrame don't need a throws clause{quote} Thanks, removed them from the equivalent methods following refactor {quote}Frame.Encoder - dead semicolon after private constructor{quote} Removed, thanks {quote}ProtocolVersion - assignment of SUPPORTED has a redundant cast to ProtocolVersion[]{quote} Removed, thanks {quote}Looking at https://users.ece.cmu.edu/~koopman/crc/c32/0x82608edb_len.txt, does a payload size of 128 KB with crc32 mean we have a HD=3 and will detect all 1 and 2 bit errors?{quote} I refer you to [~benedict] :) https://issues.apache.org/jira/browse/CASSANDRA-15299?focusedCommentId=17113865&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17113865 {quote}Is there anything behind the default stating capacity of FrameSet being 5?{quote} If there was, I’m afraid I don’t recall it. We could make this smarter by sampling the typical per-frame message counts and adjusting the initial capacity, maybe as a follow up item? > CASSANDRA-13304 follow-up: improve checksumming and compression in protocol > v5-beta > ----------------------------------------------------------------------------------- > > Key: CASSANDRA-15299 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15299 > Project: Cassandra > Issue Type: Improvement > Components: Messaging/Client > Reporter: Aleksey Yeschenko > Assignee: Sam Tunnicliffe > Priority: Normal > Labels: protocolv5 > Fix For: 4.0-alpha > > > CASSANDRA-13304 made an important improvement to our native protocol: it > introduced checksumming/CRC32 to request and response bodies. It’s an > important step forward, but it doesn’t cover the entire stream. In > particular, the message header is not covered by a checksum or a crc, which > poses a correctness issue if, for example, {{streamId}} gets corrupted. > Additionally, we aren’t quite using CRC32 correctly, in two ways: > 1. We are calculating the CRC32 of the *decompressed* value instead of > computing the CRC32 on the bytes written on the wire - losing the properties > of the CRC32. In some cases, due to this sequencing, attempting to decompress > a corrupt stream can cause a segfault by LZ4. > 2. When using CRC32, the CRC32 value is written in the incorrect byte order, > also losing some of the protections. > See https://users.ece.cmu.edu/~koopman/pubs/KoopmanCRCWebinar9May2012.pdf for > explanation for the two points above. > Separately, there are some long-standing issues with the protocol - since > *way* before CASSANDRA-13304. Importantly, both checksumming and compression > operate on individual message bodies rather than frames of multiple complete > messages. In reality, this has several important additional downsides. To > name a couple: > # For compression, we are getting poor compression ratios for smaller > messages - when operating on tiny sequences of bytes. In reality, for most > small requests and responses we are discarding the compressed value as it’d > be smaller than the uncompressed one - incurring both redundant allocations > and compressions. > # For checksumming and CRC32 we pay a high overhead price for small messages. > 4 bytes extra is *a lot* for an empty write response, for example. > To address the correctness issue of {{streamId}} not being covered by the > checksum/CRC32 and the inefficiency in compression and checksumming/CRC32, we > should switch to a framing protocol with multiple messages in a single frame. > I suggest we reuse the framing protocol recently implemented for internode > messaging in CASSANDRA-15066 to the extent that its logic can be borrowed, > and that we do it before native protocol v5 graduates from beta. See > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderCrc.java > and > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderLZ4.java. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org