[ 
https://issues.apache.org/jira/browse/CASSANDRA-15299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219275#comment-17219275
 ] 

Sam Tunnicliffe commented on CASSANDRA-15299:
---------------------------------------------

[~maedhroz], thanks for the initial comments and apologies for the delay in 
responding. 
I've pushed an update to the branch with some rework to the backpressure 
mechanism and resource management/book keeping that popped up during some perf 
testing (explanations in the commit messages)

I've also made a start on your review comments, covering the *Correctness*, 
*Testing*, *Zombies* and *Questions* sections (see below). I'll have more 
patches for the remaining sections tomorrow.


{quote}Are we incrementing receivedCount both on the first frame of a large 
message (in processFirstFrameOfLargeMessage()) and on the last frame (in 
processSubsequentFrameOfLargeMessage())?{quote}

Good catch, removed.

{quote}forceOverAllocation() only allocates enough to hit the limit, but is it 
possible for the subsequent release to de-allocate the full frame size?{quote}

Removed forceOverAllocation in b8251bf3b157f29dbed258aa2795728964db7804

{quote}Flusher uses a PayloadAllocator from either the CRC or LZ4 encoder. CRC 
includes the header and trailer size when it hits the BufferPool, while LZ4 
uses the Payload constructor that doesn't take those into account. Is this 
correct?{quote}

It is correct. Using the CRC encoder, you know the exact size of the payload in 
advance, so we allocate a buffer exactly the right size (payload + header + 
trailer). The producer then writes directly to that buffer and the encoder 
updates it in place with the header/trailer when done. For LZ4, this isn’t 
possible as we don’t know the compressed size up front. So in this case, the 
payload is sized for the frame content and the encoder allocates another buffer 
once it does know the compressed size. 

{quote}PostV5ExceptionHandler - in exceptionCaught(), does payload.release() 
need to be in a finally block?{quote}

Added a finally block

{quote}CQLMessageHandler - should processOneContainedMessage() release the 
frame in a finally block?{quote}

Not applicable anymore as we only construct the frame if we’re processing it

{quote}It isn't really a correctness issue per-se, but FrameSet would feel 
safer if we used composition rather than inheritance. (Was the motivator not 
creating the additional object?){quote}

As far as I remember, there was no major motivation for it other than we needed 
a List<Frame> with some mildly specialized behaviour (I think I had recently 
been doing something in OutboundConnection, where Deliver extends 
AtomicInteger). I don’t really mind changing it to use composition, but why do 
you say that would feel safer? (Totally agreed we should rename to XList 
though, but I will hold off until actually renaming Frame -> X)


{quote}CQLTester - It might be a little awkward to follow the semantics of 
"require" -> "prepare" -> "initialize". Maybe we should just inline 
prepareNetwork().{quote}

Yep, I don’t recall exactly why I broke it out into 3, rather than 2 pieces. 
Inlined prepare in require.

{quote}The large frame accumulation state machine might be a good unit test 
target. Lowest level might be a test-only subclass of 
AbstractMessageHandler.LargeMessage that simulates the cases we're interested 
in. Another would be to go one level up and test the 
CQLMessageHandler.LargeMessage, but that pulls all of CQLMessageHandler in, as 
it's not a static class. (We could make it static, create a little interface to 
abstract CQLMessageHandler away, etc.){quote}

I’ll follow up on using a test-only subclass to exercise the accumulation.

{quote}It might be worth pulling up WaitQueue (which is already static) as a 
top-level class, then throw a few unit-level tests around it it.{quote}

I agree that it might be nice to unit test this, but I’m not sure I would 
prioritize it, given that the class is exercised by existing unit and burn 
tests, plus has been used during extensive perf testing of the messaging 
subsystem when validating CASSANDRA-15066

{quote}StartupMessage - There's enough logic that it might be nice to unit test 
execute(). (Mocking Connection and checking replays, etc.){quote}

Similarly, I agree that we could add direct test coverage for this but I don’t 
see it as super high priority as all the functionality should be covered by 
other unit or dtests (I will check that though).


{quote}transport.Frame.Header.BODY_LENGTH_SIZE is unused{quote}

It’s used again now that I reverted some of the changes to Frame.Decoder in 
b8251bf3b157f29dbed258aa2795728964db7804

{quote}PasswordAuthenticatorTest - import java.net.InetSocketAddress is 
unused{quote}

Removed, thanks

{quote}OutboundConnection - import java.util.stream.Stream is unused{quote}

I don’t think I’ve touched that file

{quote}StressSettings - import com.datastax.driver.core.Metadata and import 
com.google.common.collect.ImmutableMap are unused{quote}

Removed, thanks

{quote}FrameCompressor.LZ4Compressor - compress() doesn't need the throws 
clause, and outputOffset + 0 could be simplified{quote}

Done

{quote}ProtocolNegotiationTest - can remove the dead throws clauses from a few 
tests{quote}

Removed, thanks.

{quote}ProtocolErrorTest - testErrorMessageWithNullString() doesn't need throws 
clause{quote}

Removed, thanks

{quote}CQLMessageHandler - UNKNOWN_STREAM_ID and import 
org.apache.cassandra.net.Crc are unused{quote}

Both removed in b8251bf3b157f29dbed258aa2795728964db7804

{quote}FrameEncoder.Payload#isEmpty() is unused{quote}

Removed, thanks

{quote}InflightRequestPayloadTrackerTestGLOBAL_LIMITS is unused{quote}

Removed in 409ffb584c3977b190a355ac388eb7f831db15be

{quote}CQLConnectionTest - extra semicolon at the end of 
handleErrorDuringNegotiation() and handleCorruptionAfterNegotiation(), and 
TestConsumer#scheduled is unused{quote}

Thanks, the semicolons got cleaned up in a refactor and I’ve removed the unused 
variable.

{quote}Frame.Decoder - decodeHeader and decodeFrame don't need a throws 
clause{quote}

Thanks, removed them from the equivalent methods following refactor

{quote}Frame.Encoder - dead semicolon after private constructor{quote}

Removed, thanks

{quote}ProtocolVersion - assignment of SUPPORTED has a redundant cast to 
ProtocolVersion[]{quote}

Removed, thanks


{quote}Looking at 
https://users.ece.cmu.edu/~koopman/crc/c32/0x82608edb_len.txt, does a payload 
size of 128 KB with crc32 mean we have a HD=3 and will detect all 1 and 2 bit 
errors?{quote}

I refer you to [~benedict] :) 
https://issues.apache.org/jira/browse/CASSANDRA-15299?focusedCommentId=17113865&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17113865

{quote}Is there anything behind the default stating capacity of FrameSet being 
5?{quote}

If there was, I’m afraid I don’t recall it. We could make this smarter by 
sampling the typical per-frame message counts and adjusting the initial 
capacity, maybe as a follow up item?


> CASSANDRA-13304 follow-up: improve checksumming and compression in protocol 
> v5-beta
> -----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15299
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15299
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Messaging/Client
>            Reporter: Aleksey Yeschenko
>            Assignee: Sam Tunnicliffe
>            Priority: Normal
>              Labels: protocolv5
>             Fix For: 4.0-alpha
>
>
> CASSANDRA-13304 made an important improvement to our native protocol: it 
> introduced checksumming/CRC32 to request and response bodies. It’s an 
> important step forward, but it doesn’t cover the entire stream. In 
> particular, the message header is not covered by a checksum or a crc, which 
> poses a correctness issue if, for example, {{streamId}} gets corrupted.
> Additionally, we aren’t quite using CRC32 correctly, in two ways:
> 1. We are calculating the CRC32 of the *decompressed* value instead of 
> computing the CRC32 on the bytes written on the wire - losing the properties 
> of the CRC32. In some cases, due to this sequencing, attempting to decompress 
> a corrupt stream can cause a segfault by LZ4.
> 2. When using CRC32, the CRC32 value is written in the incorrect byte order, 
> also losing some of the protections.
> See https://users.ece.cmu.edu/~koopman/pubs/KoopmanCRCWebinar9May2012.pdf for 
> explanation for the two points above.
> Separately, there are some long-standing issues with the protocol - since 
> *way* before CASSANDRA-13304. Importantly, both checksumming and compression 
> operate on individual message bodies rather than frames of multiple complete 
> messages. In reality, this has several important additional downsides. To 
> name a couple:
> # For compression, we are getting poor compression ratios for smaller 
> messages - when operating on tiny sequences of bytes. In reality, for most 
> small requests and responses we are discarding the compressed value as it’d 
> be smaller than the uncompressed one - incurring both redundant allocations 
> and compressions.
> # For checksumming and CRC32 we pay a high overhead price for small messages. 
> 4 bytes extra is *a lot* for an empty write response, for example.
> To address the correctness issue of {{streamId}} not being covered by the 
> checksum/CRC32 and the inefficiency in compression and checksumming/CRC32, we 
> should switch to a framing protocol with multiple messages in a single frame.
> I suggest we reuse the framing protocol recently implemented for internode 
> messaging in CASSANDRA-15066 to the extent that its logic can be borrowed, 
> and that we do it before native protocol v5 graduates from beta. See 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderCrc.java
>  and 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderLZ4.java.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to