[ 
https://issues.apache.org/jira/browse/CASSANDRA-16581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348687#comment-17348687
 ] 

Sam Tunnicliffe edited comment on CASSANDRA-16581 at 5/20/21, 6:47 PM:
-----------------------------------------------------------------------

Oh, I totally missed that in {{SimpleClient}}, sorry. In that case, I 
definitely think we need to revisit how the test is
 implemented. That assertion in {{FrameEncoder}} is important, not only for the 
client protocol, but for internode which
 shares the class.

There's quite a lot of machinery in {{CQLConnectionTest}} to give control over 
the exact bytes that the v5 pipeline
 classes have to deal with. Unfortunately, retro fitting that to support v4 and 
earlier is not a trivial task. I've
 pushed an alternative reworking of the test 
[here|https://github.com/beobal/cassandra/tree/CASSANDRA-16581-4.0].

Instead of changing the encoders/decoders this adds a test implementation of 
{{Message.Request}} which we can prime with some garbage bytes, so that when 
it's decoded, a{{ProtocolError}} is guaranteed. This required a tiny bit of 
refactoring in {{Envelope}} to make the v4 encoding ask-dont-tell like v5, but 
it gives the test code control over the actual bytes that get decoded on the 
server rather than having to bypass the checks in the outbound pipeline.

A nice side effect of this is that the v5 test is now exercising the CQL 
protocol decoding, rather than hitting the CRC
 errors in the framing layer. There are quite a few tests in 
{{CQLConnectionTest}} which do verify that, so we should be
 good there. This also means that the channel-closing behaviour is the same 
between the v4 & v5 tests.

Now the message pipelines and failure behaviours are aligned across versions, 
we don't need {{WrappedSimpleClient}} any
 more and because the test cases are identical, I've made the test 
{{@Parameterized}} so it will automatically cover all
 supported versions.


was (Author: beobal):
Oh, I totally missed that in {{SimpleClient}}, sorry. In that case, I 
definitely think we need to revisit how the test is
implemented. That assertion in {{FrameEncoder}} is important, not only for the 
client protocol, but for internode which
shares the class. 

There's quite a lot of machinery in {{CQLConnectionTest}} to give control over 
the exact bytes that the v5 pipeline
classes have to deal with. Unfortunately, retro fitting that to support v4 and 
earlier is not a trivial task. I've
pushed an alternative reworking of the test 
[here|https://github.com/beobal/cassandra/tree/CASSANDRA-16581-4.0].

Instead of changing the encoders/decoders this adds a test implementation of 
{{Message.Request}} which we can prime with some garbage bytes, so that when 
it's decoded, a
{{ProtocolError}} is guaranteed. This required a tiny bit of refactoring in 
{{Envelope}} to make the v4 encoding
ask-dont-tell like v5, but it gives the test code control over the actual bytes 
that get decoded on the server rather
than having to bypass the checks in the outbound pipeline.

A nice side effect of this is that the v5 test is now exercising the CQL 
protocol decoding, rather than hitting the CRC
errors in the framing layer. There are quite a few tests in 
{{CQLConnectionTest}} which do verify that, so we should be
good there. This also means that the channel-closing behaviour is the same 
between the v4 & v5 tests.

Now the message pipelines and failure behaviours are aligned across versions, 
we don't need {{WrappedSimpleClient}} any
more and because the test cases are identical, I've made the test 
{{@Parameterized}} so it will automatically cover all
supported versions. 


> Failure to execute queries should emit a KPI other than read 
> timeout/unavailable so it can be alerted/tracked
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-16581
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16581
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Messaging/Client, Observability/Metrics
>            Reporter: David Capwell
>            Assignee: David Capwell
>            Priority: Normal
>             Fix For: 3.0.x, 3.11.x, 4.0-rc
>
>
> When we are unable to parse a message we do not have a way to detect this 
> from a monitoring point of view so can get into situations where we believe 
> the database is fine but the clients are on-fire.  This case popped up in the 
> 2.1 to 3.0 upgrade as paging state wasn’t mixed-mode safe.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to