[ 
https://issues.apache.org/jira/browse/CASSANDRA-19753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-19753:
------------------------------------
     Bug Category: Parent values: Correctness(12982)Level 1 values: Transient 
Incorrect Response(12987)
       Complexity: Normal
      Component/s: Messaging/Client
    Discovered By: Unit Test
         Severity: Critical
           Status: Open  (was: Triage Needed)

> Not getting responses with concurrent stream IDs in native protocol v5
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-19753
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19753
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Messaging/Client
>            Reporter: Andrea Leopardi
>            Priority: Normal
>         Attachments: xandra.log
>
>
> This is not gonna be an easy bug to report or to give a great set of repro 
> steps for, so apologies in advance. I’m one of the authors and the maintainer 
> of [Xandra|https://github.com/whatyouhide/xandra], the Cassandra client for 
> Elixir.
> We noticed an issue with request timeouts in a new version of our client. 
> Just for reference, the issue is [this 
> one|https://github.com/whatyouhide/xandra/issues/356].
> After some debugging, we figured out that the issue was limited to *native 
> protocol v5*. With native protocol v5, the issue shows up in C* 4.1 and 5.0. 
> With native protocol v4, those versions (4.1 and 5.0) both work fine. I'm 
> running C* in a Docker container, but I've had folks reproduce this with all 
> sorts of C* setups.
> h2. The Issue
> The new version of our client in question uses concurrent requests. We assign 
> each request a sequential stream ID ({{1}}, {{2}}, ...). We behave in a 
> compliant way with [section 2.4.1.3. of the native protocol v5 
> spec|https://github.com/apache/cassandra/blob/e7cf38b5de6f804ce121e7a676576135db0c4bb1/doc/native_protocol_v5.spec#L316C1-L316C9]—to
>  the best of my knowledge.
> Now, it seems like C* does not respond do all requests this way. We have a 
> [simple test|https://github.com/whatyouhide/xandra/pull/368] in our repo that 
> reproduces this. It just issues two requests in parallel (with stream IDs 
> {{1}} and {{2}}) and then keeps issuing requests as soon as there are 
> responses. Almost 100% of the times, we don't get the response on at least 
> one stream. I've also attached some debug logs that show this in case it can 
> be helpful (from the client perspective). The {{<<56, 0, 2, 67, 161, ...>>}} 
> syntax is Erlang's syntax for bytestrings, where each number is the decimal 
> value for a single byte. You can see in the logs that we never get the 
> response frame on stream ID 1. Sometimes it's stream ID 2, or 3, or whatever.
> I’m pretty short on what to do next on our end. I’ve tried shuffling around 
> the socket buffer size as well (from {{10}} bytes to {{1000000}} bytes) to 
> get the packets to split up in all sorts of places, but everything works as 
> expected _except_ for the requests that are not coming out of C*.
> Any other help is appreciated here, but I've started to suspect this might be 
> something with C*. It could totally not be, but I figured it was worth to 
> post out here.
> Thank you all in advance folks! 💟



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to