Andrea Leopardi created CASSANDRA-19753: -------------------------------------------
Summary: Not getting responses with concurrent stream IDs in native protocol v5 Key: CASSANDRA-19753 URL: https://issues.apache.org/jira/browse/CASSANDRA-19753 Project: Cassandra Issue Type: Bug Reporter: Andrea Leopardi Attachments: xandra.log This is not gonna be an easy bug to report or to give a great set of repro steps for, so apologies in advance. I’m one of the authors and the maintainer of [Xandra|https://github.com/whatyouhide/xandra], the Cassandra client for Elixir. We noticed an issue with request timeouts in a new version of our client. Just for reference, the issue is [this one|https://github.com/whatyouhide/xandra/issues/356]. After some debugging, we figured out that the issue was limited to *native protocol v5*. With native protocol v5, the issue shows up in C* 4.1 and 5.0. With native protocol v4, those versions (4.1 and 5.0) both work fine. I'm running C* in a Docker container, but I've had folks reproduce this with all sorts of C* setups. h2. The Issue The new version of our client in question uses concurrent requests. We assign each request a sequential stream ID ({{1}}, {{2}}, ...). We behave in a compliant way with [section 2.4.1.3. of the native protocol v5 spec|https://github.com/apache/cassandra/blob/e7cf38b5de6f804ce121e7a676576135db0c4bb1/doc/native_protocol_v5.spec#L316C1-L316C9]—to the best of my knowledge. Now, it seems like C* does not respond do all requests this way. We have a [simple test|https://github.com/whatyouhide/xandra/pull/368] in our repo that reproduces this. It just issues two requests in parallel (with stream IDs {{1}} and {{2}}) and then keeps issuing requests as soon as there are responses. Almost 100% of the times, we don't get the response on at least one stream. I've also attached some debug logs that show this in case it can be helpful (from the client perspective). The {{<<56, 0, 2, 67, 161, ...>>}} syntax is Erlang's syntax for bytestrings, where each number is the decimal value for a single byte. You can see in the logs that we never get the response frame on stream ID 1. Sometimes it's stream ID 2, or 3, or whatever. I’m pretty short on what to do next on our end. I’ve tried shuffling around the socket buffer size as well (from {{10}} bytes to {{1000000}} bytes) to get the packets to split up in all sorts of places, but everything works as expected _except_ for the requests that are not coming out of C*. Any other help is appreciated here, but I've started to suspect this might be something with C*. It could totally not be, but I figured it was worth to post out here. Thank you all in advance folks! 💟 -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org