[ https://issues.apache.org/jira/browse/CASSANDRA-19753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sam Tunnicliffe reassigned CASSANDRA-19753: ------------------------------------------- Assignee: Sam Tunnicliffe > Not getting responses with concurrent stream IDs in native protocol v5 > ---------------------------------------------------------------------- > > Key: CASSANDRA-19753 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19753 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client > Reporter: Andrea Leopardi > Assignee: Sam Tunnicliffe > Priority: Normal > Attachments: xandra.log > > > This is not gonna be an easy bug to report or to give a great set of repro > steps for, so apologies in advance. I’m one of the authors and the maintainer > of [Xandra|https://github.com/whatyouhide/xandra], the Cassandra client for > Elixir. > We noticed an issue with request timeouts in a new version of our client. > Just for reference, the issue is [this > one|https://github.com/whatyouhide/xandra/issues/356]. > After some debugging, we figured out that the issue was limited to *native > protocol v5*. With native protocol v5, the issue shows up in C* 4.1 and 5.0. > With native protocol v4, those versions (4.1 and 5.0) both work fine. I'm > running C* in a Docker container, but I've had folks reproduce this with all > sorts of C* setups. > h2. The Issue > The new version of our client in question uses concurrent requests. We assign > each request a sequential stream ID ({{1}}, {{2}}, ...). We behave in a > compliant way with [section 2.4.1.3. of the native protocol v5 > spec|https://github.com/apache/cassandra/blob/e7cf38b5de6f804ce121e7a676576135db0c4bb1/doc/native_protocol_v5.spec#L316C1-L316C9]—to > the best of my knowledge. > Now, it seems like C* does not respond do all requests this way. We have a > [simple test|https://github.com/whatyouhide/xandra/pull/368] in our repo that > reproduces this. It just issues two requests in parallel (with stream IDs > {{1}} and {{2}}) and then keeps issuing requests as soon as there are > responses. Almost 100% of the times, we don't get the response on at least > one stream. I've also attached some debug logs that show this in case it can > be helpful (from the client perspective). The {{<<56, 0, 2, 67, 161, ...>>}} > syntax is Erlang's syntax for bytestrings, where each number is the decimal > value for a single byte. You can see in the logs that we never get the > response frame on stream ID 1. Sometimes it's stream ID 2, or 3, or whatever. > I’m pretty short on what to do next on our end. I’ve tried shuffling around > the socket buffer size as well (from {{10}} bytes to {{1000000}} bytes) to > get the packets to split up in all sorts of places, but everything works as > expected _except_ for the requests that are not coming out of C*. > Any other help is appreciated here, but I've started to suspect this might be > something with C*. It could totally not be, but I figured it was worth to > post out here. > Thank you all in advance folks! 💟 -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org