[jira] [Commented] (CASSANDRA-14503) Internode connection management is race-prone
[ https://issues.apache.org/jira/browse/CASSANDRA-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16801295#comment-16801295 ] Benedict commented on CASSANDRA-14503: -- Thanks for the patch! And sorry for taking so long to get back to you. We did find [some bugs|https://gist.github.com/belliottsmith/0d12df678d8e9ab06776e29116d56b91], and other areas of improvement; I've filed CASSANDRA-15066 as a follow-up, and look forward to hearing your feedback. > Internode connection management is race-prone > - > > Key: CASSANDRA-14503 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14503 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Sergio Bossa >Assignee: Jason Brown >Priority: Normal > Labels: pull-request-available > Fix For: 4.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Following CASSANDRA-8457, internode connection management has been rewritten > to rely on Netty, but the new implementation in > {{OutboundMessagingConnection}} seems quite race prone to me, in particular > on those two cases: > * {{#finishHandshake()}} racing with {{#close()}}: i.e. in such case the > former could run into an NPE if the latter nulls the {{channelWriter}} (but > this is just an example, other conflicts might happen). > * Connection timeout and retry racing with state changing methods: > {{connectionRetryFuture}} and {{connectionTimeoutFuture}} are cancelled when > handshaking or closing, but there's no guarantee those will be actually > cancelled (as they might be already running), so they might end up changing > the connection state concurrently with other methods (i.e. by unexpectedly > closing the channel or clearing the backlog). > Overall, the thread safety of {{OutboundMessagingConnection}} is very > difficult to assess given the current implementation: I would suggest to > refactor it into a single-thread model, where all connection state changing > actions are enqueued on a single threaded scheduler, so that state > transitions can be clearly defined and checked. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14503) Internode connection management is race-prone
[ https://issues.apache.org/jira/browse/CASSANDRA-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16750032#comment-16750032 ] Benedict commented on CASSANDRA-14503: -- It's next on my docket. > Internode connection management is race-prone > - > > Key: CASSANDRA-14503 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14503 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Sergio Bossa >Assignee: Jason Brown >Priority: Major > Labels: pull-request-available > Fix For: 4.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Following CASSANDRA-8457, internode connection management has been rewritten > to rely on Netty, but the new implementation in > {{OutboundMessagingConnection}} seems quite race prone to me, in particular > on those two cases: > * {{#finishHandshake()}} racing with {{#close()}}: i.e. in such case the > former could run into an NPE if the latter nulls the {{channelWriter}} (but > this is just an example, other conflicts might happen). > * Connection timeout and retry racing with state changing methods: > {{connectionRetryFuture}} and {{connectionTimeoutFuture}} are cancelled when > handshaking or closing, but there's no guarantee those will be actually > cancelled (as they might be already running), so they might end up changing > the connection state concurrently with other methods (i.e. by unexpectedly > closing the channel or clearing the backlog). > Overall, the thread safety of {{OutboundMessagingConnection}} is very > difficult to assess given the current implementation: I would suggest to > refactor it into a single-thread model, where all connection state changing > actions are enqueued on a single threaded scheduler, so that state > transitions can be clearly defined and checked. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14503) Internode connection management is race-prone
[ https://issues.apache.org/jira/browse/CASSANDRA-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16749383#comment-16749383 ] Dinesh Joshi commented on CASSANDRA-14503: -- [~jasobrown] I can take a pass at it but I'd rather go after [~benedict]'s done his review. > Internode connection management is race-prone > - > > Key: CASSANDRA-14503 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14503 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Sergio Bossa >Assignee: Jason Brown >Priority: Major > Labels: pull-request-available > Fix For: 4.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Following CASSANDRA-8457, internode connection management has been rewritten > to rely on Netty, but the new implementation in > {{OutboundMessagingConnection}} seems quite race prone to me, in particular > on those two cases: > * {{#finishHandshake()}} racing with {{#close()}}: i.e. in such case the > former could run into an NPE if the latter nulls the {{channelWriter}} (but > this is just an example, other conflicts might happen). > * Connection timeout and retry racing with state changing methods: > {{connectionRetryFuture}} and {{connectionTimeoutFuture}} are cancelled when > handshaking or closing, but there's no guarantee those will be actually > cancelled (as they might be already running), so they might end up changing > the connection state concurrently with other methods (i.e. by unexpectedly > closing the channel or clearing the backlog). > Overall, the thread safety of {{OutboundMessagingConnection}} is very > difficult to assess given the current implementation: I would suggest to > refactor it into a single-thread model, where all connection state changing > actions are enqueued on a single threaded scheduler, so that state > transitions can be clearly defined and checked. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14503) Internode connection management is race-prone
[ https://issues.apache.org/jira/browse/CASSANDRA-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16749356#comment-16749356 ] Jason Brown commented on CASSANDRA-14503: - [~benedict] / [~djoshi3] Any update on reviewing this latest patch? This seems to be a blocker for 4.0. > Internode connection management is race-prone > - > > Key: CASSANDRA-14503 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14503 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Sergio Bossa >Assignee: Jason Brown >Priority: Major > Labels: pull-request-available > Fix For: 4.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Following CASSANDRA-8457, internode connection management has been rewritten > to rely on Netty, but the new implementation in > {{OutboundMessagingConnection}} seems quite race prone to me, in particular > on those two cases: > * {{#finishHandshake()}} racing with {{#close()}}: i.e. in such case the > former could run into an NPE if the latter nulls the {{channelWriter}} (but > this is just an example, other conflicts might happen). > * Connection timeout and retry racing with state changing methods: > {{connectionRetryFuture}} and {{connectionTimeoutFuture}} are cancelled when > handshaking or closing, but there's no guarantee those will be actually > cancelled (as they might be already running), so they might end up changing > the connection state concurrently with other methods (i.e. by unexpectedly > closing the channel or clearing the backlog). > Overall, the thread safety of {{OutboundMessagingConnection}} is very > difficult to assess given the current implementation: I would suggest to > refactor it into a single-thread model, where all connection state changing > actions are enqueued on a single threaded scheduler, so that state > transitions can be clearly defined and checked. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14503) Internode connection management is race-prone
[ https://issues.apache.org/jira/browse/CASSANDRA-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16677289#comment-16677289 ] Vinay Chella commented on CASSANDRA-14503: -- Thank you [~jasobrown] for the patch. [~jolynch] and I benchmarked Jason's 14503-v2 branch, our benchmark results show [trunk-Jason's branch|https://github.com/jasobrown/cassandra/tree/14503-v2] is significantly out-performing 3.0.17 in terms of mean, 99th, and 95th percentile during a pure write benchmark. When systems are under heavy load, we have seen coordinator mean latencies are ~14x better, 99th latencies are ~4x better and 95th latencies are ~6x better on the trunk. When both trunk and 3.0.17 had 67k write QPS applied, throughput is steady on the trunk and 3.0.17 fell over. Note that we have only tested writes in this benchmark. However, the trunk is accumulating more hints than 3.0.17 and dropping messages compared to 3.0.17, these issues are yet to troubleshoot. For a detailed analysis of this benchmarking, find attached document [Cassandra 4.0 testing with CASSANDRA-14503 fixes] > Internode connection management is race-prone > - > > Key: CASSANDRA-14503 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14503 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Sergio Bossa >Assignee: Jason Brown >Priority: Major > Labels: pull-request-available > Fix For: 4.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Following CASSANDRA-8457, internode connection management has been rewritten > to rely on Netty, but the new implementation in > {{OutboundMessagingConnection}} seems quite race prone to me, in particular > on those two cases: > * {{#finishHandshake()}} racing with {{#close()}}: i.e. in such case the > former could run into an NPE if the latter nulls the {{channelWriter}} (but > this is just an example, other conflicts might happen). > * Connection timeout and retry racing with state changing methods: > {{connectionRetryFuture}} and {{connectionTimeoutFuture}} are cancelled when > handshaking or closing, but there's no guarantee those will be actually > cancelled (as they might be already running), so they might end up changing > the connection state concurrently with other methods (i.e. by unexpectedly > closing the channel or clearing the backlog). > Overall, the thread safety of {{OutboundMessagingConnection}} is very > difficult to assess given the current implementation: I would suggest to > refactor it into a single-thread model, where all connection state changing > actions are enqueued on a single threaded scheduler, so that state > transitions can be clearly defined and checked. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14503) Internode connection management is race-prone
[ https://issues.apache.org/jira/browse/CASSANDRA-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16662265#comment-16662265 ] Jason Brown commented on CASSANDRA-14503: - Based on testing conducted with [~jolynch] and [~vinaykumarcse], here's an updated branch with performance fixes and code improvements: ||v2|| |[branch|https://github.com/jasobrown/cassandra/tree/14503-v2]| |[utests & dtests|https://circleci.com/gh/jasobrown/workflows/cassandra/tree/14503-v2]| ||PR|https://github.com/apache/cassandra/pull/289| The major change in this branch is I experimented with aggregating the messages to send into a single ByteBuf, instead of sending the messages individually to the netty pipeline. Since we can send up to (the current hard-coded size of) 64 messages per each iteration of OMC.dequeueMessages(), that's 64 times to invoke the pipeline mechanics, 64 ByteBuf allocations (and releases), and 64 times to fulfill the promise corresponding to each message. If, instead, we send one ByteBuf (with data serialized into it) then it's just one message into the pipeline, one allocation, and one promise fulfillment. The primary trade-off is that the single buffer will be, of course, large; perhaps large enough to not be efficient with the netty allocator. To that end, wrote a JMH benchmark, and the results are compelling: TL;DR a single buffer is significantly faster than multiple smaller buffers. The closest case is a single buffer is twice as fast, with the typical percentile difference being about 10-20 times faster for the single buffer (1.5 micros vs. 23 micros). To make this work, I need the allocation and serialization code to be moved outside of the pipeline handler (as it now needs to be invoked from OMC). I had already done this work with CASSANDRA-13630. Thus, I pulled that patch into this branch. That patch also greatly reduced the need for the ChannelWriter abstraction, and combined with the outstanding work in this branch, I am able to eliminate ChannelWriter and the confusion it added. However, I still need to handle large messages separately (as we don't want to use our blocking serializers on the event loop), so I've preserved the "move large message serialization on a separate thread" behavior from CASSANDRA-13630 by creating a new abstraction in OMC by adding (the not cleverly named) MessageDequeuer interface, with implementations for large messages and "small messages" (basically the current behavior of this patch that we've been riffing on). One feature that we've been debating again is the whether to include the message coalescing feature. The current branch does not include it - mostly due to the fact that we've been iterating quite quickly over this code, and I broke it when incorporating the CASSANDRA-13630 patch (and killing off ChannelWriter). There is some testing happening to reevaluate the efficacy of message coalescing with the netty internode messaging. Some other points of interest: - switch OMC#backlog from ConcurrentLinkedQueue to MpscLinkedQueue from jctools. MpscLinkedQueue is dramtically better, and ConcurrentLinkedQueue#isEmpty was a CPU drain. - improved scheduling of the consumerTask in OutboundMessagingConnection, though still needs a bit more refinement - ditched the OMC.State from the last branch - added [~jolynch]'s fixes wrt not setting a default SO_SNDBUF value - OMC - introduced consumerTaskThread vs eventLoop member field - ditched the auto-read in RebufferingByteBufDataInputPlus - I need to document this In general I have a small bit of documenting to add, but the branch is ready for review. > Internode connection management is race-prone > - > > Key: CASSANDRA-14503 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14503 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Sergio Bossa >Assignee: Jason Brown >Priority: Major > Labels: pull-request-available > Fix For: 4.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Following CASSANDRA-8457, internode connection management has been rewritten > to rely on Netty, but the new implementation in > {{OutboundMessagingConnection}} seems quite race prone to me, in particular > on those two cases: > * {{#finishHandshake()}} racing with {{#close()}}: i.e. in such case the > former could run into an NPE if the latter nulls the {{channelWriter}} (but > this is just an example, other conflicts might happen). > * Connection timeout and retry racing with state changing methods: > {{connectionRetryFuture}} and {{connectionTimeoutFuture}} are cancelled when > handshaking or closing, but there's no guarantee those will be actually > cancelled (as they might be already runnin
[jira] [Commented] (CASSANDRA-14503) Internode connection management is race-prone
[ https://issues.apache.org/jira/browse/CASSANDRA-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609099#comment-16609099 ] Jason Brown commented on CASSANDRA-14503: - Patch available here: ||14503|| |[branch|https://github.com/jasobrown/cassandra/tree/14503]| |[utests & dtests|https://circleci.com/gh/jasobrown/workflows/cassandra/tree/14503]| || Additionally, I've [created a Pull Request|https://github.com/apache/cassandra/pull/264] for review, as well. Note: this patch will need to be rebased when CASSANDRA-13630 is committed, and incorprate the changes ChannelWriter for large messages, but that should not affect this patch much (I've been keeping that in mind as I worked on this) - OutboundMessagingConnection changes -- All producer threads queue messages into the backlog, and messages are only consumed by a task on a fixed thread (the event loop). Producers will contend to schedule the consumer, but have no further involvement in sending a message (unlike the current implementation). -- All netty-related activity (setting up a remote connection, connection-related callbacks and time outs, consuming form the backlog and writing to the channel and associated callbacks) are all handled on the event loop. OutboundMessagingConnection gets a reference to a event loop in it's constructor, and uses that for the duration of it's lifetime. -- Finally forward-ported the queue bounding functionality of CASSANDRA-13265. In short, we want to limit the size of queued messages in order to not OOM. Thus, we schedule a task for the consumer thread that examines the queue looking for elements to prune. Further, I've added a naive upper bound to the queue so that producers drop messages before enqueuing if the backlog is in a *really* bad state. @djoshi3 has recomended bounding by message size rather than by message count, which I agree with, but propose saving that for a followup ticket. -- Cleaner, more documented, and better tested State machine to manage state transitions for the class. - ChannelWriter and MessageOutHandler became much simpler as we can control the flush behaviors from the OMC (instead of the previous complicated CW/MOH dance) because we're already on the event loop when consuming from the backlog and writing to the channel. - I was able to clean up/remove a bunch of extra code due to this simplification, as well (ExpiredException, OutboundMessagingParameters, MessageResult) - Updated all the javadoc documentation for these changes (mostly OMC and ChannelWriter) > Internode connection management is race-prone > - > > Key: CASSANDRA-14503 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14503 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Sergio Bossa >Assignee: Jason Brown >Priority: Major > Labels: pull-request-available > Fix For: 4.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Following CASSANDRA-8457, internode connection management has been rewritten > to rely on Netty, but the new implementation in > {{OutboundMessagingConnection}} seems quite race prone to me, in particular > on those two cases: > * {{#finishHandshake()}} racing with {{#close()}}: i.e. in such case the > former could run into an NPE if the latter nulls the {{channelWriter}} (but > this is just an example, other conflicts might happen). > * Connection timeout and retry racing with state changing methods: > {{connectionRetryFuture}} and {{connectionTimeoutFuture}} are cancelled when > handshaking or closing, but there's no guarantee those will be actually > cancelled (as they might be already running), so they might end up changing > the connection state concurrently with other methods (i.e. by unexpectedly > closing the channel or clearing the backlog). > Overall, the thread safety of {{OutboundMessagingConnection}} is very > difficult to assess given the current implementation: I would suggest to > refactor it into a single-thread model, where all connection state changing > actions are enqueued on a single threaded scheduler, so that state > transitions can be clearly defined and checked. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14503) Internode connection management is race-prone
[ https://issues.apache.org/jira/browse/CASSANDRA-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588759#comment-16588759 ] Jason Brown commented on CASSANDRA-14503: - Working on a solution that will also resolve the problems [~sbtourist] raised with CASSANDRA-14507. I've spoken with some of the netty maintainers and they've given me some solid advice about to rework {{OutboubndMessageConnection}}. > Internode connection management is race-prone > - > > Key: CASSANDRA-14503 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14503 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Sergio Bossa >Assignee: Jason Brown >Priority: Major > > Following CASSANDRA-8457, internode connection management has been rewritten > to rely on Netty, but the new implementation in > {{OutboundMessagingConnection}} seems quite race prone to me, in particular > on those two cases: > * {{#finishHandshake()}} racing with {{#close()}}: i.e. in such case the > former could run into an NPE if the latter nulls the {{channelWriter}} (but > this is just an example, other conflicts might happen). > * Connection timeout and retry racing with state changing methods: > {{connectionRetryFuture}} and {{connectionTimeoutFuture}} are cancelled when > handshaking or closing, but there's no guarantee those will be actually > cancelled (as they might be already running), so they might end up changing > the connection state concurrently with other methods (i.e. by unexpectedly > closing the channel or clearing the backlog). > Overall, the thread safety of {{OutboundMessagingConnection}} is very > difficult to assess given the current implementation: I would suggest to > refactor it into a single-thread model, where all connection state changing > actions are enqueued on a single threaded scheduler, so that state > transitions can be clearly defined and checked. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org