[ https://issues.apache.org/jira/browse/CASSANDRA-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16801295#comment-16801295 ]
Benedict commented on CASSANDRA-14503: -------------------------------------- Thanks for the patch! And sorry for taking so long to get back to you. We did find [some bugs|https://gist.github.com/belliottsmith/0d12df678d8e9ab06776e29116d56b91], and other areas of improvement; I've filed CASSANDRA-15066 as a follow-up, and look forward to hearing your feedback. > Internode connection management is race-prone > --------------------------------------------- > > Key: CASSANDRA-14503 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14503 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging > Reporter: Sergio Bossa > Assignee: Jason Brown > Priority: Normal > Labels: pull-request-available > Fix For: 4.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Following CASSANDRA-8457, internode connection management has been rewritten > to rely on Netty, but the new implementation in > {{OutboundMessagingConnection}} seems quite race prone to me, in particular > on those two cases: > * {{#finishHandshake()}} racing with {{#close()}}: i.e. in such case the > former could run into an NPE if the latter nulls the {{channelWriter}} (but > this is just an example, other conflicts might happen). > * Connection timeout and retry racing with state changing methods: > {{connectionRetryFuture}} and {{connectionTimeoutFuture}} are cancelled when > handshaking or closing, but there's no guarantee those will be actually > cancelled (as they might be already running), so they might end up changing > the connection state concurrently with other methods (i.e. by unexpectedly > closing the channel or clearing the backlog). > Overall, the thread safety of {{OutboundMessagingConnection}} is very > difficult to assess given the current implementation: I would suggest to > refactor it into a single-thread model, where all connection state changing > actions are enqueued on a single threaded scheduler, so that state > transitions can be clearly defined and checked. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org