[ https://issues.apache.org/jira/browse/CASSANDRA-15958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17185251#comment-17185251 ]
Adam Holmberg commented on CASSANDRA-15958: ------------------------------------------- Looked at the {{OutboundMessageQueue}} issue a bit more. Put more simply, there's a race [adding to this queue and updating the expiration deadline|https://github.com/apache/cassandra/blob/405e2dd8b5610208596ab4cb0bb6b9be7a159f5e/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L89-L92] while another thread is draining ([1|https://github.com/apache/cassandra/blob/405e2dd8b5610208596ab4cb0bb6b9be7a159f5e/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L256][2|https://github.com/apache/cassandra/blob/405e2dd8b5610208596ab4cb0bb6b9be7a159f5e/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L148][3|https://github.com/apache/cassandra/blob/405e2dd8b5610208596ab4cb0bb6b9be7a159f5e/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L484]) and also updating ([1|https://github.com/apache/cassandra/blob/405e2dd8b5610208596ab4cb0bb6b9be7a159f5e/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L288][2|https://github.com/apache/cassandra/blob/405e2dd8b5610208596ab4cb0bb6b9be7a159f5e/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L488]). The race is there, but I'm not certain it would be a problem in an operating server, since nothing is spinning on an inactive queue waiting for messages to be evacuated, like this test is. In other words, new incoming messages and ongoing delivery would break this loose naturally. Two ways to proceed: 1.) We can agree that it's not a problem, and I can make this test not susceptible to the timeout. 2.) We try to fix by adding synchronization around both the external queue and expiry update. I would need to expand the analysis quite a bit to understand what performance implications that might have (since the apparent point of the two queue design is efficiency). [~yifanc] I'm interested in your take. also /cc [~benedict] [~aleksey] for ideas since this is part of your newish messaging rewrite. > org.apache.cassandra.net.ConnectionTest testMessagePurging > ---------------------------------------------------------- > > Key: CASSANDRA-15958 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15958 > Project: Cassandra > Issue Type: Bug > Components: Test/unit > Reporter: David Capwell > Assignee: Adam Holmberg > Priority: Normal > Fix For: 4.0-beta > > > Build: > https://ci-cassandra.apache.org/job/Cassandra-trunk-test/196/testReport/junit/org.apache.cassandra.net/ConnectionTest/testMessagePurging/ > Build: > https://ci-cassandra.apache.org/job/Cassandra-trunk-test/194/testReport/junit/org.apache.cassandra.net/ConnectionTest/testMessagePurging/ > java.util.concurrent.TimeoutException > at org.apache.cassandra.net.AsyncPromise.get(AsyncPromise.java:258) > at org.apache.cassandra.net.FutureDelegate.get(FutureDelegate.java:143) > at > org.apache.cassandra.net.ConnectionTest.doTestManual(ConnectionTest.java:268) > at > org.apache.cassandra.net.ConnectionTest.testManual(ConnectionTest.java:236) > at > org.apache.cassandra.net.ConnectionTest.testMessagePurging(ConnectionTest.java:679) -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org