[jira] [Commented] (CASSANDRA-6788) Race condition silently kills thrift server
[ https://issues.apache.org/jira/browse/CASSANDRA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15204079#comment-15204079 ] Daniel Pinyol commented on CASSANDRA-6788: -- With both Cassandra 1.2.19 and 2.2.5 (which should contain the patch) I can experience a similar problem, with both OSX and linux. I use thrift with scale7-pelops 1.3-1.1.x. This is my pseudocode. I use dynamic columns. {noformat} for(column=1..1000) { for(value=1..25) { write("CF1", "key1", column, writtenValue); readValue = read("CF1", "key1", column); some times here readValue!=writtenValue. Once this happens, sleeping and reading again does not help } } {noformat} The only alternative ways to avoid the problem are: * inserting a sleep (any duration) right after the put. * replacing thrift with CQL * This sounds crazy, but each value contains the previous one as prefix (1, 12, 123, 1234...) it never fails. > Race condition silently kills thrift server > --- > > Key: CASSANDRA-6788 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6788 > Project: Cassandra > Issue Type: Bug >Reporter: Christian Rolf >Assignee: Christian Rolf > Fix For: 1.2.17, 2.0.7, 2.1 beta2 > > Attachments: 6788-v2.txt, 6788-v3.txt, 6793-v3-rebased.txt, > race_patch.diff > > > There's a race condition in CustomTThreadPoolServer that can cause the thrift > server to silently stop listening for connections. > It happens when the executor service throws a RejectedExecutionException, > which is not caught. > > Silent in the sense that OpsCenter doesn't notice any problem since JMX is > still running fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6788) Race condition silently kills thrift server
[ https://issues.apache.org/jira/browse/CASSANDRA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018912#comment-14018912 ] Brandon Williams commented on CASSANDRA-6788: - v2 Just Applies to 1.2, so if you're cool with that [~jbellis] let's just commit it there. > Race condition silently kills thrift server > --- > > Key: CASSANDRA-6788 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6788 > Project: Cassandra > Issue Type: Bug >Reporter: Christian Rolf >Assignee: Christian Rolf > Fix For: 1.2.17, 2.0.7, 2.1 beta2 > > Attachments: 6788-v2.txt, 6788-v3.txt, 6793-v3-rebased.txt, > race_patch.diff > > > There's a race condition in CustomTThreadPoolServer that can cause the thrift > server to silently stop listening for connections. > It happens when the executor service throws a RejectedExecutionException, > which is not caught. > > Silent in the sense that OpsCenter doesn't notice any problem since JMX is > still running fine. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6788) Race condition silently kills thrift server
[ https://issues.apache.org/jira/browse/CASSANDRA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018401#comment-14018401 ] Vincent Mallet commented on CASSANDRA-6788: --- +1 on the port to 1.2, we're hoping to grab your patch as soon as you feel comfortable with it and commit it for 1.2.17. > Race condition silently kills thrift server > --- > > Key: CASSANDRA-6788 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6788 > Project: Cassandra > Issue Type: Bug >Reporter: Christian Rolf >Assignee: Christian Rolf > Fix For: 1.2.17, 2.0.7, 2.1 beta2 > > Attachments: 6788-v2.txt, 6788-v3.txt, 6793-v3-rebased.txt, > race_patch.diff > > > There's a race condition in CustomTThreadPoolServer that can cause the thrift > server to silently stop listening for connections. > It happens when the executor service throws a RejectedExecutionException, > which is not caught. > > Silent in the sense that OpsCenter doesn't notice any problem since JMX is > still running fine. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6788) Race condition silently kills thrift server
[ https://issues.apache.org/jira/browse/CASSANDRA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012287#comment-14012287 ] Christian Rolf commented on CASSANDRA-6788: --- Thanks for reporting this. Looks like the patch (v2) is exactly the same for 1.2. > Race condition silently kills thrift server > --- > > Key: CASSANDRA-6788 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6788 > Project: Cassandra > Issue Type: Bug >Reporter: Christian Rolf >Assignee: Christian Rolf > Fix For: 2.0.7, 2.1 beta2 > > Attachments: 6788-v2.txt, 6788-v3.txt, 6793-v3-rebased.txt, > race_patch.diff > > > There's a race condition in CustomTThreadPoolServer that can cause the thrift > server to silently stop listening for connections. > It happens when the executor service throws a RejectedExecutionException, > which is not caught. > > Silent in the sense that OpsCenter doesn't notice any problem since JMX is > still running fine. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6788) Race condition silently kills thrift server
[ https://issues.apache.org/jira/browse/CASSANDRA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14011656#comment-14011656 ] sankalp kohli commented on CASSANDRA-6788: -- +1 > Race condition silently kills thrift server > --- > > Key: CASSANDRA-6788 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6788 > Project: Cassandra > Issue Type: Bug >Reporter: Christian Rolf >Assignee: Christian Rolf > Fix For: 2.0.7, 2.1 beta2 > > Attachments: 6788-v2.txt, 6788-v3.txt, 6793-v3-rebased.txt, > race_patch.diff > > > There's a race condition in CustomTThreadPoolServer that can cause the thrift > server to silently stop listening for connections. > It happens when the executor service throws a RejectedExecutionException, > which is not caught. > > Silent in the sense that OpsCenter doesn't notice any problem since JMX is > still running fine. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6788) Race condition silently kills thrift server
[ https://issues.apache.org/jira/browse/CASSANDRA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937937#comment-13937937 ] Jonathan Ellis commented on CASSANDRA-6788: --- committed > Race condition silently kills thrift server > --- > > Key: CASSANDRA-6788 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6788 > Project: Cassandra > Issue Type: Bug >Reporter: Christian Rolf >Assignee: Christian Rolf > Fix For: 2.0.7, 2.1 beta2 > > Attachments: 6788-v2.txt, 6788-v3.txt, 6793-v3-rebased.txt, > race_patch.diff > > > There's a race condition in CustomTThreadPoolServer that can cause the thrift > server to silently stop listening for connections. > It happens when the executor service throws a RejectedExecutionException, > which is not caught. > > Silent in the sense that OpsCenter doesn't notice any problem since JMX is > still running fine. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6788) Race condition silently kills thrift server
[ https://issues.apache.org/jira/browse/CASSANDRA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937776#comment-13937776 ] Christian Rolf commented on CASSANDRA-6788: --- Having read through the (somewhat horrendous) code of java.util.concurrent.ThreadPoolExecutor, I agree whole-heartedly with your version. It's the only way to be completely safe from the race; the contract for afterExecute simply isn't clear enough to rely on. I see I put myself as the assignee, what do I need to do to subtmit the patch? > Race condition silently kills thrift server > --- > > Key: CASSANDRA-6788 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6788 > Project: Cassandra > Issue Type: Bug >Reporter: Christian Rolf >Assignee: Christian Rolf > Attachments: 6788-v2.txt, 6788-v3.txt, 6793-v3-rebased.txt, > race_patch.diff > > > There's a race condition in CustomTThreadPoolServer that can cause the thrift > server to silently stop listening for connections. > It happens when the executor service throws a RejectedExecutionException, > which is not caught. > > Silent in the sense that OpsCenter doesn't notice any problem since JMX is > still running fine. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6788) Race condition silently kills thrift server
[ https://issues.apache.org/jira/browse/CASSANDRA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13933635#comment-13933635 ] Jonathan Ellis commented on CASSANDRA-6788: --- Hmm, doesn't this mean we're back to dying ignominiously if we still happen to get a REE? I would prefer v2 for that reason. > Race condition silently kills thrift server > --- > > Key: CASSANDRA-6788 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6788 > Project: Cassandra > Issue Type: Bug >Reporter: Christian Rolf >Assignee: Christian Rolf > Attachments: 6788-v2.txt, 6788-v3.txt, 6793-v3-rebased.txt, > race_patch.diff > > > There's a race condition in CustomTThreadPoolServer that can cause the thrift > server to silently stop listening for connections. > It happens when the executor service throws a RejectedExecutionException, > which is not caught. > > Silent in the sense that OpsCenter doesn't notice any problem since JMX is > still running fine. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6788) Race condition silently kills thrift server
[ https://issues.apache.org/jira/browse/CASSANDRA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13933641#comment-13933641 ] Jonathan Ellis commented on CASSANDRA-6788: --- (attaching my rebase of v3 for posterity) > Race condition silently kills thrift server > --- > > Key: CASSANDRA-6788 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6788 > Project: Cassandra > Issue Type: Bug >Reporter: Christian Rolf >Assignee: Christian Rolf > Attachments: 6788-v2.txt, 6788-v3.txt, 6793-v3-rebased.txt, > race_patch.diff > > > There's a race condition in CustomTThreadPoolServer that can cause the thrift > server to silently stop listening for connections. > It happens when the executor service throws a RejectedExecutionException, > which is not caught. > > Silent in the sense that OpsCenter doesn't notice any problem since JMX is > still running fine. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6788) Race condition silently kills thrift server
[ https://issues.apache.org/jira/browse/CASSANDRA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13917174#comment-13917174 ] Christian Rolf commented on CASSANDRA-6788: --- Sorry, I should've been a more specific; this happens when the number of RPC threads is limited. We've been running a ring of 12 nodes with 2048 as max RPC threads for over a year without problems, but the past week we've been getting zombie nodes almost every day. Basically, the active thread counter is decremented at line 216 (pre-patch) of CustomTThreadPoolServer.java, this can end the waiting loop at line 98. If a new connection is made before the run-method of old thread has completed, the execute() command at line 108 can cause a RejectedExecutionException. > Race condition silently kills thrift server > --- > > Key: CASSANDRA-6788 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6788 > Project: Cassandra > Issue Type: Bug >Reporter: Christian Rolf >Assignee: Christian Rolf > Attachments: race_patch.diff > > > There's a race condition in CustomTThreadPoolServer that can cause the thrift > server to silently stop listening for connections. > It happens when the executor service throws a RejectedExecutionException, > which is not caught. > > Silent in the sense that OpsCenter doesn't notice any problem since JMX is > still running fine. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6788) Race condition silently kills thrift server
[ https://issues.apache.org/jira/browse/CASSANDRA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13917144#comment-13917144 ] Jonathan Ellis commented on CASSANDRA-6788: --- I don't understand how (a) a REE will "cause the thrift server to silently stop listening for connections," nor (b) how closing transports fixes it. Note that the executor is always a ThreadPoolExecutor so a dead worker thread will be replaced automatically. > Race condition silently kills thrift server > --- > > Key: CASSANDRA-6788 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6788 > Project: Cassandra > Issue Type: Bug >Reporter: Christian Rolf >Assignee: Christian Rolf > Attachments: race_patch.diff > > > There's a race condition in CustomTThreadPoolServer that can cause the thrift > server to silently stop listening for connections. > It happens when the executor service throws a RejectedExecutionException, > which is not caught. > > Silent in the sense that OpsCenter doesn't notice any problem since JMX is > still running fine. -- This message was sent by Atlassian JIRA (v6.1.5#6160)