[jira] [Commented] (CASSANDRA-6788) Race condition silently kills thrift server

2016-03-21 Thread Daniel Pinyol (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15204079#comment-15204079
 ] 

Daniel Pinyol commented on CASSANDRA-6788:
--

With both Cassandra 1.2.19 and 2.2.5 (which should contain the patch) I can 
experience a similar problem, with both OSX and linux. I use thrift with 
scale7-pelops 1.3-1.1.x. This is my pseudocode. I use dynamic columns.
{noformat}
for(column=1..1000) 
{
  for(value=1..25) 
  {
write("CF1", "key1", column, writtenValue);
readValue = read("CF1", "key1", column);
some times here readValue!=writtenValue. Once this happens, sleeping and 
reading again does not help  
  }
}
{noformat}
The only alternative ways to avoid the problem are:
* inserting a sleep (any duration) right after the put.
* replacing thrift with CQL
* This sounds crazy, but each value contains the previous one as prefix (1, 12, 
123, 1234...) it never fails. 

> Race condition silently kills thrift server
> ---
>
> Key: CASSANDRA-6788
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6788
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Christian Rolf
>Assignee: Christian Rolf
> Fix For: 1.2.17, 2.0.7, 2.1 beta2
>
> Attachments: 6788-v2.txt, 6788-v3.txt, 6793-v3-rebased.txt, 
> race_patch.diff
>
>
> There's a race condition in CustomTThreadPoolServer that can cause the thrift 
> server to silently stop listening for connections. 
> It happens when the executor service throws a RejectedExecutionException, 
> which is not caught.
>  
> Silent in the sense that OpsCenter doesn't notice any problem since JMX is 
> still running fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6788) Race condition silently kills thrift server

2014-06-05 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018912#comment-14018912
 ] 

Brandon Williams commented on CASSANDRA-6788:
-

v2 Just Applies to 1.2, so if you're cool with that [~jbellis] let's just 
commit it there.

> Race condition silently kills thrift server
> ---
>
> Key: CASSANDRA-6788
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6788
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Christian Rolf
>Assignee: Christian Rolf
> Fix For: 1.2.17, 2.0.7, 2.1 beta2
>
> Attachments: 6788-v2.txt, 6788-v3.txt, 6793-v3-rebased.txt, 
> race_patch.diff
>
>
> There's a race condition in CustomTThreadPoolServer that can cause the thrift 
> server to silently stop listening for connections. 
> It happens when the executor service throws a RejectedExecutionException, 
> which is not caught.
>  
> Silent in the sense that OpsCenter doesn't notice any problem since JMX is 
> still running fine.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6788) Race condition silently kills thrift server

2014-06-04 Thread Vincent Mallet (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018401#comment-14018401
 ] 

Vincent Mallet commented on CASSANDRA-6788:
---

+1 on the port to 1.2, we're hoping to grab your patch as soon as you feel 
comfortable with it and commit it for 1.2.17.


> Race condition silently kills thrift server
> ---
>
> Key: CASSANDRA-6788
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6788
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Christian Rolf
>Assignee: Christian Rolf
> Fix For: 1.2.17, 2.0.7, 2.1 beta2
>
> Attachments: 6788-v2.txt, 6788-v3.txt, 6793-v3-rebased.txt, 
> race_patch.diff
>
>
> There's a race condition in CustomTThreadPoolServer that can cause the thrift 
> server to silently stop listening for connections. 
> It happens when the executor service throws a RejectedExecutionException, 
> which is not caught.
>  
> Silent in the sense that OpsCenter doesn't notice any problem since JMX is 
> still running fine.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6788) Race condition silently kills thrift server

2014-05-29 Thread Christian Rolf (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012287#comment-14012287
 ] 

Christian Rolf commented on CASSANDRA-6788:
---

Thanks for reporting this. Looks like the patch (v2) is exactly the same for 
1.2. 

> Race condition silently kills thrift server
> ---
>
> Key: CASSANDRA-6788
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6788
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Christian Rolf
>Assignee: Christian Rolf
> Fix For: 2.0.7, 2.1 beta2
>
> Attachments: 6788-v2.txt, 6788-v3.txt, 6793-v3-rebased.txt, 
> race_patch.diff
>
>
> There's a race condition in CustomTThreadPoolServer that can cause the thrift 
> server to silently stop listening for connections. 
> It happens when the executor service throws a RejectedExecutionException, 
> which is not caught.
>  
> Silent in the sense that OpsCenter doesn't notice any problem since JMX is 
> still running fine.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6788) Race condition silently kills thrift server

2014-05-28 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14011656#comment-14011656
 ] 

sankalp kohli commented on CASSANDRA-6788:
--

+1

> Race condition silently kills thrift server
> ---
>
> Key: CASSANDRA-6788
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6788
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Christian Rolf
>Assignee: Christian Rolf
> Fix For: 2.0.7, 2.1 beta2
>
> Attachments: 6788-v2.txt, 6788-v3.txt, 6793-v3-rebased.txt, 
> race_patch.diff
>
>
> There's a race condition in CustomTThreadPoolServer that can cause the thrift 
> server to silently stop listening for connections. 
> It happens when the executor service throws a RejectedExecutionException, 
> which is not caught.
>  
> Silent in the sense that OpsCenter doesn't notice any problem since JMX is 
> still running fine.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6788) Race condition silently kills thrift server

2014-03-17 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937937#comment-13937937
 ] 

Jonathan Ellis commented on CASSANDRA-6788:
---

committed

> Race condition silently kills thrift server
> ---
>
> Key: CASSANDRA-6788
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6788
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Christian Rolf
>Assignee: Christian Rolf
> Fix For: 2.0.7, 2.1 beta2
>
> Attachments: 6788-v2.txt, 6788-v3.txt, 6793-v3-rebased.txt, 
> race_patch.diff
>
>
> There's a race condition in CustomTThreadPoolServer that can cause the thrift 
> server to silently stop listening for connections. 
> It happens when the executor service throws a RejectedExecutionException, 
> which is not caught.
>  
> Silent in the sense that OpsCenter doesn't notice any problem since JMX is 
> still running fine.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6788) Race condition silently kills thrift server

2014-03-17 Thread Christian Rolf (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937776#comment-13937776
 ] 

Christian Rolf commented on CASSANDRA-6788:
---

Having read through the (somewhat horrendous) code of 
java.util.concurrent.ThreadPoolExecutor, I agree whole-heartedly with your 
version. It's the only way to be completely safe from the race; the contract 
for afterExecute simply isn't clear enough to rely on.
I see I put myself as the assignee, what do I need to do to subtmit the patch?

> Race condition silently kills thrift server
> ---
>
> Key: CASSANDRA-6788
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6788
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Christian Rolf
>Assignee: Christian Rolf
> Attachments: 6788-v2.txt, 6788-v3.txt, 6793-v3-rebased.txt, 
> race_patch.diff
>
>
> There's a race condition in CustomTThreadPoolServer that can cause the thrift 
> server to silently stop listening for connections. 
> It happens when the executor service throws a RejectedExecutionException, 
> which is not caught.
>  
> Silent in the sense that OpsCenter doesn't notice any problem since JMX is 
> still running fine.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6788) Race condition silently kills thrift server

2014-03-13 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13933635#comment-13933635
 ] 

Jonathan Ellis commented on CASSANDRA-6788:
---

Hmm, doesn't this mean we're back to dying ignominiously if we still happen to 
get a REE?  I would prefer v2 for that reason.

> Race condition silently kills thrift server
> ---
>
> Key: CASSANDRA-6788
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6788
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Christian Rolf
>Assignee: Christian Rolf
> Attachments: 6788-v2.txt, 6788-v3.txt, 6793-v3-rebased.txt, 
> race_patch.diff
>
>
> There's a race condition in CustomTThreadPoolServer that can cause the thrift 
> server to silently stop listening for connections. 
> It happens when the executor service throws a RejectedExecutionException, 
> which is not caught.
>  
> Silent in the sense that OpsCenter doesn't notice any problem since JMX is 
> still running fine.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6788) Race condition silently kills thrift server

2014-03-13 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13933641#comment-13933641
 ] 

Jonathan Ellis commented on CASSANDRA-6788:
---

(attaching my rebase of v3 for posterity)

> Race condition silently kills thrift server
> ---
>
> Key: CASSANDRA-6788
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6788
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Christian Rolf
>Assignee: Christian Rolf
> Attachments: 6788-v2.txt, 6788-v3.txt, 6793-v3-rebased.txt, 
> race_patch.diff
>
>
> There's a race condition in CustomTThreadPoolServer that can cause the thrift 
> server to silently stop listening for connections. 
> It happens when the executor service throws a RejectedExecutionException, 
> which is not caught.
>  
> Silent in the sense that OpsCenter doesn't notice any problem since JMX is 
> still running fine.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6788) Race condition silently kills thrift server

2014-03-01 Thread Christian Rolf (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13917174#comment-13917174
 ] 

Christian Rolf commented on CASSANDRA-6788:
---

Sorry, I should've been a more specific; this happens when the number of RPC 
threads is limited. We've been running a ring of 12 nodes with 2048 as max RPC 
threads for over a year without problems, but the past week we've been getting 
zombie nodes almost every day.

Basically, the active thread counter is decremented at line 216 (pre-patch) of 
CustomTThreadPoolServer.java, this can end the waiting loop at line 98. If a 
new connection is made before the run-method of old thread has completed, the 
execute() command at line 108 can cause a RejectedExecutionException.

> Race condition silently kills thrift server
> ---
>
> Key: CASSANDRA-6788
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6788
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Christian Rolf
>Assignee: Christian Rolf
> Attachments: race_patch.diff
>
>
> There's a race condition in CustomTThreadPoolServer that can cause the thrift 
> server to silently stop listening for connections. 
> It happens when the executor service throws a RejectedExecutionException, 
> which is not caught.
>  
> Silent in the sense that OpsCenter doesn't notice any problem since JMX is 
> still running fine.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6788) Race condition silently kills thrift server

2014-03-01 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13917144#comment-13917144
 ] 

Jonathan Ellis commented on CASSANDRA-6788:
---

I don't understand how (a) a REE will "cause the thrift server to silently stop 
listening for connections," nor (b) how closing transports fixes it.  Note that 
the executor is always a ThreadPoolExecutor so a dead worker thread will be 
replaced automatically.

> Race condition silently kills thrift server
> ---
>
> Key: CASSANDRA-6788
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6788
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Christian Rolf
>Assignee: Christian Rolf
> Attachments: race_patch.diff
>
>
> There's a race condition in CustomTThreadPoolServer that can cause the thrift 
> server to silently stop listening for connections. 
> It happens when the executor service throws a RejectedExecutionException, 
> which is not caught.
>  
> Silent in the sense that OpsCenter doesn't notice any problem since JMX is 
> still running fine.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)