[jira] Updated: (CASSANDRA-2072) Race condition during decommission

Brandon Williams (JIRA) Thu, 27 Jan 2011 16:08:09 -0800

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Brandon Williams updated CASSANDRA-2072:
----------------------------------------

    Description: 
Occasionally when decommissioning a node, there is a race condition that occurs 
where another node will never remove the token and thus propagate it again with 
a state of down.  With CASSANDRA-1900 we can solve this, but it shouldn't occur 
in the first place.

Given nodes A, B, and C, if you decommission B it will stream to A and C.  When 
complete, B will decommission and receive this stacktrace:

ERROR 00:02:40,282 Fatal exception in thread Thread[Thread-5,5,main]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut 
down
        at 
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:62)
        at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
        at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
        at 
org.apache.cassandra.net.MessagingService.receive(MessagingService.java:387)
        at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:91

At this point A will show it is removing B's token, but C will not and instead 
its failure detector will report that B is dead, and nodetool ring on C shows B 
in a leaving/down state.  In another gossip round, C will propagate this state 
back to A.

  was:
Occasionally when decommissioning a node, there is a race condition that occurs 
where another node will never remove the token and thus propagate it again with 
a state of down.  With CASSANDRA-1900 we can solve this, but it shouldn't occur 
in the first place.

Given nodes A, B, and C, if you decommission B it will stream to A and C.  When 
complete, B will decommission and receive this stacktrace:

ERROR 00:02:40,282 Fatal exception in thread Thread[Thread-5,5,main]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut 
down
        at 
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:62)
        at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
        at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
        at 
org.apache.cassandra.net.MessagingService.receive(MessagingService.java:387)
        at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:91

At this point A will show it is removing B's token, but C will not and instead 
it's failure detector will report that B is dead, and nodetool ring on C shows 
A in a leaving/down state.  In another gossip round, C will propagate this 
state back to A.


> Race condition during decommission
> ----------------------------------
>
>                 Key: CASSANDRA-2072
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2072
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Brandon Williams
>            Priority: Minor
>
> Occasionally when decommissioning a node, there is a race condition that 
> occurs where another node will never remove the token and thus propagate it 
> again with a state of down.  With CASSANDRA-1900 we can solve this, but it 
> shouldn't occur in the first place.
> Given nodes A, B, and C, if you decommission B it will stream to A and C.  
> When complete, B will decommission and receive this stacktrace:
> ERROR 00:02:40,282 Fatal exception in thread Thread[Thread-5,5,main]
> java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut 
> down
>         at 
> org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:62)
>         at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
>         at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
>         at 
> org.apache.cassandra.net.MessagingService.receive(MessagingService.java:387)
>         at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:91
> At this point A will show it is removing B's token, but C will not and 
> instead its failure detector will report that B is dead, and nodetool ring on 
> C shows B in a leaving/down state.  In another gossip round, C will propagate 
> this state back to A.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-2072) Race condition during decommission

Reply via email to