[ https://issues.apache.org/jira/browse/CASSANDRA-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Carl Yeksigian updated CASSANDRA-8801: -------------------------------------- Attachment: 8801-v2.txt I was able to bring up a node again after decommissioning; it doesn't seem like the {{DECOMMISSIONED}} state gets saved to the {{system.local}} table. The cause is IOErrors from MessagingService while it was trying to close the socket threads. Wrapping MessagingService in a try block fixed the problem, and when I restarted, it error'd that the node had been decommissioned, and I was able to use the {{override_decommission}} flag. I've attached the change that I made to make it work. Just one nit otherwise, there is an unnecessary whitespace change in StorageService. > Decommissioned nodes are willing to rejoin the cluster if restarted > ------------------------------------------------------------------- > > Key: CASSANDRA-8801 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8801 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Eric Stevens > Assignee: Brandon Williams > Fix For: 3.0 > > Attachments: 8801-v2.txt, 8801.txt > > > This issue comes from the Cassandra user group. > If a node which was successfully decommissioned gets restarted with its data > directory in tact, it will rejoin the cluster immediately going to {{UN}} and > beginning to serve client requests. > This is wrong - the node has consistency issues, having missed any writes > while it was offline because no hinted handoffs were being kept. And in the > best case scenario (it's spotted and remediated immediately), near-100% > overstreaming will still occur. > Also, whatever reasons the operator had for decommissioning the node would > presumably still be valid, so this action may threaten cluster stability if > the node is underpowered or suffering hardware issues. > But what elevates this to critical is that if the node had been offline > longer than gc_grace_seconds, it may cause permanent and unrecoverable > consistency issues due to data resurrection. > h3. Recommendation: > A node should remember that it was decommissioned and refuse to rejoin a > cluster without at least a -Dflag forcing it to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)