[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15228474#comment-15228474 ]
Joel Knighton commented on CASSANDRA-8523: ------------------------------------------ I don't have a good idea for a low-hanging solution here - I think the cleanest is the combination of a new gossip state for replacing nodes + enhancement of the failure detector/token metadata. Since that's a bigger change, let's also make sure that it doesn't conflict with plans for strongly consistent membership. I'd really rather not couple gossip status to the internals of the read/write paths. I also agree that we should make sure that we kill two birds with one stone and handle this and [CASSANDRA-9244] at the same time. Let me know if I can help in design and/or review. > Writes should be sent to a replacement node while it is streaming in data > ------------------------------------------------------------------------- > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Improvement > Reporter: Richard Wagner > Assignee: Brandon Williams > Fix For: 2.1.x > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)