[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146625#comment-16146625 ] ASF GitHub Bot commented on CASSANDRA-8523: --- Github user asfgit closed the pull request at: https://github.com/apache/cassandra/pull/145 > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Bug >Reporter: Richard Wagner >Assignee: Paulo Motta > Fix For: 2.2.8, 3.0.9, 3.10 > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146592#comment-16146592 ] ASF GitHub Bot commented on CASSANDRA-8523: --- GitHub user kgreav opened a pull request: https://github.com/apache/cassandra/pull/145 Describe replacement cases based on CASSANDRA-8523 and CASSANDRA-12344 Just updating the docs to cover cases where replacing with a same IP node and different IP node. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kgreav/cassandra replace_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/cassandra/pull/145.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #145 commit e9228182d7f6916eab0dd64afcc57fedb103f25c Author: kurt Date: 2017-08-29T03:36:12Z Describe replacement cases based on CASSANDRA-8523 and CASSANDRA-12344 > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Bug >Reporter: Richard Wagner >Assignee: Paulo Motta > Fix For: 2.2.8, 3.0.9, 3.10 > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15453146#comment-15453146 ] Aleksey Yeschenko commented on CASSANDRA-8523: -- Committed as [b39d984f7bd682c7638415d65dcc4ac9bcb74e5f|https://github.com/apache/cassandra/commit/b39d984f7bd682c7638415d65dcc4ac9bcb74e5f] to 2.2, merged with 3.0 and trunk. Thanks. > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Bug >Reporter: Richard Wagner >Assignee: Paulo Motta > Fix For: 2.2.8, 3.0.9, 3.10 > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15449771#comment-15449771 ] Paulo Motta commented on CASSANDRA-8523: from [PR#1155|https://github.com/riptano/cassandra-dtest/pull/1155], it's done. :) > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Improvement >Reporter: Richard Wagner >Assignee: Paulo Motta > Fix For: 2.1.x > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15449749#comment-15449749 ] sankalp kohli commented on CASSANDRA-8523: -- waiting final +1 from whom? > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Improvement >Reporter: Richard Wagner >Assignee: Paulo Motta > Fix For: 2.1.x > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15449605#comment-15449605 ] Paulo Motta commented on CASSANDRA-8523: Waiting final +1 and then will mark as ready to commit. > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Improvement >Reporter: Richard Wagner >Assignee: Paulo Motta > Fix For: 2.1.x > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15449594#comment-15449594 ] Philip Thompson commented on CASSANDRA-8523: I don't think the tests are waiting on anything at this point, but [~pauloricardomg] can confirm > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Improvement >Reporter: Richard Wagner >Assignee: Paulo Motta > Fix For: 2.1.x > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15449587#comment-15449587 ] sankalp kohli commented on CASSANDRA-8523: -- Any updates about the Dtests > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Improvement >Reporter: Richard Wagner >Assignee: Paulo Motta > Fix For: 2.1.x > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427337#comment-15427337 ] Paulo Motta commented on CASSANDRA-8523: No, someone from the cassandra-dtest is reviewing, but since this will break existing tests it can only be commited after that is merged. > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Improvement >Reporter: Richard Wagner >Assignee: Paulo Motta > Fix For: 2.1.x > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427335#comment-15427335 ] Richard Low commented on CASSANDRA-8523: Are you waiting for me to review the dtest PR? > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Improvement >Reporter: Richard Wagner >Assignee: Paulo Motta > Fix For: 2.1.x > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427310#comment-15427310 ] Paulo Motta commented on CASSANDRA-8523: Thanks! This will be ready to commit after [PR|https://github.com/riptano/cassandra-dtest/pull/1155] is reviewed and merged. I rebased with the updated dtests and submitted a new CI run: ||2.2||3.0||trunk|| |[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-8523]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-8523]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-8523]| |[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-8523-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-8523-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-8523-testall/lastCompletedBuild/testReport/]| |[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-8523-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-8523-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-8523-dtest/lastCompletedBuild/testReport/]| > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Improvement >Reporter: Richard Wagner >Assignee: Paulo Motta > Fix For: 2.1.x > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416328#comment-15416328 ] Richard Low commented on CASSANDRA-8523: +1 on the 3.9 version too. > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Improvement >Reporter: Richard Wagner >Assignee: Paulo Motta > Fix For: 2.1.x > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15399949#comment-15399949 ] Richard Low commented on CASSANDRA-8523: I'll review the 3.9 version. I'm very much in favour of putting this in 2.2 and 3.0 as this hurts us badly and no doubt others suffer too. > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Improvement >Reporter: Richard Wagner >Assignee: Paulo Motta > Fix For: 2.1.x > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15399941#comment-15399941 ] Paulo Motta commented on CASSANDRA-8523: Thanks for taking a look! Created CASSANDRA-12344 to follow-up with support for this when the replacement node has the same address as the original node. Rebased patch and dtests as well as merged up to 3.0+. All patches and CI results available below: ||2.2||3.0||3.9||trunk||dtest|| |[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-8523]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-8523]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.9...pauloricardomg:3.9-8523]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-8523]|[branch|https://github.com/riptano/cassandra-dtest/compare/master...pauloricardomg:8523]| |[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-8523-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-8523-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.9-8523-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-8523-testall/lastCompletedBuild/testReport/]| |[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-8523-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-8523-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.9-8523-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-8523-dtest/lastCompletedBuild/testReport/]| There were some minor merge conflicts on 3.0, and a slightly larger conflict on 3.9 due to CASSANDRA-10134, so I did some refactoring in the 3.9+ version to move most of the logic to {{prepareForReplacement}}. Can you take another look [~rlow]? Dtest PR created [here|https://github.com/riptano/cassandra-dtest/pull/1155]. While this is marked an improvement and would theoretically only go to trunk, this limitation is pretty counter-intuitive and probably hurts many users in the wild, and since the changeset is relatively small and self-contained, I think it could be interpreted as a bugfix and perhaps go on 2.2+ or maybe 3.0+. WDYT [~brandon.williams] [~jkni] ? > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Improvement >Reporter: Richard Wagner >Assignee: Paulo Motta > Fix For: 2.1.x > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15398667#comment-15398667 ] Richard Low commented on CASSANDRA-8523: +1 patch looks good. Really like the dtests. > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Improvement >Reporter: Richard Wagner >Assignee: Paulo Motta > Fix For: 2.1.x > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15340771#comment-15340771 ] Paulo Motta commented on CASSANDRA-8523: bq. The hints that you might care about are writes dropped during the replacement on the replacing node. Even these should not be a problem, because they are also forwarded to the replacement node, and if they fail they are hinted to replacement node ID. I rebased and resubmitted dtests and fixed minor typo on the original patch. I also realized it's not necessary to change original node state to {{REMOVED_TOKEN}} because it's already removed from gossip when the replacement node changes its state to {{NORMAL}}, so I removed that step. Also added new dtests to test replace in a mixed-version environment to verify backward compatibility and to check for the warning message when replacing a node with the same address. > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Improvement >Reporter: Richard Wagner >Assignee: Paulo Motta > Fix For: 2.1.x > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15340204#comment-15340204 ] Joel Knighton commented on CASSANDRA-8523: -- Assigning you to review [~rlow] - just saw the comment from two days ago. Let me know if anything has changed there; I'm available to review as well. > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Improvement >Reporter: Richard Wagner >Assignee: Paulo Motta > Fix For: 2.1.x > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15340200#comment-15340200 ] Joshua McKenzie commented on CASSANDRA-8523: Missed that comment above - [~rlow] to review. Thanks! > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Improvement >Reporter: Richard Wagner >Assignee: Paulo Motta > Fix For: 2.1.x > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337328#comment-15337328 ] Richard Low commented on CASSANDRA-8523: It's not those writes that matter - the replacement will get those writes from the other nodes during streaming. The hints that you might care about are writes dropped during the replacement on the replacing node. But those should be extremely rare, and a price well worth paying for getting the vast majority of the writes vs none today. > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Improvement >Reporter: Richard Wagner >Assignee: Paulo Motta > Fix For: 2.1.x > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337301#comment-15337301 ] Brandon Williams commented on CASSANDRA-8523: - .bq Since it's no longer necessary to forward hints to the replacement node when replace_address != broadcast_address, the replacement node does not need to inherit the same ID of the original node. I don't think that's necessarily true, if a node dies at point A, and then the replacement is issued at point N, the hints between points B and M are still missed if the host id is not the same. > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Improvement >Reporter: Richard Wagner >Assignee: Paulo Motta > Fix For: 2.1.x > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337290#comment-15337290 ] Richard Low commented on CASSANDRA-8523: Thanks! I'm happy to review. > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Improvement >Reporter: Richard Wagner >Assignee: Paulo Motta > Fix For: 2.1.x > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337288#comment-15337288 ] Paulo Motta commented on CASSANDRA-8523: Due to the limitations of forwarding writes to replacement nodes with the same IP, I propose initially adding this support only to replacement nodes with a different IP, since it's much simpler and we can do it in a backward-compatible way so it can probably go on 2.2+. After CASSANDRA-11559, we can extend this support to nodes with the same IP quite easily by setting an inactive flag on nodes being replaced and ignore these nodes on read. The central idea is: {quote} * Add a new non-dead gossip state for replace BOOT_REPLACE * When receiving BOOT_REPLACE, other node adds the replacing node as bootstrapping endpoint * Pending ranges are calculated, and writes are sent to the replacing node during replace * When replacing node changes state to NORMAL, the old node is removed and the new node becomes a natural endpoint on TokenMetadata * The final step is to change the original node state to REMOVED_TOKEN so other nodes evict the original node from gossip {quote} Since it's no longer necessary to forward hints to the replacement node when {{replace_address != broadcast_address}}, the replacement node does not need to inherit the same ID of the original node. The replacing process remains unchanged when the replacement node has the same IP as the original node. If that's the case, I added a warn message so users know they need to run repair if the node is down for longer than {{max_hint_window_in_ms}}: {noformat} Writes will not be redirected to this node while it is performing replace because it has the same address as the node to be replaced ({}). If that node has been down for longer than max_hint_window_in_ms, repair must be run after the replacement process in order to make this node consistent. {noformat} I adapted current dtests to test replace_address for both the old and the new path, and when {{replace_address != broadcast_address}} make sure writes are being redirected to the replacement node. Initial patch and tests below (will provide 2.2+ patches after initial review): ||2.2||dtest|| |[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-8523]|[branch|https://github.com/riptano/cassandra-dtest/compare/master...pauloricardomg:8523]| |[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-8523-testall/lastCompletedBuild/testReport/]| |[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-8523-dtest/lastCompletedBuild/testReport/]| > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Improvement >Reporter: Richard Wagner >Assignee: Paulo Motta > Fix For: 2.1.x > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15313417#comment-15313417 ] Paulo Motta commented on CASSANDRA-8523: I had a closer look at CASSANDRA-11559 and it should be possible to support that on trunk/3.x series without breaking compatibility, thus making this viable before 4.0 in a robust way. I will provide an initial patch for that soon. We now need to decide whether we want to have a partial/fragile solution to this ticket before the ideal fix on 3.x leveraging #11559. > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Improvement >Reporter: Richard Wagner >Assignee: Paulo Motta > Fix For: 2.1.x > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15312567#comment-15312567 ] Paulo Motta commented on CASSANDRA-8523: Solving this when the replacing node has a different ip address is quite simple: * Add a new non-dead gossip state for replace BOOT_REPLACE * When receiving BOOT_REPLACE, other node adds the replacing node as bootstrapping AND replacing endpoint * Pending ranges are calculated, and writes are sent to the replacing node during replace * When replacing node changes state to NORMAL, the old node is removed and the new node becomes a natural endpoint on {{TokenMetadata}} I have a WIP patch with this approach [here|https://github.com/pauloricardomg/cassandra/commit/a4be7f7a0f298b1d52821f67ff2e927a1d1f1c5e] that is functional. The tricky part is when the replacing node has the same IP as the old node, because if this node ever joins gossip with a non-dead state, {{FailureDetector.isAlive(InetAddress)}} return true and reads are sent to the replacing node, since the original node is a natural endpoint. My initial idea of removing the old node as a natural endpoint from {{TokenMetadata}} obviously does not work because this changes range assignments making reads go the following node in the ring, which has no data for that range. I also considered replacing the node IP in TokenMetadata with a special marker, but this will also not play along well with snitches and topologies. A clean fix to this is to enhance the node representation on {{TokenMetadata}}, which is currently a plain {{InetAddresss}}, to include the information if an endpoint is available for reads (CASSANDRA-11559), but this cannot be done before cassandra 4.0 because it will change {{TokenMetadata}} and {{AbstractReplicationStrategy}} public interfaces and other related classes, so it's a major refactoring in the codebase. One step further is to change the {{FailureDetector}} interface to query nodes by {{UUIDs}} instead of {{InetAddress}}, so a replacing endpoint will not be confused with the original node since they will have different ids. One workaround before that would be to send a read to a normal endpoint if both {{FailureDetector.isAlive(endpoint)}} is true and the endpoint is on {{NORMAL}} state, so we avoid sending reads to a replacing endpoint with the same IP address. A big downside is that this check should also be peformed for other operations such as bootstrap, repairs, hints, batches, etc. I experimented with that route [in this commit|https://github.com/pauloricardomg/cassandra/commit/a439edcbd8d4186ea2e3301681b7cc0b5012a265] by creating a method {{StorageService.isAvailable(InetAddress endpoint, boolean isPending) = \{FailureDetector.instance.isAlive(endpoint) && (isPendingEndpoint || Gossiper.instance.isNormalStatus(endpoint))\}}} that should be used instead of {{FailureDetector.isAlive(endpoint)}}, but I'm not very comfortable with this approach for the following reasons: * It's pretty fragile, because that check must be used instead of {{FailureDetector.isAlive()}} in every place throughout the code * Other developers/drivers/tools wil also need to use the same approach or a similar logic * We will need to modify the write path to segregate writes between natural and pending endpoints (see an example on [doPaxosCommit|https://github.com/pauloricardomg/cassandra/commit/a439edcbd8d4186ea2e3301681b7cc0b5012a265#diff-71f06c193f5b5e270cf8ac695164f43aR498]) So unless someone comes up with a magic idea, I think we're left with two options to fix this before CASSANDRA-11559: 1. Go ahead with {{StorageService.isAvailable}} approach even with its downsides and fragility 2. Fix this only for replacing endpoints with a different IP address, falling back to the previous behavior when the replacing node has the same IP as the old node, which should already fix this in most cloud deployments, where replacement nodes usually have different IPs I'm leaning more towards 2 since it's much simpler/safer and work on a proper general solution for 4.0 after CASSANDRA-11559. WDYT? > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Improvement >Reporter: Richard Wagner >Assignee: Paulo Motta > Fix For: 2.1.x > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that
[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311526#comment-15311526 ] Richard Low commented on CASSANDRA-8523: Without understanding the FD details, this sounds good. Losing hints isn't an issue, as you say. > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Improvement >Reporter: Richard Wagner >Assignee: Paulo Motta > Fix For: 2.1.x > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304281#comment-15304281 ] Paulo Motta commented on CASSANDRA-8523: Initial route I will take is to create a new dead state for replace, that adds the node as replacing endpoint to TokenMetadata. When a replacement endpoint is added to TokenMetadata, if there's an existing down node with the same IP, then we remove it from natural endpoints so the replacement node no longer receive reads when it becomes alive in the FD and include the replacement node as pending joining endpoint so writes are forwarded to it. After pending ranges for replace are calculated the final step is to set the node as alive in the FD and change the dead state logic to not mark "alive" dead state nodes as DOWN automatically (we will probably rename this nomenclature to a better name, like invisible or whatever), so the FD will work as usual and remove the replacing endpoint from TokenMetadata (and restore it as a natural endpoint) if it becomes down. The current streaming logic will probably be unaffected by this, and the node will change its state to NORMAL after stream is finished completing the replace procedure. The only downside I can think of this approach is that we will lose hints during a failed replace, but this is not a big deal as hints are an optimization, and replace will probably take longer than max_hint_window anyway. I will start going this route, feel free to give any feedback or let me know if I'm missing something on this high level flow. > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Improvement >Reporter: Richard Wagner >Assignee: Paulo Motta > Fix For: 2.1.x > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15277068#comment-15277068 ] Brandon Williams commented on CASSANDRA-8523: - [~pauloricardomg] assigning to you, please do continue working on this. > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Improvement >Reporter: Richard Wagner >Assignee: Paulo Motta > Fix For: 2.1.x > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15243834#comment-15243834 ] Jason Brown commented on CASSANDRA-8523: I agree with [~jkni] - a new gossip state and updating FD/TMD is the best way to go. I'll follow up on #9244 to get better grasp of what's going on there, as well. wrt strongly consistent membership, that effort focuses primarily on correct and linearizable changes to the cluster state machine, and not any behaviors that should be triggered due to those changes when they arrive at cluster nodes. So, I don't *think* we'd run into problems between these two efforts, but good to keep the same players involved :) > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Improvement >Reporter: Richard Wagner >Assignee: Brandon Williams > Fix For: 2.1.x > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15228474#comment-15228474 ] Joel Knighton commented on CASSANDRA-8523: -- I don't have a good idea for a low-hanging solution here - I think the cleanest is the combination of a new gossip state for replacing nodes + enhancement of the failure detector/token metadata. Since that's a bigger change, let's also make sure that it doesn't conflict with plans for strongly consistent membership. I'd really rather not couple gossip status to the internals of the read/write paths. I also agree that we should make sure that we kill two birds with one stone and handle this and [CASSANDRA-9244] at the same time. Let me know if I can help in design and/or review. > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Improvement >Reporter: Richard Wagner >Assignee: Brandon Williams > Fix For: 2.1.x > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15228317#comment-15228317 ] Paulo Motta commented on CASSANDRA-8523: There are two scenarios we should consider when replacing a node: 1) The replacing node has the same IP as the previous node 2) The replacing node has a different IP as the previous node On CASSANDRA-9244 I have gotten pretty far in an [implementation|https://issues.apache.org/jira/browse/CASSANDRA-9244?focusedCommentId=15211202&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15211202] that adds a new non-dead gossip state {{BOOT_REPLACE}} and considers the replacing endpoint as a bootstrapping pending endpoint, solving case 2 transparently. Case 1 is trickier because when the replacing node enters gossip with a non-dead state, other nodes will think the previous node is back up and send reads to him (since he is a natural endpoint). A simple way to solve this is to special-case the read path and ignore nodes in "NON-NORMAL" state when sending reads to natural endpoints. While this will probably solve the problem, there are quite a few different paths we need to hack to make sure this is enforced correctly (paxos, read, hints, etc), so I'm not totally comfortable with that. A more transparent but a bit costlier approach to solve case 1 would be to change the {{TokenMetadata}} to keep nodes as {{(InetAddress, UUID)}} pairs, and create a new interface to the {{FailureDetector}} indexed by {{UUID}}. This way we could keep {{(IP=127.0.0.1,UUID=1)}} in {{TokenMetadata}} as a natural endpoint, and add a replacement node {{(IP=127.0.0.1,UUID=2)}} as a pending endpoint. So, during reads, {{FD.isAlive(UUID=1)}} would return false, and natural reads would not be sent to {{(IP=127.0.0.1,UUID=1)}}, while pending writes would be sent to {{(IP=127.0.0.1,UUID=2)}} because {{FD.isAlive(UUID=2)}} would return true. I'd be happy to continue working on this, so feedback on any of the above or alternative approaches would be greatly appreciated. > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Improvement >Reporter: Richard Wagner >Assignee: Brandon Williams > Fix For: 2.1.x > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15225483#comment-15225483 ] Stefania commented on CASSANDRA-8523: - bq. Hmm, can you can explain further? Well, it would be a bit convoluted so a new Gossip state would probably be cleaner if we can add it. But the idea is to link a host to one or more hosts that act as 'shadows', that is they receive copies of all writes or hints but are not queried for reads. > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Improvement >Reporter: Richard Wagner >Assignee: Brandon Williams > Fix For: 2.1.x > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15225475#comment-15225475 ] Brandon Williams commented on CASSANDRA-8523: - bq. I would consider a new, non dead, state for replacing a node rather than using hibernate. This should allow us to use the Failure Detector right? It looks like this would work now, because the gossiper is the only thing that registers with the FD. In the past this wasn't always true, though. bq. Alternatively, would a new Gossip property that indicates a "shadow" of an existing node be easier to implement? Hmm, can you can explain further? > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Improvement >Reporter: Richard Wagner >Assignee: Brandon Williams > Fix For: 2.1.x > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15225470#comment-15225470 ] Stefania commented on CASSANDRA-8523: - I would consider a new, non dead, state for replacing a node rather than using hibernate. This should allow us to use the Failure Detector right? Alternatively, would a new Gossip property that indicates a "shadow" of an existing node be easier to implement? Could this property be made to expire unless it gets renewed by the shadow node? > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Improvement >Reporter: Richard Wagner >Assignee: Brandon Williams > Fix For: 2.1.x > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275862#comment-14275862 ] Jonathan Ellis commented on CASSANDRA-8523: --- Marking related to CASSANDRA-8494, but may want to resolve as duplicate. > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Richard Wagner >Assignee: Brandon Williams > Fix For: 2.0.12, 2.1.3 > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8523) Writes should be sent to a replacement node while it is streaming in data
[ https://issues.apache.org/jira/browse/CASSANDRA-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14271464#comment-14271464 ] Brandon Williams commented on CASSANDRA-8523: - I completely agree this is an improvement, but it's going to be pretty tricky, especially since we can't use the FD to determine if the node has died, at least not in its current form since that would mark the node as UP. > Writes should be sent to a replacement node while it is streaming in data > - > > Key: CASSANDRA-8523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8523 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Richard Wagner > Fix For: 2.0.12, 2.1.3 > > > In our operations, we make heavy use of replace_address (or > replace_address_first_boot) in order to replace broken nodes. We now realize > that writes are not sent to the replacement nodes while they are in hibernate > state and streaming in data. This runs counter to what our expectations were, > especially since we know that writes ARE sent to nodes when they are > bootstrapped into the ring. > It seems like cassandra should arrange to send writes to a node that is in > the process of replacing another node, just like it does for a nodes that are > bootstraping. I hesitate to phrase this as "we should send writes to a node > in hibernate" because the concept of hibernate may be useful in other > contexts, as per CASSANDRA-8336. Maybe a new state is needed here? > Among other things, the fact that we don't get writes during this period > makes subsequent repairs more expensive, proportional to the number of writes > that we miss (and depending on the amount of data that needs to be streamed > during replacement and the time it may take to rebuild secondary indexes, we > could miss many many hours worth of writes). It also leaves us more exposed > to consistency violations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)