[jira] [Commented] (CASSANDRA-13327) Pending endpoints size check for CAS doesn't play nicely with writes-on-replacement
[ https://issues.apache.org/jira/browse/CASSANDRA-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15951550#comment-15951550 ] Paulo Motta commented on CASSANDRA-13327: - Thanks for clarifying [~slebresne]. [~aweisberg] guess we can close this then? and maybe open another ticket with Sylvain's suggestion above to lift the max number of pending endpoints limitation for CAS if you're willing to take a shot at it? > Pending endpoints size check for CAS doesn't play nicely with > writes-on-replacement > --- > > Key: CASSANDRA-13327 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13327 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > > Consider this ring: > 127.0.0.1 MR UP JOINING -7301836195843364181 > 127.0.0.2MR UP NORMAL -7263405479023135948 > 127.0.0.3MR UP NORMAL -7205759403792793599 > 127.0.0.4 MR DOWN NORMAL -7148113328562451251 > where 127.0.0.1 was bootstrapping for cluster expansion. Note that, due to > the failure of 127.0.0.4, 127.0.0.1 was stuck trying to stream from it and > making no progress. > Then the down node was replaced so we had: > 127.0.0.1 MR UP JOINING -7301836195843364181 > 127.0.0.2MR UP NORMAL -7263405479023135948 > 127.0.0.3MR UP NORMAL -7205759403792793599 > 127.0.0.5 MR UP JOINING -7148113328562451251 > It’s confusing in the ring - the first JOINING is a genuine bootstrap, the > second is a replacement. We now had CAS unavailables (but no non-CAS > unvailables). I think it’s because the pending endpoints check thinks that > 127.0.0.5 is gaining a range when it’s just replacing. > The workaround is to kill the stuck JOINING node, but Cassandra shouldn’t > unnecessarily fail these requests. > It also appears like required participants is bumped by 1 during a host > replacement so if the replacing host fails you will get unavailables and > timeouts. > This is related to the check added in CASSANDRA-8346 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13327) Pending endpoints size check for CAS doesn't play nicely with writes-on-replacement
[ https://issues.apache.org/jira/browse/CASSANDRA-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15950523#comment-15950523 ] Sylvain Lebresne commented on CASSANDRA-13327: -- bq. so bootstrap can be resumed Forgot we supported resuming now :) bq. by making the read phase use an extended (RF + P + 1) / 2 quorum? Reading from pending nodes is a very bad idea since by definition those nodes don't have up-to-date data. Well, I guess things are working as they do for decently good reason here. That said, thinking about it, it could be that the solution from CASSANDRA-8346 is a bit of a big hammer: I believe it's enough to ensure that we read from at least one replica that responded to PREPARE 'in the same Paxos round' But we have timeouts on the paxos round, so it could be it is possible to reduce drastically the time we consider a node pending for CAS so that it's not a real problem in practice. Something like having pending node move to a "almost there" state before becoming true replica, and staying in that state for basically the max time of a paxos round, and then Paxos might be able to replace "pending" nodes by those "almost there" for PREPARE. With that said, anything paxos related is pretty subtle so I'm not saying this would work, one would have to look at the idea a lot more closely. Also, this probably wouldn't be a trivial change at all. And to be upfront, I'm unlikely to personally have cycles to devote to this in the short term. > Pending endpoints size check for CAS doesn't play nicely with > writes-on-replacement > --- > > Key: CASSANDRA-13327 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13327 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > > Consider this ring: > 127.0.0.1 MR UP JOINING -7301836195843364181 > 127.0.0.2MR UP NORMAL -7263405479023135948 > 127.0.0.3MR UP NORMAL -7205759403792793599 > 127.0.0.4 MR DOWN NORMAL -7148113328562451251 > where 127.0.0.1 was bootstrapping for cluster expansion. Note that, due to > the failure of 127.0.0.4, 127.0.0.1 was stuck trying to stream from it and > making no progress. > Then the down node was replaced so we had: > 127.0.0.1 MR UP JOINING -7301836195843364181 > 127.0.0.2MR UP NORMAL -7263405479023135948 > 127.0.0.3MR UP NORMAL -7205759403792793599 > 127.0.0.5 MR UP JOINING -7148113328562451251 > It’s confusing in the ring - the first JOINING is a genuine bootstrap, the > second is a replacement. We now had CAS unavailables (but no non-CAS > unvailables). I think it’s because the pending endpoints check thinks that > 127.0.0.5 is gaining a range when it’s just replacing. > The workaround is to kill the stuck JOINING node, but Cassandra shouldn’t > unnecessarily fail these requests. > It also appears like required participants is bumped by 1 during a host > replacement so if the replacing host fails you will get unavailables and > timeouts. > This is related to the check added in CASSANDRA-8346 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13327) Pending endpoints size check for CAS doesn't play nicely with writes-on-replacement
[ https://issues.apache.org/jira/browse/CASSANDRA-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949808#comment-15949808 ] Paulo Motta commented on CASSANDRA-13327: - bq. If a node is streaming from a node that is replaced, we should probably detect that and fail the bootstrapping node since we know it will never complete (and hence has no reason to be accounted as pending anymore). Actually this is a feature and not a bug, the joining node remains in JOINING state so bootstrap can be resumed (CASSANDRA-8838) after the failure is resolved. In this case it will obviously not play along well with CAS it will mean 2 pending endpoints will remain unavailable, which is a pity. Is there a reason we can't lift the limitation from CASSANDRA-8346 by making the read phase use an extended (RF + P + 1) / 2 quorum? > Pending endpoints size check for CAS doesn't play nicely with > writes-on-replacement > --- > > Key: CASSANDRA-13327 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13327 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > > Consider this ring: > 127.0.0.1 MR UP JOINING -7301836195843364181 > 127.0.0.2MR UP NORMAL -7263405479023135948 > 127.0.0.3MR UP NORMAL -7205759403792793599 > 127.0.0.4 MR DOWN NORMAL -7148113328562451251 > where 127.0.0.1 was bootstrapping for cluster expansion. Note that, due to > the failure of 127.0.0.4, 127.0.0.1 was stuck trying to stream from it and > making no progress. > Then the down node was replaced so we had: > 127.0.0.1 MR UP JOINING -7301836195843364181 > 127.0.0.2MR UP NORMAL -7263405479023135948 > 127.0.0.3MR UP NORMAL -7205759403792793599 > 127.0.0.5 MR UP JOINING -7148113328562451251 > It’s confusing in the ring - the first JOINING is a genuine bootstrap, the > second is a replacement. We now had CAS unavailables (but no non-CAS > unvailables). I think it’s because the pending endpoints check thinks that > 127.0.0.5 is gaining a range when it’s just replacing. > The workaround is to kill the stuck JOINING node, but Cassandra shouldn’t > unnecessarily fail these requests. > It also appears like required participants is bumped by 1 during a host > replacement so if the replacing host fails you will get unavailables and > timeouts. > This is related to the check added in CASSANDRA-8346 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13327) Pending endpoints size check for CAS doesn't play nicely with writes-on-replacement
[ https://issues.apache.org/jira/browse/CASSANDRA-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948593#comment-15948593 ] Sylvain Lebresne commented on CASSANDRA-13327: -- Fyi, some of the confusion is probably my fault as I initially read the description too quickly, and though the _replaced_ node was in pending, which is what looked unnecessary to me but it appears this is not what is happening here. Re-reading said description, it does look like there is 2 genuine "pending" nodes: one that is bootstrapping and one that is replacing some other node. In that case, I'm afraid the code is working as designed: a replacing node _is_ gaining a range in the sense that it's not a replica for that range as far as read are concerned, but it may become one at any time once the replacement ends. bq. Note that, due to the failure of 127.0.0.4, 127.0.0.1 was stuck trying to stream from it and making no progress. I'll submit that this is probably the part where we ought to do better. If a node is streaming from a node that is replaced, we should probably detect that and fail the bootstrapping node since we know it will never complete (and hence has no reason to be accounted as pending anymore). > Pending endpoints size check for CAS doesn't play nicely with > writes-on-replacement > --- > > Key: CASSANDRA-13327 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13327 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > > Consider this ring: > 127.0.0.1 MR UP JOINING -7301836195843364181 > 127.0.0.2MR UP NORMAL -7263405479023135948 > 127.0.0.3MR UP NORMAL -7205759403792793599 > 127.0.0.4 MR DOWN NORMAL -7148113328562451251 > where 127.0.0.1 was bootstrapping for cluster expansion. Note that, due to > the failure of 127.0.0.4, 127.0.0.1 was stuck trying to stream from it and > making no progress. > Then the down node was replaced so we had: > 127.0.0.1 MR UP JOINING -7301836195843364181 > 127.0.0.2MR UP NORMAL -7263405479023135948 > 127.0.0.3MR UP NORMAL -7205759403792793599 > 127.0.0.5 MR UP JOINING -7148113328562451251 > It’s confusing in the ring - the first JOINING is a genuine bootstrap, the > second is a replacement. We now had CAS unavailables (but no non-CAS > unvailables). I think it’s because the pending endpoints check thinks that > 127.0.0.5 is gaining a range when it’s just replacing. > The workaround is to kill the stuck JOINING node, but Cassandra shouldn’t > unnecessarily fail these requests. > It also appears like required participants is bumped by 1 during a host > replacement so if the replacing host fails you will get unavailables and > timeouts. > This is related to the check added in CASSANDRA-8346 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13327) Pending endpoints size check for CAS doesn't play nicely with writes-on-replacement
[ https://issues.apache.org/jira/browse/CASSANDRA-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947725#comment-15947725 ] Paulo Motta commented on CASSANDRA-13327: - bq. To answer your question. What I believe happens is that while streaming is occurring the replacing node remains in the joining state. This is designed behavior per CASSANDRA-8523. "JOINING" is just a display name on nodetool so we must probably fix that to show REPLACING instead, but internally it means the node is trying to join the ring with the same tokens of the node it's trying to replace (only IFF the operation completes successfully, that's why it's on pending state). bq. If the down node is not coming back and you are replacing it why should there be unavailables? The unavailable only happened because there were 2 pending nodes in the requested range (the joining node AND the replacing node), and the current CAS design forbids more than 1 pending endpoint in the requested range (CASSANDRA-8346). bq. The question for me is whether the replacing node is really pending? What is the definition of pending and why should it include a replacing node? The replacing node is pending because we cannot count that node as an ordinary node towards the consistency level, otherwise if the replace operation fails the operations that used the replacement node as a member of the quorum will become inconsistent, that's why CASSANDRA-833 added pending/joining nodes as additional members of the cohort. > Pending endpoints size check for CAS doesn't play nicely with > writes-on-replacement > --- > > Key: CASSANDRA-13327 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13327 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > > Consider this ring: > 127.0.0.1 MR UP JOINING -7301836195843364181 > 127.0.0.2MR UP NORMAL -7263405479023135948 > 127.0.0.3MR UP NORMAL -7205759403792793599 > 127.0.0.4 MR DOWN NORMAL -7148113328562451251 > where 127.0.0.1 was bootstrapping for cluster expansion. Note that, due to > the failure of 127.0.0.4, 127.0.0.1 was stuck trying to stream from it and > making no progress. > Then the down node was replaced so we had: > 127.0.0.1 MR UP JOINING -7301836195843364181 > 127.0.0.2MR UP NORMAL -7263405479023135948 > 127.0.0.3MR UP NORMAL -7205759403792793599 > 127.0.0.5 MR UP JOINING -7148113328562451251 > It’s confusing in the ring - the first JOINING is a genuine bootstrap, the > second is a replacement. We now had CAS unavailables (but no non-CAS > unvailables). I think it’s because the pending endpoints check thinks that > 127.0.0.5 is gaining a range when it’s just replacing. > The workaround is to kill the stuck JOINING node, but Cassandra shouldn’t > unnecessarily fail these requests. > It also appears like required participants is bumped by 1 during a host > replacement so if the replacing host fails you will get unavailables and > timeouts. > This is related to the check added in CASSANDRA-8346 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13327) Pending endpoints size check for CAS doesn't play nicely with writes-on-replacement
[ https://issues.apache.org/jira/browse/CASSANDRA-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947607#comment-15947607 ] Ariel Weisberg commented on CASSANDRA-13327: Give me a bit to package up my dtest demonstrating this. > Pending endpoints size check for CAS doesn't play nicely with > writes-on-replacement > --- > > Key: CASSANDRA-13327 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13327 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > > Consider this ring: > 127.0.0.1 MR UP JOINING -7301836195843364181 > 127.0.0.2MR UP NORMAL -7263405479023135948 > 127.0.0.3MR UP NORMAL -7205759403792793599 > 127.0.0.4 MR DOWN NORMAL -7148113328562451251 > where 127.0.0.1 was bootstrapping for cluster expansion. Note that, due to > the failure of 127.0.0.4, 127.0.0.1 was stuck trying to stream from it and > making no progress. > Then the down node was replaced so we had: > 127.0.0.1 MR UP JOINING -7301836195843364181 > 127.0.0.2MR UP NORMAL -7263405479023135948 > 127.0.0.3MR UP NORMAL -7205759403792793599 > 127.0.0.5 MR UP JOINING -7148113328562451251 > It’s confusing in the ring - the first JOINING is a genuine bootstrap, the > second is a replacement. We now had CAS unavailables (but no non-CAS > unvailables). I think it’s because the pending endpoints check thinks that > 127.0.0.5 is gaining a range when it’s just replacing. > The workaround is to kill the stuck JOINING node, but Cassandra shouldn’t > unnecessarily fail these requests. > It also appears like required participants is bumped by 1 during a host > replacement so if the replacing host fails you will get unavailables and > timeouts. > This is related to the check added in CASSANDRA-8346 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13327) Pending endpoints size check for CAS doesn't play nicely with writes-on-replacement
[ https://issues.apache.org/jira/browse/CASSANDRA-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947564#comment-15947564 ] Paulo Motta commented on CASSANDRA-13327: - I'm a bit confused here, did CAS unavailable happened while 127.0.0.5 was being replaced, or did 127.0.0.5 remained in JOINING/replacing state after replace was finished? If the former, than this is expected behavior I guess? Because you had 3 normal endpoints (where 1 was down) and 2 pending endpoints (the bootstrapping and the replacing node) for the requested key so the CAS should not allowed due to CASSANDRA-8346. If the latter than this is a bug with replace and must be fixed since the node must bump to NORMAL state after replacement is completed and the CAS should succeed. Can you reproduce this easily or have logs to understand why the replacement node did not go into NORMAL state after replacement was finished? > Pending endpoints size check for CAS doesn't play nicely with > writes-on-replacement > --- > > Key: CASSANDRA-13327 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13327 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > > Consider this ring: > 127.0.0.1 MR UP JOINING -7301836195843364181 > 127.0.0.2MR UP NORMAL -7263405479023135948 > 127.0.0.3MR UP NORMAL -7205759403792793599 > 127.0.0.4 MR DOWN NORMAL -7148113328562451251 > where 127.0.0.1 was bootstrapping for cluster expansion. Note that, due to > the failure of 127.0.0.4, 127.0.0.1 was stuck trying to stream from it and > making no progress. > Then the down node was replaced so we had: > 127.0.0.1 MR UP JOINING -7301836195843364181 > 127.0.0.2MR UP NORMAL -7263405479023135948 > 127.0.0.3MR UP NORMAL -7205759403792793599 > 127.0.0.5 MR UP JOINING -7148113328562451251 > It’s confusing in the ring - the first JOINING is a genuine bootstrap, the > second is a replacement. We now had CAS unavailables (but no non-CAS > unvailables). I think it’s because the pending endpoints check thinks that > 127.0.0.5 is gaining a range when it’s just replacing. > The workaround is to kill the stuck JOINING node, but Cassandra shouldn’t > unnecessarily fail these requests. > It also appears like required participants is bumped by 1 during a host > replacement so if the replacing host fails you will get unavailables and > timeouts. > This is related to the check added in CASSANDRA-8346 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13327) Pending endpoints size check for CAS doesn't play nicely with writes-on-replacement
[ https://issues.apache.org/jira/browse/CASSANDRA-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947306#comment-15947306 ] Sylvain Lebresne commented on CASSANDRA-13327: -- [~pauloricardomg], you worked on replacement, any comments? Is there any reason to not remove replacement from pending as soon as we replace them? > Pending endpoints size check for CAS doesn't play nicely with > writes-on-replacement > --- > > Key: CASSANDRA-13327 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13327 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > > Consider this ring: > 127.0.0.1 MR UP JOINING -7301836195843364181 > 127.0.0.2MR UP NORMAL -7263405479023135948 > 127.0.0.3MR UP NORMAL -7205759403792793599 > 127.0.0.4 MR DOWN NORMAL -7148113328562451251 > where 127.0.0.1 was bootstrapping for cluster expansion. Note that, due to > the failure of 127.0.0.4, 127.0.0.1 was stuck trying to stream from it and > making no progress. > Then the down node was replaced so we had: > 127.0.0.1 MR UP JOINING -7301836195843364181 > 127.0.0.2MR UP NORMAL -7263405479023135948 > 127.0.0.3MR UP NORMAL -7205759403792793599 > 127.0.0.5 MR UP JOINING -7148113328562451251 > It’s confusing in the ring - the first JOINING is a genuine bootstrap, the > second is a replacement. We now had CAS unavailables (but no non-CAS > unvailables). I think it’s because the pending endpoints check thinks that > 127.0.0.5 is gaining a range when it’s just replacing. > The workaround is to kill the stuck JOINING node, but Cassandra shouldn’t > unnecessarily fail these requests. > It also appears like required participants is bumped by 1 during a host > replacement so if the replacing host fails you will get unavailables and > timeouts. > This is related to the check added in CASSANDRA-8346 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13327) Pending endpoints size check for CAS doesn't play nicely with writes-on-replacement
[ https://issues.apache.org/jira/browse/CASSANDRA-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15944229#comment-15944229 ] Ariel Weisberg commented on CASSANDRA-13327: It seems to me that for host replacement there is simply no pending state at all. My reading says that the host replacement can only occur when the host being replacement is not alive although I imagine at some nodes it might be considered alive. So why can't the replacing node transition directly to a state where it is no longer joining, but has replaced it's target node, and in that situation it is simply a node that is badly behind and in need of repair. Bootstrap can then run on the replacing node to provide missing data while it serves both writes and reads? Is that an issue for CL.ONE which already has weak guarantees? > Pending endpoints size check for CAS doesn't play nicely with > writes-on-replacement > --- > > Key: CASSANDRA-13327 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13327 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > > Consider this ring: > 127.0.0.1 MR UP JOINING -7301836195843364181 > 127.0.0.2MR UP NORMAL -7263405479023135948 > 127.0.0.3MR UP NORMAL -7205759403792793599 > 127.0.0.4 MR DOWN NORMAL -7148113328562451251 > where 127.0.0.1 was bootstrapping for cluster expansion. Note that, due to > the failure of 127.0.0.4, 127.0.0.1 was stuck trying to stream from it and > making no progress. > Then the down node was replaced so we had: > 127.0.0.1 MR UP JOINING -7301836195843364181 > 127.0.0.2MR UP NORMAL -7263405479023135948 > 127.0.0.3MR UP NORMAL -7205759403792793599 > 127.0.0.5 MR UP JOINING -7148113328562451251 > It’s confusing in the ring - the first JOINING is a genuine bootstrap, the > second is a replacement. We now had CAS unavailables (but no non-CAS > unvailables). I think it’s because the pending endpoints check thinks that > 127.0.0.5 is gaining a range when it’s just replacing. > The workaround is to kill the stuck JOINING node, but Cassandra shouldn’t > unnecessarily fail these requests. > It also appears like required participants is bumped by 1 during a host > replacement so if the replacing host fails you will get unavailables and > timeouts. > This is related to the check added in CASSANDRA-8346 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13327) Pending endpoints size check for CAS doesn't play nicely with writes-on-replacement
[ https://issues.apache.org/jira/browse/CASSANDRA-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15943870#comment-15943870 ] Ariel Weisberg commented on CASSANDRA-13327: OK. So to remove the node from {{TokenMetadata.pendingEndpointsFor}} then it must appear as a natural endpoint from {{StorageService.getNaturalEndpoints()}} otherwise writes would never make to that participant at all. I guess I need to look up when that transition occurs now and whether doing that transition sooner makes sense. > Pending endpoints size check for CAS doesn't play nicely with > writes-on-replacement > --- > > Key: CASSANDRA-13327 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13327 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > > Consider this ring: > 127.0.0.1 MR UP JOINING -7301836195843364181 > 127.0.0.2MR UP NORMAL -7263405479023135948 > 127.0.0.3MR UP NORMAL -7205759403792793599 > 127.0.0.4 MR DOWN NORMAL -7148113328562451251 > where 127.0.0.1 was bootstrapping for cluster expansion. Note that, due to > the failure of 127.0.0.4, 127.0.0.1 was stuck trying to stream from it and > making no progress. > Then the down node was replaced so we had: > 127.0.0.1 MR UP JOINING -7301836195843364181 > 127.0.0.2MR UP NORMAL -7263405479023135948 > 127.0.0.3MR UP NORMAL -7205759403792793599 > 127.0.0.5 MR UP JOINING -7148113328562451251 > It’s confusing in the ring - the first JOINING is a genuine bootstrap, the > second is a replacement. We now had CAS unavailables (but no non-CAS > unvailables). I think it’s because the pending endpoints check thinks that > 127.0.0.5 is gaining a range when it’s just replacing. > The workaround is to kill the stuck JOINING node, but Cassandra shouldn’t > unnecessarily fail these requests. > It also appears like required participants is bumped by 1 during a host > replacement so if the replacing host fails you will get unavailables and > timeouts. > This is related to the check added in CASSANDRA-8346 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13327) Pending endpoints size check for CAS doesn't play nicely with writes-on-replacement
[ https://issues.apache.org/jira/browse/CASSANDRA-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933025#comment-15933025 ] Ariel Weisberg commented on CASSANDRA-13327: [dtest here|https://github.com/riptano/cassandra-dtest/compare/master...aweisberg:cassandra-13327?expand=1] [~slebresne] can you review this? Since we don't run Jepsen on this it's really hard to know if it's going to impact the correctness of LWT. > Pending endpoints size check for CAS doesn't play nicely with > writes-on-replacement > --- > > Key: CASSANDRA-13327 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13327 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > > Consider this ring: > 127.0.0.1 MR UP JOINING -7301836195843364181 > 127.0.0.2MR UP NORMAL -7263405479023135948 > 127.0.0.3MR UP NORMAL -7205759403792793599 > 127.0.0.4 MR DOWN NORMAL -7148113328562451251 > where 127.0.0.1 was bootstrapping for cluster expansion. Note that, due to > the failure of 127.0.0.4, 127.0.0.1 was stuck trying to stream from it and > making no progress. > Then the down node was replaced so we had: > 127.0.0.1 MR UP JOINING -7301836195843364181 > 127.0.0.2MR UP NORMAL -7263405479023135948 > 127.0.0.3MR UP NORMAL -7205759403792793599 > 127.0.0.5 MR UP JOINING -7148113328562451251 > It’s confusing in the ring - the first JOINING is a genuine bootstrap, the > second is a replacement. We now had CAS unavailables (but no non-CAS > unvailables). I think it’s because the pending endpoints check thinks that > 127.0.0.5 is gaining a range when it’s just replacing. > The workaround is to kill the stuck JOINING node, but Cassandra shouldn’t > unnecessarily fail these requests. > It also appears like required participants is bumped by 1 during a host > replacement so if the replacing host fails you will get unavailables and > timeouts. > This is related to the check added in CASSANDRA-8346 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13327) Pending endpoints size check for CAS doesn't play nicely with writes-on-replacement
[ https://issues.apache.org/jira/browse/CASSANDRA-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15926934#comment-15926934 ] Ariel Weisberg commented on CASSANDRA-13327: ||code|utests|dtests|| |[trunk|https://github.com/apache/cassandra/compare/trunk...aweisberg:cassandra-13327-trunk?expand=1]|[utests|https://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-cassandra-13327-trunk-testall/1/]|[dtests|https://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-cassandra-13327-trunk-dtest/1/]| Working on a dtest to see if this at least has the intended effect. I am nowhere near as confident as I need to be that it's safe. > Pending endpoints size check for CAS doesn't play nicely with > writes-on-replacement > --- > > Key: CASSANDRA-13327 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13327 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > > Consider this ring: > 127.0.0.1 MR UP JOINING -7301836195843364181 > 127.0.0.2MR UP NORMAL -7263405479023135948 > 127.0.0.3MR UP NORMAL -7205759403792793599 > 127.0.0.4 MR DOWN NORMAL -7148113328562451251 > where 127.0.0.1 was bootstrapping for cluster expansion. Note that, due to > the failure of 127.0.0.4, 127.0.0.1 was stuck trying to stream from it and > making no progress. > Then the down node was replaced so we had: > 127.0.0.1 MR UP JOINING -7301836195843364181 > 127.0.0.2MR UP NORMAL -7263405479023135948 > 127.0.0.3MR UP NORMAL -7205759403792793599 > 127.0.0.5 MR UP JOINING -7148113328562451251 > It’s confusing in the ring - the first JOINING is a genuine bootstrap, the > second is a replacement. We now had CAS unavailables (but no non-CAS > unvailables). I think it’s because the pending endpoints check thinks that > 127.0.0.5 is gaining a range when it’s just replacing. > The workaround is to kill the stuck JOINING node, but Cassandra shouldn’t > unnecessarily fail these requests. > It also appears like required participants is bumped by 1 during a host > replacement so if the replacing host fails you will get unavailables and > timeouts. -- This message was sent by Atlassian JIRA (v6.3.15#6346)