[jira] [Commented] (CASSANDRA-13327) Pending endpoints size check for CAS doesn't play nicely with writes-on-replacement

2017-03-31 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15951550#comment-15951550
 ] 

Paulo Motta commented on CASSANDRA-13327:
-

Thanks for clarifying [~slebresne].

[~aweisberg] guess we can close this then? and maybe open another ticket with 
Sylvain's suggestion above to lift the max number of pending endpoints 
limitation for CAS if you're willing to take a shot at it?

> Pending endpoints size check for CAS doesn't play nicely with 
> writes-on-replacement
> ---
>
> Key: CASSANDRA-13327
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13327
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>
> Consider this ring:
> 127.0.0.1  MR UP JOINING -7301836195843364181
> 127.0.0.2MR UP NORMAL -7263405479023135948
> 127.0.0.3MR UP NORMAL -7205759403792793599
> 127.0.0.4   MR DOWN NORMAL -7148113328562451251
> where 127.0.0.1 was bootstrapping for cluster expansion. Note that, due to 
> the failure of 127.0.0.4, 127.0.0.1 was stuck trying to stream from it and 
> making no progress.
> Then the down node was replaced so we had:
> 127.0.0.1  MR UP JOINING -7301836195843364181
> 127.0.0.2MR UP NORMAL -7263405479023135948
> 127.0.0.3MR UP NORMAL -7205759403792793599
> 127.0.0.5   MR UP JOINING -7148113328562451251
> It’s confusing in the ring - the first JOINING is a genuine bootstrap, the 
> second is a replacement. We now had CAS unavailables (but no non-CAS 
> unvailables). I think it’s because the pending endpoints check thinks that 
> 127.0.0.5 is gaining a range when it’s just replacing.
> The workaround is to kill the stuck JOINING node, but Cassandra shouldn’t 
> unnecessarily fail these requests.
> It also appears like required participants is bumped by 1 during a host 
> replacement so if the replacing host fails you will get unavailables and 
> timeouts.
> This is related to the check added in CASSANDRA-8346



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13327) Pending endpoints size check for CAS doesn't play nicely with writes-on-replacement

2017-03-31 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15950523#comment-15950523
 ] 

Sylvain Lebresne commented on CASSANDRA-13327:
--

bq. so bootstrap can be resumed

Forgot we supported resuming now :)

bq. by making the read phase use an extended (RF + P + 1) / 2 quorum?

Reading from pending nodes is a very bad idea since by definition those nodes 
don't have up-to-date data.

Well, I guess things are working as they do for decently good reason here. That 
said, thinking about it, it could be that the solution from CASSANDRA-8346 is a 
bit of a big hammer: I believe it's enough to ensure that we read from at least 
one replica that responded to PREPARE 'in the same Paxos round' But we have 
timeouts on the paxos round, so it could be it is possible to reduce 
drastically the time we consider a node pending for CAS so that it's not a real 
problem in practice. Something like having pending node move to a "almost 
there" state before becoming true replica, and staying in that state for 
basically the max time of a paxos round, and then Paxos might be able to 
replace "pending" nodes by those "almost there" for PREPARE.

With that said, anything paxos related is pretty subtle so I'm not saying this 
would work, one would have to look at the idea a lot more closely. Also, this 
probably wouldn't be a trivial change at all. And to be upfront, I'm unlikely 
to personally have cycles to devote to this in the short term. 


> Pending endpoints size check for CAS doesn't play nicely with 
> writes-on-replacement
> ---
>
> Key: CASSANDRA-13327
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13327
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>
> Consider this ring:
> 127.0.0.1  MR UP JOINING -7301836195843364181
> 127.0.0.2MR UP NORMAL -7263405479023135948
> 127.0.0.3MR UP NORMAL -7205759403792793599
> 127.0.0.4   MR DOWN NORMAL -7148113328562451251
> where 127.0.0.1 was bootstrapping for cluster expansion. Note that, due to 
> the failure of 127.0.0.4, 127.0.0.1 was stuck trying to stream from it and 
> making no progress.
> Then the down node was replaced so we had:
> 127.0.0.1  MR UP JOINING -7301836195843364181
> 127.0.0.2MR UP NORMAL -7263405479023135948
> 127.0.0.3MR UP NORMAL -7205759403792793599
> 127.0.0.5   MR UP JOINING -7148113328562451251
> It’s confusing in the ring - the first JOINING is a genuine bootstrap, the 
> second is a replacement. We now had CAS unavailables (but no non-CAS 
> unvailables). I think it’s because the pending endpoints check thinks that 
> 127.0.0.5 is gaining a range when it’s just replacing.
> The workaround is to kill the stuck JOINING node, but Cassandra shouldn’t 
> unnecessarily fail these requests.
> It also appears like required participants is bumped by 1 during a host 
> replacement so if the replacing host fails you will get unavailables and 
> timeouts.
> This is related to the check added in CASSANDRA-8346



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13327) Pending endpoints size check for CAS doesn't play nicely with writes-on-replacement

2017-03-30 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949808#comment-15949808
 ] 

Paulo Motta commented on CASSANDRA-13327:
-

bq. If a node is streaming from a node that is replaced, we should probably 
detect that and fail the bootstrapping node since we know it will never 
complete (and hence has no reason to be accounted as pending anymore).

Actually this is a feature and not a bug, the joining node remains in JOINING 
state so bootstrap can be resumed (CASSANDRA-8838) after the failure is 
resolved. In this case it will obviously not play along well with CAS it will 
mean 2 pending endpoints will remain unavailable, which is a pity.

Is there a reason we can't lift the limitation from CASSANDRA-8346 by making 
the read phase use an extended (RF + P + 1) / 2 quorum?

> Pending endpoints size check for CAS doesn't play nicely with 
> writes-on-replacement
> ---
>
> Key: CASSANDRA-13327
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13327
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>
> Consider this ring:
> 127.0.0.1  MR UP JOINING -7301836195843364181
> 127.0.0.2MR UP NORMAL -7263405479023135948
> 127.0.0.3MR UP NORMAL -7205759403792793599
> 127.0.0.4   MR DOWN NORMAL -7148113328562451251
> where 127.0.0.1 was bootstrapping for cluster expansion. Note that, due to 
> the failure of 127.0.0.4, 127.0.0.1 was stuck trying to stream from it and 
> making no progress.
> Then the down node was replaced so we had:
> 127.0.0.1  MR UP JOINING -7301836195843364181
> 127.0.0.2MR UP NORMAL -7263405479023135948
> 127.0.0.3MR UP NORMAL -7205759403792793599
> 127.0.0.5   MR UP JOINING -7148113328562451251
> It’s confusing in the ring - the first JOINING is a genuine bootstrap, the 
> second is a replacement. We now had CAS unavailables (but no non-CAS 
> unvailables). I think it’s because the pending endpoints check thinks that 
> 127.0.0.5 is gaining a range when it’s just replacing.
> The workaround is to kill the stuck JOINING node, but Cassandra shouldn’t 
> unnecessarily fail these requests.
> It also appears like required participants is bumped by 1 during a host 
> replacement so if the replacing host fails you will get unavailables and 
> timeouts.
> This is related to the check added in CASSANDRA-8346



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13327) Pending endpoints size check for CAS doesn't play nicely with writes-on-replacement

2017-03-30 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948593#comment-15948593
 ] 

Sylvain Lebresne commented on CASSANDRA-13327:
--

Fyi, some of the confusion is probably my fault as I initially read the 
description too quickly, and though the _replaced_ node was in pending, which 
is what looked unnecessary to me but it appears this is not what is happening 
here. Re-reading said description, it does look like there is 2 genuine 
"pending" nodes: one that is bootstrapping and one that is replacing some other 
node. In that case, I'm afraid the code is working as designed: a replacing 
node _is_ gaining a range in the sense that it's not a replica for that range 
as far as read are concerned, but it may become one at any time once the 
replacement ends.

bq. Note that, due to the failure of 127.0.0.4, 127.0.0.1 was stuck trying to 
stream from it and making no progress.

I'll submit that this is probably the part where we ought to do better. If a 
node is streaming from a node that is replaced, we should probably detect that 
and fail the bootstrapping node since we know it will never complete (and hence 
has no reason to be accounted as pending anymore).

> Pending endpoints size check for CAS doesn't play nicely with 
> writes-on-replacement
> ---
>
> Key: CASSANDRA-13327
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13327
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>
> Consider this ring:
> 127.0.0.1  MR UP JOINING -7301836195843364181
> 127.0.0.2MR UP NORMAL -7263405479023135948
> 127.0.0.3MR UP NORMAL -7205759403792793599
> 127.0.0.4   MR DOWN NORMAL -7148113328562451251
> where 127.0.0.1 was bootstrapping for cluster expansion. Note that, due to 
> the failure of 127.0.0.4, 127.0.0.1 was stuck trying to stream from it and 
> making no progress.
> Then the down node was replaced so we had:
> 127.0.0.1  MR UP JOINING -7301836195843364181
> 127.0.0.2MR UP NORMAL -7263405479023135948
> 127.0.0.3MR UP NORMAL -7205759403792793599
> 127.0.0.5   MR UP JOINING -7148113328562451251
> It’s confusing in the ring - the first JOINING is a genuine bootstrap, the 
> second is a replacement. We now had CAS unavailables (but no non-CAS 
> unvailables). I think it’s because the pending endpoints check thinks that 
> 127.0.0.5 is gaining a range when it’s just replacing.
> The workaround is to kill the stuck JOINING node, but Cassandra shouldn’t 
> unnecessarily fail these requests.
> It also appears like required participants is bumped by 1 during a host 
> replacement so if the replacing host fails you will get unavailables and 
> timeouts.
> This is related to the check added in CASSANDRA-8346



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13327) Pending endpoints size check for CAS doesn't play nicely with writes-on-replacement

2017-03-29 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947725#comment-15947725
 ] 

Paulo Motta commented on CASSANDRA-13327:
-

bq. To answer your question. What I believe happens is that while streaming is 
occurring the replacing node remains in the joining state.

This is designed behavior per CASSANDRA-8523. "JOINING" is just a display name 
on nodetool so we must probably fix that to show REPLACING instead, but 
internally it means the node is trying to join the ring with the same tokens of 
the node it's trying to replace (only IFF the operation completes successfully, 
that's why it's on pending state).

bq. If the down node is not coming back and you are replacing it why should 
there be unavailables?

The unavailable only happened because there were 2 pending nodes in the 
requested range (the joining node AND the replacing node), and the current CAS 
design forbids more than 1 pending endpoint in the requested range 
(CASSANDRA-8346).

bq. The question for me is whether the replacing node is really pending? What 
is the definition of pending and why should it include a replacing node?

The replacing node is pending because we cannot count that node as an ordinary 
node towards the consistency level, otherwise if the replace operation fails 
the operations that used the replacement node as a member of the quorum will 
become inconsistent, that's why CASSANDRA-833 added pending/joining nodes as 
additional members of the cohort.

> Pending endpoints size check for CAS doesn't play nicely with 
> writes-on-replacement
> ---
>
> Key: CASSANDRA-13327
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13327
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>
> Consider this ring:
> 127.0.0.1  MR UP JOINING -7301836195843364181
> 127.0.0.2MR UP NORMAL -7263405479023135948
> 127.0.0.3MR UP NORMAL -7205759403792793599
> 127.0.0.4   MR DOWN NORMAL -7148113328562451251
> where 127.0.0.1 was bootstrapping for cluster expansion. Note that, due to 
> the failure of 127.0.0.4, 127.0.0.1 was stuck trying to stream from it and 
> making no progress.
> Then the down node was replaced so we had:
> 127.0.0.1  MR UP JOINING -7301836195843364181
> 127.0.0.2MR UP NORMAL -7263405479023135948
> 127.0.0.3MR UP NORMAL -7205759403792793599
> 127.0.0.5   MR UP JOINING -7148113328562451251
> It’s confusing in the ring - the first JOINING is a genuine bootstrap, the 
> second is a replacement. We now had CAS unavailables (but no non-CAS 
> unvailables). I think it’s because the pending endpoints check thinks that 
> 127.0.0.5 is gaining a range when it’s just replacing.
> The workaround is to kill the stuck JOINING node, but Cassandra shouldn’t 
> unnecessarily fail these requests.
> It also appears like required participants is bumped by 1 during a host 
> replacement so if the replacing host fails you will get unavailables and 
> timeouts.
> This is related to the check added in CASSANDRA-8346



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13327) Pending endpoints size check for CAS doesn't play nicely with writes-on-replacement

2017-03-29 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947607#comment-15947607
 ] 

Ariel Weisberg commented on CASSANDRA-13327:


Give me a bit to package up my dtest demonstrating this.

> Pending endpoints size check for CAS doesn't play nicely with 
> writes-on-replacement
> ---
>
> Key: CASSANDRA-13327
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13327
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>
> Consider this ring:
> 127.0.0.1  MR UP JOINING -7301836195843364181
> 127.0.0.2MR UP NORMAL -7263405479023135948
> 127.0.0.3MR UP NORMAL -7205759403792793599
> 127.0.0.4   MR DOWN NORMAL -7148113328562451251
> where 127.0.0.1 was bootstrapping for cluster expansion. Note that, due to 
> the failure of 127.0.0.4, 127.0.0.1 was stuck trying to stream from it and 
> making no progress.
> Then the down node was replaced so we had:
> 127.0.0.1  MR UP JOINING -7301836195843364181
> 127.0.0.2MR UP NORMAL -7263405479023135948
> 127.0.0.3MR UP NORMAL -7205759403792793599
> 127.0.0.5   MR UP JOINING -7148113328562451251
> It’s confusing in the ring - the first JOINING is a genuine bootstrap, the 
> second is a replacement. We now had CAS unavailables (but no non-CAS 
> unvailables). I think it’s because the pending endpoints check thinks that 
> 127.0.0.5 is gaining a range when it’s just replacing.
> The workaround is to kill the stuck JOINING node, but Cassandra shouldn’t 
> unnecessarily fail these requests.
> It also appears like required participants is bumped by 1 during a host 
> replacement so if the replacing host fails you will get unavailables and 
> timeouts.
> This is related to the check added in CASSANDRA-8346



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13327) Pending endpoints size check for CAS doesn't play nicely with writes-on-replacement

2017-03-29 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947564#comment-15947564
 ] 

Paulo Motta commented on CASSANDRA-13327:
-

I'm a bit confused here, did CAS unavailable happened while 127.0.0.5 was being 
replaced, or did 127.0.0.5 remained in JOINING/replacing state after replace 
was finished?

If the former, than this is expected behavior I guess? Because you had 3 normal 
endpoints (where 1 was down) and 2 pending endpoints (the bootstrapping and the 
replacing node) for the requested key so the CAS should not allowed due to 
CASSANDRA-8346.

If the latter than this is a bug with replace and must be fixed since the node 
must bump to NORMAL state after replacement is completed and the CAS should 
succeed. Can you reproduce this easily or have logs to understand why the 
replacement node did not go into NORMAL state after replacement was finished?

> Pending endpoints size check for CAS doesn't play nicely with 
> writes-on-replacement
> ---
>
> Key: CASSANDRA-13327
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13327
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>
> Consider this ring:
> 127.0.0.1  MR UP JOINING -7301836195843364181
> 127.0.0.2MR UP NORMAL -7263405479023135948
> 127.0.0.3MR UP NORMAL -7205759403792793599
> 127.0.0.4   MR DOWN NORMAL -7148113328562451251
> where 127.0.0.1 was bootstrapping for cluster expansion. Note that, due to 
> the failure of 127.0.0.4, 127.0.0.1 was stuck trying to stream from it and 
> making no progress.
> Then the down node was replaced so we had:
> 127.0.0.1  MR UP JOINING -7301836195843364181
> 127.0.0.2MR UP NORMAL -7263405479023135948
> 127.0.0.3MR UP NORMAL -7205759403792793599
> 127.0.0.5   MR UP JOINING -7148113328562451251
> It’s confusing in the ring - the first JOINING is a genuine bootstrap, the 
> second is a replacement. We now had CAS unavailables (but no non-CAS 
> unvailables). I think it’s because the pending endpoints check thinks that 
> 127.0.0.5 is gaining a range when it’s just replacing.
> The workaround is to kill the stuck JOINING node, but Cassandra shouldn’t 
> unnecessarily fail these requests.
> It also appears like required participants is bumped by 1 during a host 
> replacement so if the replacing host fails you will get unavailables and 
> timeouts.
> This is related to the check added in CASSANDRA-8346



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13327) Pending endpoints size check for CAS doesn't play nicely with writes-on-replacement

2017-03-29 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947306#comment-15947306
 ] 

Sylvain Lebresne commented on CASSANDRA-13327:
--

[~pauloricardomg], you worked on replacement, any comments? Is there any reason 
to not remove replacement from pending as soon as we replace them?

> Pending endpoints size check for CAS doesn't play nicely with 
> writes-on-replacement
> ---
>
> Key: CASSANDRA-13327
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13327
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>
> Consider this ring:
> 127.0.0.1  MR UP JOINING -7301836195843364181
> 127.0.0.2MR UP NORMAL -7263405479023135948
> 127.0.0.3MR UP NORMAL -7205759403792793599
> 127.0.0.4   MR DOWN NORMAL -7148113328562451251
> where 127.0.0.1 was bootstrapping for cluster expansion. Note that, due to 
> the failure of 127.0.0.4, 127.0.0.1 was stuck trying to stream from it and 
> making no progress.
> Then the down node was replaced so we had:
> 127.0.0.1  MR UP JOINING -7301836195843364181
> 127.0.0.2MR UP NORMAL -7263405479023135948
> 127.0.0.3MR UP NORMAL -7205759403792793599
> 127.0.0.5   MR UP JOINING -7148113328562451251
> It’s confusing in the ring - the first JOINING is a genuine bootstrap, the 
> second is a replacement. We now had CAS unavailables (but no non-CAS 
> unvailables). I think it’s because the pending endpoints check thinks that 
> 127.0.0.5 is gaining a range when it’s just replacing.
> The workaround is to kill the stuck JOINING node, but Cassandra shouldn’t 
> unnecessarily fail these requests.
> It also appears like required participants is bumped by 1 during a host 
> replacement so if the replacing host fails you will get unavailables and 
> timeouts.
> This is related to the check added in CASSANDRA-8346



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13327) Pending endpoints size check for CAS doesn't play nicely with writes-on-replacement

2017-03-27 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15944229#comment-15944229
 ] 

Ariel Weisberg commented on CASSANDRA-13327:


It seems to me that for host replacement there is simply no pending state at 
all. My reading says that the host replacement can only occur when the host 
being replacement is not alive although I imagine at some nodes it might be 
considered alive.

So why can't the replacing node transition directly to a state where it is no 
longer joining, but has replaced it's target node, and in that situation it is 
simply a node that is  badly behind and in need of repair. Bootstrap can then 
run on the replacing node to provide missing data while it serves both writes 
and reads?

Is that an issue for CL.ONE which already has weak guarantees?

> Pending endpoints size check for CAS doesn't play nicely with 
> writes-on-replacement
> ---
>
> Key: CASSANDRA-13327
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13327
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>
> Consider this ring:
> 127.0.0.1  MR UP JOINING -7301836195843364181
> 127.0.0.2MR UP NORMAL -7263405479023135948
> 127.0.0.3MR UP NORMAL -7205759403792793599
> 127.0.0.4   MR DOWN NORMAL -7148113328562451251
> where 127.0.0.1 was bootstrapping for cluster expansion. Note that, due to 
> the failure of 127.0.0.4, 127.0.0.1 was stuck trying to stream from it and 
> making no progress.
> Then the down node was replaced so we had:
> 127.0.0.1  MR UP JOINING -7301836195843364181
> 127.0.0.2MR UP NORMAL -7263405479023135948
> 127.0.0.3MR UP NORMAL -7205759403792793599
> 127.0.0.5   MR UP JOINING -7148113328562451251
> It’s confusing in the ring - the first JOINING is a genuine bootstrap, the 
> second is a replacement. We now had CAS unavailables (but no non-CAS 
> unvailables). I think it’s because the pending endpoints check thinks that 
> 127.0.0.5 is gaining a range when it’s just replacing.
> The workaround is to kill the stuck JOINING node, but Cassandra shouldn’t 
> unnecessarily fail these requests.
> It also appears like required participants is bumped by 1 during a host 
> replacement so if the replacing host fails you will get unavailables and 
> timeouts.
> This is related to the check added in CASSANDRA-8346



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13327) Pending endpoints size check for CAS doesn't play nicely with writes-on-replacement

2017-03-27 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15943870#comment-15943870
 ] 

Ariel Weisberg commented on CASSANDRA-13327:


OK. So to remove the node from {{TokenMetadata.pendingEndpointsFor}} then it 
must appear as a natural endpoint from {{StorageService.getNaturalEndpoints()}} 
otherwise writes would never make to that participant at all. I guess I need to 
look up when that transition occurs now and whether doing that transition 
sooner makes sense.



> Pending endpoints size check for CAS doesn't play nicely with 
> writes-on-replacement
> ---
>
> Key: CASSANDRA-13327
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13327
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>
> Consider this ring:
> 127.0.0.1  MR UP JOINING -7301836195843364181
> 127.0.0.2MR UP NORMAL -7263405479023135948
> 127.0.0.3MR UP NORMAL -7205759403792793599
> 127.0.0.4   MR DOWN NORMAL -7148113328562451251
> where 127.0.0.1 was bootstrapping for cluster expansion. Note that, due to 
> the failure of 127.0.0.4, 127.0.0.1 was stuck trying to stream from it and 
> making no progress.
> Then the down node was replaced so we had:
> 127.0.0.1  MR UP JOINING -7301836195843364181
> 127.0.0.2MR UP NORMAL -7263405479023135948
> 127.0.0.3MR UP NORMAL -7205759403792793599
> 127.0.0.5   MR UP JOINING -7148113328562451251
> It’s confusing in the ring - the first JOINING is a genuine bootstrap, the 
> second is a replacement. We now had CAS unavailables (but no non-CAS 
> unvailables). I think it’s because the pending endpoints check thinks that 
> 127.0.0.5 is gaining a range when it’s just replacing.
> The workaround is to kill the stuck JOINING node, but Cassandra shouldn’t 
> unnecessarily fail these requests.
> It also appears like required participants is bumped by 1 during a host 
> replacement so if the replacing host fails you will get unavailables and 
> timeouts.
> This is related to the check added in CASSANDRA-8346



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13327) Pending endpoints size check for CAS doesn't play nicely with writes-on-replacement

2017-03-20 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933025#comment-15933025
 ] 

Ariel Weisberg commented on CASSANDRA-13327:


[dtest 
here|https://github.com/riptano/cassandra-dtest/compare/master...aweisberg:cassandra-13327?expand=1]
[~slebresne] can you review this? Since we don't run Jepsen on this it's really 
hard to know if it's going to impact the correctness of LWT.

> Pending endpoints size check for CAS doesn't play nicely with 
> writes-on-replacement
> ---
>
> Key: CASSANDRA-13327
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13327
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>
> Consider this ring:
> 127.0.0.1  MR UP JOINING -7301836195843364181
> 127.0.0.2MR UP NORMAL -7263405479023135948
> 127.0.0.3MR UP NORMAL -7205759403792793599
> 127.0.0.4   MR DOWN NORMAL -7148113328562451251
> where 127.0.0.1 was bootstrapping for cluster expansion. Note that, due to 
> the failure of 127.0.0.4, 127.0.0.1 was stuck trying to stream from it and 
> making no progress.
> Then the down node was replaced so we had:
> 127.0.0.1  MR UP JOINING -7301836195843364181
> 127.0.0.2MR UP NORMAL -7263405479023135948
> 127.0.0.3MR UP NORMAL -7205759403792793599
> 127.0.0.5   MR UP JOINING -7148113328562451251
> It’s confusing in the ring - the first JOINING is a genuine bootstrap, the 
> second is a replacement. We now had CAS unavailables (but no non-CAS 
> unvailables). I think it’s because the pending endpoints check thinks that 
> 127.0.0.5 is gaining a range when it’s just replacing.
> The workaround is to kill the stuck JOINING node, but Cassandra shouldn’t 
> unnecessarily fail these requests.
> It also appears like required participants is bumped by 1 during a host 
> replacement so if the replacing host fails you will get unavailables and 
> timeouts.
> This is related to the check added in CASSANDRA-8346



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13327) Pending endpoints size check for CAS doesn't play nicely with writes-on-replacement

2017-03-15 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15926934#comment-15926934
 ] 

Ariel Weisberg commented on CASSANDRA-13327:


||code|utests|dtests||
|[trunk|https://github.com/apache/cassandra/compare/trunk...aweisberg:cassandra-13327-trunk?expand=1]|[utests|https://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-cassandra-13327-trunk-testall/1/]|[dtests|https://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-cassandra-13327-trunk-dtest/1/]|

Working on a dtest to see if this at least has the intended effect. I am 
nowhere near as confident as I need to be that it's safe.

> Pending endpoints size check for CAS doesn't play nicely with 
> writes-on-replacement
> ---
>
> Key: CASSANDRA-13327
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13327
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>
> Consider this ring:
> 127.0.0.1  MR UP JOINING -7301836195843364181
> 127.0.0.2MR UP NORMAL -7263405479023135948
> 127.0.0.3MR UP NORMAL -7205759403792793599
> 127.0.0.4   MR DOWN NORMAL -7148113328562451251
> where 127.0.0.1 was bootstrapping for cluster expansion. Note that, due to 
> the failure of 127.0.0.4, 127.0.0.1 was stuck trying to stream from it and 
> making no progress.
> Then the down node was replaced so we had:
> 127.0.0.1  MR UP JOINING -7301836195843364181
> 127.0.0.2MR UP NORMAL -7263405479023135948
> 127.0.0.3MR UP NORMAL -7205759403792793599
> 127.0.0.5   MR UP JOINING -7148113328562451251
> It’s confusing in the ring - the first JOINING is a genuine bootstrap, the 
> second is a replacement. We now had CAS unavailables (but no non-CAS 
> unvailables). I think it’s because the pending endpoints check thinks that 
> 127.0.0.5 is gaining a range when it’s just replacing.
> The workaround is to kill the stuck JOINING node, but Cassandra shouldn’t 
> unnecessarily fail these requests.
> It also appears like required participants is bumped by 1 during a host 
> replacement so if the replacing host fails you will get unavailables and 
> timeouts.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)