[ 
https://issues.apache.org/jira/browse/CASSANDRA-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947725#comment-15947725
 ] 

Paulo Motta commented on CASSANDRA-13327:
-----------------------------------------

bq. To answer your question. What I believe happens is that while streaming is 
occurring the replacing node remains in the joining state.

This is designed behavior per CASSANDRA-8523. "JOINING" is just a display name 
on nodetool so we must probably fix that to show REPLACING instead, but 
internally it means the node is trying to join the ring with the same tokens of 
the node it's trying to replace (only IFF the operation completes successfully, 
that's why it's on pending state).

bq. If the down node is not coming back and you are replacing it why should 
there be unavailables?

The unavailable only happened because there were 2 pending nodes in the 
requested range (the joining node AND the replacing node), and the current CAS 
design forbids more than 1 pending endpoint in the requested range 
(CASSANDRA-8346).

bq. The question for me is whether the replacing node is really pending? What 
is the definition of pending and why should it include a replacing node?

The replacing node is pending because we cannot count that node as an ordinary 
node towards the consistency level, otherwise if the replace operation fails 
the operations that used the replacement node as a member of the quorum will 
become inconsistent, that's why CASSANDRA-833 added pending/joining nodes as 
additional members of the cohort.

> Pending endpoints size check for CAS doesn't play nicely with 
> writes-on-replacement
> -----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-13327
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13327
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Coordination
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>
> Consider this ring:
> 127.0.0.1  MR UP     JOINING -7301836195843364181
> 127.0.0.2    MR UP     NORMAL -7263405479023135948
> 127.0.0.3    MR UP     NORMAL -7205759403792793599
> 127.0.0.4   MR DOWN     NORMAL -7148113328562451251
> where 127.0.0.1 was bootstrapping for cluster expansion. Note that, due to 
> the failure of 127.0.0.4, 127.0.0.1 was stuck trying to stream from it and 
> making no progress.
> Then the down node was replaced so we had:
> 127.0.0.1  MR UP     JOINING -7301836195843364181
> 127.0.0.2    MR UP     NORMAL -7263405479023135948
> 127.0.0.3    MR UP     NORMAL -7205759403792793599
> 127.0.0.5   MR UP     JOINING -7148113328562451251
> It’s confusing in the ring - the first JOINING is a genuine bootstrap, the 
> second is a replacement. We now had CAS unavailables (but no non-CAS 
> unvailables). I think it’s because the pending endpoints check thinks that 
> 127.0.0.5 is gaining a range when it’s just replacing.
> The workaround is to kill the stuck JOINING node, but Cassandra shouldn’t 
> unnecessarily fail these requests.
> It also appears like required participants is bumped by 1 during a host 
> replacement so if the replacing host fails you will get unavailables and 
> timeouts.
> This is related to the check added in CASSANDRA-8346



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to