[ 
https://issues.apache.org/jira/browse/CASSANDRA-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068746#comment-17068746
 ] 

Aleksandr Sorokoumov commented on CASSANDRA-14801:
--------------------------------------------------

I reproduced the bug using simple [randomized 
test|https://gist.github.com/Gerrrr/f59dc5dedaf6501bb31d79b068244213] that 
creates a cluster of N nodes and adds, moves, and removes nodes from it. I also 
found another bug where {{TokenMetadata#calculatePendingRanges}} fails during 
move-affected replica calculation.

Patch that addresses both problems ([link to 
PR|https://github.com/apache/cassandra/pull/495]):

* 
[a534b2|https://github.com/Gerrrr/cassandra/commit/a534b2be9a653fb0cdda75043c0b79e481ca1701]
 adds tests that reproduce both bugs.
* 
[bb36cd|https://github.com/Gerrrr/cassandra/commit/bb36cd09c164840ad9ab3231f1039c443f8f040c]
 fixes the newly discovered bug ({{test1Leave1Move}} in 
[a534b2|https://github.com/Gerrrr/cassandra/commit/a534b2be9a653fb0cdda75043c0b79e481ca1701]).
 There, the failure happens because for the same pending range we can include 2 
replicas with the same endpoint and different ranges (violation of 
{{PendingRangeMaps Conflict.DUPLICATE}} policy). This happens because right now 
we include in {{PendingRangeMaps}} the entire new replica after leave for 
leave-affected ranges and only the pending part of it for move-affected ranges. 
This commit marks as pending only the missing part of the new replica.
* 
[a33b03|https://github.com/Gerrrr/cassandra/commit/a33b032cd75044db3475505cd32e6afd6f98ad40]
 solves the original issue. Without this commit {{getPendingRanges}} can fail 
if the same replica covers more than 1 pending range. I think that this is a 
valid situation. Consider {{testLeave2}} in 
[a534b2|https://github.com/Gerrrr/cassandra/commit/a534b2be9a653fb0cdda75043c0b79e481ca1701].
 In this case replica {{Full(127.0.0.1:7012,(-9,0])}} covers 2 ranges - 
{{(-9,-4]}} and {{(-4,0]}}. If we run the same test against 3.11 {{(-9, 0]}} is 
represented as {{\{(-9, -4], (-4, 0]\}}} and that possible representation would 
not trigger the bug in 4.0. However, I think that we should not base safety of 
{{getPendingRanges}} execution on the way a particular 
{{AbstractReplicationStrategy}} represents a range and we should allow 
duplicate entries while building pending {{RangesAtEndpoint}}.

With these changes, the randomized test hasn't failed after running for a 
while. As I mentioned, it is a very simplistic test, so any suggestions for 
improvement are welcome. I haven't included it in the PR, as it seems to be a 
good candidate to become flaky. Maybe it is worth adding as a {{long}} test?

> calculatePendingRanges no longer safe for multiple adjacent range movements
> ---------------------------------------------------------------------------
>
>                 Key: CASSANDRA-14801
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14801
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Legacy/Coordination, Legacy/Distributed Metadata
>            Reporter: Benedict Elliott Smith
>            Assignee: Aleksandr Sorokoumov
>            Priority: Normal
>              Labels: pull-request-available
>             Fix For: 4.0, 4.0-beta
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Correctness depended upon the narrowing to a {{Set<InetAddressAndport>}}, 
> which we no longer do - we maintain a collection of all {{Replica}}.  Our 
> {{RangesAtEndpoint}} collection built by {{getPendingRanges}} can as a result 
> contain the same endpoint multiple times; and our {{EndpointsForToken}} 
> obtained by {{TokenMetadata.pendingEndpointsFor}} may fail to be constructed, 
> resulting in cluster-wide failures for writes to the affected token ranges 
> for the duration of the range movement.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to