[ https://issues.apache.org/jira/browse/CASSANDRA-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234140#comment-14234140 ]
sankalp kohli commented on CASSANDRA-8346: ------------------------------------------ Let me review it. > Paxos operation can use stale data during multiple range movements > ------------------------------------------------------------------ > > Key: CASSANDRA-8346 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8346 > Project: Cassandra > Issue Type: Bug > Reporter: Sylvain Lebresne > Assignee: Sylvain Lebresne > Fix For: 2.0.12 > > Attachments: 8346.txt > > > Paxos operations correctly account for pending ranges for all operation > pertaining to the Paxos state, but those pending ranges are not taken into > account when reading the data to check for the conditions or during a serial > read. It's thus possible to break the LWT guarantees by reading a stale > value. This require 2 node movements (on the same token range) to be a > problem though. > Basically, we have {{RF}} replicas + {{P}} pending nodes. For the Paxos > prepare/propose phases, the number of required participants (the "Paxos > QUORUM") is {{(RF + P + 1) / 2}} ({{SP.getPaxosParticipants}}), but the read > done to check conditions or for serial reads is done at a "normal" QUORUM (or > LOCAL_QUORUM), and so a weaker {{(RF + 1) / 2}}. We have a problem if it's > possible that said read can read only from nodes that were not part of the > paxos participants, and so we have a problem if: > {noformat} > "normal quorum" == (RF + 1) / 2 <= (RF + P) - ((RF + P + 1) / 2) == > "participants considered - blocked for" > {noformat} > We're good if {{P = 0}} or {{P = 1}} since this inequality gives us > respectively {{RF + 1 <= RF - 1}} and {{RF + 1 <= RF}}, both of which are > impossible. But at {{P = 2}} (2 pending nodes), this inequality is equivalent > to {{RF <= RF}} and so we might read stale data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)