[ 
https://issues.apache.org/jira/browse/CASSANDRA-15601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-15601:
----------------------------------------
    Reviewers: Aleksey Yeschenko

> Ensure repaired data tracking reads a consistent amount of data across 
> replicas
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15601
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15601
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Repair
>            Reporter: Sam Tunnicliffe
>            Assignee: Sam Tunnicliffe
>            Priority: Normal
>             Fix For: 4.0-alpha
>
>
> When generating a digest for repaired data tracking, the amount of repaired 
> data that needs to be read may depend on the unrepaired data on the replica. 
> As this may vary between replicas, digest mismatches can be reported even 
> though the repaired data may actually be in sync.
> For example, two replicas, A & B and a table like
> {code}
> CREATE TABLE t  (pk int, ck int, PRIMARY KEY (pk, ck)) WITH CLUSTERING ORDER 
> BY ck DESC; 
> Unrepaired
> ===========
> Instance A
> (0, 5)
> Instance B
> (0, 6)
> (0, 5)
> Repaired (Both A & B)
> =========
> (0, 4)
> (0, 3)
> (0, 2)
> (0, 1)
> (0, 0)
> SELECT * FROM tbl WHERE pk = 0 LIMIT 3;
> {code}
> Instance A would read (0, 5) from the unrepaired set and (0, 4) (0, 3) from 
> the repaired set. 
>  Instance B would read (0, 6) (0, 5) from its unrepaired set and just (0, 4) 
> from repaired data.
> Unrepaired row/range/partition tombstones shadowing repaired data and present 
> on some replicas but not others will have the opposite effect, with more 
> repaired data being read in comparison.
>  To fix this, when repaired data tracking is in effect each replica needs to 
> overread during a full data read. Replicas should read up to {{LIMIT}} (i.e. 
> the {{DataLimit}} of the {{ReadCommand}}) from the repaired set, regardless 
> of how much is read from the unrepaired data. At the point where that amount 
> of repaired data has been read, replica should stop updating the digest. So 
> if unrepaired tombstones cause more than {{LIMIT}} repaired data to be read, 
> the digest is only calculated over the first {{LIMIT}}-worth of repaired data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to