[ 
https://issues.apache.org/jira/browse/CASSANDRA-15601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-15601:
----------------------------------------
    Test and Documentation Plan: Added new in-jvm-dtests and unit tests
                         Status: Patch Available  (was: In Progress)

A replica will attempt to read a fixed amount of repaired data, based on the 
query's DataLimit, while generating the repaired digest. If less than that 
amount is read in the course of satisfying the query limits, we overread. If 
more that the amount is read during the main read phase (because of unrepaired 
deletions shadowing repaired data), the digest calculation stops when its own 
limit is reached.

[branch|https://github.com/apache/cassandra/compare/trunk...beobal:15601-trunk],
 [tests|https://circleci.com/workflow-run/96751ea6-7d08-40dd-839c-e409ffebdb81]

> Ensure repaired data tracking reads a consistent amount of data across 
> replicas
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15601
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15601
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Repair
>            Reporter: Sam Tunnicliffe
>            Assignee: Sam Tunnicliffe
>            Priority: Normal
>             Fix For: 4.0-alpha
>
>
> When generating a digest for repaired data tracking, the amount of repaired 
> data that needs to be read may depend on the unrepaired data on the replica. 
> As this may vary between replicas, digest mismatches can be reported even 
> though the repaired data may actually be in sync.
> For example, two replicas, A & B and a table like
> {code}
> CREATE TABLE t  (pk int, ck int, PRIMARY KEY (pk, ck)) WITH CLUSTERING ORDER 
> BY ck DESC; 
> Unrepaired
> ===========
> Instance A
> (0, 5)
> Instance B
> (0, 6)
> (0, 5)
> Repaired (Both A & B)
> =========
> (0, 4)
> (0, 3)
> (0, 2)
> (0, 1)
> (0, 0)
> SELECT * FROM tbl WHERE pk = 0 LIMIT 3;
> {code}
> Instance A would read (0, 5) from the unrepaired set and (0, 4) (0, 3) from 
> the repaired set. 
>  Instance B would read (0, 6) (0, 5) from its unrepaired set and just (0, 4) 
> from repaired data.
> Unrepaired row/range/partition tombstones shadowing repaired data and present 
> on some replicas but not others will have the opposite effect, with more 
> repaired data being read in comparison.
>  To fix this, when repaired data tracking is in effect each replica needs to 
> overread during a full data read. Replicas should read up to {{LIMIT}} (i.e. 
> the {{DataLimit}} of the {{ReadCommand}}) from the repaired set, regardless 
> of how much is read from the unrepaired data. At the point where that amount 
> of repaired data has been read, replica should stop updating the digest. So 
> if unrepaired tombstones cause more than {{LIMIT}} repaired data to be read, 
> the digest is only calculated over the first {{LIMIT}}-worth of repaired data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to