[ https://issues.apache.org/jira/browse/CASSANDRA-15601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sam Tunnicliffe updated CASSANDRA-15601: ---------------------------------------- Reviewers: Aleksey Yeschenko > Ensure repaired data tracking reads a consistent amount of data across > replicas > ------------------------------------------------------------------------------- > > Key: CASSANDRA-15601 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15601 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair > Reporter: Sam Tunnicliffe > Assignee: Sam Tunnicliffe > Priority: Normal > Fix For: 4.0-alpha > > > When generating a digest for repaired data tracking, the amount of repaired > data that needs to be read may depend on the unrepaired data on the replica. > As this may vary between replicas, digest mismatches can be reported even > though the repaired data may actually be in sync. > For example, two replicas, A & B and a table like > {code} > CREATE TABLE t (pk int, ck int, PRIMARY KEY (pk, ck)) WITH CLUSTERING ORDER > BY ck DESC; > Unrepaired > =========== > Instance A > (0, 5) > Instance B > (0, 6) > (0, 5) > Repaired (Both A & B) > ========= > (0, 4) > (0, 3) > (0, 2) > (0, 1) > (0, 0) > SELECT * FROM tbl WHERE pk = 0 LIMIT 3; > {code} > Instance A would read (0, 5) from the unrepaired set and (0, 4) (0, 3) from > the repaired set. > Instance B would read (0, 6) (0, 5) from its unrepaired set and just (0, 4) > from repaired data. > Unrepaired row/range/partition tombstones shadowing repaired data and present > on some replicas but not others will have the opposite effect, with more > repaired data being read in comparison. > To fix this, when repaired data tracking is in effect each replica needs to > overread during a full data read. Replicas should read up to {{LIMIT}} (i.e. > the {{DataLimit}} of the {{ReadCommand}}) from the repaired set, regardless > of how much is read from the unrepaired data. At the point where that amount > of repaired data has been read, replica should stop updating the digest. So > if unrepaired tombstones cause more than {{LIMIT}} repaired data to be read, > the digest is only calculated over the first {{LIMIT}}-worth of repaired data. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org