[jira] [Updated] (CASSANDRA-16710) Read repairs can break row isolation

Brandon Williams (Jira) Fri, 04 Jun 2021 08:10:08 -0700


     [ 
https://issues.apache.org/jira/browse/CASSANDRA-16710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Brandon Williams updated CASSANDRA-16710:
-----------------------------------------
    Severity: Critical  (was: Normal)

Raising priority on this as we should either decide to restore the older 
behavior or document the new one.

> Read repairs can break row isolation
> ------------------------------------
>
>                 Key: CASSANDRA-16710
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16710
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Coordination
>            Reporter: Samuel Klock
>            Priority: Urgent
>             Fix For: 3.0.x, 3.11.x, 4.0.x
>
>
> This issue essentially revives CASSANDRA-8287, was resolved "Later" in 2015.  
> While it was possible in principle at that time for read repair to break row 
> isolation, that couldn't happen in practice because Cassandra always pulled 
> all of the columns for each row in response to regular reads, so read repairs 
> would never partially resolve a row.  CASSANDRA-10657 modified Cassandra to 
> only pull the requested columns for reads, which enabled read repair to break 
> row isolation in practice.
> Note also that this is distinct from CASSANDRA-14593 (for read repair 
> breaking partition-level isolation): that issue (as we understand it) 
> captures isolation being broken across multiple rows within an update to a 
> partition, while this issue covers broken isolation across multiple columns 
> within an update to a single row.
> This behavior is easy to reproduce under affected versions using {{ccm}}:
> {code:bash}
> ccm create -n 3 -v $VERSION rrtest
> ccm updateconf -y 'hinted_handoff_enabled: false
> max_hint_window_in_ms: 0'
> ccm start
> (cat <<EOF
> CREATE KEYSPACE IF NOT EXISTS rrtest WITH REPLICATION = {'class': 
> 'SimpleStrategy', 'replication_factor': '3'};
> CREATE TABLE IF NOT EXISTS rrtest.kv (key TEXT PRIMARY KEY, col1 TEXT, col2 
> INT);
> CONSISTENCY ALL;
> INSERT INTO rrtest.kv (key, col1, col2) VALUES ('key', 'a', 1);
> EOF
> ) | ccm node1 cqlsh
> ccm node3 stop
> (cat <<EOF
> CONSISTENCY QUORUM;
> INSERT INTO rrtest.kv (key, col1, col2) VALUES ('key', 'b', 2);
> EOF
> ) | ccm node1 cqlsh
> ccm node3 start
> ccm node2 stop
> (cat <<EOF
> CONSISTENCY QUORUM;
> SELECT key, col1 FROM rrtest.kv WHERE key = 'key';
> EOF
> ) | ccm node1 cqlsh
> ccm node1 stop
> (cat <<EOF
> CONSISTENCY ONE;
> SELECT * FROM rrtest.kv WHERE key = 'key';
> EOF
> ) | ccm node3 cqlsh
> {code}
> This snippet creates a three-node cluster with an RF=3 keyspace containing a 
> table with three columns: a partition key and two value columns.  (Hinted 
> handoff can mask the problem if the repro steps are executed in quick 
> succession, so the snippet disables it for this exercise.)  Then:
> # It adds a full row to the table with values ('a', 1), ensuring it's 
> replicated to all three nodes.
> # It stops a node, then replaces the initial row with new values ('b', 2) in 
> a single update, ensuring that it's replicated to both available nodes.
> # It starts the node that was down, then stops one of the other nodes and 
> performs a quorum read just for the letter column.  The read observes 'b'.
> # Finally, it stops the other node that observed the second update, then 
> performs a CL=ONE read of the entire row on the node that was down for that 
> update.
> If read repair respects row isolation, then the final read should observe 
> ('b', 2).  (('a', 1) is also acceptable if we're willing to sacrifice 
> monotonicity.)
> * With {{VERSION=3.0.24}}, the final read observes ('b', 2), as expected.
> * With {{VERSION=3.11.10}} and {{VERSION=4.0-rc1}}, the final read instead 
> observes ('b', 1).  The same is true for 3.0.24 if CASSANDRA-10657 is 
> backported to it.
> The scenario above is somewhat contrived in that it supposes multiple read 
> workflows consulting different sets of columns with different consistency 
> levels.  Under 3.11, asynchronous read repair makes this scenario possible 
> even using just CL=ONE -- and with speculative retry, even if 
> {{read_repair_chance}}/{{dclocal_read_repair_chance}} are both zeroed.  We 
> haven't looked closely at 4.0, but even though (as we understand it) it lacks 
> async read repair, scenarios like CL=ONE writes or failed, 
> partially-committed CL>ONE writes create some surface area for this behavior, 
> even without mixed consistency/column reads.
> Given the importance of paging to reads from wide partitions, it makes some 
> intuitive sense that applications shouldn't rely on isolation at the 
> partition level.  Being unable to rely on row isolation is much more 
> surprising, especially given that (modulo the possibility of other atomicity 
> bugs) Cassandra did preserve it before 3.11.  Cassandra should either find a 
> solution for this in code (e.g., when performing a read repair, always 
> operate over all of the columns for the table, regardless of what was 
> originally requested for a read) or at least update its documentation to 
> include appropriate caveats about update isolation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16710) Read repairs can break row isolation

Reply via email to