[jira] [Comment Edited] (CASSANDRA-19018) An SAI-specific mechanism to ensure consistency isn't violated for multi-column (i.e. AND) queries at CL > ONE

Caleb Rackliffe (Jira) Tue, 06 Feb 2024 08:49:09 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-19018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814134#comment-17814134
 ]


Caleb Rackliffe edited comment on CASSANDRA-19018 at 2/6/24 4:48 PM:
---------------------------------------------------------------------

To make the "short read" problem above more concrete, here's a pretty easy 
repro:

{noformat}
@Test
public void testPartialUpdatesWithShortRead()
{
    CLUSTER.schemaChange(withKeyspace("CREATE TABLE %s.partial_updates (k int 
PRIMARY KEY, a int, b int) WITH read_repair = 'NONE'"));
    CLUSTER.schemaChange(withKeyspace("CREATE INDEX ON %s.partial_updates(a) 
USING 'sai'"));
    CLUSTER.schemaChange(withKeyspace("CREATE INDEX ON %s.partial_updates(b) 
USING 'sai'"));
    SAIUtil.waitForIndexQueryable(CLUSTER, KEYSPACE);

    // insert a split row
    CLUSTER.get(1).executeInternal(withKeyspace("INSERT INTO 
%s.partial_updates(k, a) VALUES (0, 1) USING TIMESTAMP 1"));
    CLUSTER.get(2).executeInternal(withKeyspace("INSERT INTO 
%s.partial_updates(k, b) VALUES (0, 2) USING TIMESTAMP 2"));


    // insert a split row that only matches on non-strict filtering but is 
kicked out by RFP
    CLUSTER.get(1).executeInternal(withKeyspace("INSERT INTO 
%s.partial_updates(k, a, b) VALUES (1, 4, 2) USING TIMESTAMP 3"));
    CLUSTER.get(2).executeInternal(withKeyspace("INSERT INTO 
%s.partial_updates(k, a, b) VALUES (1, 1, 4) USING TIMESTAMP 4"));

    String select = withKeyspace("SELECT * FROM %s.partial_updates WHERE a = 1 
AND b = 2");
    Iterator<Object[]> initialRows = 
CLUSTER.coordinator(1).executeWithPaging(select, ConsistencyLevel.ALL, 1);
    assertRows(initialRows, row(0, 1, 2));
}
{noformat}

This fails the same way if you bump the page size to 2 and set a {{LIMIT 1}}.


was (Author: maedhroz):
To make the "short read" problem above more concrete, here's a pretty easy 
repro:

{noformat}
@Test
public void testPartialUpdatesWithShortRead()
{
    CLUSTER.schemaChange(withKeyspace("CREATE TABLE %s.partial_updates (k int 
PRIMARY KEY, a int, b int) WITH read_repair = 'NONE'"));
    CLUSTER.schemaChange(withKeyspace("CREATE INDEX ON %s.partial_updates(a) 
USING 'sai'"));
    CLUSTER.schemaChange(withKeyspace("CREATE INDEX ON %s.partial_updates(b) 
USING 'sai'"));
    SAIUtil.waitForIndexQueryable(CLUSTER, KEYSPACE);

    // insert a split row
    CLUSTER.get(1).executeInternal(withKeyspace("INSERT INTO 
%s.partial_updates(k, a) VALUES (0, 1) USING TIMESTAMP 1"));
    CLUSTER.get(2).executeInternal(withKeyspace("INSERT INTO 
%s.partial_updates(k, b) VALUES (0, 2) USING TIMESTAMP 2"));


    // insert a split row that only matches on non-strict filtering but is 
kicked out by RFP
    CLUSTER.get(1).executeInternal(withKeyspace("INSERT INTO 
%s.partial_updates(k, a, b) VALUES (1, 4, 2) USING TIMESTAMP 3"));
    CLUSTER.get(2).executeInternal(withKeyspace("INSERT INTO 
%s.partial_updates(k, a, b) VALUES (1, 1, 4) USING TIMESTAMP 4"));

    String select = withKeyspace("SELECT * FROM %s.partial_updates WHERE a = 1 
AND b = 2");
    Iterator<Object[]> initialRows = 
CLUSTER.coordinator(1).executeWithPaging(select, ConsistencyLevel.ALL, 1);
    assertRows(initialRows, row(0, 1, 2));
}
{noformat}

This fails the same way if you bump the page size to 2 and set a {{LIMIT 1}}.

[~adelapena] Something like 
https://github.com/apache/cassandra/pull/3044/commits/8d701c4f67bd7670fa1968fe4540259e4145000e#diff-d8fb4ccd5cc47d7e0043f473bee725932acd5cc9a9ac2721eb96dabcecca3515R272
 seems to fix this, but I'm not sure if it's correct overall.

> An SAI-specific mechanism to ensure consistency isn't violated for 
> multi-column (i.e. AND) queries at CL > ONE
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-19018
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19018
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Coordination, Feature/SAI
>            Reporter: Caleb Rackliffe
>            Assignee: Caleb Rackliffe
>            Priority: Normal
>             Fix For: 5.0-rc, 5.x
>
>         Attachments: ci_summary-1.html, ci_summary.html, 
> result_details.tar-1.gz, result_details.tar.gz
>
>          Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> CASSANDRA-19007 is going to be where we add a guardrail around 
> filtering/index queries that use intersection/AND over partially updated 
> non-key columns. (ex. Restricting one clustering column and one normal column 
> does not cause a consistency problem, as primary keys cannot be partially 
> updated.) This issue exists to attempt to fix this specifically for SAI in 
> 5.0.x, as Accord will (last I checked) not be available until the 5.1 release.
> The SAI-specific version of the originally reported issue is this:
> {noformat}
> try (Cluster cluster = init(Cluster.build(2).withConfig(config -> 
> config.with(GOSSIP).with(NETWORK)).start()))
>         {
>             cluster.schemaChange(withKeyspace("CREATE TABLE %s.t (k int 
> PRIMARY KEY, a int, b int)"));
>             cluster.schemaChange(withKeyspace("CREATE INDEX ON %s.t(a) USING 
> 'sai'"));
>             cluster.schemaChange(withKeyspace("CREATE INDEX ON %s.t(b) USING 
> 'sai'"));
>             // insert a split row
>             cluster.get(1).executeInternal(withKeyspace("INSERT INTO %s.t(k, 
> a) VALUES (0, 1)"));
>             cluster.get(2).executeInternal(withKeyspace("INSERT INTO %s.t(k, 
> b) VALUES (0, 2)"));
>         // Uncomment this line and test succeeds w/ partial writes 
> completed...
>         //cluster.get(1).nodetoolResult("repair", 
> KEYSPACE).asserts().success();
>             String select = withKeyspace("SELECT * FROM %s.t WHERE a = 1 AND 
> b = 2");
>             Object[][] initialRows = cluster.coordinator(1).execute(select, 
> ConsistencyLevel.ALL);
>             assertRows(initialRows, row(0, 1, 2)); // not found!!
>         }
> {noformat}
> To make a long story short, the local SAI indexes are hiding local partial 
> matches from the coordinator that would combine there to form full matches. 
> Simple non-index filtering queries also suffer from this problem, but they 
> hide the partial matches in a different way. I'll outline a possible solution 
> for this in the comments that takes advantage of replica filtering protection 
> and the repaired/unrepaired datasets...and attempts to minimize the amount of 
> extra row data sent to the coordinator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-19018) An SAI-specific mechanism to ensure consistency isn't violated for multi-column (i.e. AND) queries at CL > ONE

Reply via email to