[jira] [Commented] (CASSANDRA-8505) Invalid results are returned while secondary index are being build

Benjamin Lerer (JIRA) Thu, 29 Oct 2015 09:56:47 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14980790#comment-14980790
 ]


Benjamin Lerer commented on CASSANDRA-8505:
-------------------------------------------

Secondary index and their build/not build status are node-local. By consequence 
it is not possible to know on a coordinator node if the index is fully build. 
It can be built on the coordinator but still building on other nodes. Further 
more an index rebuild can be triggered at any time.
Therefore the only moment where we can check if the index is ready is at query 
execution time.

The first problem of rejecting index queries at execution time is that some 
{{ALLOW FILTERING}} queries that could have been processed without an index 
will be rejected. As the {{ALLOW FILTERING}} information is not passed with the 
command we have no way to know if the query should be executed or not using 
filtering. On the other hand, currently, if an index exists but is not built 
Cassandra might silently return the wrong results. By consequence rejecting the 
query is still an improvement, in my opinion, and we can create a new ticket to 
improve the situation in the future.

The second problem if about communicating back the error to the coordinator 
node. CASSANDRA-7886 added a mechanism for that but it is not perfect. The user 
will receive a {{ReadFailureException}} but would have to look within the logs 
to find the root cause of the problem. Ideally this mechanism should be 
improved to be able to pass the error message to the {{ReadFailureException}}. 
The other problem of the mechanism is that it is only available since {{2.2}}, 
so I could not create a patch for {{2.1}}.

The patch for {{2.2}} is 
[here|https://github.com/apache/cassandra/compare/trunk...blerer:8505-2.2] and 
the patch for {{3.0}} is 
[here|https://github.com/apache/cassandra/compare/trunk...blerer:8505-3.0]

Both patches keep the index state in memory and throw an Exception if the index 
is not ready when a request arrive.
The paches also shortcut the building of a index if the base table is empty. 
This optimisation prevent a lot of the existing index tests to fail.

*The unit test results for {{2.2}} are 
[here|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-8505-2.2-testall/3/]
   
*The dtest results for {{2.2}} are 
[here|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-8505-2.2-dtest/3/]
   
*The unit test results for {{3.0}} are 
[here|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-8505-3.0-testall/1/]
   
*The dtest results for {{3.0}} are 
[here|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-8505-3.0-dtest/1/]
   

The 
{{secondary_indexes_test.TestSecondaryIndexesOnCollections.test_map_indexes}} 
dtest fails in {{2.2}} because it is not waiting for the index to be built 
before querying the index. I will provide a patch for the DTest. 

   

> Invalid results are returned while secondary index are being build
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-8505
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8505
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Benjamin Lerer
>            Assignee: Benjamin Lerer
>             Fix For: 2.1.x
>
>
> If you request an index creation and then execute a query that use the index 
> the results returned might be invalid until the index is fully build. This is 
> caused by the fact that the table column will be marked as indexed before the 
> index is ready.
> The following unit tests can be use to reproduce the problem:
> {code}
>     @Test
>     public void testIndexCreatedAfterInsert() throws Throwable
>     {
>         createTable("CREATE TABLE %s (a int, b int, c int, primary key((a, 
> b)))");
>         execute("INSERT INTO %s (a, b, c) VALUES (0, 0, 0);");
>         execute("INSERT INTO %s (a, b, c) VALUES (0, 1, 1);");
>         execute("INSERT INTO %s (a, b, c) VALUES (0, 2, 2);");
>         execute("INSERT INTO %s (a, b, c) VALUES (1, 0, 3);");
>         execute("INSERT INTO %s (a, b, c) VALUES (1, 1, 4);");
>         
>         createIndex("CREATE INDEX ON %s(b)");
>         
>         assertRows(execute("SELECT * FROM %s WHERE b = ?;", 1),
>                    row(0, 1, 1),
>                    row(1, 1, 4));
>     }
>     
>     @Test
>     public void testIndexCreatedBeforeInsert() throws Throwable
>     {
>         createTable("CREATE TABLE %s (a int, b int, c int, primary key((a, 
> b)))");
>         createIndex("CREATE INDEX ON %s(b)");
>         
>         execute("INSERT INTO %s (a, b, c) VALUES (0, 0, 0);");
>         execute("INSERT INTO %s (a, b, c) VALUES (0, 1, 1);");
>         execute("INSERT INTO %s (a, b, c) VALUES (0, 2, 2);");
>         execute("INSERT INTO %s (a, b, c) VALUES (1, 0, 3);");
>         execute("INSERT INTO %s (a, b, c) VALUES (1, 1, 4);");
>         assertRows(execute("SELECT * FROM %s WHERE b = ?;", 1),
>                    row(0, 1, 1),
>                    row(1, 1, 4));
>     }
> {code}
> The first test will fail while the second will work. 
> In my opinion the first test should reject the request as invalid (as if the 
> index was not existing) until the index is fully build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8505) Invalid results are returned while secondary index are being build

Reply via email to