[ https://issues.apache.org/jira/browse/CASSANDRA-13547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16040527#comment-16040527 ]
Krishna Dattu Koneru edited comment on CASSANDRA-13547 at 6/23/17 3:26 AM: --------------------------------------------------------------------------- {code} cqlsh> UPDATE test.table1 SET enabled = TRUE WHERE id = 1 AND name = 'One'; cqlsh> SELECT * FROM test.table1; id | name | enabled | foo ----+------+---------+----- 1 | One | True | Bar (1 rows) {code} {code:title=Problem 1 - Missing Updates|borderStyle=solid} cqlsh> SELECT * FROM test.table1_mv1; name | id | foo ------+----+----- (0 rows) {code} Logic in ViewUpdateGenerator.java does not update the view row if updated column is not denormalized in the view. in the above case,{{enabled}} is not denormalized and so update has not propagated to the view.View metadata only has pk columns + columns in select statement of create view. Now that filtering on non-pk columns is supported , we have to make sure that all non-primary key columns that have filters are denormalized.With this we can also make sure that {{ALTER TABLE}} does not drop a column that is used in view. (delete does not do this check because it does not have to. This is the reason row delete worked when {{enabled}} is set to false.) {code:title=Problem 2 - incorrect non-pk tombstones|borderStyle=solid} cqlsh> SELECT * FROM test.table1_mv2; name | id | enabled | foo ------+----+---------+------ One | 1 | True | null (1 rows) {code} This happens because of the way liveliness/deletion info is computed in the view. {{computeTimestampForEntryDeletion())}} method takes the biggest timestamp of all the columns (including non-pk) in the row and uses it in Deletion info when deleting. But,when inserting/updating, {{computeLivenessInfoForEntry()}} uses the biggest timestamp of the primary keys for liveliness info. This causes non-pk columns to be treated as deleted because view tombstones have higher timestamp than live cell from base row. I have uploaded a [patch for 3.11 | https://github.com/apache/cassandra/compare/cassandra-3.11...krishna-koneru:cassandra-3.11-13547]branch which fixes above two issues. I will make patches for other branches if this patch looks okay. Comments appreciated ! was (Author: krishna.koneru): {code} cqlsh> UPDATE test.table1 SET enabled = TRUE WHERE id = 1 AND name = 'One'; cqlsh> SELECT * FROM test.table1; id | name | enabled | foo ----+------+---------+----- 1 | One | True | Bar (1 rows) {code} {code:title=Problem 1 - Missing Updates|borderStyle=solid} cqlsh> SELECT * FROM test.table1_mv1; name | id | foo ------+----+----- (0 rows) {code} Logic in ViewUpdateGenerator.java does not update the view row if updated column is not denormalized in the view. in the above case,{{enabled}} is not denormalized and so update has not propagated to the view.View metadata only has pk columns + columns in select statement of create view. Now that filtering on non-pk columns is supported , we have to make sure that all non-primary key columns that have filters are denormalized. (delete does not do this check because it does not have to. This is the reason row delete worked when {{enabled}} is set to false.) {code:title=Problem 2 - incorrect non-pk tombstones|borderStyle=solid} cqlsh> SELECT * FROM test.table1_mv2; name | id | enabled | foo ------+----+---------+------ One | 1 | True | null (1 rows) {code} This happens because of the way liveliness/deletion info is computed in the view. {{computeTimestampForEntryDeletion())}} method takes the biggest timestamp of all the columns (including non-pk) in the row and uses it in Deletion info when deleting. But,when inserting/updating, {{computeLivenessInfoForEntry()}} uses the biggest timestamp of the primary keys for liveliness info. This causes non-pk columns to be treated as deleted because view tombstones have higher timestamp than live cell from base row. I have uploaded a [patch for 3.11 | https://github.com/apache/cassandra/compare/cassandra-3.11...krishna-koneru:cassandra-3.11-13547]branch which fixes above two issues. I will make patches for other branches if this patch looks okay. Comments appreciated ! > Filtered materialized views missing data > ---------------------------------------- > > Key: CASSANDRA-13547 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13547 > Project: Cassandra > Issue Type: Bug > Components: Materialized Views > Environment: Official Cassandra 3.10 Docker image (ID 154b919bf8ce). > Reporter: Craig Nicholson > Assignee: Krishna Dattu Koneru > Priority: Blocker > Labels: materializedviews > Fix For: 3.11.x > > > When creating a materialized view against a base table the materialized view > does not always reflect the correct data. > Using the following test schema: > {code:title=Schema|language=sql} > DROP KEYSPACE IF EXISTS test; > CREATE KEYSPACE test > WITH REPLICATION = { > 'class' : 'SimpleStrategy', > 'replication_factor' : 1 > }; > CREATE TABLE test.table1 ( > id int, > name text, > enabled boolean, > foo text, > PRIMARY KEY (id, name)); > CREATE MATERIALIZED VIEW test.table1_mv1 AS SELECT id, name, foo > FROM test.table1 > WHERE id IS NOT NULL > AND name IS NOT NULL > AND enabled = TRUE > PRIMARY KEY ((name), id); > CREATE MATERIALIZED VIEW test.table1_mv2 AS SELECT id, name, foo, enabled > FROM test.table1 > WHERE id IS NOT NULL > AND name IS NOT NULL > AND enabled = TRUE > PRIMARY KEY ((name), id); > {code} > When I insert a row into the base table the materialized views are updated > appropriately. (+) > {code:title=Insert row|language=sql} > cqlsh> INSERT INTO test.table1 (id, name, enabled, foo) VALUES (1, 'One', > TRUE, 'Bar'); > cqlsh> SELECT * FROM test.table1; > id | name | enabled | foo > ----+------+---------+----- > 1 | One | True | Bar > (1 rows) > cqlsh> SELECT * FROM test.table1_mv1; > name | id | foo > ------+----+----- > One | 1 | Bar > (1 rows) > cqlsh> SELECT * FROM test.table1_mv2; > name | id | enabled | foo > ------+----+---------+----- > One | 1 | True | Bar > (1 rows) > {code} > Updating the record in the base table and setting enabled to FALSE will > filter the record from both materialized views. (+) > {code:title=Disable the row|language=sql} > cqlsh> UPDATE test.table1 SET enabled = FALSE WHERE id = 1 AND name = 'One'; > cqlsh> SELECT * FROM test.table1; > id | name | enabled | foo > ----+------+---------+----- > 1 | One | False | Bar > (1 rows) > cqlsh> SELECT * FROM test.table1_mv1; > name | id | foo > ------+----+----- > (0 rows) > cqlsh> SELECT * FROM test.table1_mv2; > name | id | enabled | foo > ------+----+---------+----- > (0 rows) > {code} > However a further update to the base table setting enabled to TRUE should > include the record in both materialzed views, however only one view > (table1_mv2) gets updated. (-) > It appears that only the view (table1_mv2) that returns the filtered column > (enabled) is updated. (-) > Additionally columns that are not part of the partiion or clustering key are > not updated. You can see that the foo column has a null value in table1_mv2. > (-) > {code:title=Enable the row|language=sql} > cqlsh> UPDATE test.table1 SET enabled = TRUE WHERE id = 1 AND name = 'One'; > cqlsh> SELECT * FROM test.table1; > id | name | enabled | foo > ----+------+---------+----- > 1 | One | True | Bar > (1 rows) > cqlsh> SELECT * FROM test.table1_mv1; > name | id | foo > ------+----+----- > (0 rows) > cqlsh> SELECT * FROM test.table1_mv2; > name | id | enabled | foo > ------+----+---------+------ > One | 1 | True | null > (1 rows) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org