[ 
https://issues.apache.org/jira/browse/CASSANDRA-13547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16040527#comment-16040527
 ] 

Krishna Dattu Koneru edited comment on CASSANDRA-13547 at 6/23/17 3:26 AM:
---------------------------------------------------------------------------

{code}
cqlsh> UPDATE test.table1 SET enabled = TRUE WHERE id = 1 AND name = 'One';

cqlsh> SELECT * FROM test.table1;

 id | name | enabled | foo
----+------+---------+-----
  1 |  One |    True | Bar

(1 rows)
{code}
{code:title=Problem 1 - Missing Updates|borderStyle=solid}

cqlsh> SELECT * FROM test.table1_mv1;

 name | id | foo
------+----+-----

(0 rows)
{code}

Logic in ViewUpdateGenerator.java does not update the view row if updated 
column is not denormalized in the view.
in the above case,{{enabled}} is not denormalized and so update has not 
propagated to the view.View metadata only has pk columns + columns in select 
statement of create view.
Now that filtering on non-pk columns is supported , we have to make sure that 
all non-primary key columns that have filters are denormalized.With this we can 
also make sure that {{ALTER TABLE}} does not drop a column that is used in view.
(delete does not do this check because it does not have to. This is the reason 
row delete worked when {{enabled}} is set to false.)

{code:title=Problem 2 - incorrect non-pk tombstones|borderStyle=solid}
cqlsh> SELECT * FROM test.table1_mv2;

 name | id | enabled | foo
------+----+---------+------
  One |  1 |    True | null

(1 rows)
{code}

This happens because of the way liveliness/deletion info  is computed in the 
view. 
{{computeTimestampForEntryDeletion())}} method takes the biggest timestamp of 
all the columns (including non-pk) in the row and uses it in Deletion info when 
deleting.

But,when inserting/updating, {{computeLivenessInfoForEntry()}} uses the biggest 
timestamp of the primary keys for liveliness info.
This causes non-pk columns to be treated as deleted because view tombstones 
have higher timestamp than live cell from base row.


I have uploaded a [patch for 3.11 | 
https://github.com/apache/cassandra/compare/cassandra-3.11...krishna-koneru:cassandra-3.11-13547]branch
 which fixes above two issues.

I will make patches for other branches if this patch looks okay.
Comments appreciated !


was (Author: krishna.koneru):

{code}
cqlsh> UPDATE test.table1 SET enabled = TRUE WHERE id = 1 AND name = 'One';

cqlsh> SELECT * FROM test.table1;

 id | name | enabled | foo
----+------+---------+-----
  1 |  One |    True | Bar

(1 rows)
{code}
{code:title=Problem 1 - Missing Updates|borderStyle=solid}

cqlsh> SELECT * FROM test.table1_mv1;

 name | id | foo
------+----+-----

(0 rows)
{code}

Logic in ViewUpdateGenerator.java does not update the view row if updated 
column is not denormalized in the view.
in the above case,{{enabled}} is not denormalized and so update has not 
propagated to the view.View metadata only has pk columns + columns in select 
statement of create view.
Now that filtering on non-pk columns is supported , we have to make sure that 
all non-primary key columns that have filters are denormalized.
(delete does not do this check because it does not have to. This is the reason 
row delete worked when {{enabled}} is set to false.)

{code:title=Problem 2 - incorrect non-pk tombstones|borderStyle=solid}
cqlsh> SELECT * FROM test.table1_mv2;

 name | id | enabled | foo
------+----+---------+------
  One |  1 |    True | null

(1 rows)
{code}

This happens because of the way liveliness/deletion info  is computed in the 
view. 
{{computeTimestampForEntryDeletion())}} method takes the biggest timestamp of 
all the columns (including non-pk) in the row and uses it in Deletion info when 
deleting.

But,when inserting/updating, {{computeLivenessInfoForEntry()}} uses the biggest 
timestamp of the primary keys for liveliness info.
This causes non-pk columns to be treated as deleted because view tombstones 
have higher timestamp than live cell from base row.


I have uploaded a [patch for 3.11 | 
https://github.com/apache/cassandra/compare/cassandra-3.11...krishna-koneru:cassandra-3.11-13547]branch
 which fixes above two issues.

I will make patches for other branches if this patch looks okay.
Comments appreciated !

> Filtered materialized views missing data
> ----------------------------------------
>
>                 Key: CASSANDRA-13547
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13547
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Materialized Views
>         Environment: Official Cassandra 3.10 Docker image (ID 154b919bf8ce).
>            Reporter: Craig Nicholson
>            Assignee: Krishna Dattu Koneru
>            Priority: Blocker
>              Labels: materializedviews
>             Fix For: 3.11.x
>
>
> When creating a materialized view against a base table the materialized view 
> does not always reflect the correct data.
> Using the following test schema:
> {code:title=Schema|language=sql}
> DROP KEYSPACE IF EXISTS test;
> CREATE KEYSPACE test
>   WITH REPLICATION = { 
>    'class' : 'SimpleStrategy', 
>    'replication_factor' : 1 
>   };
> CREATE TABLE test.table1 (
>                 id int,
>                 name text,
>                 enabled boolean,
>                 foo text,
>                 PRIMARY KEY (id, name));
> CREATE MATERIALIZED VIEW test.table1_mv1 AS SELECT id, name, foo
>                 FROM test.table1
>                 WHERE id IS NOT NULL 
>                 AND name IS NOT NULL 
>                 AND enabled = TRUE
>                 PRIMARY KEY ((name), id);
> CREATE MATERIALIZED VIEW test.table1_mv2 AS SELECT id, name, foo, enabled
>                 FROM test.table1
>                 WHERE id IS NOT NULL 
>                 AND name IS NOT NULL 
>                 AND enabled = TRUE
>                 PRIMARY KEY ((name), id);
> {code}
> When I insert a row into the base table the materialized views are updated 
> appropriately. (+)
> {code:title=Insert row|language=sql}
> cqlsh> INSERT INTO test.table1 (id, name, enabled, foo) VALUES (1, 'One', 
> TRUE, 'Bar');
> cqlsh> SELECT * FROM test.table1;
>  id | name | enabled | foo
> ----+------+---------+-----
>   1 |  One |    True | Bar
> (1 rows)
> cqlsh> SELECT * FROM test.table1_mv1;
>  name | id | foo
> ------+----+-----
>   One |  1 | Bar
> (1 rows)
> cqlsh> SELECT * FROM test.table1_mv2;
>  name | id | enabled | foo
> ------+----+---------+-----
>   One |  1 |    True | Bar
> (1 rows)
> {code}
> Updating the record in the base table and setting enabled to FALSE will 
> filter the record from both materialized views. (+)
> {code:title=Disable the row|language=sql}
> cqlsh> UPDATE test.table1 SET enabled = FALSE WHERE id = 1 AND name = 'One';
> cqlsh> SELECT * FROM test.table1;
>  id | name | enabled | foo
> ----+------+---------+-----
>   1 |  One |   False | Bar
> (1 rows)
> cqlsh> SELECT * FROM test.table1_mv1;
>  name | id | foo
> ------+----+-----
> (0 rows)
> cqlsh> SELECT * FROM test.table1_mv2;
>  name | id | enabled | foo
> ------+----+---------+-----
> (0 rows)
> {code}
> However a further update to the base table setting enabled to TRUE should 
> include the record in both materialzed views, however only one view 
> (table1_mv2) gets updated. (-)
> It appears that only the view (table1_mv2) that returns the filtered column 
> (enabled) is updated. (-)
> Additionally columns that are not part of the partiion or clustering key are 
> not updated. You can see that the foo column has a null value in table1_mv2. 
> (-)
> {code:title=Enable the row|language=sql}
> cqlsh> UPDATE test.table1 SET enabled = TRUE WHERE id = 1 AND name = 'One';
> cqlsh> SELECT * FROM test.table1;
>  id | name | enabled | foo
> ----+------+---------+-----
>   1 |  One |    True | Bar
> (1 rows)
> cqlsh> SELECT * FROM test.table1_mv1;
>  name | id | foo
> ------+----+-----
> (0 rows)
> cqlsh> SELECT * FROM test.table1_mv2;
>  name | id | enabled | foo
> ------+----+---------+------
>   One |  1 |    True | null
> (1 rows)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to