[jira] [Commented] (HIVE-28050) Disable Incremental non aggregated materialized view rebuild in presence of delete operations

2024-02-08 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17815643#comment-17815643
 ] 

Stamatis Zampetakis commented on HIVE-28050:


Thanks for the explanation and additional examples. I definitely get a better 
understanding of the problem by reading those! I approved the PR dropping the 
incremental rebuild in this case makes sense.

> Disable Incremental non aggregated materialized view rebuild in presence of 
> delete operations
> -
>
> Key: HIVE-28050
> URL: https://issues.apache.org/jira/browse/HIVE-28050
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> To support incremental rebuild of materialized views which definition does 
> not have aggregate in presence of delete operations in any of its source 
> tables the records of the source tables need to be uniquely identified and 
> right joined with the records present in the view.
> The join keys should be the unique columns of each table in the definition 
> query but we can not determine which are those columns.
> One possibility is to project ROW_IDs of each source table in the view 
> definition but the writeId component is changing at delete.
> Another way is to project columns of primary keys or unique keys but these 
> constraints are not enforced in Hive.
> Current implementation leads to data correctness issues:
> {code}
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
> stored as orc TBLPROPERTIES ('transactional'='true');
> insert into cmv_basetable_n6 values
>  (1, 'bob', 10.30, 2),
>  (1, 'alfred', 10.30, 2),
>  (2, 'bob', 3.14, 3),
>  (2, 'bonnie', 172342.2, 3),
>  (3, 'calvin', 978.76, 3),
>  (3, 'charlie', 9.8, 1);
> create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d 
> int) stored as orc TBLPROPERTIES ('transactional'='true');
> insert into cmv_basetable_2_n3 values
>  (1, 'alfred', 10.30, 2),
>  (3, 'calvin', 978.76, 3);
> CREATE MATERIALIZED VIEW cmv_mat_view_n6
>   TBLPROPERTIES ('transactional'='true') AS
>   SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
>   FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
>   WHERE cmv_basetable_2_n3.c > 10.0;
> delete from cmv_basetable_n6 where b = 'bob';
> explain cbo
> alter materialized view cmv_mat_view_n6 rebuild;
> alter materialized view cmv_mat_view_n6 rebuild;
> select * from cmv_mat_view_n6;
> {code}
> {code}
> 3 978.76
> 3 978.76
> {code}
> but it should be
> {code}
> 1 10.30
> 3 978.76
> 3 978.76
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28050) Disable Incremental non aggregated materialized view rebuild in presence of delete operations

2024-02-07 Thread Krisztian Kasa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17815523#comment-17815523
 ] 

Krisztian Kasa commented on HIVE-28050:
---

[~zabetak]
Another example without join:
{code}
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

create table t1 (a int, b int) stored as orc TBLPROPERTIES 
('transactional'='true');

insert into t1 values
(3, 3),
(2, 1),
(2, 2),
(1, 2),
(1, 1);

CREATE MATERIALIZED VIEW mat1
  TBLPROPERTIES ('transactional'='true') AS
SELECT a
FROM t1
WHERE b < 10;

delete from t1 where b = 2;

alter materialized view mat1 rebuild;

SELECT a
FROM t1
WHERE b < 10;
{code}
{code}
3
{code}
vs
{code}
drop materialized view mat1;

SELECT a
FROM t1
WHERE b < 10;
{code}
{code}
3
2
1
{code}

> Disable Incremental non aggregated materialized view rebuild in presence of 
> delete operations
> -
>
> Key: HIVE-28050
> URL: https://issues.apache.org/jira/browse/HIVE-28050
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> To support incremental rebuild of materialized views which definition does 
> not have aggregate in presence of delete operations in any of its source 
> tables the records of the source tables need to be uniquely identified and 
> right joined with the records present in the view.
> The join keys should be the unique columns of each table in the definition 
> query but we can not determine which are those columns.
> One possibility is to project ROW_IDs of each source table in the view 
> definition but the writeId component is changing at delete.
> Another way is to project columns of primary keys or unique keys but these 
> constraints are not enforced in Hive.
> Current implementation leads to data correctness issues:
> {code}
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
> stored as orc TBLPROPERTIES ('transactional'='true');
> insert into cmv_basetable_n6 values
>  (1, 'bob', 10.30, 2),
>  (1, 'alfred', 10.30, 2),
>  (2, 'bob', 3.14, 3),
>  (2, 'bonnie', 172342.2, 3),
>  (3, 'calvin', 978.76, 3),
>  (3, 'charlie', 9.8, 1);
> create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d 
> int) stored as orc TBLPROPERTIES ('transactional'='true');
> insert into cmv_basetable_2_n3 values
>  (1, 'alfred', 10.30, 2),
>  (3, 'calvin', 978.76, 3);
> CREATE MATERIALIZED VIEW cmv_mat_view_n6
>   TBLPROPERTIES ('transactional'='true') AS
>   SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
>   FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
>   WHERE cmv_basetable_2_n3.c > 10.0;
> delete from cmv_basetable_n6 where b = 'bob';
> explain cbo
> alter materialized view cmv_mat_view_n6 rebuild;
> alter materialized view cmv_mat_view_n6 rebuild;
> select * from cmv_mat_view_n6;
> {code}
> {code}
> 3 978.76
> 3 978.76
> {code}
> but it should be
> {code}
> 1 10.30
> 3 978.76
> 3 978.76
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28050) Disable Incremental non aggregated materialized view rebuild in presence of delete operations

2024-02-07 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17815309#comment-17815309
 ] 

Stamatis Zampetakis commented on HIVE-28050:


The materialized view definition in the description contains a join of two 
tables. Is the problem also relevant when there are no joins in the MV 
definition?

> Disable Incremental non aggregated materialized view rebuild in presence of 
> delete operations
> -
>
> Key: HIVE-28050
> URL: https://issues.apache.org/jira/browse/HIVE-28050
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> To support incremental rebuild of materialized views which definition does 
> not have aggregate in presence of delete operations in any of its source 
> tables the records of the source tables need to be uniquely identified and 
> right joined with the records present in the view.
> The join keys should be the unique columns of each table in the definition 
> query but we can not determine which are those columns.
> One possibility is to project ROW_IDs of each source table in the view 
> definition but the writeId component is changing at delete.
> Another way is to project columns of primary keys or unique keys but these 
> constraints are not enforced in Hive.
> Current implementation leads to data correctness issues:
> {code}
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
> stored as orc TBLPROPERTIES ('transactional'='true');
> insert into cmv_basetable_n6 values
>  (1, 'bob', 10.30, 2),
>  (1, 'alfred', 10.30, 2),
>  (2, 'bob', 3.14, 3),
>  (2, 'bonnie', 172342.2, 3),
>  (3, 'calvin', 978.76, 3),
>  (3, 'charlie', 9.8, 1);
> create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d 
> int) stored as orc TBLPROPERTIES ('transactional'='true');
> insert into cmv_basetable_2_n3 values
>  (1, 'alfred', 10.30, 2),
>  (3, 'calvin', 978.76, 3);
> CREATE MATERIALIZED VIEW cmv_mat_view_n6
>   TBLPROPERTIES ('transactional'='true') AS
>   SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
>   FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
>   WHERE cmv_basetable_2_n3.c > 10.0;
> delete from cmv_basetable_n6 where b = 'bob';
> explain cbo
> alter materialized view cmv_mat_view_n6 rebuild;
> alter materialized view cmv_mat_view_n6 rebuild;
> select * from cmv_mat_view_n6;
> {code}
> {code}
> 3 978.76
> 3 978.76
> {code}
> but it should be
> {code}
> 1 10.30
> 3 978.76
> 3 978.76
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28050) Disable Incremental non aggregated materialized view rebuild in presence of delete operations

2024-02-07 Thread Krisztian Kasa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17815276#comment-17815276
 ] 

Krisztian Kasa commented on HIVE-28050:
---

[~zabetak]
Updated the description with an example.
The current implementation should check whether there are unique columns 
projected from each source table in the view definition query. Unfortunately we 
can not identify unique columns. If a method is found to do it the feature can 
be restored in the future.

> Disable Incremental non aggregated materialized view rebuild in presence of 
> delete operations
> -
>
> Key: HIVE-28050
> URL: https://issues.apache.org/jira/browse/HIVE-28050
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> To support incremental rebuild of materialized views which definition does 
> not have aggregate in presence of delete operations in any of its source 
> tables the records of the source tables need to be uniquely identified and 
> right joined with the records present in the view.
> The join keys should be the unique columns of each table in the definition 
> query but we can not determine which are those columns.
> One possibility is to project ROW_IDs of each source table in the view 
> definition but the writeId component is changing at delete.
> Another way is to project columns of primary keys or unique keys but these 
> constraints are not enforced in Hive.
> Current implementation leads to data correctness issues:
> {code}
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
> stored as orc TBLPROPERTIES ('transactional'='true');
> insert into cmv_basetable_n6 values
>  (1, 'bob', 10.30, 2),
>  (1, 'alfred', 10.30, 2),
>  (2, 'bob', 3.14, 3),
>  (2, 'bonnie', 172342.2, 3),
>  (3, 'calvin', 978.76, 3),
>  (3, 'charlie', 9.8, 1);
> create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d 
> int) stored as orc TBLPROPERTIES ('transactional'='true');
> insert into cmv_basetable_2_n3 values
>  (1, 'alfred', 10.30, 2),
>  (3, 'calvin', 978.76, 3);
> CREATE MATERIALIZED VIEW cmv_mat_view_n6
>   TBLPROPERTIES ('transactional'='true') AS
>   SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
>   FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
>   WHERE cmv_basetable_2_n3.c > 10.0;
> delete from cmv_basetable_n6 where b = 'bob';
> explain cbo
> alter materialized view cmv_mat_view_n6 rebuild;
> alter materialized view cmv_mat_view_n6 rebuild;
> select * from cmv_mat_view_n6;
> {code}
> {code}
> 3 978.76
> 3 978.76
> {code}
> but it should be
> {code}
> 1 10.30
> 3 978.76
> 3 978.76
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28050) Disable Incremental non aggregated materialized view rebuild in presence of delete operations

2024-02-07 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17815170#comment-17815170
 ] 

Stamatis Zampetakis commented on HIVE-28050:


Hey [~kkasa], I see that the PR removes code so I assume we are removing the 
feature introduced by HIVE-24854 is that correct? Is there a plan to restore 
the functionality in the future?

Can you briefly outline here why it is impossible to fix the existing code and 
instead you opted for removal?

> Disable Incremental non aggregated materialized view rebuild in presence of 
> delete operations
> -
>
> Key: HIVE-28050
> URL: https://issues.apache.org/jira/browse/HIVE-28050
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> To support incremental rebuild of materialized views which definition does 
> not have aggregate in presence of delete operations in any of its source 
> tables the records of the source tables need to be uniquely identified and 
> joined with the records present in the view.
> One possibility is to project ROW_IDs of each source table in the view 
> definition but the writeId component is changing at delete.
> Another way is to project columns of primary keys or unique keys but these 
> constraints are not enforced in Hive.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)