[jira] [Commented] (HIVE-27948) Wrong results when using materialized views with non-deterministic/dynamic functions

2023-12-15 Thread Krisztian Kasa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17797157#comment-17797157
 ] 

Krisztian Kasa commented on HIVE-27948:
---

The general rule is if there is any error message during materialized view 
definition validation the MV can not be used by Calcite to perform query 
rewrites but it can be used for text/AST based rewrite algorithm so MV creation 
succeeds with a warning message.

 As the description shows this logic can not be applied to any type of MVs, so 
I submitted a patch to filter out MVs which doesn't support any type of rewrite 
algorithm.

Thanks for reporting this bug.

> Wrong results when using materialized views with non-deterministic/dynamic 
> functions
> 
>
> Key: HIVE-27948
> URL: https://issues.apache.org/jira/browse/HIVE-27948
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Krisztian Kasa
>Priority: Critical
>  Labels: pull-request-available
> Attachments: materialized_view_unix_timestamp.q
>
>
> There are certain SQL functions that return different results across 
> different executions. Usually we refer to these functions as 
> non-deterministic or dynamic. Some examples are: UNIX_TIMESTAMP(), 
> CURRENT_TIMESTAMP, CURRENT_DATE, etc.
> When a materialized view definition contains such functions the queries that 
> are using this view may return wrong results.
> Consider the following scenario where we populate the employee table with 
> timestamps representing the future. For making this easily reproable in 
> self-contained test the timestamps are only a few seconds apart.
> {code:sql}
> CREATE TABLE EMPS (ENAME STRING, BIRTH_EPOCH_SECS INT) STORED AS ORC 
> TBLPROPERTIES ('transactional'='true');
> INSERT INTO EMPS
> VALUES ('Victor', UNIX_TIMESTAMP()),
>('Alex', UNIX_TIMESTAMP() + 2),
>('Bob', UNIX_TIMESTAMP() + 5),
>('Alice', UNIX_TIMESTAMP() + 10);
> CREATE MATERIALIZED VIEW v_emp AS SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS 
> <= UNIX_TIMESTAMP();
> {code}
> When the materialized view is created it is populated with only the rows that 
> match the timestamp at the given time.
> To demonstrate the problem run the following queries with view based 
> rewritting disabled and enabled.
> {code:sql}
> set hive.materializedview.rewriting.sql=false;
> SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
> {code}
> {noformat}
> Victor1702302786
> Alex  1702302788
> Bob   1702302791
> {noformat}
> {code:sql}
> set hive.materializedview.rewriting.sql=true;
> SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
> {code}
> {noformat}
> Victor1702302786
> Alex  1702302788
> {noformat}
> Naturally the second query should return more rows than the first one since 
> UNIX_TIMESTAMP is constantly growing. However, when view based rewritting is 
> in use the second query will use the results from the materialized view which 
> are by now obsolete (missing Bob entry).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27948) Wrong results when using materialized views with non-deterministic/dynamic functions

2023-12-11 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17795424#comment-17795424
 ] 

Stamatis Zampetakis commented on HIVE-27948:


The problem is reproducible in [current 
master|https://github.com/apache/hive/commit/0e9282b3ed7c59dee3e3a950bd3cd390cca2652b]
 using [^materialized_view_unix_timestamp.q]:
{noformat}
 mvn test -Dtest=TestMiniLlapLocalCliDriver 
-Dqfile=materialized_view_unix_timestamp.q -Dtest.output.overwrite
{noformat}

> Wrong results when using materialized views with non-deterministic/dynamic 
> functions
> 
>
> Key: HIVE-27948
> URL: https://issues.apache.org/jira/browse/HIVE-27948
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Priority: Major
> Attachments: materialized_view_unix_timestamp.q
>
>
> There are certain SQL functions that return different results across 
> different executions. Usually we refer to these functions as 
> non-deterministic or dynamic. Some examples are: UNIX_TIMESTAMP(), 
> CURRENT_TIMESTAMP, CURRENT_DATE, etc.
> When a materialized view definition contains such functions the queries that 
> are using this view may return wrong results.
> Consider the following scenario where we populate the employee table with 
> timestamps representing the future. For making this easily reproable in 
> self-contained test the timestamps are only a few seconds apart.
> {code:sql}
> CREATE TABLE EMPS (ENAME STRING, BIRTH_EPOCH_SECS INT) STORED AS ORC 
> TBLPROPERTIES ('transactional'='true');
> INSERT INTO EMPS
> VALUES ('Victor', UNIX_TIMESTAMP()),
>('Alex', UNIX_TIMESTAMP() + 2),
>('Bob', UNIX_TIMESTAMP() + 5),
>('Alice', UNIX_TIMESTAMP() + 10);
> CREATE MATERIALIZED VIEW v_emp AS SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS 
> <= UNIX_TIMESTAMP();
> {code}
> When the materialized view is created it is populated with only the rows that 
> match the timestamp at the given time.
> To demonstrate the problem run the following queries with view based 
> rewritting disabled and enabled.
> {code:sql}
> set hive.materializedview.rewriting.sql=false;
> SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
> {code}
> {noformat}
> Victor1702302786
> Alex  1702302788
> Bob   1702302791
> {noformat}
> {code:sql}
> set hive.materializedview.rewriting.sql=true;
> SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
> {code}
> {noformat}
> Victor1702302786
> Alex  1702302788
> {noformat}
> Naturally the second query should return more rows than the first one since 
> UNIX_TIMESTAMP is constantly growing. However, when view based rewritting is 
> in use the second query will use the results from the materialized view which 
> are by now obsolete (missing Bob entry).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27948) Wrong results when using materialized views with non-deterministic/dynamic functions

2023-12-11 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17795421#comment-17795421
 ] 

Stamatis Zampetakis commented on HIVE-27948:


It seems that before HIVE-24274, it was not possible to create materialized 
views with non-deterministic/dynamic functions. I am not sure why this 
requirement was relaxed with the feature added by HIVE-24274. [~kkasa] any 
ideas why we changed the SemanticException to a plain console/log message in 
[SemanticAnalyzer|https://github.com/apache/hive/blob/0e9282b3ed7c59dee3e3a950bd3cd390cca2652b/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L14668]?

> Wrong results when using materialized views with non-deterministic/dynamic 
> functions
> 
>
> Key: HIVE-27948
> URL: https://issues.apache.org/jira/browse/HIVE-27948
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Priority: Major
>
> There are certain SQL functions that return different results across 
> different executions. Usually we refer to these functions as 
> non-deterministic or dynamic. Some examples are: UNIX_TIMESTAMP(), 
> CURRENT_TIMESTAMP, CURRENT_DATE, etc.
> When a materialized view definition contains such functions the queries that 
> are using this view may return wrong results.
> Consider the following scenario where we populate the employee table with 
> timestamps representing the future. For making this easily reproable in 
> self-contained test the timestamps are only a few seconds apart.
> {code:sql}
> CREATE TABLE EMPS (ENAME STRING, BIRTH_EPOCH_SECS INT) STORED AS ORC 
> TBLPROPERTIES ('transactional'='true');
> INSERT INTO EMPS
> VALUES ('Victor', UNIX_TIMESTAMP()),
>('Alex', UNIX_TIMESTAMP() + 2),
>('Bob', UNIX_TIMESTAMP() + 5),
>('Alice', UNIX_TIMESTAMP() + 10);
> CREATE MATERIALIZED VIEW v_emp AS SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS 
> <= UNIX_TIMESTAMP();
> {code}
> When the materialized view is created it is populated with only the rows that 
> match the timestamp at the given time.
> To demonstrate the problem run the following queries with view based 
> rewritting disabled and enabled.
> {code:sql}
> set hive.materializedview.rewriting.sql=false;
> SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
> {code}
> {noformat}
> Victor1702302786
> Alex  1702302788
> Bob   1702302791
> {noformat}
> {code:sql}
> set hive.materializedview.rewriting.sql=true;
> SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
> {code}
> {noformat}
> Victor1702302786
> Alex  1702302788
> {noformat}
> Naturally the second query should return more rows than the first one since 
> UNIX_TIMESTAMP is constantly growing. However, when view based rewritting is 
> in use the second query will use the results from the materialized view which 
> are by now obsolete (missing Bob entry).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)