[jira] [Updated] (HIVE-27948) Wrong results when using materialized views with non-deterministic/dynamic functions

2024-01-08 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-27948:
--
Fix Version/s: 4.1.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Merged to master. Thanks [~zabetak] for reporting, reproducing the issue and 
review the PR.

> Wrong results when using materialized views with non-deterministic/dynamic 
> functions
> 
>
> Key: HIVE-27948
> URL: https://issues.apache.org/jira/browse/HIVE-27948
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Krisztian Kasa
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.1.0
>
> Attachments: materialized_view_unix_timestamp.q
>
>
> There are certain SQL functions that return different results across 
> different executions. Usually we refer to these functions as 
> non-deterministic or dynamic. Some examples are: UNIX_TIMESTAMP(), 
> CURRENT_TIMESTAMP, CURRENT_DATE, etc.
> When a materialized view definition contains such functions the queries that 
> are using this view may return wrong results.
> Consider the following scenario where we populate the employee table with 
> timestamps representing the future. For making this easily reproable in 
> self-contained test the timestamps are only a few seconds apart.
> {code:sql}
> CREATE TABLE EMPS (ENAME STRING, BIRTH_EPOCH_SECS INT) STORED AS ORC 
> TBLPROPERTIES ('transactional'='true');
> INSERT INTO EMPS
> VALUES ('Victor', UNIX_TIMESTAMP()),
>('Alex', UNIX_TIMESTAMP() + 2),
>('Bob', UNIX_TIMESTAMP() + 5),
>('Alice', UNIX_TIMESTAMP() + 10);
> CREATE MATERIALIZED VIEW v_emp AS SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS 
> <= UNIX_TIMESTAMP();
> {code}
> When the materialized view is created it is populated with only the rows that 
> match the timestamp at the given time.
> To demonstrate the problem run the following queries with view based 
> rewritting disabled and enabled.
> {code:sql}
> set hive.materializedview.rewriting.sql=false;
> SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
> {code}
> {noformat}
> Victor1702302786
> Alex  1702302788
> Bob   1702302791
> {noformat}
> {code:sql}
> set hive.materializedview.rewriting.sql=true;
> SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
> {code}
> {noformat}
> Victor1702302786
> Alex  1702302788
> {noformat}
> Naturally the second query should return more rows than the first one since 
> UNIX_TIMESTAMP is constantly growing. However, when view based rewritting is 
> in use the second query will use the results from the materialized view which 
> are by now obsolete (missing Bob entry).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27948) Wrong results when using materialized views with non-deterministic/dynamic functions

2024-01-08 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-27948:
--
Component/s: Materialized views

> Wrong results when using materialized views with non-deterministic/dynamic 
> functions
> 
>
> Key: HIVE-27948
> URL: https://issues.apache.org/jira/browse/HIVE-27948
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Materialized views
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Krisztian Kasa
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.1.0
>
> Attachments: materialized_view_unix_timestamp.q
>
>
> There are certain SQL functions that return different results across 
> different executions. Usually we refer to these functions as 
> non-deterministic or dynamic. Some examples are: UNIX_TIMESTAMP(), 
> CURRENT_TIMESTAMP, CURRENT_DATE, etc.
> When a materialized view definition contains such functions the queries that 
> are using this view may return wrong results.
> Consider the following scenario where we populate the employee table with 
> timestamps representing the future. For making this easily reproable in 
> self-contained test the timestamps are only a few seconds apart.
> {code:sql}
> CREATE TABLE EMPS (ENAME STRING, BIRTH_EPOCH_SECS INT) STORED AS ORC 
> TBLPROPERTIES ('transactional'='true');
> INSERT INTO EMPS
> VALUES ('Victor', UNIX_TIMESTAMP()),
>('Alex', UNIX_TIMESTAMP() + 2),
>('Bob', UNIX_TIMESTAMP() + 5),
>('Alice', UNIX_TIMESTAMP() + 10);
> CREATE MATERIALIZED VIEW v_emp AS SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS 
> <= UNIX_TIMESTAMP();
> {code}
> When the materialized view is created it is populated with only the rows that 
> match the timestamp at the given time.
> To demonstrate the problem run the following queries with view based 
> rewritting disabled and enabled.
> {code:sql}
> set hive.materializedview.rewriting.sql=false;
> SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
> {code}
> {noformat}
> Victor1702302786
> Alex  1702302788
> Bob   1702302791
> {noformat}
> {code:sql}
> set hive.materializedview.rewriting.sql=true;
> SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
> {code}
> {noformat}
> Victor1702302786
> Alex  1702302788
> {noformat}
> Naturally the second query should return more rows than the first one since 
> UNIX_TIMESTAMP is constantly growing. However, when view based rewritting is 
> in use the second query will use the results from the materialized view which 
> are by now obsolete (missing Bob entry).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27948) Wrong results when using materialized views with non-deterministic/dynamic functions

2023-12-20 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27948:
--
Status: Patch Available  (was: Open)

> Wrong results when using materialized views with non-deterministic/dynamic 
> functions
> 
>
> Key: HIVE-27948
> URL: https://issues.apache.org/jira/browse/HIVE-27948
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Krisztian Kasa
>Priority: Critical
>  Labels: pull-request-available
> Attachments: materialized_view_unix_timestamp.q
>
>
> There are certain SQL functions that return different results across 
> different executions. Usually we refer to these functions as 
> non-deterministic or dynamic. Some examples are: UNIX_TIMESTAMP(), 
> CURRENT_TIMESTAMP, CURRENT_DATE, etc.
> When a materialized view definition contains such functions the queries that 
> are using this view may return wrong results.
> Consider the following scenario where we populate the employee table with 
> timestamps representing the future. For making this easily reproable in 
> self-contained test the timestamps are only a few seconds apart.
> {code:sql}
> CREATE TABLE EMPS (ENAME STRING, BIRTH_EPOCH_SECS INT) STORED AS ORC 
> TBLPROPERTIES ('transactional'='true');
> INSERT INTO EMPS
> VALUES ('Victor', UNIX_TIMESTAMP()),
>('Alex', UNIX_TIMESTAMP() + 2),
>('Bob', UNIX_TIMESTAMP() + 5),
>('Alice', UNIX_TIMESTAMP() + 10);
> CREATE MATERIALIZED VIEW v_emp AS SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS 
> <= UNIX_TIMESTAMP();
> {code}
> When the materialized view is created it is populated with only the rows that 
> match the timestamp at the given time.
> To demonstrate the problem run the following queries with view based 
> rewritting disabled and enabled.
> {code:sql}
> set hive.materializedview.rewriting.sql=false;
> SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
> {code}
> {noformat}
> Victor1702302786
> Alex  1702302788
> Bob   1702302791
> {noformat}
> {code:sql}
> set hive.materializedview.rewriting.sql=true;
> SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
> {code}
> {noformat}
> Victor1702302786
> Alex  1702302788
> {noformat}
> Naturally the second query should return more rows than the first one since 
> UNIX_TIMESTAMP is constantly growing. However, when view based rewritting is 
> in use the second query will use the results from the materialized view which 
> are by now obsolete (missing Bob entry).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27948) Wrong results when using materialized views with non-deterministic/dynamic functions

2023-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27948:
--
Labels: pull-request-available  (was: )

> Wrong results when using materialized views with non-deterministic/dynamic 
> functions
> 
>
> Key: HIVE-27948
> URL: https://issues.apache.org/jira/browse/HIVE-27948
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Krisztian Kasa
>Priority: Critical
>  Labels: pull-request-available
> Attachments: materialized_view_unix_timestamp.q
>
>
> There are certain SQL functions that return different results across 
> different executions. Usually we refer to these functions as 
> non-deterministic or dynamic. Some examples are: UNIX_TIMESTAMP(), 
> CURRENT_TIMESTAMP, CURRENT_DATE, etc.
> When a materialized view definition contains such functions the queries that 
> are using this view may return wrong results.
> Consider the following scenario where we populate the employee table with 
> timestamps representing the future. For making this easily reproable in 
> self-contained test the timestamps are only a few seconds apart.
> {code:sql}
> CREATE TABLE EMPS (ENAME STRING, BIRTH_EPOCH_SECS INT) STORED AS ORC 
> TBLPROPERTIES ('transactional'='true');
> INSERT INTO EMPS
> VALUES ('Victor', UNIX_TIMESTAMP()),
>('Alex', UNIX_TIMESTAMP() + 2),
>('Bob', UNIX_TIMESTAMP() + 5),
>('Alice', UNIX_TIMESTAMP() + 10);
> CREATE MATERIALIZED VIEW v_emp AS SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS 
> <= UNIX_TIMESTAMP();
> {code}
> When the materialized view is created it is populated with only the rows that 
> match the timestamp at the given time.
> To demonstrate the problem run the following queries with view based 
> rewritting disabled and enabled.
> {code:sql}
> set hive.materializedview.rewriting.sql=false;
> SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
> {code}
> {noformat}
> Victor1702302786
> Alex  1702302788
> Bob   1702302791
> {noformat}
> {code:sql}
> set hive.materializedview.rewriting.sql=true;
> SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
> {code}
> {noformat}
> Victor1702302786
> Alex  1702302788
> {noformat}
> Naturally the second query should return more rows than the first one since 
> UNIX_TIMESTAMP is constantly growing. However, when view based rewritting is 
> in use the second query will use the results from the materialized view which 
> are by now obsolete (missing Bob entry).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27948) Wrong results when using materialized views with non-deterministic/dynamic functions

2023-12-13 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-27948:

Priority: Critical  (was: Major)

> Wrong results when using materialized views with non-deterministic/dynamic 
> functions
> 
>
> Key: HIVE-27948
> URL: https://issues.apache.org/jira/browse/HIVE-27948
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Priority: Critical
> Attachments: materialized_view_unix_timestamp.q
>
>
> There are certain SQL functions that return different results across 
> different executions. Usually we refer to these functions as 
> non-deterministic or dynamic. Some examples are: UNIX_TIMESTAMP(), 
> CURRENT_TIMESTAMP, CURRENT_DATE, etc.
> When a materialized view definition contains such functions the queries that 
> are using this view may return wrong results.
> Consider the following scenario where we populate the employee table with 
> timestamps representing the future. For making this easily reproable in 
> self-contained test the timestamps are only a few seconds apart.
> {code:sql}
> CREATE TABLE EMPS (ENAME STRING, BIRTH_EPOCH_SECS INT) STORED AS ORC 
> TBLPROPERTIES ('transactional'='true');
> INSERT INTO EMPS
> VALUES ('Victor', UNIX_TIMESTAMP()),
>('Alex', UNIX_TIMESTAMP() + 2),
>('Bob', UNIX_TIMESTAMP() + 5),
>('Alice', UNIX_TIMESTAMP() + 10);
> CREATE MATERIALIZED VIEW v_emp AS SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS 
> <= UNIX_TIMESTAMP();
> {code}
> When the materialized view is created it is populated with only the rows that 
> match the timestamp at the given time.
> To demonstrate the problem run the following queries with view based 
> rewritting disabled and enabled.
> {code:sql}
> set hive.materializedview.rewriting.sql=false;
> SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
> {code}
> {noformat}
> Victor1702302786
> Alex  1702302788
> Bob   1702302791
> {noformat}
> {code:sql}
> set hive.materializedview.rewriting.sql=true;
> SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
> {code}
> {noformat}
> Victor1702302786
> Alex  1702302788
> {noformat}
> Naturally the second query should return more rows than the first one since 
> UNIX_TIMESTAMP is constantly growing. However, when view based rewritting is 
> in use the second query will use the results from the materialized view which 
> are by now obsolete (missing Bob entry).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27948) Wrong results when using materialized views with non-deterministic/dynamic functions

2023-12-11 Thread Sungwoo Park (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sungwoo Park updated HIVE-27948:

Description: 
There are certain SQL functions that return different results across different 
executions. Usually we refer to these functions as non-deterministic or 
dynamic. Some examples are: UNIX_TIMESTAMP(), CURRENT_TIMESTAMP, CURRENT_DATE, 
etc.

When a materialized view definition contains such functions the queries that 
are using this view may return wrong results.

Consider the following scenario where we populate the employee table with 
timestamps representing the future. For making this easily reproable in 
self-contained test the timestamps are only a few seconds apart.
{code:sql}
CREATE TABLE EMPS (ENAME STRING, BIRTH_EPOCH_SECS INT) STORED AS ORC 
TBLPROPERTIES ('transactional'='true');

INSERT INTO EMPS
VALUES ('Victor', UNIX_TIMESTAMP()),
   ('Alex', UNIX_TIMESTAMP() + 2),
   ('Bob', UNIX_TIMESTAMP() + 5),
   ('Alice', UNIX_TIMESTAMP() + 10);

CREATE MATERIALIZED VIEW v_emp AS SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= 
UNIX_TIMESTAMP();
{code}
When the materialized view is created it is populated with only the rows that 
match the timestamp at the given time.

To demonstrate the problem run the following queries with view based rewritting 
disabled and enabled.
{code:sql}
set hive.materializedview.rewriting.sql=false;
SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
{code}
{noformat}
Victor  1702302786
Alex1702302788
Bob 1702302791
{noformat}
{code:sql}
set hive.materializedview.rewriting.sql=true;
SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
{code}
{noformat}
Victor  1702302786
Alex1702302788
{noformat}
Naturally the second query should return more rows than the first one since 
UNIX_TIMESTAMP is constantly growing. However, when view based rewritting is in 
use the second query will use the results from the materialized view which are 
by now obsolete (missing Bob entry).

  was:
{color:red}colored text{color}There are certain SQL functions that return 
different results across different executions. Usually we refer to these 
functions as non-deterministic or dynamic. Some examples are: UNIX_TIMESTAMP(), 
CURRENT_TIMESTAMP, CURRENT_DATE, etc.

When a materialized view definition contains such functions the queries that 
are using this view may return wrong results.

Consider the following scenario where we populate the employee table with 
timestamps representing the future. For making this easily reproable in 
self-contained test the timestamps are only a few seconds apart.
{code:sql}
CREATE TABLE EMPS (ENAME STRING, BIRTH_EPOCH_SECS INT) STORED AS ORC 
TBLPROPERTIES ('transactional'='true');

INSERT INTO EMPS
VALUES ('Victor', UNIX_TIMESTAMP()),
   ('Alex', UNIX_TIMESTAMP() + 2),
   ('Bob', UNIX_TIMESTAMP() + 5),
   ('Alice', UNIX_TIMESTAMP() + 10);

CREATE MATERIALIZED VIEW v_emp AS SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= 
UNIX_TIMESTAMP();
{code}
When the materialized view is created it is populated with only the rows that 
match the timestamp at the given time.

To demonstrate the problem run the following queries with view based rewritting 
disabled and enabled.
{code:sql}
set hive.materializedview.rewriting.sql=false;
SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
{code}
{noformat}
Victor  1702302786
Alex1702302788
Bob 1702302791
{noformat}
{code:sql}
set hive.materializedview.rewriting.sql=true;
SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
{code}
{noformat}
Victor  1702302786
Alex1702302788
{noformat}
Naturally the second query should return more rows than the first one since 
UNIX_TIMESTAMP is constantly growing. However, when view based rewritting is in 
use the second query will use the results from the materialized view which are 
by now obsolete (missing Bob entry).


> Wrong results when using materialized views with non-deterministic/dynamic 
> functions
> 
>
> Key: HIVE-27948
> URL: https://issues.apache.org/jira/browse/HIVE-27948
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Priority: Major
> Attachments: materialized_view_unix_timestamp.q
>
>
> There are certain SQL functions that return different results across 
> different executions. Usually we refer to these functions as 
> non-deterministic or dynamic. Some examples are: UNIX_TIMESTAMP(), 
> CURRENT_TIMESTAMP, CURRENT_DATE, etc.
> When a materialized view definition contains such functions the queries that 
> are using this view may return wrong results.
> Consider the following scenario where we populate the employee table with 
> timestamps represent

[jira] [Updated] (HIVE-27948) Wrong results when using materialized views with non-deterministic/dynamic functions

2023-12-11 Thread Sungwoo Park (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sungwoo Park updated HIVE-27948:

Description: 
{color:red}colored text{color}There are certain SQL functions that return 
different results across different executions. Usually we refer to these 
functions as non-deterministic or dynamic. Some examples are: UNIX_TIMESTAMP(), 
CURRENT_TIMESTAMP, CURRENT_DATE, etc.

When a materialized view definition contains such functions the queries that 
are using this view may return wrong results.

Consider the following scenario where we populate the employee table with 
timestamps representing the future. For making this easily reproable in 
self-contained test the timestamps are only a few seconds apart.
{code:sql}
CREATE TABLE EMPS (ENAME STRING, BIRTH_EPOCH_SECS INT) STORED AS ORC 
TBLPROPERTIES ('transactional'='true');

INSERT INTO EMPS
VALUES ('Victor', UNIX_TIMESTAMP()),
   ('Alex', UNIX_TIMESTAMP() + 2),
   ('Bob', UNIX_TIMESTAMP() + 5),
   ('Alice', UNIX_TIMESTAMP() + 10);

CREATE MATERIALIZED VIEW v_emp AS SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= 
UNIX_TIMESTAMP();
{code}
When the materialized view is created it is populated with only the rows that 
match the timestamp at the given time.

To demonstrate the problem run the following queries with view based rewritting 
disabled and enabled.
{code:sql}
set hive.materializedview.rewriting.sql=false;
SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
{code}
{noformat}
Victor  1702302786
Alex1702302788
Bob 1702302791
{noformat}
{code:sql}
set hive.materializedview.rewriting.sql=true;
SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
{code}
{noformat}
Victor  1702302786
Alex1702302788
{noformat}
Naturally the second query should return more rows than the first one since 
UNIX_TIMESTAMP is constantly growing. However, when view based rewritting is in 
use the second query will use the results from the materialized view which are 
by now obsolete (missing Bob entry).

  was:
There are certain SQL functions that return different results across different 
executions. Usually we refer to these functions as non-deterministic or 
dynamic. Some examples are: UNIX_TIMESTAMP(), CURRENT_TIMESTAMP, CURRENT_DATE, 
etc.

When a materialized view definition contains such functions the queries that 
are using this view may return wrong results.

Consider the following scenario where we populate the employee table with 
timestamps representing the future. For making this easily reproable in 
self-contained test the timestamps are only a few seconds apart.
{code:sql}
CREATE TABLE EMPS (ENAME STRING, BIRTH_EPOCH_SECS INT) STORED AS ORC 
TBLPROPERTIES ('transactional'='true');

INSERT INTO EMPS
VALUES ('Victor', UNIX_TIMESTAMP()),
   ('Alex', UNIX_TIMESTAMP() + 2),
   ('Bob', UNIX_TIMESTAMP() + 5),
   ('Alice', UNIX_TIMESTAMP() + 10);

CREATE MATERIALIZED VIEW v_emp AS SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= 
UNIX_TIMESTAMP();
{code}
When the materialized view is created it is populated with only the rows that 
match the timestamp at the given time.

To demonstrate the problem run the following queries with view based rewritting 
disabled and enabled.
{code:sql}
set hive.materializedview.rewriting.sql=false;
SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
{code}
{noformat}
Victor  1702302786
Alex1702302788
Bob 1702302791
{noformat}
{code:sql}
set hive.materializedview.rewriting.sql=true;
SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
{code}
{noformat}
Victor  1702302786
Alex1702302788
{noformat}
Naturally the second query should return more rows than the first one since 
UNIX_TIMESTAMP is constantly growing. However, when view based rewritting is in 
use the second query will use the results from the materialized view which are 
by now obsolete (missing Bob entry).


> Wrong results when using materialized views with non-deterministic/dynamic 
> functions
> 
>
> Key: HIVE-27948
> URL: https://issues.apache.org/jira/browse/HIVE-27948
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Priority: Major
> Attachments: materialized_view_unix_timestamp.q
>
>
> {color:red}colored text{color}There are certain SQL functions that return 
> different results across different executions. Usually we refer to these 
> functions as non-deterministic or dynamic. Some examples are: 
> UNIX_TIMESTAMP(), CURRENT_TIMESTAMP, CURRENT_DATE, etc.
> When a materialized view definition contains such functions the queries that 
> are using this view may return wrong results.
> Consider the following scenario where we populate the employee tabl

[jira] [Updated] (HIVE-27948) Wrong results when using materialized views with non-deterministic/dynamic functions

2023-12-11 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-27948:
---
Attachment: materialized_view_unix_timestamp.q

> Wrong results when using materialized views with non-deterministic/dynamic 
> functions
> 
>
> Key: HIVE-27948
> URL: https://issues.apache.org/jira/browse/HIVE-27948
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Priority: Major
> Attachments: materialized_view_unix_timestamp.q
>
>
> There are certain SQL functions that return different results across 
> different executions. Usually we refer to these functions as 
> non-deterministic or dynamic. Some examples are: UNIX_TIMESTAMP(), 
> CURRENT_TIMESTAMP, CURRENT_DATE, etc.
> When a materialized view definition contains such functions the queries that 
> are using this view may return wrong results.
> Consider the following scenario where we populate the employee table with 
> timestamps representing the future. For making this easily reproable in 
> self-contained test the timestamps are only a few seconds apart.
> {code:sql}
> CREATE TABLE EMPS (ENAME STRING, BIRTH_EPOCH_SECS INT) STORED AS ORC 
> TBLPROPERTIES ('transactional'='true');
> INSERT INTO EMPS
> VALUES ('Victor', UNIX_TIMESTAMP()),
>('Alex', UNIX_TIMESTAMP() + 2),
>('Bob', UNIX_TIMESTAMP() + 5),
>('Alice', UNIX_TIMESTAMP() + 10);
> CREATE MATERIALIZED VIEW v_emp AS SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS 
> <= UNIX_TIMESTAMP();
> {code}
> When the materialized view is created it is populated with only the rows that 
> match the timestamp at the given time.
> To demonstrate the problem run the following queries with view based 
> rewritting disabled and enabled.
> {code:sql}
> set hive.materializedview.rewriting.sql=false;
> SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
> {code}
> {noformat}
> Victor1702302786
> Alex  1702302788
> Bob   1702302791
> {noformat}
> {code:sql}
> set hive.materializedview.rewriting.sql=true;
> SELECT * FROM EMPS WHERE BIRTH_EPOCH_SECS <= UNIX_TIMESTAMP();
> {code}
> {noformat}
> Victor1702302786
> Alex  1702302788
> {noformat}
> Naturally the second query should return more rows than the first one since 
> UNIX_TIMESTAMP is constantly growing. However, when view based rewritting is 
> in use the second query will use the results from the materialized view which 
> are by now obsolete (missing Bob entry).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)