[jira] [Comment Edited] (HIVE-27712) GenericUDAFNumericStatsEvaluator throws NPE
[ https://issues.apache.org/jira/browse/HIVE-27712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793539#comment-17793539 ] liang yu edited comment on HIVE-27712 at 12/6/23 7:56 AM: -- [~zhangbutao] Thanks for your comments. I checked the patch, it deprecate the function compute_stats, and use other functions to get the same result, this made too many changes. as I have mentioned in my solution and description, this is just a bug for functions compute_stats, which can be easily solved, if we made so many new changes, there might be more problems and bugs. was (Author: JIRAUSER299608): [~zhangbutao] I checked the patch, it deprecate the function compute_stats, and use other functions to get the same result, this made too many changes. as I have mentioned in my solution and description, this is just a bug for functions compute_stats, which can be easily solved, if we made so many new changes, there might be more problems and bugs. > GenericUDAFNumericStatsEvaluator throws NPE > --- > > Key: HIVE-27712 > URL: https://issues.apache.org/jira/browse/HIVE-27712 > Project: Hive > Issue Type: Bug >Reporter: liang yu >Assignee: liang yu >Priority: Major > Labels: pull-request-available > Attachments: image-2023-09-19-16-33-49-881.png > > > using Hadoop 3.3.4 > Hive 3.1.3 > when I set the config: > {code:java} > set hive.groupby.skewindata=true; > set hive.map.aggr=true; {code} > and execute a sql with groupby execution and join execution, I got a > NullPointerException below: > > !image-2023-09-19-16-33-49-881.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27712) GenericUDAFNumericStatsEvaluator throws NPE
[ https://issues.apache.org/jira/browse/HIVE-27712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793539#comment-17793539 ] liang yu commented on HIVE-27712: - [~zhangbutao] I checked the patch, it deprecate the function compute_stats, and use other functions to get the same result, this made too many changes. as I have mentioned in my solution and description, this is just a bug for functions compute_stats, which can be easily solved, if we made so many new changes, there might be more problems and bugs. > GenericUDAFNumericStatsEvaluator throws NPE > --- > > Key: HIVE-27712 > URL: https://issues.apache.org/jira/browse/HIVE-27712 > Project: Hive > Issue Type: Bug >Reporter: liang yu >Assignee: liang yu >Priority: Major > Labels: pull-request-available > Attachments: image-2023-09-19-16-33-49-881.png > > > using Hadoop 3.3.4 > Hive 3.1.3 > when I set the config: > {code:java} > set hive.groupby.skewindata=true; > set hive.map.aggr=true; {code} > and execute a sql with groupby execution and join execution, I got a > NullPointerException below: > > !image-2023-09-19-16-33-49-881.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27925) HiveConf: unify ConfVars enum and use underscore for better readability
[ https://issues.apache.org/jira/browse/HIVE-27925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-27925: -- Labels: pull-request-available (was: ) > HiveConf: unify ConfVars enum and use underscore for better readability > > > Key: HIVE-27925 > URL: https://issues.apache.org/jira/browse/HIVE-27925 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: Kokila N >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > When I read something like > "[BASICSTATSTASKSMAXTHREADSFACTOR|https://github.com/apache/hive/blob/70f34e27349dccf5fabbfc6c63e63c7be0785360/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L753]; > I feel someone in the world laughs out loud thinking of me struggling. I can > read it, but I hate it :) imagine what if we have vars like > [HIVE_MATERIALIZED_VIEW_ENABLE_AUTO_REWRITING_SUBQUERY_SQL|https://github.com/apache/hive/blob/70f34e27349dccf5fabbfc6c63e63c7be0785360/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1921] > without underscores...okay, let me help, it is: > HIVEMATERIALIZEDVIEWENABLEAUTOREWRITINGSUBQUERYSQL :D > please let's fix this in 4.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (HIVE-27925) HiveConf: unify ConfVars enum and use underscore for better readability
[ https://issues.apache.org/jira/browse/HIVE-27925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-27925 started by Kokila N. --- > HiveConf: unify ConfVars enum and use underscore for better readability > > > Key: HIVE-27925 > URL: https://issues.apache.org/jira/browse/HIVE-27925 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: Kokila N >Priority: Major > Fix For: 4.0.0 > > > When I read something like > "[BASICSTATSTASKSMAXTHREADSFACTOR|https://github.com/apache/hive/blob/70f34e27349dccf5fabbfc6c63e63c7be0785360/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L753]; > I feel someone in the world laughs out loud thinking of me struggling. I can > read it, but I hate it :) imagine what if we have vars like > [HIVE_MATERIALIZED_VIEW_ENABLE_AUTO_REWRITING_SUBQUERY_SQL|https://github.com/apache/hive/blob/70f34e27349dccf5fabbfc6c63e63c7be0785360/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1921] > without underscores...okay, let me help, it is: > HIVEMATERIALIZEDVIEWENABLEAUTOREWRITINGSUBQUERYSQL :D > please let's fix this in 4.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (HIVE-27556) Add Unit Test for KafkaStorageHandlerInfo
[ https://issues.apache.org/jira/browse/HIVE-27556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-27556 started by Kokila N. --- > Add Unit Test for KafkaStorageHandlerInfo > - > > Key: HIVE-27556 > URL: https://issues.apache.org/jira/browse/HIVE-27556 > Project: Hive > Issue Type: Test > Components: kafka integration, StorageHandler >Reporter: Kokila N >Assignee: Kokila N >Priority: Major > Labels: pull-request-available > > Adding unit tests for KafkaStorageHandlerInfo. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27894) Enhance HMS Handler Logs for all 'get_partition' functions.
[ https://issues.apache.org/jira/browse/HIVE-27894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793502#comment-17793502 ] Chinna Rao Lalam commented on HIVE-27894: - Merged to master !! Thanks for the patch [~shivijha30] > Enhance HMS Handler Logs for all 'get_partition' functions. > --- > > Key: HIVE-27894 > URL: https://issues.apache.org/jira/browse/HIVE-27894 > Project: Hive > Issue Type: Improvement >Reporter: Shivangi Jha >Assignee: Shivangi Jha >Priority: Major > Labels: pull-request-available > > The HMSHandler > (standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java) > class encompasses various functions pertaining to partition information, yet > its current implementation lacks comprehensive logging of substantial > partition data. Enhancing this aspect would significantly contribute to > improved log readability and facilitate more effective debugging processes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27894) Enhance HMS Handler Logs for all 'get_partition' functions.
[ https://issues.apache.org/jira/browse/HIVE-27894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam resolved HIVE-27894. - Fix Version/s: 4.1.0 Resolution: Fixed > Enhance HMS Handler Logs for all 'get_partition' functions. > --- > > Key: HIVE-27894 > URL: https://issues.apache.org/jira/browse/HIVE-27894 > Project: Hive > Issue Type: Improvement >Reporter: Shivangi Jha >Assignee: Shivangi Jha >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > > The HMSHandler > (standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java) > class encompasses various functions pertaining to partition information, yet > its current implementation lacks comprehensive logging of substantial > partition data. Enhancing this aspect would significantly contribute to > improved log readability and facilitate more effective debugging processes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27894) Enhance HMS Handler Logs for all 'get_partition' functions.
[ https://issues.apache.org/jira/browse/HIVE-27894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793500#comment-17793500 ] Chinna Rao Lalam commented on HIVE-27894: - +1 LGTM > Enhance HMS Handler Logs for all 'get_partition' functions. > --- > > Key: HIVE-27894 > URL: https://issues.apache.org/jira/browse/HIVE-27894 > Project: Hive > Issue Type: Improvement >Reporter: Shivangi Jha >Assignee: Shivangi Jha >Priority: Major > Labels: pull-request-available > > The HMSHandler > (standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java) > class encompasses various functions pertaining to partition information, yet > its current implementation lacks comprehensive logging of substantial > partition data. Enhancing this aspect would significantly contribute to > improved log readability and facilitate more effective debugging processes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (HIVE-27924) Incremental rebuild goes wrong when inserts and deletes overlap between the source tables
[ https://issues.apache.org/jira/browse/HIVE-27924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-27924 started by Krisztian Kasa. - > Incremental rebuild goes wrong when inserts and deletes overlap between the > source tables > - > > Key: HIVE-27924 > URL: https://issues.apache.org/jira/browse/HIVE-27924 > Project: Hive > Issue Type: Bug > Components: Materialized views >Affects Versions: 4.0.0-beta-1 > Environment: * Docker version : 19.03.6 > * Hive version : 4.0.0-beta-1 > * Driver version : Hive JDBC (4.0.0-beta-1) > * Beeline version : 4.0.0-beta-1 >Reporter: Wenhao Li >Assignee: Krisztian Kasa >Priority: Critical > Labels: bug, hive, materializedviews > Attachments: 截图.PNG, 截图1.PNG, 截图2.PNG, 截图3.PNG, 截图4.PNG, 截图5.PNG, > 截图6.PNG, 截图7.PNG, 截图8.PNG, 截图9.PNG > > > h1. Summary > The incremental rebuild plan and execution output are incorrect when one side > of the table join has inserted/deleted join keys that the other side has > deleted/inserted (note the order). > The argument is that tuples that have never been present simultaneously > should not interact with one another, i.e., one's inserts should not join the > other's deletes. > h1. Related Test Case > The bug was discovered during replication of the test case: > ??hive/ql/src/test/queries/clientpositive/materialized_view_create_rewrite_5.q?? > h1. Steps to Reproduce the Issue > # Configurations: > {code:sql} > SET hive.vectorized.execution.enabled=false; > set hive.support.concurrency=true; > set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; > set hive.strict.checks.cartesian.product=false; > set hive.materializedview.rewriting=true;{code} > # > {code:sql} > create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) > stored as orc TBLPROPERTIES ('transactional'='true'); {code} > # > {code:sql} > insert into cmv_basetable_n6 values > (1, 'alfred', 10.30, 2), > (1, 'charlie', 20.30, 2); {code} > # > {code:sql} > create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d > int) stored as orc TBLPROPERTIES ('transactional'='true'); {code} > # > {code:sql} > insert into cmv_basetable_2_n3 values > (1, 'bob', 30.30, 2), > (1, 'bonnie', 40.30, 2);{code} > # > {code:sql} > CREATE MATERIALIZED VIEW cmv_mat_view_n6 TBLPROPERTIES > ('transactional'='true') AS > SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c > FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = > cmv_basetable_2_n3.a) > WHERE cmv_basetable_2_n3.c > 10.0;{code} > # > {code:sql} > show tables; {code} > !截图.PNG! > # Select tuples, including deletion and with VirtualColumn's, from the MV > and source tables. We see that the MV is correctly built upon creation: > {code:sql} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code} > !截图1.PNG! > # > {code:sql} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code} > !截图2.PNG! > # > {code:sql} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code} > !截图3.PNG! > # Now make an insert to the LHS and a delete to the RHS source table: > {code:sql} > insert into cmv_basetable_n6 values > (1, 'kevin', 50.30, 2); > DELETE FROM cmv_basetable_2_n3 WHERE b = 'bonnie';{code} > # Select again to see what happened: > {code:sql} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code} > !截图4.PNG! > # > {code:sql} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code} > !截图5.PNG! > # Use {{EXPLAIN CBO}} to produce the incremental rebuild plan for the MV, > which is incorrect already: > {code:sql} > EXPLAIN CBO > ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; {code} > !截图6.PNG! > # Rebuild MV and see (incorrect) results: > {code:sql} > ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code} > !截图7.PNG! > # Run MV definition directly, which outputs incorrect results because the MV > is enabled for MV-based query rewrite, i.e., the following query will output > what's in the MV for the time being: > {code:sql} > SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c > FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = > cmv_basetable_2_n3.a) > WHERE cmv_basetable_2_n3.c > 10.0; {code} > !截图8.PNG! > # Disable MV-based query rewrite for the MV and re-run the definition, which > should give the correct results: > {code:sql} > ALTER MATERIALIZED VIEW cmv_mat_view_n6 DISABLE REWRITE; > SELECT
[jira] [Assigned] (HIVE-27924) Incremental rebuild goes wrong when inserts and deletes overlap between the source tables
[ https://issues.apache.org/jira/browse/HIVE-27924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa reassigned HIVE-27924: - Assignee: Krisztian Kasa > Incremental rebuild goes wrong when inserts and deletes overlap between the > source tables > - > > Key: HIVE-27924 > URL: https://issues.apache.org/jira/browse/HIVE-27924 > Project: Hive > Issue Type: Bug > Components: Materialized views >Affects Versions: 4.0.0-beta-1 > Environment: * Docker version : 19.03.6 > * Hive version : 4.0.0-beta-1 > * Driver version : Hive JDBC (4.0.0-beta-1) > * Beeline version : 4.0.0-beta-1 >Reporter: Wenhao Li >Assignee: Krisztian Kasa >Priority: Critical > Labels: bug, hive, materializedviews > Attachments: 截图.PNG, 截图1.PNG, 截图2.PNG, 截图3.PNG, 截图4.PNG, 截图5.PNG, > 截图6.PNG, 截图7.PNG, 截图8.PNG, 截图9.PNG > > > h1. Summary > The incremental rebuild plan and execution output are incorrect when one side > of the table join has inserted/deleted join keys that the other side has > deleted/inserted (note the order). > The argument is that tuples that have never been present simultaneously > should not interact with one another, i.e., one's inserts should not join the > other's deletes. > h1. Related Test Case > The bug was discovered during replication of the test case: > ??hive/ql/src/test/queries/clientpositive/materialized_view_create_rewrite_5.q?? > h1. Steps to Reproduce the Issue > # Configurations: > {code:sql} > SET hive.vectorized.execution.enabled=false; > set hive.support.concurrency=true; > set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; > set hive.strict.checks.cartesian.product=false; > set hive.materializedview.rewriting=true;{code} > # > {code:sql} > create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) > stored as orc TBLPROPERTIES ('transactional'='true'); {code} > # > {code:sql} > insert into cmv_basetable_n6 values > (1, 'alfred', 10.30, 2), > (1, 'charlie', 20.30, 2); {code} > # > {code:sql} > create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d > int) stored as orc TBLPROPERTIES ('transactional'='true'); {code} > # > {code:sql} > insert into cmv_basetable_2_n3 values > (1, 'bob', 30.30, 2), > (1, 'bonnie', 40.30, 2);{code} > # > {code:sql} > CREATE MATERIALIZED VIEW cmv_mat_view_n6 TBLPROPERTIES > ('transactional'='true') AS > SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c > FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = > cmv_basetable_2_n3.a) > WHERE cmv_basetable_2_n3.c > 10.0;{code} > # > {code:sql} > show tables; {code} > !截图.PNG! > # Select tuples, including deletion and with VirtualColumn's, from the MV > and source tables. We see that the MV is correctly built upon creation: > {code:sql} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code} > !截图1.PNG! > # > {code:sql} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code} > !截图2.PNG! > # > {code:sql} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code} > !截图3.PNG! > # Now make an insert to the LHS and a delete to the RHS source table: > {code:sql} > insert into cmv_basetable_n6 values > (1, 'kevin', 50.30, 2); > DELETE FROM cmv_basetable_2_n3 WHERE b = 'bonnie';{code} > # Select again to see what happened: > {code:sql} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code} > !截图4.PNG! > # > {code:sql} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code} > !截图5.PNG! > # Use {{EXPLAIN CBO}} to produce the incremental rebuild plan for the MV, > which is incorrect already: > {code:sql} > EXPLAIN CBO > ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; {code} > !截图6.PNG! > # Rebuild MV and see (incorrect) results: > {code:sql} > ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code} > !截图7.PNG! > # Run MV definition directly, which outputs incorrect results because the MV > is enabled for MV-based query rewrite, i.e., the following query will output > what's in the MV for the time being: > {code:sql} > SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c > FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = > cmv_basetable_2_n3.a) > WHERE cmv_basetable_2_n3.c > 10.0; {code} > !截图8.PNG! > # Disable MV-based query rewrite for the MV and re-run the definition, which > should give the correct results: > {code:sql} > ALTER MATERIALIZED VIEW cmv_mat_view_n6 DISABLE REWRITE; >
[jira] [Commented] (HIVE-27226) FullOuterJoin with filter expressions is not computed correctly
[ https://issues.apache.org/jira/browse/HIVE-27226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793209#comment-17793209 ] Seonggon Namgung commented on HIVE-27226: - [~dkuzmenko] , I think that would not take much time; it seems that we can disable HIVE-18908 optimization by adding an extra condition in ConvertJoinMapJoin.getMapJoinConversion(). > FullOuterJoin with filter expressions is not computed correctly > --- > > Key: HIVE-27226 > URL: https://issues.apache.org/jira/browse/HIVE-27226 > Project: Hive > Issue Type: Bug >Reporter: Seonggon Namgung >Priority: Major > Labels: hive-4.0.0-must > > I tested many OuterJoin queries as an extension of HIVE-27138, and I found > that Hive returns incorrect result for a query containing FullOuterJoin with > filter expressions. In a nutshell, all JoinOperators that run on Tez engine > return incorrect result for OuterJoin queries, and one of the reason for > incorrect computation comes from CommonJoinOperator, which is the base of all > JoinOperators. I attached the queries and configuration that I used at the > bottom of the document. I am still inspecting this problems, and I will share > an update once when I find out another reason. Also any comments and opinions > would be appreciated. > First of all, I observed that current Hive ignores filter expressions > contained in MapJoinOperator. For example, the attached result of query1 > shows that MapJoinOperator performs inner join, not full outer join. This > problem stems from removal of filterMap. When converting JoinOperator to > MapJoinOperator, ConvertJoinMapJoin#convertJoinDynamicPartitionedHashJoin() > removes filterMap of MapJoinOperator. Because MapJoinOperator does not > evaluate filter expressions if filterMap is null, this change makes > MapJoinOperator ignore filter expressions and it always joins tables > regardless whether they satisfy filter expressions or not. To solve this > problem, I disable FullOuterMapJoinOptimization and apply path for > HIVE-27138, which prevents NPE. (The patch is available at the following > link: LINK.) The rest of this document uses this modified Hive, but most of > problems happen to current Hive, too. > The second problem I found is that Hive returns the same left-null or > right-null rows multiple time when it uses MapJoinOperator or > CommonMergeJoinOperator. This is caused by the logic of current > CommonJoinOperator. Both of the two JoinOperators joins tables in 2 steps. > First, they create RowContainers, each of which is a group of rows from one > table and has the same key. Second, they call > CommonJoinOperator#checkAndGenObject() with created RowContainers. This > method checks filterTag of each row in RowContainers and forwards joined row > if they meet all filter conditions. For OuterJoin, checkAndGenObject() > forwards non-matching rows if there is no matching row in RowContainer. The > problem happens when there are multiple RowContainer for the same key and > table. For example, suppose that there are two left RowContainers and one > right RowContainer. If none of the row in two left RowContainers satisfies > filter condition, then checkAndGenObject() will forward Left-Null row for > each right row. Because checkAndGenObject() is called with each left > RowContainer, there will be two duplicated Left-Null rows for every right row. > In the case of MapJoinOperator, it always creates singleton RowContainer for > big table. Therefore, it always produces duplicated non-matching rows. > CommonMergeJoinOperator also creates multiple RowContainer for big table, > whose size is hive.join.emit.interval. In the below experiment, I also set > hive.join.shortcut.unmatched.rows=false, and hive.exec.reducers.max=1 to > disable specialized algorithm for OuterJoin of 2 tables and force calling > checkAndGenObject() before all rows with the same keys are gathered. I didn't > observe this problem when using VectorMapJoinOperator, and I will inspect > VectorMapJoinOperator whether we can reproduce the problem with it. > I think the second problem is not limited to FullOuterJoin, but I couldn't > find such query as of now. This will also be added to this issue if I can > write a query that reproduces the second problem without FullOuterJoin. > I also found that Hive returns wrong result for query2 even when I used > VectorMapJoinOperator. I am still inspecting this problem and I will add an > update on it when I find out the reason. > > Experiment: > > {code:java} > Configuration > set hive.optimize.shared.work=false; > -- Std MapJoin > set hive.auto.convert.join=true; > set hive.vectorized.execution.enabled=false; > -- Vec MapJoin > set hive.auto.convert.join=true; > set
[jira] [Resolved] (HIVE-27918) Iceberg: Push transforms for clustering during table writes
[ https://issues.apache.org/jira/browse/HIVE-27918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sourabh Badhya resolved HIVE-27918. --- Fix Version/s: 4.1.0 Resolution: Fixed Merged to master. Thanks [~dkuzmenko] for the reviews. > Iceberg: Push transforms for clustering during table writes > --- > > Key: HIVE-27918 > URL: https://issues.apache.org/jira/browse/HIVE-27918 > Project: Hive > Issue Type: Improvement >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > > Currently transformed columns (except for bucket transform) are not pushed / > passed as clustering columns. This can lead to incorrect clustering on such > columns which can lead non-performant writes. > Hence push transforms for clustering during table writes. -- This message was sent by Atlassian Jira (v8.20.10#820010)