[jira] [Work started] (HIVE-26075) hive metastore connection leaking when hiveserver2 kerberos enable and hive.server2.enable.doAs set to true
[ https://issues.apache.org/jira/browse/HIVE-26075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-26075 started by liuguanghua. -- > hive metastore connection leaking when hiveserver2 kerberos enable and > hive.server2.enable.doAs set to true > > > Key: HIVE-26075 > URL: https://issues.apache.org/jira/browse/HIVE-26075 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0 >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Labels: pull-request-available > Attachments: HIVE-26075.patch > > Time Spent: 10m > Remaining Estimate: 0h > > (1)When hadoop cluster kerberos is enable > (2) HiveServer2 config hive.server2.enable.doAs is set true > After a beeline scripte has been executed, hivemetastore connection is > created are in ESTABLISHED state and never closed. > If we submit a lot of task to hiveserver2 ,this will result in hive metastore > thrift thread(default is 1000) full ,thus new task will fail. > > HiveServer2 use ThreadLocal to store multithreading metastore > connection,the application should call Hive.closeCurrent() to close > connection after task finished. > > When HiveServer2 impersonate is enable (hive.server2.enable.doAs is set > true), the ugi will create proxy user via > UserGroupInformation.createProxyUser( > owner, UserGroupInformation.getLoginUser()),the old metastore client is never > closed. > > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work stopped] (HIVE-26075) hive metastore connection leaking when hiveserver2 kerberos enable and hive.server2.enable.doAs set to true
[ https://issues.apache.org/jira/browse/HIVE-26075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-26075 stopped by liuguanghua. -- > hive metastore connection leaking when hiveserver2 kerberos enable and > hive.server2.enable.doAs set to true > > > Key: HIVE-26075 > URL: https://issues.apache.org/jira/browse/HIVE-26075 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0 >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Labels: pull-request-available > Attachments: HIVE-26075.patch > > Time Spent: 10m > Remaining Estimate: 0h > > (1)When hadoop cluster kerberos is enable > (2) HiveServer2 config hive.server2.enable.doAs is set true > After a beeline scripte has been executed, hivemetastore connection is > created are in ESTABLISHED state and never closed. > If we submit a lot of task to hiveserver2 ,this will result in hive metastore > thrift thread(default is 1000) full ,thus new task will fail. > > HiveServer2 use ThreadLocal to store multithreading metastore > connection,the application should call Hive.closeCurrent() to close > connection after task finished. > > When HiveServer2 impersonate is enable (hive.server2.enable.doAs is set > true), the ugi will create proxy user via > UserGroupInformation.createProxyUser( > owner, UserGroupInformation.getLoginUser()),the old metastore client is never > closed. > > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-26111) FULL JOIN returns incorrect result
[ https://issues.apache.org/jira/browse/HIVE-26111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Youjun Yuan updated HIVE-26111: --- Description: we hit a query which FULL JOINs two tables, hive produces incorrect results, for a single value of join key, it produces two records, each record has a valid value for one table and NULL for the other table. The query is: {code:java} SET mapreduce.job.reduces=2; SELECT d.id, u.id FROM ( SELECT id FROM airflow.tableA rud WHERE rud.dt = '2022-04-02-1row' ) d FULL JOIN ( SELECT id FROM default.tableB WHERE dt = '2022-04-01' and device_token='blabla' ) u ON u.id = d.id ; {code} According to the job log, the two reducers each get an input record, and output a record. And produces two records for id=350570497 {code:java} 350570497 NULL NULL 350570497 Time taken: 62.692 seconds, Fetched: 2 row(s) {code} I am sure tableB has only one row where device_token='blabla' And we tried: 1, SET mapreduce.job.reduces=1; then it produces right result; -2, SET hive.execution.engine=mr; then it produces right result;- mr also has the issue. 3, JOIN (instead of FULL JOIN) worked as expected 4, in sub query u, change filter device_token='blabla' to id=350570497, it worked ok 5, flatten the sub queries, then it works ok, like below: {code:java} SELECT d.id, u.id from airflow.rds_users_delta d full join default.users u on (u.id = d.id) where d.dt = '2022-04-02-1row' and u.dt = '2022-04-01' and u.device_token='blabla' {code} Below is the explain output of the query: {code:java} Plan optimized by CBO.Vertex dependency in root stage Reducer 3 <- Map 1 (CUSTOM_SIMPLE_EDGE), Map 2 (CUSTOM_SIMPLE_EDGE)Stage-0 Fetch Operator limit:-1 Stage-1 Reducer 3 File Output Operator [FS_10] Map Join Operator [MAPJOIN_13] (rows=2 width=8) Conds:RS_6.KEY.reducesinkkey0=RS_7.KEY.reducesinkkey0(Outer),DynamicPartitionHashJoin:true,Output:["_col0","_col1"] <-Map 1 [CUSTOM_SIMPLE_EDGE] PARTITION_ONLY_SHUFFLE [RS_6] PartitionCols:_col0 Select Operator [SEL_2] (rows=1 width=4) Output:["_col0"] TableScan [TS_0] (rows=1 width=4) airflow@rds_users_delta,rud,Tbl:COMPLETE,Col:COMPLETE,Output:["id"] <-Map 2 [CUSTOM_SIMPLE_EDGE] PARTITION_ONLY_SHUFFLE [RS_7] PartitionCols:_col0 Select Operator [SEL_5] (rows=1 width=4) Output:["_col0"] Filter Operator [FIL_12] (rows=1 width=110) predicate:(device_token = 'blabla') TableScan [TS_3] (rows=215192362 width=109) default@users,users,Tbl:COMPLETE,Col:COMPLETE,Output:["id","device_token"] {code} I can't generate a small enough result set to reproduce the issue, I have minimized the tableA to only 1 row, tableB has ~200m rows, but if I further reduce the size of tableB, then the issue can't be reproduced. Any suggestion would be highly appreciated, regarding the root cause of the issue, how to work around it, or how to reproduce it with small enough dataset. below is the log found in hive.log {code:java} 220405004014_2c3b3486-9bc7-4d1d-9639-693dad39da17 : STAGE DEPENDENCIES: Stage-1 is a root stage [MAPRED] Stage-0 depends on stages: Stage-1 [FETCH]STAGE PLANS: Stage: Stage-1 Tez DagId: ec2-user_20220405004014_2c3b3486-9bc7-4d1d-9639-693dad39da17:1 Edges: Reducer 3 <- Map 1 (CUSTOM_SIMPLE_EDGE), Map 2 (CUSTOM_SIMPLE_EDGE) DagName: ec2-user_20220405004014_2c3b3486-9bc7-4d1d-9639-693dad39da17:1 Vertices: Map 1 Map Operator Tree: TableScan alias: rud Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: COMPLETE GatherStats: false Select Operator expressions: id (type: int) outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) null sort order: a sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: COMPLETE tag: 0 auto parallelism: true Path -> Alias: s3a://.../rds_users_delta/dt=2022-04-02-1row/hh=00 [rud] Path -> Partition: s3a://.../rds_users_delta/dt=2022-04-02-1row/hh=00 Partition base file name: hh=00 input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.
[jira] [Updated] (HIVE-26111) FULL JOIN returns incorrect result
[ https://issues.apache.org/jira/browse/HIVE-26111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Youjun Yuan updated HIVE-26111: --- Summary: FULL JOIN returns incorrect result (was: FULL JOIN returns incorrect result with Tez engine) > FULL JOIN returns incorrect result > -- > > Key: HIVE-26111 > URL: https://issues.apache.org/jira/browse/HIVE-26111 > Project: Hive > Issue Type: Bug > Environment: aws EMR (hive 3.1.2 + Tez 0.10.1) >Reporter: Youjun Yuan >Priority: Blocker > > we hit a query which FULL JOINs two tables, hive produces incorrect results, > for a single value of join key, it produces two records, each record has a > valid value for one table and NULL for the other table. > The query is: > {code:java} > SET mapreduce.job.reduces=2; > SELECT d.id, u.id > FROM ( > SELECT id > FROM airflow.tableA rud > WHERE rud.dt = '2022-04-02-1row' > ) d > FULL JOIN ( > SELECT id > FROM default.tableB > WHERE dt = '2022-04-01' and device_token='blabla' > ) u > ON u.id = d.id > ; {code} > According to the job log, the two reducers each get an input record, and > output a record. > And produces two records for id=350570497 > {code:java} > 350570497 NULL > NULL 350570497 > Time taken: 62.692 seconds, Fetched: 2 row(s) {code} > I am sure tableB has only one row where device_token='blabla' > And we tried: > 1, SET mapreduce.job.reduces=1; then it produces right result; > -2, SET hive.execution.engine=mr; then it produces right result;- mr also has > the issue. > 3, JOIN (instead of FULL JOIN) worked as expected > 4, in sub query u, change filter device_token='blabla' to id=350570497, it > worked ok > Below is the explain output of the query: > {code:java} > Plan optimized by CBO.Vertex dependency in root stage > Reducer 3 <- Map 1 (CUSTOM_SIMPLE_EDGE), Map 2 (CUSTOM_SIMPLE_EDGE)Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 3 > File Output Operator [FS_10] > Map Join Operator [MAPJOIN_13] (rows=2 width=8) > > Conds:RS_6.KEY.reducesinkkey0=RS_7.KEY.reducesinkkey0(Outer),DynamicPartitionHashJoin:true,Output:["_col0","_col1"] > <-Map 1 [CUSTOM_SIMPLE_EDGE] > PARTITION_ONLY_SHUFFLE [RS_6] > PartitionCols:_col0 > Select Operator [SEL_2] (rows=1 width=4) > Output:["_col0"] > TableScan [TS_0] (rows=1 width=4) > > airflow@rds_users_delta,rud,Tbl:COMPLETE,Col:COMPLETE,Output:["id"] > <-Map 2 [CUSTOM_SIMPLE_EDGE] > PARTITION_ONLY_SHUFFLE [RS_7] > PartitionCols:_col0 > Select Operator [SEL_5] (rows=1 width=4) > Output:["_col0"] > Filter Operator [FIL_12] (rows=1 width=110) > predicate:(device_token = 'blabla') > TableScan [TS_3] (rows=215192362 width=109) > > default@users,users,Tbl:COMPLETE,Col:COMPLETE,Output:["id","device_token"] > {code} > I can't generate a small enough result set to reproduce the issue, I have > minimized the tableA to only 1 row, tableB has ~200m rows, but if I further > reduce the size of tableB, then the issue can't be reproduced. > Any suggestion would be highly appreciated, regarding the root cause of the > issue, how to work around it, or how to reproduce it with small enough > dataset. > > below is the log found in hive.log > {code:java} > 220405004014_2c3b3486-9bc7-4d1d-9639-693dad39da17 : STAGE DEPENDENCIES: > Stage-1 is a root stage [MAPRED] > Stage-0 depends on stages: Stage-1 [FETCH]STAGE PLANS: > Stage: Stage-1 > Tez > DagId: ec2-user_20220405004014_2c3b3486-9bc7-4d1d-9639-693dad39da17:1 > Edges: > Reducer 3 <- Map 1 (CUSTOM_SIMPLE_EDGE), Map 2 (CUSTOM_SIMPLE_EDGE) > DagName: ec2-user_20220405004014_2c3b3486-9bc7-4d1d-9639-693dad39da17:1 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: rud > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE > Column stats: COMPLETE > GatherStats: false > Select Operator > expressions: id (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 4 Basic stats: > COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > null sort order: a > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 1 Data size: 4 Basic stats: > COMPLETE Column stats: COMPLETE > tag: 0 > aut
[jira] [Assigned] (HIVE-26124) Upgrade HBase from 2.0.0-alpha4 to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-26124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary reassigned HIVE-26124: - Assignee: Peter Vary > Upgrade HBase from 2.0.0-alpha4 to 2.0.0 > > > Key: HIVE-26124 > URL: https://issues.apache.org/jira/browse/HIVE-26124 > Project: Hive > Issue Type: Task >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > We should remove the alpha version to the stable one -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work started] (HIVE-26124) Upgrade HBase from 2.0.0-alpha4 to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-26124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-26124 started by Peter Vary. - > Upgrade HBase from 2.0.0-alpha4 to 2.0.0 > > > Key: HIVE-26124 > URL: https://issues.apache.org/jira/browse/HIVE-26124 > Project: Hive > Issue Type: Task >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > We should remove the alpha version to the stable one -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26124) Upgrade HBase from 2.0.0-alpha4 to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-26124?focusedWorklogId=753680&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753680 ] ASF GitHub Bot logged work on HIVE-26124: - Author: ASF GitHub Bot Created on: 06/Apr/22 20:47 Start Date: 06/Apr/22 20:47 Worklog Time Spent: 10m Work Description: pvary opened a new pull request, #3186: URL: https://github.com/apache/hive/pull/3186 ### What changes were proposed in this pull request? Upgrade the HBase to the 2.0.0 ### Why are the changes needed? In a release we minimally should depend on a stable version ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit tests Issue Time Tracking --- Worklog Id: (was: 753680) Remaining Estimate: 0h Time Spent: 10m > Upgrade HBase from 2.0.0-alpha4 to 2.0.0 > > > Key: HIVE-26124 > URL: https://issues.apache.org/jira/browse/HIVE-26124 > Project: Hive > Issue Type: Task >Reporter: Peter Vary >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > We should remove the alpha version to the stable one -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-26124) Upgrade HBase from 2.0.0-alpha4 to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-26124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-26124: -- Labels: pull-request-available (was: ) > Upgrade HBase from 2.0.0-alpha4 to 2.0.0 > > > Key: HIVE-26124 > URL: https://issues.apache.org/jira/browse/HIVE-26124 > Project: Hive > Issue Type: Task >Reporter: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > We should remove the alpha version to the stable one -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HIVE-26092) Fix javadoc errors for the 4.0.0 release
[ https://issues.apache.org/jira/browse/HIVE-26092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary reassigned HIVE-26092: - Assignee: Peter Vary > Fix javadoc errors for the 4.0.0 release > > > Key: HIVE-26092 > URL: https://issues.apache.org/jira/browse/HIVE-26092 > Project: Hive > Issue Type: Task >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently there are plenty of errors in the javadoc. > We should fix those before a final release -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work started] (HIVE-26092) Fix javadoc errors for the 4.0.0 release
[ https://issues.apache.org/jira/browse/HIVE-26092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-26092 started by Peter Vary. - > Fix javadoc errors for the 4.0.0 release > > > Key: HIVE-26092 > URL: https://issues.apache.org/jira/browse/HIVE-26092 > Project: Hive > Issue Type: Task >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently there are plenty of errors in the javadoc. > We should fix those before a final release -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26092) Fix javadoc errors for the 4.0.0 release
[ https://issues.apache.org/jira/browse/HIVE-26092?focusedWorklogId=753669&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753669 ] ASF GitHub Bot logged work on HIVE-26092: - Author: ASF GitHub Bot Created on: 06/Apr/22 20:36 Start Date: 06/Apr/22 20:36 Worklog Time Spent: 10m Work Description: pvary opened a new pull request, #3185: URL: https://github.com/apache/hive/pull/3185 ### What changes were proposed in this pull request? Fixes the javadoc errors and adds a CI test for generating the javadoc ### Why are the changes needed? To fix the errors and prevent any new occurring again ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manually running: ``` mvn install javadoc:javadoc javadoc:aggregate -DskipTests ``` Issue Time Tracking --- Worklog Id: (was: 753669) Remaining Estimate: 0h Time Spent: 10m > Fix javadoc errors for the 4.0.0 release > > > Key: HIVE-26092 > URL: https://issues.apache.org/jira/browse/HIVE-26092 > Project: Hive > Issue Type: Task >Reporter: Peter Vary >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Currently there are plenty of errors in the javadoc. > We should fix those before a final release -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-26092) Fix javadoc errors for the 4.0.0 release
[ https://issues.apache.org/jira/browse/HIVE-26092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-26092: -- Labels: pull-request-available (was: ) > Fix javadoc errors for the 4.0.0 release > > > Key: HIVE-26092 > URL: https://issues.apache.org/jira/browse/HIVE-26092 > Project: Hive > Issue Type: Task >Reporter: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently there are plenty of errors in the javadoc. > We should fix those before a final release -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25882) Using the Hive Metastore with Kudu not work
[ https://issues.apache.org/jira/browse/HIVE-25882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liu updated HIVE-25882: --- Affects Version/s: 4.0.0 > Using the Hive Metastore with Kudu not work > --- > > Key: HIVE-25882 > URL: https://issues.apache.org/jira/browse/HIVE-25882 > Project: Hive > Issue Type: Bug > Components: Accumulo Storage Handler >Affects Versions: 3.1.2, 4.0.0 > Environment: HIVE: 3.1 > HDP: 3.1.1.3.1 > KUDU:kudu 1.15.0 >Reporter: liu >Priority: Critical > > I follow this page config . It looks as if the configuration was successful. > [https://kudu.apache.org/docs/hive_metastore.html#enabling-the-hive-metastore-integration] > kudu master start log > {code:java} > I1115 18:51:37.391942 1832 catalog_manager.cc:1253] Loading table and tablet > metadata into memory... > I1115 18:51:37.392135 1832 catalog_manager.cc:495] Loaded metadata for table > $schemas [id=9c31d249228f42b38468835a7ae2c6e6] > I1115 18:51:37.392266 1832 catalog_manager.cc:549] Loaded metadata for > tablet 1526622b192145b8973fc852c2cfbd8f (table $schemas > [id=9c31d249228f42b38468835a7ae2c6e6]) > I1115 18:51:37.392287 1832 catalog_manager.cc:549] Loaded metadata for > tablet 2842be87bec74f0592a01ca0535bd9aa (table $schemas > [id=9c31d249228f42b38468835a7ae2c6e6]) > I1115 18:51:37.392294 1832 catalog_manager.cc:1262] Initializing Kudu > cluster ID... > I1115 18:51:37.392381 1832 catalog_manager.cc:1098] Loaded cluster ID: > 70b19944b04543759922355e6ce259ac > I1115 18:51:37.392387 1832 catalog_manager.cc:1273] Initializing Kudu > internal certificate authority... > I1115 18:51:37.392593 1832 catalog_manager.cc:1282] Loading token signing > keys... > I1115 18:51:37.392693 1832 catalog_manager.cc:5093] T > P 2bc3b2318ca640a78a99fcbe4d058a9f: Loaded > TSK: 0 > I1115 18:51:37.392736 1832 catalog_manager.cc:1292] Initializing in-progress > tserver states... > I1115 18:51:37.392812 1832 catalog_manager.cc:1305] Loading latest processed > Hive Metastore notification log event ID... {code} > Now I use trino to connect to kudu and execute the following script。 > {code:java} > trino:default> create schema cdr; > CREATE SCHEMA > trino:default> use cdr; > USE > trino:cdr> show schemas; > Schema > > cdr > default > information_schema > (3 rows)Query 2028_033415_00020_4gwuw, FINISHED, 3 nodes > Splits: 36 total, 36 done (100.00%) > 0.22 [3 rows, 43B] [13 rows/s, 195B/s] > trino:cdr> CREATE TABLE kudu.cdr.users ( > -> user_id int WITH (primary_key = true), > -> first_name varchar, > -> last_name varchar > -> ) WITH ( > -> partition_by_hash_columns = ARRAY['user_id'], > -> partition_by_hash_buckets = 2 > -> ); > -> > W1118 13:56:00.671370 31226 catalog_manager.cc:1959] Remote error: failed to > create HMS catalog entry for table [id=3490249b929842509d3364a18f07a4e5]: > failed to create Hive MetaStore table: TException - service has thrown: > MetaException(message=NoSuchObjectException(message:cdr)) {code} > master log: > {code:java} > W1118 13:56:00.671370 31226 catalog_manager.cc:1959] Remote error: failed to > create HMS catalog entry for table [id=3490249b929842509d3364a18f07a4e5]: > failed to create Hive MetaStore table: TException - service has thrown: > MetaException(message=NoSuchObjectException(message:cdr)) {code} > Schemas Failed to synchronize to hive metadata,If I create this database in > hive,the error log: > {code:java} > W1118 13:30:00.148990 31226 catalog_manager.cc:1959] Remote error: failed to > create HMS catalog entry for table [id=4a40e0c12d9a4d26a11fcce0cf259d35]: > failed to create Hive MetaStore table: TException - service has thrown: > MetaException(message=java.lang.IllegalArgumentException: Can not create a > Path from an empty string) {code} > Now I don't know how to solve this problem > [link title|https://issues.apache.org/jira/browse/KUDU-3338] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26122) Factorize out common docker code between DatabaseRule and AbstractExternalDB
[ https://issues.apache.org/jira/browse/HIVE-26122?focusedWorklogId=753467&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753467 ] ASF GitHub Bot logged work on HIVE-26122: - Author: ASF GitHub Bot Created on: 06/Apr/22 15:30 Start Date: 06/Apr/22 15:30 Worklog Time Spent: 10m Work Description: asolimando closed pull request #3182: HIVE-26122: Factorize out common docker code between DatabaseRule and… URL: https://github.com/apache/hive/pull/3182 Issue Time Tracking --- Worklog Id: (was: 753467) Time Spent: 20m (was: 10m) > Factorize out common docker code between DatabaseRule and AbstractExternalDB > > > Key: HIVE-26122 > URL: https://issues.apache.org/jira/browse/HIVE-26122 > Project: Hive > Issue Type: Improvement > Components: Testing Infrastructure >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 20m > Remaining Estimate: 0h > > Currently there is a lot of shared code between the two classes which could > be extracted into a utility class called DockerUtils, since all this code > pertains docker. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HIVE-26122) Factorize out common docker code between DatabaseRule and AbstractExternalDB
[ https://issues.apache.org/jira/browse/HIVE-26122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando resolved HIVE-26122. - Resolution: Duplicate Thanks [~zabetak], I have missed that, closing as duplicate. > Factorize out common docker code between DatabaseRule and AbstractExternalDB > > > Key: HIVE-26122 > URL: https://issues.apache.org/jira/browse/HIVE-26122 > Project: Hive > Issue Type: Improvement > Components: Testing Infrastructure >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 10m > Remaining Estimate: 0h > > Currently there is a lot of shared code between the two classes which could > be extracted into a utility class called DockerUtils, since all this code > pertains docker. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753466&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753466 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 06/Apr/22 15:29 Start Date: 06/Apr/22 15:29 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r844089279 ## iceberg/iceberg-handler/src/test/queries/negative/delete_iceberg_vectorized.q: ## @@ -0,0 +1,10 @@ +set hive.vectorized.execution.enabled=true; +set hive.support.concurrency=true; +set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; Review Comment: If there is a way to skip this check for Iceberg tables, then it would be nice Issue Time Tracking --- Worklog Id: (was: 753466) Time Spent: 7.5h (was: 7h 20m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 7.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753462&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753462 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 06/Apr/22 15:23 Start Date: 06/Apr/22 15:23 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r844076945 ## iceberg/iceberg-handler/src/test/queries/negative/delete_iceberg_vectorized.q: ## @@ -0,0 +1,10 @@ +set hive.vectorized.execution.enabled=true; +set hive.support.concurrency=true; +set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; Review Comment: We get an exception here if the txn handler does not support acid operations: https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/RewriteSemanticAnalyzer.java#L70 It crossed my mind whether to disable this check for Iceberg, but it didn't seem worth the effort, since we only have the ASTTree available in this method so the parsing might be complicated Issue Time Tracking --- Worklog Id: (was: 753462) Time Spent: 7h 20m (was: 7h 10m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 7h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753461&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753461 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 06/Apr/22 15:22 Start Date: 06/Apr/22 15:22 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r844076945 ## iceberg/iceberg-handler/src/test/queries/negative/delete_iceberg_vectorized.q: ## @@ -0,0 +1,10 @@ +set hive.vectorized.execution.enabled=true; +set hive.support.concurrency=true; +set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; Review Comment: We get an exception here if the txn handler does not support acid operations: https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/RewriteSemanticAnalyzer.java#L70 It crossed my mind whether to disable this check for Iceberg, but it didn't seem worth the effort, since we only have the ASTTree available in this method Issue Time Tracking --- Worklog Id: (was: 753461) Time Spent: 7h 10m (was: 7h) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 7h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26113) Align HMS and metastore tables's schema
[ https://issues.apache.org/jira/browse/HIVE-26113?focusedWorklogId=753460&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753460 ] ASF GitHub Bot logged work on HIVE-26113: - Author: ASF GitHub Bot Created on: 06/Apr/22 15:20 Start Date: 06/Apr/22 15:20 Worklog Time Spent: 10m Work Description: asolimando commented on PR #3175: URL: https://github.com/apache/hive/pull/3175#issuecomment-1090396057 > Might be a different story, but I think it would be good to have some tests in place where we can at least run a single query against all of the tables on all of the different supported databases. I am a bit concerned that we write wrong sqls and we do not run a test against them. You are right, I have filed https://issues.apache.org/jira/browse/HIVE-26123 and I am working on it, I will resume this one once I have it working. Issue Time Tracking --- Worklog Id: (was: 753460) Time Spent: 40m (was: 0.5h) > Align HMS and metastore tables's schema > --- > > Key: HIVE-26113 > URL: https://issues.apache.org/jira/browse/HIVE-26113 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 40m > Remaining Estimate: 0h > > HMS tables should be in sync with those exposed by Hive metastore via _sysdb_. > At the moment there are some discrepancies for the existing tables, the > present ticket aims at bridging this gap. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753459&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753459 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 06/Apr/22 15:18 Start Date: 06/Apr/22 15:18 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r844076945 ## iceberg/iceberg-handler/src/test/queries/negative/delete_iceberg_vectorized.q: ## @@ -0,0 +1,10 @@ +set hive.vectorized.execution.enabled=true; +set hive.support.concurrency=true; +set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; Review Comment: We get an exception here if the txn handler does not support acid operations: https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/RewriteSemanticAnalyzer.java#L70 I wonder if we should avoid this check for Iceberg? Issue Time Tracking --- Worklog Id: (was: 753459) Time Spent: 7h (was: 6h 50m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 7h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-26122) Factorize out common docker code between DatabaseRule and AbstractExternalDB
[ https://issues.apache.org/jira/browse/HIVE-26122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26122: Affects Version/s: 4.0.0-alpha-2 (was: 4.0.0-alpha-1) > Factorize out common docker code between DatabaseRule and AbstractExternalDB > > > Key: HIVE-26122 > URL: https://issues.apache.org/jira/browse/HIVE-26122 > Project: Hive > Issue Type: Improvement > Components: Testing Infrastructure >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 10m > Remaining Estimate: 0h > > Currently there is a lot of shared code between the two classes which could > be extracted into a utility class called DockerUtils, since all this code > pertains docker. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-26123) Introduce test coverage for sysdb for the different metastores
[ https://issues.apache.org/jira/browse/HIVE-26123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26123: Affects Version/s: 4.0.0-alpha-2 (was: 4.0.0-alpha-1) > Introduce test coverage for sysdb for the different metastores > -- > > Key: HIVE-26123 > URL: https://issues.apache.org/jira/browse/HIVE-26123 > Project: Hive > Issue Type: Test > Components: Testing Infrastructure >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Fix For: 4.0.0-alpha-2 > > > _sydb_ provides a view over (some) metastore tables from Hive via JDBC > queries. > Existing tests are running only against Derby, meaning that any change > against sysdb query mapping is not covered by CI. > The present ticket aims at bridging this gap by introducing test coverage for > the different supported metastore for sydb. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25540) Enable batch update of column stats only for MySql and Postgres
[ https://issues.apache.org/jira/browse/HIVE-25540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-25540: --- Fix Version/s: 4.0.0-alpha-2 (was: 4.0.0-alpha-1) > Enable batch update of column stats only for MySql and Postgres > > > Key: HIVE-25540 > URL: https://issues.apache.org/jira/browse/HIVE-25540 > Project: Hive > Issue Type: Sub-task >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 50m > Remaining Estimate: 0h > > The batch updation of partition column stats using direct sql is tested only > for MySql and Postgres. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-26122) Factorize out common docker code between DatabaseRule and AbstractExternalDB
[ https://issues.apache.org/jira/browse/HIVE-26122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17518216#comment-17518216 ] Stamatis Zampetakis commented on HIVE-26122: [~asolimando] This looks like a duplicate of https://issues.apache.org/jira/browse/HIVE-25667. Have you seen that? > Factorize out common docker code between DatabaseRule and AbstractExternalDB > > > Key: HIVE-26122 > URL: https://issues.apache.org/jira/browse/HIVE-26122 > Project: Hive > Issue Type: Improvement > Components: Testing Infrastructure >Affects Versions: 4.0.0-alpha-1 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 10m > Remaining Estimate: 0h > > Currently there is a lot of shared code between the two classes which could > be extracted into a utility class called DockerUtils, since all this code > pertains docker. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-26075) hive metastore connection leaking when hiveserver2 kerberos enable and hive.server2.enable.doAs set to true
[ https://issues.apache.org/jira/browse/HIVE-26075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-26075: -- Labels: pull-request-available (was: ) > hive metastore connection leaking when hiveserver2 kerberos enable and > hive.server2.enable.doAs set to true > > > Key: HIVE-26075 > URL: https://issues.apache.org/jira/browse/HIVE-26075 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0 >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Labels: pull-request-available > Attachments: HIVE-26075.patch > > Time Spent: 10m > Remaining Estimate: 0h > > (1)When hadoop cluster kerberos is enable > (2) HiveServer2 config hive.server2.enable.doAs is set true > After a beeline scripte has been executed, hivemetastore connection is > created are in ESTABLISHED state and never closed. > If we submit a lot of task to hiveserver2 ,this will result in hive metastore > thrift thread(default is 1000) full ,thus new task will fail. > > HiveServer2 use ThreadLocal to store multithreading metastore > connection,the application should call Hive.closeCurrent() to close > connection after task finished. > > When HiveServer2 impersonate is enable (hive.server2.enable.doAs is set > true), the ugi will create proxy user via > UserGroupInformation.createProxyUser( > owner, UserGroupInformation.getLoginUser()),the old metastore client is never > closed. > > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26075) hive metastore connection leaking when hiveserver2 kerberos enable and hive.server2.enable.doAs set to true
[ https://issues.apache.org/jira/browse/HIVE-26075?focusedWorklogId=753456&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753456 ] ASF GitHub Bot logged work on HIVE-26075: - Author: ASF GitHub Bot Created on: 06/Apr/22 15:13 Start Date: 06/Apr/22 15:13 Worklog Time Spent: 10m Work Description: lgh-cn opened a new pull request, #3183: URL: https://github.com/apache/hive/pull/3183 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Issue Time Tracking --- Worklog Id: (was: 753456) Remaining Estimate: 0h Time Spent: 10m > hive metastore connection leaking when hiveserver2 kerberos enable and > hive.server2.enable.doAs set to true > > > Key: HIVE-26075 > URL: https://issues.apache.org/jira/browse/HIVE-26075 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0 >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Attachments: HIVE-26075.patch > > Time Spent: 10m > Remaining Estimate: 0h > > (1)When hadoop cluster kerberos is enable > (2) HiveServer2 config hive.server2.enable.doAs is set true > After a beeline scripte has been executed, hivemetastore connection is > created are in ESTABLISHED state and never closed. > If we submit a lot of task to hiveserver2 ,this will result in hive metastore > thrift thread(default is 1000) full ,thus new task will fail. > > HiveServer2 use ThreadLocal to store multithreading metastore > connection,the application should call Hive.closeCurrent() to close > connection after task finished. > > When HiveServer2 impersonate is enable (hive.server2.enable.doAs is set > true), the ugi will create proxy user via > UserGroupInformation.createProxyUser( > owner, UserGroupInformation.getLoginUser()),the old metastore client is never > closed. > > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-26104) HIVE-19138 May block queries to compile
[ https://issues.apache.org/jira/browse/HIVE-26104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17518180#comment-17518180 ] Stamatis Zampetakis commented on HIVE-26104: [~liuyan] Can you clarify if we are talking about queries in the same session or different sessions? > HIVE-19138 May block queries to compile > --- > > Key: HIVE-26104 > URL: https://issues.apache.org/jira/browse/HIVE-26104 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 3.0.0, 3.1.2 >Reporter: liuyan >Priority: Critical > > HIVE-19138 introduce a way to allow other queries to stay in compilation > state while there are placeholder for the same query in result cache. > However, multiple queires may enter the same state and hence used all the > avaliable parallel compilation limit via > hive.driver.parallel.compilation.global.limit.Althought we can turn off > this feature by setting hive.query.results.cache.wait.for.pending.results = > false, but seems this negelects all the efforts that Hive-19138 trying to > reslove. We need a better solution for such situation -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work started] (HIVE-26123) Introduce test coverage for sysdb for the different metastores
[ https://issues.apache.org/jira/browse/HIVE-26123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-26123 started by Alessandro Solimando. --- > Introduce test coverage for sysdb for the different metastores > -- > > Key: HIVE-26123 > URL: https://issues.apache.org/jira/browse/HIVE-26123 > Project: Hive > Issue Type: Test > Components: Testing Infrastructure >Affects Versions: 4.0.0-alpha-1 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Fix For: 4.0.0-alpha-2 > > > _sydb_ provides a view over (some) metastore tables from Hive via JDBC > queries. Existing tests are running only against Derby, meaning that any > change against sysdb query mapping are not covered by CI. > The present ticket aims at bridging this gap by introducing test coverage for > the different supported metastore for sydb. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HIVE-26123) Introduce test coverage for sysdb for the different metastores
[ https://issues.apache.org/jira/browse/HIVE-26123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando reassigned HIVE-26123: --- > Introduce test coverage for sysdb for the different metastores > -- > > Key: HIVE-26123 > URL: https://issues.apache.org/jira/browse/HIVE-26123 > Project: Hive > Issue Type: Test > Components: Testing Infrastructure >Affects Versions: 4.0.0-alpha-1 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Fix For: 4.0.0-alpha-2 > > > _sydb_ provides a view over (some) metastore tables from Hive via JDBC > queries. Existing tests are running only against Derby, meaning that any > change against sysdb query mapping are not covered by CI. > The present ticket aims at bridging this gap by introducing test coverage for > the different supported metastore for sydb. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-26123) Introduce test coverage for sysdb for the different metastores
[ https://issues.apache.org/jira/browse/HIVE-26123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26123: Description: _sydb_ provides a view over (some) metastore tables from Hive via JDBC queries. Existing tests are running only against Derby, meaning that any change against sysdb query mapping is not covered by CI. The present ticket aims at bridging this gap by introducing test coverage for the different supported metastore for sydb. was: _sydb_ provides a view over (some) metastore tables from Hive via JDBC queries. Existing tests are running only against Derby, meaning that any change against sysdb query mapping are not covered by CI. The present ticket aims at bridging this gap by introducing test coverage for the different supported metastore for sydb. > Introduce test coverage for sysdb for the different metastores > -- > > Key: HIVE-26123 > URL: https://issues.apache.org/jira/browse/HIVE-26123 > Project: Hive > Issue Type: Test > Components: Testing Infrastructure >Affects Versions: 4.0.0-alpha-1 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Fix For: 4.0.0-alpha-2 > > > _sydb_ provides a view over (some) metastore tables from Hive via JDBC > queries. Existing tests are running only against Derby, meaning that any > change against sysdb query mapping is not covered by CI. > The present ticket aims at bridging this gap by introducing test coverage for > the different supported metastore for sydb. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-26123) Introduce test coverage for sysdb for the different metastores
[ https://issues.apache.org/jira/browse/HIVE-26123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando updated HIVE-26123: Description: _sydb_ provides a view over (some) metastore tables from Hive via JDBC queries. Existing tests are running only against Derby, meaning that any change against sysdb query mapping is not covered by CI. The present ticket aims at bridging this gap by introducing test coverage for the different supported metastore for sydb. was: _sydb_ provides a view over (some) metastore tables from Hive via JDBC queries. Existing tests are running only against Derby, meaning that any change against sysdb query mapping is not covered by CI. The present ticket aims at bridging this gap by introducing test coverage for the different supported metastore for sydb. > Introduce test coverage for sysdb for the different metastores > -- > > Key: HIVE-26123 > URL: https://issues.apache.org/jira/browse/HIVE-26123 > Project: Hive > Issue Type: Test > Components: Testing Infrastructure >Affects Versions: 4.0.0-alpha-1 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Fix For: 4.0.0-alpha-2 > > > _sydb_ provides a view over (some) metastore tables from Hive via JDBC > queries. > Existing tests are running only against Derby, meaning that any change > against sysdb query mapping is not covered by CI. > The present ticket aims at bridging this gap by introducing test coverage for > the different supported metastore for sydb. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-26122) Factorize out common docker code between DatabaseRule and AbstractExternalDB
[ https://issues.apache.org/jira/browse/HIVE-26122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-26122: -- Labels: pull-request-available (was: ) > Factorize out common docker code between DatabaseRule and AbstractExternalDB > > > Key: HIVE-26122 > URL: https://issues.apache.org/jira/browse/HIVE-26122 > Project: Hive > Issue Type: Improvement > Components: Testing Infrastructure >Affects Versions: 4.0.0-alpha-1 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 10m > Remaining Estimate: 0h > > Currently there is a lot of shared code between the two classes which could > be extracted into a utility class called DockerUtils, since all this code > pertains docker. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26122) Factorize out common docker code between DatabaseRule and AbstractExternalDB
[ https://issues.apache.org/jira/browse/HIVE-26122?focusedWorklogId=753375&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753375 ] ASF GitHub Bot logged work on HIVE-26122: - Author: ASF GitHub Bot Created on: 06/Apr/22 13:12 Start Date: 06/Apr/22 13:12 Worklog Time Spent: 10m Work Description: asolimando opened a new pull request, #3182: URL: https://github.com/apache/hive/pull/3182 … AbstractExternalDB Introduced support for running docker-based tests on MacOS ### What changes were proposed in this pull request? Reduce code duplication by introducing a utility class for common code. ### Why are the changes needed? There is a lot of redundancy between the classes. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Locally on MacOS and via remote CI. Issue Time Tracking --- Worklog Id: (was: 753375) Remaining Estimate: 0h Time Spent: 10m > Factorize out common docker code between DatabaseRule and AbstractExternalDB > > > Key: HIVE-26122 > URL: https://issues.apache.org/jira/browse/HIVE-26122 > Project: Hive > Issue Type: Improvement > Components: Testing Infrastructure >Affects Versions: 4.0.0-alpha-1 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Fix For: 4.0.0-alpha-2 > > Time Spent: 10m > Remaining Estimate: 0h > > Currently there is a lot of shared code between the two classes which could > be extracted into a utility class called DockerUtils, since all this code > pertains docker. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work started] (HIVE-26122) Factorize out common docker code between DatabaseRule and AbstractExternalDB
[ https://issues.apache.org/jira/browse/HIVE-26122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-26122 started by Alessandro Solimando. --- > Factorize out common docker code between DatabaseRule and AbstractExternalDB > > > Key: HIVE-26122 > URL: https://issues.apache.org/jira/browse/HIVE-26122 > Project: Hive > Issue Type: Improvement > Components: Testing Infrastructure >Affects Versions: 4.0.0-alpha-1 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Fix For: 4.0.0-alpha-2 > > > Currently there is a lot of shared code between the two classes which could > be extracted into a utility class called DockerUtils, since all this code > pertains docker. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HIVE-26122) Factorize out common docker code between DatabaseRule and AbstractExternalDB
[ https://issues.apache.org/jira/browse/HIVE-26122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Solimando reassigned HIVE-26122: --- > Factorize out common docker code between DatabaseRule and AbstractExternalDB > > > Key: HIVE-26122 > URL: https://issues.apache.org/jira/browse/HIVE-26122 > Project: Hive > Issue Type: Improvement > Components: Testing Infrastructure >Affects Versions: 4.0.0-alpha-1 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Fix For: 4.0.0-alpha-2 > > > Currently there is a lot of shared code between the two classes which could > be extracted into a utility class called DockerUtils, since all this code > pertains docker. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753367&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753367 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 06/Apr/22 12:57 Start Date: 06/Apr/22 12:57 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r843917563 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergDeleteWriter.java: ## @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg.mr.hive; + +import java.io.IOException; +import java.util.List; +import org.apache.hadoop.io.Writable; +import org.apache.hadoop.mapred.TaskAttemptID; +import org.apache.iceberg.DeleteFile; +import org.apache.iceberg.FileFormat; +import org.apache.iceberg.PartitionSpec; +import org.apache.iceberg.Schema; +import org.apache.iceberg.data.Record; +import org.apache.iceberg.deletes.PositionDelete; +import org.apache.iceberg.io.ClusteredPositionDeleteWriter; +import org.apache.iceberg.io.FileIO; +import org.apache.iceberg.io.FileWriterFactory; +import org.apache.iceberg.io.OutputFileFactory; +import org.apache.iceberg.mr.mapred.Container; +import org.apache.iceberg.util.Tasks; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class HiveIcebergDeleteWriter extends HiveIcebergWriter { + private static final Logger LOG = LoggerFactory.getLogger(HiveIcebergDeleteWriter.class); + + private final ClusteredPositionDeleteWriter innerWriter; + + HiveIcebergDeleteWriter(Schema schema, PartitionSpec spec, FileFormat fileFormat, + FileWriterFactory writerFactory, OutputFileFactory fileFactory, FileIO io, long targetFileSize, + TaskAttemptID taskAttemptID, String tableName) { +super(schema, spec, io, taskAttemptID, tableName, true); +this.innerWriter = new ClusteredPositionDeleteWriter<>(writerFactory, fileFactory, io, fileFormat, targetFileSize); + } + + @Override + public void write(Writable row) throws IOException { +Record rec = ((Container) row).get(); +PositionDelete positionDelete = IcebergAcidUtil.getPositionDelete(spec.schema(), rec); +innerWriter.write(positionDelete, spec, partition(positionDelete.row())); + } + + @Override + public void close(boolean abort) throws IOException { +innerWriter.close(); +List deleteFiles = deleteFiles(); + +// If abort then remove the unnecessary files +if (abort) { + Tasks.foreach(deleteFiles) + .retry(3) + .suppressFailureWhenFinished() + .onFailure((file, exception) -> LOG.debug("Failed on to remove delete file {} on abort", file, exception)) + .run(deleteFile -> io.deleteFile(deleteFile.path().toString())); +} + +LOG.info("IcebergDeleteWriter is closed with abort={}. Created {} files", abort, deleteFiles.size()); + } + + @Override + public List deleteFiles() { Review Comment: Refactored interface to be `protected abstract FileForCommit files()` and moved the `close()` method into the parent class Issue Time Tracking --- Worklog Id: (was: 753367) Time Spent: 6h 40m (was: 6.5h) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 6h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753368&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753368 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 06/Apr/22 12:57 Start Date: 06/Apr/22 12:57 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r843918164 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java: ## @@ -118,18 +120,23 @@ public void commitTask(TaskAttemptContext originalContext) throws IOException { .run(output -> { Table table = HiveIcebergStorageHandler.table(context.getJobConf(), output); if (table != null) { - HiveIcebergRecordWriter writer = writers.get(output); - DataFile[] closedFiles; + HiveIcebergWriter writer = writers.get(output); + HiveIcebergWriter delWriter = delWriters.get(output); + String fileForCommitLocation = generateFileForCommitLocation(table.location(), jobConf, + attemptID.getJobID(), attemptID.getTaskID().getId()); + if (delWriter != null) { +DeleteFile[] closedFiles = delWriter.deleteFiles().toArray(new DeleteFile[0]); +createFileForCommit(closedFiles, fileForCommitLocation, table.io()); Review Comment: I've created a new container class `FilesForCommit`, which we now use to serialize into S3 during commitTask, and read it back during jobCommit Issue Time Tracking --- Worklog Id: (was: 753368) Time Spent: 6h 50m (was: 6h 40m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 6h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (HIVE-25934) Non blocking RENAME PARTITION implementation
[ https://issues.apache.org/jira/browse/HIVE-25934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko closed HIVE-25934. - > Non blocking RENAME PARTITION implementation > > > Key: HIVE-25934 > URL: https://issues.apache.org/jira/browse/HIVE-25934 > Project: Hive > Issue Type: Task >Reporter: Denys Kuzmenko >Assignee: Denys Kuzmenko >Priority: Major > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > > Implement RENAME PARTITION in a way that doesn't have to wait for currently > running read operations to be finished. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-26104) HIVE-19138 May block queries to compile
[ https://issues.apache.org/jira/browse/HIVE-26104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17518050#comment-17518050 ] liuyan commented on HIVE-26104: --- Log findings 2022-03-22 05:23:00,252 INFO org.apache.hadoop.hive.ql.cache.results.QueryResultsCache: [f4c8782f-903d-46da-96d5-4e45d72ff431 HiveServer2-Handler-Pool: Thread-8884649]: Waiting on pending cacheEntry 2022-03-22 05:54:25,257 INFO org.apache.hadoop.hive.ql.Driver: [f4c8782f-903d-46da-96d5-4e45d72ff431 HiveServer2-Handler-Pool: Thread-8884649]: Semantic Analysis Completed (retrial = false) 2022-03-22 05:54:25,304 INFO org.apache.hadoop.hive.ql.Driver: [f4c8782f-903d-46da-96d5-4e45d72ff431 HiveServer2-Handler-Pool: Thread-8884649]: Completed compiling command(queryId=hive_20220322052300_7c219f1f-b969-49bb-aa7b-ea2f8926ac76); Time taken: 1885.298 seconds seems the query(hive_20220322052300_7c219f1f-b969-49bb-aa7b-ea2f8926ac76) was freezeed for 30 minutes for compilation due to Waiting on pending cacheEntry. it introduces two issues : 1. The user does not aware of the waiting for pending cache status, so from the beeline or client side, the user do not know why the query is not executing for a very long period. we need to notify the user in some sort of way so that the user aware this query is currently waiting for cache(hence will not run before the cache went to ready state ) 2. We had hive.driver.parallel.compilation.global.limit normally set to 3 , which means that if we have 4 identical queries runs on the managed table, the 4th query will be blocked, as well as any following queries sending to this HS2 > HIVE-19138 May block queries to compile > --- > > Key: HIVE-26104 > URL: https://issues.apache.org/jira/browse/HIVE-26104 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 3.0.0, 3.1.2 >Reporter: liuyan >Priority: Critical > > HIVE-19138 introduce a way to allow other queries to stay in compilation > state while there are placeholder for the same query in result cache. > However, multiple queires may enter the same state and hence used all the > avaliable parallel compilation limit via > hive.driver.parallel.compilation.global.limit.Althought we can turn off > this feature by setting hive.query.results.cache.wait.for.pending.results = > false, but seems this negelects all the efforts that Hive-19138 trying to > reslove. We need a better solution for such situation -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26121) Hive transaction rollback should be thread-safe
[ https://issues.apache.org/jira/browse/HIVE-26121?focusedWorklogId=753313&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753313 ] ASF GitHub Bot logged work on HIVE-26121: - Author: ASF GitHub Bot Created on: 06/Apr/22 11:28 Start Date: 06/Apr/22 11:28 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3181: URL: https://github.com/apache/hive/pull/3181#discussion_r843829763 ## ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java: ## @@ -570,7 +570,7 @@ void endTransactionAndCleanup(boolean commit) throws LockException { txnRollbackRunner = null; } - void endTransactionAndCleanup(boolean commit, HiveTxnManager txnManager) throws LockException { + synchronized void endTransactionAndCleanup(boolean commit, HiveTxnManager txnManager) throws LockException { Review Comment: added Issue Time Tracking --- Worklog Id: (was: 753313) Time Spent: 0.5h (was: 20m) > Hive transaction rollback should be thread-safe > --- > > Key: HIVE-26121 > URL: https://issues.apache.org/jira/browse/HIVE-26121 > Project: Hive > Issue Type: Task >Reporter: Denys Kuzmenko >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > When Hive query is being interrupted via cancel request, both the background > pool thread (HiveServer2-Background) executing the query and the HttpHandler > thread (HiveServer2-Handler) running the HiveSession.cancelOperation logic > will eventually trigger the below method: > {code} > DriverTxnHandler.endTransactionAndCleanup(boolean commit) > {code} > Since this method could be invoked concurrently we need to synchronize access > to it, so that only 1 thread would attempt to abort the transaction and stop > the heartbeat. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26121) Hive transaction rollback should be thread-safe
[ https://issues.apache.org/jira/browse/HIVE-26121?focusedWorklogId=753311&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753311 ] ASF GitHub Bot logged work on HIVE-26121: - Author: ASF GitHub Bot Created on: 06/Apr/22 11:27 Start Date: 06/Apr/22 11:27 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3181: URL: https://github.com/apache/hive/pull/3181#discussion_r843829383 ## ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java: ## @@ -710,49 +691,32 @@ private Heartbeater startHeartbeat(long initialDelay) throws LockException { return task; } - private void stopHeartbeat() { -if (heartbeatTask == null) { - // avoid unnecessary locking if the field is null - return; -} - -boolean isLockAcquired = false; -try { - // The lock should not be held by other thread trying to stop the heartbeat for more than 31 seconds - isLockAcquired = heartbeatTaskLock.tryLock(31000, TimeUnit.MILLISECONDS); -} catch (InterruptedException e) { - // safe to go on -} - -try { - if (isLockAcquired && heartbeatTask != null) { -heartbeatTask.cancel(true); -long startTime = System.currentTimeMillis(); -long sleepInterval = 100; -while (!heartbeatTask.isCancelled() && !heartbeatTask.isDone()) { - // We will wait for 30 seconds for the task to be cancelled. - // If it's still not cancelled (unlikely), we will just move on. - long now = System.currentTimeMillis(); - if (now - startTime > 3) { -LOG.warn("Heartbeat task cannot be cancelled for unknown reason. QueryId: " + queryId); -break; - } - try { -Thread.sleep(sleepInterval); - } catch (InterruptedException e) { - } - sleepInterval *= 2; + private synchronized void stopHeartbeat() { Review Comment: added Issue Time Tracking --- Worklog Id: (was: 753311) Remaining Estimate: 0h Time Spent: 10m > Hive transaction rollback should be thread-safe > --- > > Key: HIVE-26121 > URL: https://issues.apache.org/jira/browse/HIVE-26121 > Project: Hive > Issue Type: Task >Reporter: Denys Kuzmenko >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > When Hive query is being interrupted via cancel request, both the background > pool thread (HiveServer2-Background) executing the query and the HttpHandler > thread (HiveServer2-Handler) running the HiveSession.cancelOperation logic > will eventually trigger the below method: > {code} > DriverTxnHandler.endTransactionAndCleanup(boolean commit) > {code} > Since this method could be invoked concurrently we need to synchronize access > to it, so that only 1 thread would attempt to abort the transaction and stop > the heartbeat. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26121) Hive transaction rollback should be thread-safe
[ https://issues.apache.org/jira/browse/HIVE-26121?focusedWorklogId=753312&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753312 ] ASF GitHub Bot logged work on HIVE-26121: - Author: ASF GitHub Bot Created on: 06/Apr/22 11:27 Start Date: 06/Apr/22 11:27 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3181: URL: https://github.com/apache/hive/pull/3181#discussion_r843829613 ## ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java: ## @@ -574,30 +571,24 @@ public void rollbackTxn() throws LockException { if (!isTxnOpen()) { throw new RuntimeException("Attempt to rollback before opening a transaction"); } -stopHeartbeat(); - try { - lockMgr.clearLocalLockRecords(); + clearLocksAndHB(); LOG.debug("Rolling back " + JavaUtils.txnIdToString(txnId)); - - // Re-checking as txn could have been closed, in the meantime, by a competing thread. - if (isTxnOpen()) { -if (replPolicy != null) { - getMS().replRollbackTxn(txnId, replPolicy, TxnType.DEFAULT); -} else { - getMS().rollbackTxn(txnId); -} + + if (replPolicy != null) { Review Comment: marked as @NotThreadSafe Issue Time Tracking --- Worklog Id: (was: 753312) Time Spent: 20m (was: 10m) > Hive transaction rollback should be thread-safe > --- > > Key: HIVE-26121 > URL: https://issues.apache.org/jira/browse/HIVE-26121 > Project: Hive > Issue Type: Task >Reporter: Denys Kuzmenko >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > When Hive query is being interrupted via cancel request, both the background > pool thread (HiveServer2-Background) executing the query and the HttpHandler > thread (HiveServer2-Handler) running the HiveSession.cancelOperation logic > will eventually trigger the below method: > {code} > DriverTxnHandler.endTransactionAndCleanup(boolean commit) > {code} > Since this method could be invoked concurrently we need to synchronize access > to it, so that only 1 thread would attempt to abort the transaction and stop > the heartbeat. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-26121) Hive transaction rollback should be thread-safe
[ https://issues.apache.org/jira/browse/HIVE-26121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-26121: -- Labels: pull-request-available (was: ) > Hive transaction rollback should be thread-safe > --- > > Key: HIVE-26121 > URL: https://issues.apache.org/jira/browse/HIVE-26121 > Project: Hive > Issue Type: Task >Reporter: Denys Kuzmenko >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When Hive query is being interrupted via cancel request, both the background > pool thread (HiveServer2-Background) executing the query and the HttpHandler > thread (HiveServer2-Handler) running the HiveSession.cancelOperation logic > will eventually trigger the below method: > {code} > DriverTxnHandler.endTransactionAndCleanup(boolean commit) > {code} > Since this method could be invoked concurrently we need to synchronize access > to it, so that only 1 thread would attempt to abort the transaction and stop > the heartbeat. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-26121) Hive transaction rollback should be thread-safe
[ https://issues.apache.org/jira/browse/HIVE-26121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko updated HIVE-26121: -- Description: When Hive query is being interrupted via cancel request, both the background pool thread (HiveServer2-Background) executing the query and the HttpHandler thread (HiveServer2-Handler) running the HiveSession.cancelOperation logic will eventually trigger the below method: {code} DriverTxnHandler.endTransactionAndCleanup(boolean commit) {code} Since this method could be invoked concurrently we need to synchronize access to it, so that only 1 thread would attempt to abort the transaction and stop the heartbeat. was: When Hive query is being interrupted via cancel request, both the background pool thread (HiveServer2-Background) executing the query and the HttpHandler thread (HiveServer2-Handler) running the HiveSession.cancelOperation logic will eventually trigger the below method: {code} DriverTxnHandler.endTransactionAndCleanup(boolean commit) {code} Since this method could be invoked concurrently we need to synchronize access to it, so that one 1 thread would abort the transaction and stop the heartbeat. > Hive transaction rollback should be thread-safe > --- > > Key: HIVE-26121 > URL: https://issues.apache.org/jira/browse/HIVE-26121 > Project: Hive > Issue Type: Task >Reporter: Denys Kuzmenko >Priority: Major > > When Hive query is being interrupted via cancel request, both the background > pool thread (HiveServer2-Background) executing the query and the HttpHandler > thread (HiveServer2-Handler) running the HiveSession.cancelOperation logic > will eventually trigger the below method: > {code} > DriverTxnHandler.endTransactionAndCleanup(boolean commit) > {code} > Since this method could be invoked concurrently we need to synchronize access > to it, so that only 1 thread would attempt to abort the transaction and stop > the heartbeat. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-26121) Hive transaction rollback should be thread-safe
[ https://issues.apache.org/jira/browse/HIVE-26121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko updated HIVE-26121: -- Description: When Hive query is being interrupted via cancel request, both the background pool thread (HiveServer2-Background) executing the query and the HttpHandler thread (HiveServer2-Handler) running the HiveSession.cancelOperation logic will eventually trigger the below method: {code} DriverTxnHandler.endTransactionAndCleanup(boolean commit) {code} Since this method could be invoked concurrently we need to synchronize access to it, so that one 1 thread would abort the transaction and stop the heartbeat. > Hive transaction rollback should be thread-safe > --- > > Key: HIVE-26121 > URL: https://issues.apache.org/jira/browse/HIVE-26121 > Project: Hive > Issue Type: Task >Reporter: Denys Kuzmenko >Priority: Major > > When Hive query is being interrupted via cancel request, both the background > pool thread (HiveServer2-Background) executing the query and the HttpHandler > thread (HiveServer2-Handler) running the HiveSession.cancelOperation logic > will eventually trigger the below method: > {code} > DriverTxnHandler.endTransactionAndCleanup(boolean commit) > {code} > Since this method could be invoked concurrently we need to synchronize access > to it, so that one 1 thread would abort the transaction and stop the > heartbeat. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-22420) DbTxnManager.stopHeartbeat() should be thread-safe
[ https://issues.apache.org/jira/browse/HIVE-22420?focusedWorklogId=753289&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753289 ] ASF GitHub Bot logged work on HIVE-22420: - Author: ASF GitHub Bot Created on: 06/Apr/22 10:05 Start Date: 06/Apr/22 10:05 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3181: URL: https://github.com/apache/hive/pull/3181#discussion_r843750551 ## ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java: ## @@ -710,49 +691,32 @@ private Heartbeater startHeartbeat(long initialDelay) throws LockException { return task; } - private void stopHeartbeat() { -if (heartbeatTask == null) { - // avoid unnecessary locking if the field is null - return; -} - -boolean isLockAcquired = false; -try { - // The lock should not be held by other thread trying to stop the heartbeat for more than 31 seconds - isLockAcquired = heartbeatTaskLock.tryLock(31000, TimeUnit.MILLISECONDS); -} catch (InterruptedException e) { - // safe to go on -} - -try { - if (isLockAcquired && heartbeatTask != null) { -heartbeatTask.cancel(true); -long startTime = System.currentTimeMillis(); -long sleepInterval = 100; -while (!heartbeatTask.isCancelled() && !heartbeatTask.isDone()) { - // We will wait for 30 seconds for the task to be cancelled. - // If it's still not cancelled (unlikely), we will just move on. - long now = System.currentTimeMillis(); - if (now - startTime > 3) { -LOG.warn("Heartbeat task cannot be cancelled for unknown reason. QueryId: " + queryId); -break; - } - try { -Thread.sleep(sleepInterval); - } catch (InterruptedException e) { - } - sleepInterval *= 2; + private synchronized void stopHeartbeat() { Review Comment: Comment here as well Issue Time Tracking --- Worklog Id: (was: 753289) Time Spent: 40m (was: 0.5h) > DbTxnManager.stopHeartbeat() should be thread-safe > -- > > Key: HIVE-22420 > URL: https://issues.apache.org/jira/browse/HIVE-22420 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Aron Hamvas >Assignee: Aron Hamvas >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 4.0.0-alpha-1 > > Attachments: HIVE-22420.1.patch, HIVE-22420.2.patch > > Time Spent: 40m > Remaining Estimate: 0h > > When a transactional query is being executed and interrupted via HS2 close > operation request, both the background pool thread executing the query and > the HttpHandler thread running the close operation logic will eventually call > the below method: > {noformat} > Driver.releaseLocksAndCommitOrRollback(commit boolean) > {noformat} > Since this method is invoked several times in both threads, it can happen > that the two threads invoke it at the same time, and due to a race condition, > the txnId field of the DbTxnManager used by both threads could be set to 0 > without actually successfully aborting the transaction. > The root cause is stopHeartbeat() method in DbTxnManager not being thread > safe: > When Thread-1 and Thread-2 enter stopHeartbeat() with very little time > difference, Thread-1 might successfully cancel the heartbeat task and set the > heartbeatTask field to null, while Thread-2 is trying to observe its state. > Thread-1 will return to the calling rollbackTxn() method and continue > execution there, while Thread-2 wis thrown back to the same method with a > NullPointerException. Thread-2 will then set txnId to 0, and Thread-1 is > sending this 0 value to HMS. So, the txn will not be aborted, and the locks > cannot be released later on either. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-22420) DbTxnManager.stopHeartbeat() should be thread-safe
[ https://issues.apache.org/jira/browse/HIVE-22420?focusedWorklogId=753288&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753288 ] ASF GitHub Bot logged work on HIVE-22420: - Author: ASF GitHub Bot Created on: 06/Apr/22 10:04 Start Date: 06/Apr/22 10:04 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3181: URL: https://github.com/apache/hive/pull/3181#discussion_r843750056 ## ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java: ## @@ -574,30 +571,24 @@ public void rollbackTxn() throws LockException { if (!isTxnOpen()) { throw new RuntimeException("Attempt to rollback before opening a transaction"); } -stopHeartbeat(); - try { - lockMgr.clearLocalLockRecords(); + clearLocksAndHB(); LOG.debug("Rolling back " + JavaUtils.txnIdToString(txnId)); - - // Re-checking as txn could have been closed, in the meantime, by a competing thread. - if (isTxnOpen()) { -if (replPolicy != null) { - getMS().replRollbackTxn(txnId, replPolicy, TxnType.DEFAULT); -} else { - getMS().rollbackTxn(txnId); -} + + if (replPolicy != null) { Review Comment: If we expect that this class should not be shared between threads, then we should write a comment on the class level for it Issue Time Tracking --- Worklog Id: (was: 753288) Time Spent: 0.5h (was: 20m) > DbTxnManager.stopHeartbeat() should be thread-safe > -- > > Key: HIVE-22420 > URL: https://issues.apache.org/jira/browse/HIVE-22420 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Aron Hamvas >Assignee: Aron Hamvas >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 4.0.0-alpha-1 > > Attachments: HIVE-22420.1.patch, HIVE-22420.2.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > When a transactional query is being executed and interrupted via HS2 close > operation request, both the background pool thread executing the query and > the HttpHandler thread running the close operation logic will eventually call > the below method: > {noformat} > Driver.releaseLocksAndCommitOrRollback(commit boolean) > {noformat} > Since this method is invoked several times in both threads, it can happen > that the two threads invoke it at the same time, and due to a race condition, > the txnId field of the DbTxnManager used by both threads could be set to 0 > without actually successfully aborting the transaction. > The root cause is stopHeartbeat() method in DbTxnManager not being thread > safe: > When Thread-1 and Thread-2 enter stopHeartbeat() with very little time > difference, Thread-1 might successfully cancel the heartbeat task and set the > heartbeatTask field to null, while Thread-2 is trying to observe its state. > Thread-1 will return to the calling rollbackTxn() method and continue > execution there, while Thread-2 wis thrown back to the same method with a > NullPointerException. Thread-2 will then set txnId to 0, and Thread-1 is > sending this 0 value to HMS. So, the txn will not be aborted, and the locks > cannot be released later on either. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-22420) DbTxnManager.stopHeartbeat() should be thread-safe
[ https://issues.apache.org/jira/browse/HIVE-22420?focusedWorklogId=753285&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753285 ] ASF GitHub Bot logged work on HIVE-22420: - Author: ASF GitHub Bot Created on: 06/Apr/22 10:00 Start Date: 06/Apr/22 10:00 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3181: URL: https://github.com/apache/hive/pull/3181#discussion_r843745809 ## ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java: ## @@ -570,7 +570,7 @@ void endTransactionAndCleanup(boolean commit) throws LockException { txnRollbackRunner = null; } - void endTransactionAndCleanup(boolean commit, HiveTxnManager txnManager) throws LockException { + synchronized void endTransactionAndCleanup(boolean commit, HiveTxnManager txnManager) throws LockException { Review Comment: Could we leave a comment here, why is this synchronized? Issue Time Tracking --- Worklog Id: (was: 753285) Time Spent: 20m (was: 10m) > DbTxnManager.stopHeartbeat() should be thread-safe > -- > > Key: HIVE-22420 > URL: https://issues.apache.org/jira/browse/HIVE-22420 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Aron Hamvas >Assignee: Aron Hamvas >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 4.0.0-alpha-1 > > Attachments: HIVE-22420.1.patch, HIVE-22420.2.patch > > Time Spent: 20m > Remaining Estimate: 0h > > When a transactional query is being executed and interrupted via HS2 close > operation request, both the background pool thread executing the query and > the HttpHandler thread running the close operation logic will eventually call > the below method: > {noformat} > Driver.releaseLocksAndCommitOrRollback(commit boolean) > {noformat} > Since this method is invoked several times in both threads, it can happen > that the two threads invoke it at the same time, and due to a race condition, > the txnId field of the DbTxnManager used by both threads could be set to 0 > without actually successfully aborting the transaction. > The root cause is stopHeartbeat() method in DbTxnManager not being thread > safe: > When Thread-1 and Thread-2 enter stopHeartbeat() with very little time > difference, Thread-1 might successfully cancel the heartbeat task and set the > heartbeatTask field to null, while Thread-2 is trying to observe its state. > Thread-1 will return to the calling rollbackTxn() method and continue > execution there, while Thread-2 wis thrown back to the same method with a > NullPointerException. Thread-2 will then set txnId to 0, and Thread-1 is > sending this 0 value to HMS. So, the txn will not be aborted, and the locks > cannot be released later on either. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-26120) beeline return 0 when Could not open connection to the HS2 server ERROR
[ https://issues.apache.org/jira/browse/HIVE-26120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MK updated HIVE-26120: -- Summary: beeline return 0 when Could not open connection to the HS2 server ERROR (was: beeline return 0 when Could not open connection to the HS2 server) > beeline return 0 when Could not open connection to the HS2 server ERROR > --- > > Key: HIVE-26120 > URL: https://issues.apache.org/jira/browse/HIVE-26120 > Project: Hive > Issue Type: Bug > Components: Beeline >Reporter: MK >Priority: Major > > when execute : beeline -u 'jdbc:hive2://bigdata-hs111:10003' -n 'etl' -p > '**' -f /opt/project/DWD/SPD/xxx.sql and bigdata-hs111 doesn't > exists or can't connect , the command return code is 0 , NOT a Non-zero > value . > > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/data/programs/apache-hive-3.1.2-bin/lib/log4j-slf4j-impl-2.17.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/data/programs/hadoop-3.1.4/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Connecting to jdbc:hive2://bigdata-hs111:10003 > 2022-04-06T17:28:04,247 WARN [main] org.apache.hive.jdbc.Utils - Could not > retrieve canonical hostname for bigdata-hs111 > java.net.UnknownHostException: bigdata-hs111: Name or service not known > at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) > ~[?:1.8.0_191] > at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) > ~[?:1.8.0_191] > at > java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) > ~[?:1.8.0_191] > at java.net.InetAddress.getAllByName0(InetAddress.java:1277) > ~[?:1.8.0_191] > at java.net.InetAddress.getAllByName(InetAddress.java:1193) > ~[?:1.8.0_191] > at java.net.InetAddress.getAllByName(InetAddress.java:1127) > ~[?:1.8.0_191] > at java.net.InetAddress.getByName(InetAddress.java:1077) > ~[?:1.8.0_191] > at org.apache.hive.jdbc.Utils.getCanonicalHostName(Utils.java:701) > [hive-jdbc-3.1.2.jar:3.1.2] > at > org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:178) > [hive-jdbc-3.1.2.jar:3.1.2] > at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:107) > [hive-jdbc-3.1.2.jar:3.1.2] > at java.sql.DriverManager.getConnection(DriverManager.java:664) > [?:1.8.0_191] > at java.sql.DriverManager.getConnection(DriverManager.java:208) > [?:1.8.0_191] > at > org.apache.hive.beeline.DatabaseConnection.connect(DatabaseConnection.java:145) > [hive-beeline-3.1.2.jar:3.1.2] > at > org.apache.hive.beeline.DatabaseConnection.getConnection(DatabaseConnection.java:209) > [hive-beeline-3.1.2.jar:3.1.2] > at org.apache.hive.beeline.Commands.connect(Commands.java:1641) > [hive-beeline-3.1.2.jar:3.1.2] > at org.apache.hive.beeline.Commands.connect(Commands.java:1536) > [hive-beeline-3.1.2.jar:3.1.2] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_191] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_191] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_191] > at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_191] > at > org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:56) > [hive-beeline-3.1.2.jar:3.1.2] > at > org.apache.hive.beeline.BeeLine.execCommandWithPrefix(BeeLine.java:1384) > [hive-beeline-3.1.2.jar:3.1.2] > at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1423) > [hive-beeline-3.1.2.jar:3.1.2] > at org.apache.hive.beeline.BeeLine.connectUsingArgs(BeeLine.java:900) > [hive-beeline-3.1.2.jar:3.1.2] > at org.apache.hive.beeline.BeeLine.initArgs(BeeLine.java:795) > [hive-beeline-3.1.2.jar:3.1.2] > at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1048) > [hive-beeline-3.1.2.jar:3.1.2] > at > org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:538) > [hive-beeline-3.1.2.jar:3.1.2] > at org.apache.hive.beeline.BeeLine.main(BeeLine.java:520) > [hive-beeline-3.1.2.jar:3.1.2] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_191] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_191] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753278&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753278 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 06/Apr/22 09:29 Start Date: 06/Apr/22 09:29 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r843715671 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java: ## @@ -156,11 +163,19 @@ public void abortTask(TaskAttemptContext originalContext) throws IOException { TaskAttemptContext context = TezUtil.enrichContextWithAttemptWrapper(originalContext); // Clean up writer data from the local store -Map writers = HiveIcebergRecordWriter.removeWriters(context.getTaskAttemptID()); +Map writers = HiveIcebergWriter.getRecordWriters(context.getTaskAttemptID()); Review Comment: As discussed, let's use a single writer map for both DeleteWriters and RecordWriters Issue Time Tracking --- Worklog Id: (was: 753278) Time Spent: 6.5h (was: 6h 20m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 6.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753277&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753277 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 06/Apr/22 09:28 Start Date: 06/Apr/22 09:28 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r843714944 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java: ## @@ -118,18 +120,23 @@ public void commitTask(TaskAttemptContext originalContext) throws IOException { .run(output -> { Table table = HiveIcebergStorageHandler.table(context.getJobConf(), output); if (table != null) { - HiveIcebergRecordWriter writer = writers.get(output); - DataFile[] closedFiles; + HiveIcebergWriter writer = writers.get(output); + HiveIcebergWriter delWriter = delWriters.get(output); + String fileForCommitLocation = generateFileForCommitLocation(table.location(), jobConf, + attemptID.getJobID(), attemptID.getTaskID().getId()); + if (delWriter != null) { +DeleteFile[] closedFiles = delWriter.deleteFiles().toArray(new DeleteFile[0]); +createFileForCommit(closedFiles, fileForCommitLocation, table.io()); Review Comment: > the S3 files is where we will spend some serious time Makes sense. As discussed, let's create a container object which we can serialize/deserialize Issue Time Tracking --- Worklog Id: (was: 753277) Time Spent: 6h 20m (was: 6h 10m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 6h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Comment Edited] (HIVE-26075) hive metastore connection leaking when hiveserver2 kerberos enable and hive.server2.enable.doAs set to true
[ https://issues.apache.org/jira/browse/HIVE-26075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517981#comment-17517981 ] liuguanghua edited comment on HIVE-26075 at 4/6/22 9:22 AM: I have tested this problem is reproduction on hive version 1.2.2 . But the version 2.3.3 does not have the problem. The Master version I don't have tested because of lack of environment. So I will push a PR on version 1.2.2. was (Author: liuguanghua): I have tested this problem is reproduction on hive version 1.2.2 . But the version 2.3.3 does not have the problem. The Master doesn't have tested. So I will push a PR on version 1.2.2. > hive metastore connection leaking when hiveserver2 kerberos enable and > hive.server2.enable.doAs set to true > > > Key: HIVE-26075 > URL: https://issues.apache.org/jira/browse/HIVE-26075 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0 >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Attachments: HIVE-26075.patch > > > (1)When hadoop cluster kerberos is enable > (2) HiveServer2 config hive.server2.enable.doAs is set true > After a beeline scripte has been executed, hivemetastore connection is > created are in ESTABLISHED state and never closed. > If we submit a lot of task to hiveserver2 ,this will result in hive metastore > thrift thread(default is 1000) full ,thus new task will fail. > > HiveServer2 use ThreadLocal to store multithreading metastore > connection,the application should call Hive.closeCurrent() to close > connection after task finished. > > When HiveServer2 impersonate is enable (hive.server2.enable.doAs is set > true), the ugi will create proxy user via > UserGroupInformation.createProxyUser( > owner, UserGroupInformation.getLoginUser()),the old metastore client is never > closed. > > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-26075) hive metastore connection leaking when hiveserver2 kerberos enable and hive.server2.enable.doAs set to true
[ https://issues.apache.org/jira/browse/HIVE-26075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517981#comment-17517981 ] liuguanghua commented on HIVE-26075: I have tested this problem is reproduction on hive version 1.2.2 . But the version 2.3.3 does not have the problem. The Master doesn't have tested. So I will push a PR on version 1.2.2. > hive metastore connection leaking when hiveserver2 kerberos enable and > hive.server2.enable.doAs set to true > > > Key: HIVE-26075 > URL: https://issues.apache.org/jira/browse/HIVE-26075 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0 >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Attachments: HIVE-26075.patch > > > (1)When hadoop cluster kerberos is enable > (2) HiveServer2 config hive.server2.enable.doAs is set true > After a beeline scripte has been executed, hivemetastore connection is > created are in ESTABLISHED state and never closed. > If we submit a lot of task to hiveserver2 ,this will result in hive metastore > thrift thread(default is 1000) full ,thus new task will fail. > > HiveServer2 use ThreadLocal to store multithreading metastore > connection,the application should call Hive.closeCurrent() to close > connection after task finished. > > When HiveServer2 impersonate is enable (hive.server2.enable.doAs is set > true), the ugi will create proxy user via > UserGroupInformation.createProxyUser( > owner, UserGroupInformation.getLoginUser()),the old metastore client is never > closed. > > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] (HIVE-26075) hive metastore connection leaking when hiveserver2 kerberos enable and hive.server2.enable.doAs set to true
[ https://issues.apache.org/jira/browse/HIVE-26075 ] liuguanghua deleted comment on HIVE-26075: was (Author: liuguanghua): I have tested hive version that is 1.2.2 and 2.3.3. Both of them has the same problem > hive metastore connection leaking when hiveserver2 kerberos enable and > hive.server2.enable.doAs set to true > > > Key: HIVE-26075 > URL: https://issues.apache.org/jira/browse/HIVE-26075 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0 >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Attachments: HIVE-26075.patch > > > (1)When hadoop cluster kerberos is enable > (2) HiveServer2 config hive.server2.enable.doAs is set true > After a beeline scripte has been executed, hivemetastore connection is > created are in ESTABLISHED state and never closed. > If we submit a lot of task to hiveserver2 ,this will result in hive metastore > thrift thread(default is 1000) full ,thus new task will fail. > > HiveServer2 use ThreadLocal to store multithreading metastore > connection,the application should call Hive.closeCurrent() to close > connection after task finished. > > When HiveServer2 impersonate is enable (hive.server2.enable.doAs is set > true), the ugi will create proxy user via > UserGroupInformation.createProxyUser( > owner, UserGroupInformation.getLoginUser()),the old metastore client is never > closed. > > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] (HIVE-26075) hive metastore connection leaking when hiveserver2 kerberos enable and hive.server2.enable.doAs set to true
[ https://issues.apache.org/jira/browse/HIVE-26075 ] liuguanghua deleted comment on HIVE-26075: was (Author: liuguanghua): Thank you very much.I will pull requests on GitHub > hive metastore connection leaking when hiveserver2 kerberos enable and > hive.server2.enable.doAs set to true > > > Key: HIVE-26075 > URL: https://issues.apache.org/jira/browse/HIVE-26075 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0 >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Attachments: HIVE-26075.patch > > > (1)When hadoop cluster kerberos is enable > (2) HiveServer2 config hive.server2.enable.doAs is set true > After a beeline scripte has been executed, hivemetastore connection is > created are in ESTABLISHED state and never closed. > If we submit a lot of task to hiveserver2 ,this will result in hive metastore > thrift thread(default is 1000) full ,thus new task will fail. > > HiveServer2 use ThreadLocal to store multithreading metastore > connection,the application should call Hive.closeCurrent() to close > connection after task finished. > > When HiveServer2 impersonate is enable (hive.server2.enable.doAs is set > true), the ugi will create proxy user via > UserGroupInformation.createProxyUser( > owner, UserGroupInformation.getLoginUser()),the old metastore client is never > closed. > > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-26075) hive metastore connection leaking when hiveserver2 kerberos enable and hive.server2.enable.doAs set to true
[ https://issues.apache.org/jira/browse/HIVE-26075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HIVE-26075: --- Affects Version/s: 1.2.0 (was: All Versions) > hive metastore connection leaking when hiveserver2 kerberos enable and > hive.server2.enable.doAs set to true > > > Key: HIVE-26075 > URL: https://issues.apache.org/jira/browse/HIVE-26075 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0 >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Attachments: HIVE-26075.patch > > > (1)When hadoop cluster kerberos is enable > (2) HiveServer2 config hive.server2.enable.doAs is set true > After a beeline scripte has been executed, hivemetastore connection is > created are in ESTABLISHED state and never closed. > If we submit a lot of task to hiveserver2 ,this will result in hive metastore > thrift thread(default is 1000) full ,thus new task will fail. > > HiveServer2 use ThreadLocal to store multithreading metastore > connection,the application should call Hive.closeCurrent() to close > connection after task finished. > > When HiveServer2 impersonate is enable (hive.server2.enable.doAs is set > true), the ugi will create proxy user via > UserGroupInformation.createProxyUser( > owner, UserGroupInformation.getLoginUser()),the old metastore client is never > closed. > > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753266&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753266 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 06/Apr/22 09:06 Start Date: 06/Apr/22 09:06 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r843693954 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergDeleteWriter.java: ## @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg.mr.hive; + +import java.io.IOException; +import java.util.List; +import org.apache.hadoop.io.Writable; +import org.apache.hadoop.mapred.TaskAttemptID; +import org.apache.iceberg.DeleteFile; +import org.apache.iceberg.FileFormat; +import org.apache.iceberg.PartitionSpec; +import org.apache.iceberg.Schema; +import org.apache.iceberg.data.Record; +import org.apache.iceberg.deletes.PositionDelete; +import org.apache.iceberg.io.ClusteredPositionDeleteWriter; +import org.apache.iceberg.io.FileIO; +import org.apache.iceberg.io.FileWriterFactory; +import org.apache.iceberg.io.OutputFileFactory; +import org.apache.iceberg.mr.mapred.Container; +import org.apache.iceberg.util.Tasks; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class HiveIcebergDeleteWriter extends HiveIcebergWriter { Review Comment: Let's talk about this offline Issue Time Tracking --- Worklog Id: (was: 753266) Time Spent: 6h 10m (was: 6h) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 6h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753264&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753264 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 06/Apr/22 09:05 Start Date: 06/Apr/22 09:05 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r843693178 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java: ## @@ -118,18 +120,23 @@ public void commitTask(TaskAttemptContext originalContext) throws IOException { .run(output -> { Table table = HiveIcebergStorageHandler.table(context.getJobConf(), output); if (table != null) { - HiveIcebergRecordWriter writer = writers.get(output); - DataFile[] closedFiles; + HiveIcebergWriter writer = writers.get(output); + HiveIcebergWriter delWriter = delWriters.get(output); + String fileForCommitLocation = generateFileForCommitLocation(table.location(), jobConf, + attemptID.getJobID(), attemptID.getTaskID().getId()); + if (delWriter != null) { +DeleteFile[] closedFiles = delWriter.deleteFiles().toArray(new DeleteFile[0]); +createFileForCommit(closedFiles, fileForCommitLocation, table.io()); Review Comment: Maybe we can create a little bit more complex data structure to serialise. I think creating/reading back the S3 files is where we will spend some serious time Issue Time Tracking --- Worklog Id: (was: 753264) Time Spent: 6h (was: 5h 50m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 6h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25980) Reduce fs calls in HiveMetaStoreChecker.checkTable
[ https://issues.apache.org/jira/browse/HIVE-25980?focusedWorklogId=753261&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753261 ] ASF GitHub Bot logged work on HIVE-25980: - Author: ASF GitHub Bot Created on: 06/Apr/22 09:01 Start Date: 06/Apr/22 09:01 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3053: URL: https://github.com/apache/hive/pull/3053#discussion_r843688528 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java: ## @@ -422,18 +418,50 @@ void findUnknownPartitions(Table table, Set partPaths, byte[] filterExp, } allPartDirs = partDirs; } -// don't want the table dir -allPartDirs.remove(tablePath); - -// remove the partition paths we know about -allPartDirs.removeAll(partPaths); - Set partColNames = Sets.newHashSet(); for(FieldSchema fSchema : getPartCols(table)) { partColNames.add(fSchema.getName()); } Map partitionColToTypeMap = getPartitionColtoTypeMap(table.getPartitionKeys()); + +Set partPathsInMS = new HashSet<>(partPaths); Review Comment: Could we just collect the needed path objects outside? Issue Time Tracking --- Worklog Id: (was: 753261) Time Spent: 5h (was: 4h 50m) > Reduce fs calls in HiveMetaStoreChecker.checkTable > -- > > Key: HIVE-25980 > URL: https://issues.apache.org/jira/browse/HIVE-25980 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Affects Versions: 3.1.2, 4.0.0 >Reporter: Chiran Ravani >Assignee: Chiran Ravani >Priority: Major > Labels: pull-request-available > Time Spent: 5h > Remaining Estimate: 0h > > MSCK Repair table for high partition table can perform slow on Cloud Storage > such as S3, one of the case we found where slowness was observed in > HiveMetaStoreChecker.checkTable. > {code:java} > "HiveServer2-Background-Pool: Thread-382" #382 prio=5 os_prio=0 > tid=0x7f97fc4a4000 nid=0x5c2a runnable [0x7f97c41a8000] >java.lang.Thread.State: RUNNABLE > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > at java.net.SocketInputStream.read(SocketInputStream.java:171) > at java.net.SocketInputStream.read(SocketInputStream.java:141) > at > sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:464) > at > sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68) > at > sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1341) > at sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:73) > at > sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:957) > at > com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) > at > com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153) > at > com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280) > at > com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138) > at > com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) > at > com.amazonaws.thirdparty.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259) > at > com.amazonaws.thirdparty.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163) > at > com.amazonaws.thirdparty.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157) > at > com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273) > at > com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:82) > at > com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) > at > com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) > at > com.amazonaws.thirdparty.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) > at > com.amazonaws.thirdparty.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) > at > com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) > at > com.amazonaws.thir
[jira] [Work logged] (HIVE-25967) Prevent residual expressions from getting serialized in Iceberg splits
[ https://issues.apache.org/jira/browse/HIVE-25967?focusedWorklogId=753252&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753252 ] ASF GitHub Bot logged work on HIVE-25967: - Author: ASF GitHub Bot Created on: 06/Apr/22 08:36 Start Date: 06/Apr/22 08:36 Worklog Time Spent: 10m Work Description: szlta merged PR #3178: URL: https://github.com/apache/hive/pull/3178 Issue Time Tracking --- Worklog Id: (was: 753252) Time Spent: 1h (was: 50m) > Prevent residual expressions from getting serialized in Iceberg splits > -- > > Key: HIVE-25967 > URL: https://issues.apache.org/jira/browse/HIVE-25967 > Project: Hive > Issue Type: Bug >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > This hack removes residual expressions from the file scan task just before > split serialization. > Residuals can sometime take up too much space in the payload causing Tez AM > to OOM. > Unfortunately Tez AM doesn't distribute splits in a streamed way, that is, it > serializes all splits for a job before sending them out to executors. Some > residuals may take ~ 1 MB in memory, multiplied with thousands of split could > kill the Tez AM JVM. > Until the streamed split distribution is implemented we will kick residuals > out of the split. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-22420) DbTxnManager.stopHeartbeat() should be thread-safe
[ https://issues.apache.org/jira/browse/HIVE-22420?focusedWorklogId=753241&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753241 ] ASF GitHub Bot logged work on HIVE-22420: - Author: ASF GitHub Bot Created on: 06/Apr/22 08:06 Start Date: 06/Apr/22 08:06 Worklog Time Spent: 10m Work Description: deniskuzZ opened a new pull request, #3181: URL: https://github.com/apache/hive/pull/3181 ### What changes were proposed in this pull request? ### Why are the changes needed? Proper ACID handling in case of operation interruption ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Issue Time Tracking --- Worklog Id: (was: 753241) Remaining Estimate: 0h Time Spent: 10m > DbTxnManager.stopHeartbeat() should be thread-safe > -- > > Key: HIVE-22420 > URL: https://issues.apache.org/jira/browse/HIVE-22420 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Aron Hamvas >Assignee: Aron Hamvas >Priority: Major > Fix For: 4.0.0, 4.0.0-alpha-1 > > Attachments: HIVE-22420.1.patch, HIVE-22420.2.patch > > Time Spent: 10m > Remaining Estimate: 0h > > When a transactional query is being executed and interrupted via HS2 close > operation request, both the background pool thread executing the query and > the HttpHandler thread running the close operation logic will eventually call > the below method: > {noformat} > Driver.releaseLocksAndCommitOrRollback(commit boolean) > {noformat} > Since this method is invoked several times in both threads, it can happen > that the two threads invoke it at the same time, and due to a race condition, > the txnId field of the DbTxnManager used by both threads could be set to 0 > without actually successfully aborting the transaction. > The root cause is stopHeartbeat() method in DbTxnManager not being thread > safe: > When Thread-1 and Thread-2 enter stopHeartbeat() with very little time > difference, Thread-1 might successfully cancel the heartbeat task and set the > heartbeatTask field to null, while Thread-2 is trying to observe its state. > Thread-1 will return to the calling rollbackTxn() method and continue > execution there, while Thread-2 wis thrown back to the same method with a > NullPointerException. Thread-2 will then set txnId to 0, and Thread-1 is > sending this 0 value to HMS. So, the txn will not be aborted, and the locks > cannot be released later on either. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-22420) DbTxnManager.stopHeartbeat() should be thread-safe
[ https://issues.apache.org/jira/browse/HIVE-22420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-22420: -- Labels: pull-request-available (was: ) > DbTxnManager.stopHeartbeat() should be thread-safe > -- > > Key: HIVE-22420 > URL: https://issues.apache.org/jira/browse/HIVE-22420 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Aron Hamvas >Assignee: Aron Hamvas >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 4.0.0-alpha-1 > > Attachments: HIVE-22420.1.patch, HIVE-22420.2.patch > > Time Spent: 10m > Remaining Estimate: 0h > > When a transactional query is being executed and interrupted via HS2 close > operation request, both the background pool thread executing the query and > the HttpHandler thread running the close operation logic will eventually call > the below method: > {noformat} > Driver.releaseLocksAndCommitOrRollback(commit boolean) > {noformat} > Since this method is invoked several times in both threads, it can happen > that the two threads invoke it at the same time, and due to a race condition, > the txnId field of the DbTxnManager used by both threads could be set to 0 > without actually successfully aborting the transaction. > The root cause is stopHeartbeat() method in DbTxnManager not being thread > safe: > When Thread-1 and Thread-2 enter stopHeartbeat() with very little time > difference, Thread-1 might successfully cancel the heartbeat task and set the > heartbeatTask field to null, while Thread-2 is trying to observe its state. > Thread-1 will return to the calling rollbackTxn() method and continue > execution there, while Thread-2 wis thrown back to the same method with a > NullPointerException. Thread-2 will then set txnId to 0, and Thread-1 is > sending this 0 value to HMS. So, the txn will not be aborted, and the locks > cannot be released later on either. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HIVE-26116) Fix handling of compaction requests originating from aborted dynamic partition queries in Initiator
[ https://issues.apache.org/jira/browse/HIVE-26116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karen Coppage resolved HIVE-26116. -- Fix Version/s: 4.0.0 Resolution: Fixed Committed to master branch. Thanks for your contribution [~veghlaci05] ! > Fix handling of compaction requests originating from aborted dynamic > partition queries in Initiator > --- > > Key: HIVE-26116 > URL: https://issues.apache.org/jira/browse/HIVE-26116 > Project: Hive > Issue Type: Bug >Reporter: László Végh >Assignee: László Végh >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Compaction requests originated from an abort of a dynamic partition insert > can cause a NPE in Initiator. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26116) Fix handling of compaction requests originating from aborted dynamic partition queries in Initiator
[ https://issues.apache.org/jira/browse/HIVE-26116?focusedWorklogId=753240&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753240 ] ASF GitHub Bot logged work on HIVE-26116: - Author: ASF GitHub Bot Created on: 06/Apr/22 08:01 Start Date: 06/Apr/22 08:01 Worklog Time Spent: 10m Work Description: klcopp merged PR #3177: URL: https://github.com/apache/hive/pull/3177 Issue Time Tracking --- Worklog Id: (was: 753240) Time Spent: 1h (was: 50m) > Fix handling of compaction requests originating from aborted dynamic > partition queries in Initiator > --- > > Key: HIVE-26116 > URL: https://issues.apache.org/jira/browse/HIVE-26116 > Project: Hive > Issue Type: Bug >Reporter: László Végh >Assignee: László Végh >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Compaction requests originated from an abort of a dynamic partition insert > can cause a NPE in Initiator. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753230&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753230 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 06/Apr/22 07:26 Start Date: 06/Apr/22 07:26 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r843564071 ## ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java: ## @@ -97,12 +100,22 @@ private void reparseAndSuperAnalyze(ASTNode tree) throws SemanticException { Table mTable = getTargetTable(tabName); validateTargetTable(mTable); +// save the operation type into the query state +SessionStateUtil.addResource(conf, Context.Operation.class.getSimpleName(), operation.name()); + StringBuilder rewrittenQueryStr = new StringBuilder(); rewrittenQueryStr.append("insert into table "); rewrittenQueryStr.append(getFullTableNameForSQL(tabName)); addPartitionColsToInsert(mTable.getPartCols(), rewrittenQueryStr); -rewrittenQueryStr.append(" select ROW__ID"); +boolean nonNativeAcid = mTable.getStorageHandler() != null && mTable.getStorageHandler().supportsAcidOperations(); Review Comment: Maybe an util for this? I have seen this several times Issue Time Tracking --- Worklog Id: (was: 753230) Time Spent: 5h 50m (was: 5h 40m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 5h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753227&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753227 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 06/Apr/22 07:20 Start Date: 06/Apr/22 07:20 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r843558677 ## ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java: ## @@ -7822,9 +7824,18 @@ protected Operator genFileSinkPlan(String dest, QB qb, Operator input) List vecCol = new ArrayList(); -if (updating(dest) || deleting(dest)) { +boolean nonNativeAcid = Optional.ofNullable(destinationTable) +.map(Table::getStorageHandler) +.map(HiveStorageHandler::supportsAcidOperations) +.orElse(false); +boolean isUpdateDelete = updating(dest) || deleting(dest); +if (!nonNativeAcid && isUpdateDelete) { Review Comment: Is it a valid situation that: isUpdateDelete and we need to go to the `else`? If not it might be easier to read: ``` if (updating(dest) || deleting(dest)) { if (nonNativeAcid) { ... } else { ... } else { .. } ``` Issue Time Tracking --- Worklog Id: (was: 753227) Time Spent: 5h 40m (was: 5.5h) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 5h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HIVE-26008) Dynamic partition pruning not sending right partitions with subqueries
[ https://issues.apache.org/jira/browse/HIVE-26008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor reassigned HIVE-26008: --- Assignee: László Bodor > Dynamic partition pruning not sending right partitions with subqueries > -- > > Key: HIVE-26008 > URL: https://issues.apache.org/jira/browse/HIVE-26008 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Rajesh Balamohan >Assignee: László Bodor >Priority: Major > Labels: performance > Attachments: Screenshot 2022-03-08 at 5.04.02 AM.png > > > DPP isn't working fine when there are subqueries involved. Here is an example > query (q83). > Note that "date_dim" has another query involved. Due to this, DPP operator > ends up sending entire "date_dim" to the fact tables. > Because of this, data scanned for fact tables are way higher and query > runtime is increased. > For context, on a very small cluster, this query ran for 265 seconds and with > the rewritten query it finished in 11 seconds!. Fact table scan was 10MB vs > 10 GB. > {noformat} > HiveJoin(condition=[=($2, $5)], joinType=[inner]) > HiveJoin(condition=[=($0, $3)], joinType=[inner]) > HiveProject(cr_item_sk=[$1], cr_return_quantity=[$16], > cr_returned_date_sk=[$26]) > HiveFilter(condition=[AND(IS NOT NULL($26), IS NOT > NULL($1))]) > HiveTableScan(table=[[tpcds_bin_partitioned_orc_1, > catalog_returns]], table:alias=[catalog_returns]) > HiveProject(i_item_sk=[$0], i_item_id=[$1]) > HiveFilter(condition=[AND(IS NOT NULL($1), IS NOT > NULL($0))]) > HiveTableScan(table=[[tpcds_bin_partitioned_orc_1, > item]], table:alias=[item]) > HiveProject(d_date_sk=[$0], d_date=[$2]) > HiveFilter(condition=[AND(IS NOT NULL($2), IS NOT > NULL($0))]) > HiveTableScan(table=[[tpcds_bin_partitioned_orc_1, > date_dim]], table:alias=[date_dim]) > HiveProject(d_date=[$0]) > HiveSemiJoin(condition=[=($1, $2)], joinType=[semi]) > HiveProject(d_date=[$2], d_week_seq=[$4]) > HiveFilter(condition=[AND(IS NOT NULL($4), IS NOT > NULL($2))]) > HiveTableScan(table=[[tpcds_bin_partitioned_orc_1, > date_dim]], table:alias=[date_dim]) > HiveProject(d_week_seq=[$4]) > HiveFilter(condition=[AND(IN($2, 1998-01-02:DATE, > 1998-10-15:DATE, 1998-11-10:DATE), IS NOT NULL($4))]) > HiveTableScan(table=[[tpcds_bin_partitioned_orc_1, > date_dim]], table:alias=[date_dim]) > {noformat} > *Original Query & Plan: * > {noformat} > explain cbo with sr_items as > (select i_item_id item_id, > sum(sr_return_quantity) sr_item_qty > from store_returns, > item, > date_dim > where sr_item_sk = i_item_sk > and d_datein > (select d_date > from date_dim > where d_week_seq in > (select d_week_seq > from date_dim > where d_date in ('1998-01-02','1998-10-15','1998-11-10'))) > and sr_returned_date_sk = d_date_sk > group by i_item_id), > cr_items as > (select i_item_id item_id, > sum(cr_return_quantity) cr_item_qty > from catalog_returns, > item, > date_dim > where cr_item_sk = i_item_sk > and d_datein > (select d_date > from date_dim > where d_week_seq in > (select d_week_seq > from date_dim > where d_date in ('1998-01-02','1998-10-15','1998-11-10'))) > and cr_returned_date_sk = d_date_sk > group by i_item_id), > wr_items as > (select i_item_id item_id, > sum(wr_return_quantity) wr_item_qty > from web_returns, > item, > date_dim > where wr_item_sk = i_item_sk > and d_datein > (select d_date > from date_dim > where d_week_seq in > (select d_week_seq > from date_dim > where d_date in ('1998-01-02','1998-10-15','1998-11-10'))) > and wr_returned_date_sk = d_date_sk > group by i_item_id) > select sr_items.item_id > ,sr_item_qty > ,sr_item_qty/(sr_item_qty+cr_item_qty+wr_item_qty)/3.0 * 100 sr_dev > ,cr_item_qty > ,cr_item_qty/(sr_item_qty+cr_item_qty+wr_item_qty)/3.0 * 100 cr_dev > ,wr_item_qty > ,wr_item_qty/(sr_item_qty+cr_item_qty+wr_item_qty)/3.0 * 100 wr_dev > ,(sr_item_qty+cr_item_qty+wr_item_qty)/3.0 average > from sr_items > ,cr_items > ,wr_items > where sr_items.item_id=cr_items.item_id > and sr_items.item_id=wr_items.item_id > order by sr_items.item_id > ,sr_item_qty > limit 100 > INFO : Starting task [Stage-3:EXPLAIN] in serial mode > INFO : Completed executing > command(queryId=hive_20220307055109_88ad0cbd-bd40-45bc-92ae-ab15fa6b1da4); > Time taken: 0.973 seconds > INFO : OK > Explain > CBO PLAN: > HiveSortLimit(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC], fetch=
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753225&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753225 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 06/Apr/22 07:15 Start Date: 06/Apr/22 07:15 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r843554296 ## ql/src/java/org/apache/hadoop/hive/ql/metadata/VirtualColumn.java: ## @@ -50,10 +50,14 @@ RAWDATASIZE("RAW__DATA__SIZE", TypeInfoFactory.longTypeInfo), /** - * {@link org.apache.hadoop.hive.ql.io.RecordIdentifier} + * {@link org.apache.hadoop.hive.ql.io.RecordIdentifier} */ ROWID("ROW__ID", RecordIdentifier.StructInfo.typeInfo, true, RecordIdentifier.StructInfo.oi), ROWISDELETED("ROW__IS__DELETED", TypeInfoFactory.booleanTypeInfo), + PARTITION_SPEC_ID("PARTITION__SPEC__ID", TypeInfoFactory.intTypeInfo), + PARTITION_HASH("PARTITION__HASH", TypeInfoFactory.longTypeInfo), + FILE_PATH("FILE__PATH", TypeInfoFactory.stringTypeInfo), + ROW_POSITION("ROW__POSITION", TypeInfoFactory.longTypeInfo), Review Comment: How is this handled inside the `ROW__ID`? Issue Time Tracking --- Worklog Id: (was: 753225) Time Spent: 5.5h (was: 5h 20m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 5.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753224&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753224 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 06/Apr/22 07:14 Start Date: 06/Apr/22 07:14 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r843553566 ## ql/src/java/org/apache/hadoop/hive/ql/metadata/VirtualColumn.java: ## @@ -50,10 +50,14 @@ RAWDATASIZE("RAW__DATA__SIZE", TypeInfoFactory.longTypeInfo), /** - * {@link org.apache.hadoop.hive.ql.io.RecordIdentifier} + * {@link org.apache.hadoop.hive.ql.io.RecordIdentifier} */ ROWID("ROW__ID", RecordIdentifier.StructInfo.typeInfo, true, RecordIdentifier.StructInfo.oi), ROWISDELETED("ROW__IS__DELETED", TypeInfoFactory.booleanTypeInfo), + PARTITION_SPEC_ID("PARTITION__SPEC__ID", TypeInfoFactory.intTypeInfo), + PARTITION_HASH("PARTITION__HASH", TypeInfoFactory.longTypeInfo), + FILE_PATH("FILE__PATH", TypeInfoFactory.stringTypeInfo), Review Comment: Isn't this the same as the `INPUT__FILE__NAME` in the delete case? Issue Time Tracking --- Worklog Id: (was: 753224) Time Spent: 5h 20m (was: 5h 10m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 5h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753223&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753223 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 06/Apr/22 07:13 Start Date: 06/Apr/22 07:13 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r843552057 ## iceberg/iceberg-handler/src/test/queries/positive/delete_iceberg_partitioned_avro.q: ## @@ -0,0 +1,26 @@ +set hive.vectorized.execution.enabled=false; +set hive.support.concurrency=true; +set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; + +drop table if exists tbl_ice; +create external table tbl_ice(a int, b string, c int) partitioned by spec (bucket(16, a), truncate(3, b)) stored by iceberg stored as avro tblproperties ('format-version'='2'); + + Issue Time Tracking --- Worklog Id: (was: 753223) Time Spent: 5h 10m (was: 5h) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 5h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753221&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753221 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 06/Apr/22 07:10 Start Date: 06/Apr/22 07:10 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r843549581 ## iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergV2.java: ## @@ -228,6 +230,104 @@ public void testReadAndWriteFormatV2Partitioned_PosDelete_RowSupplied() throws I Assert.assertArrayEquals(new Object[] {2L, "Trudy", "Pink"}, objects.get(3)); } + @Test + public void testDeleteStatementUnpartitioned() { +Assume.assumeFalse("Iceberg DELETEs are only implemented for non-vectorized mode for now", isVectorized); + +// create and insert an initial batch of records +testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, +PartitionSpec.unpartitioned(), fileFormat, HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_2, 2); +// insert one more batch so that we have multiple data files within the same partition + shell.executeStatement(testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_1, +TableIdentifier.of("default", "customers"), false)); + +shell.executeStatement("DELETE FROM customers WHERE customer_id=3 or first_name='Joanna'"); + +List objects = shell.executeStatement("SELECT * FROM customers ORDER BY customer_id, last_name"); +Assert.assertEquals(6, objects.size()); +List expected = TestHelper.RecordsBuilder.newInstance(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA) +.add(1L, "Sharon", "Taylor") +.add(2L, "Jake", "Donnel") +.add(2L, "Susan", "Morrison") +.add(2L, "Bob", "Silver") +.add(4L, "Laci", "Zold") +.add(5L, "Peti", "Rozsaszin") +.build(); +HiveIcebergTestUtils.validateData(expected, + HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, objects), 0); + } + + @Test + public void testDeleteStatementPartitioned() { +Assume.assumeFalse("Iceberg DELETEs are only implemented for non-vectorized mode for now", isVectorized); +PartitionSpec spec = PartitionSpec.builderFor(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA) +.identity("last_name").bucket("customer_id", 16).build(); + +// create and insert an initial batch of records +testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, +spec, fileFormat, HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_2, 2); +// insert one more batch so that we have multiple data files within the same partition + shell.executeStatement(testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_1, +TableIdentifier.of("default", "customers"), false)); + +shell.executeStatement("DELETE FROM customers WHERE customer_id=3 or first_name='Joanna'"); + +List objects = shell.executeStatement("SELECT * FROM customers ORDER BY customer_id, last_name"); +Assert.assertEquals(6, objects.size()); +List expected = TestHelper.RecordsBuilder.newInstance(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA) +.add(1L, "Sharon", "Taylor") +.add(2L, "Jake", "Donnel") +.add(2L, "Susan", "Morrison") +.add(2L, "Bob", "Silver") +.add(4L, "Laci", "Zold") +.add(5L, "Peti", "Rozsaszin") +.build(); +HiveIcebergTestUtils.validateData(expected, + HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, objects), 0); + } + + @Test + public void testDeleteStatementWithOtherTable() { +Assume.assumeFalse("Iceberg DELETEs are only implemented for non-vectorized mode for now", isVectorized); +PartitionSpec spec = PartitionSpec.builderFor(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA) +.identity("last_name").bucket("customer_id", 16).build(); + +// create a couple of tables, with an initial batch of records +testTables.createTable(shell, "customers", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, +spec, fileFormat, HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_2, 2); +testTables.createTable(shell, "other", HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, +spec, fileFormat, HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_1, 2); + +shell.executeStatement("DELETE FROM customers WHERE customer_id in (select t1.customer_id from customers t1 join " + +"other t2 on t1.customer_id = t2.customer_id) or " + +"first_name in (select first_name from cus
[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753222&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753222 ] ASF GitHub Bot logged work on HIVE-26102: - Author: ASF GitHub Bot Created on: 06/Apr/22 07:10 Start Date: 06/Apr/22 07:10 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3131: URL: https://github.com/apache/hive/pull/3131#discussion_r843549998 ## iceberg/iceberg-handler/src/test/queries/negative/delete_iceberg_vectorized.q: ## @@ -0,0 +1,10 @@ +set hive.vectorized.execution.enabled=true; +set hive.support.concurrency=true; +set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; Review Comment: Why do we set these? Issue Time Tracking --- Worklog Id: (was: 753222) Time Spent: 5h (was: 4h 50m) > Implement DELETE statements for Iceberg tables > -- > > Key: HIVE-26102 > URL: https://issues.apache.org/jira/browse/HIVE-26102 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)