date:20220406

[jira] [Work started] (HIVE-26075) hive metastore connection leaking when hiveserver2 kerberos enable and hive.server2.enable.doAs set to true

2022-04-06 Thread liuguanghua (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-26075 started by liuguanghua.
--
> hive metastore connection leaking when hiveserver2 kerberos enable and  
> hive.server2.enable.doAs set to true
> 
>
> Key: HIVE-26075
> URL: https://issues.apache.org/jira/browse/HIVE-26075
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0
>Reporter: liuguanghua
>Assignee: liuguanghua
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-26075.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> (1)When hadoop cluster  kerberos is enable
> (2) HiveServer2 config hive.server2.enable.doAs is set true
> After a beeline   scripte has been executed, hivemetastore connection  is 
> created are in ESTABLISHED state and never closed.
> If we submit a lot of task to hiveserver2 ,this will result in hive metastore 
> thrift thread(default is 1000) full ,thus new task will fail.
>  
> HiveServer2 use ThreadLocal  to store multithreading  metastore 
> connection,the application should call Hive.closeCurrent() to close 
> connection after  task finished.
>  
> When HiveServer2 impersonate is enable (hive.server2.enable.doAs is set 
> true), the ugi   will create proxy user via  
> UserGroupInformation.createProxyUser(
> owner, UserGroupInformation.getLoginUser()),the old metastore client is never 
> closed.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work stopped] (HIVE-26075) hive metastore connection leaking when hiveserver2 kerberos enable and hive.server2.enable.doAs set to true

2022-04-06 Thread liuguanghua (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-26075 stopped by liuguanghua.
--
> hive metastore connection leaking when hiveserver2 kerberos enable and  
> hive.server2.enable.doAs set to true
> 
>
> Key: HIVE-26075
> URL: https://issues.apache.org/jira/browse/HIVE-26075
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0
>Reporter: liuguanghua
>Assignee: liuguanghua
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-26075.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> (1)When hadoop cluster  kerberos is enable
> (2) HiveServer2 config hive.server2.enable.doAs is set true
> After a beeline   scripte has been executed, hivemetastore connection  is 
> created are in ESTABLISHED state and never closed.
> If we submit a lot of task to hiveserver2 ,this will result in hive metastore 
> thrift thread(default is 1000) full ,thus new task will fail.
>  
> HiveServer2 use ThreadLocal  to store multithreading  metastore 
> connection,the application should call Hive.closeCurrent() to close 
> connection after  task finished.
>  
> When HiveServer2 impersonate is enable (hive.server2.enable.doAs is set 
> true), the ugi   will create proxy user via  
> UserGroupInformation.createProxyUser(
> owner, UserGroupInformation.getLoginUser()),the old metastore client is never 
> closed.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-26111) FULL JOIN returns incorrect result

2022-04-06 Thread Youjun Yuan (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Youjun Yuan updated HIVE-26111:
---
Description: 
we hit a query which FULL JOINs two tables, hive produces incorrect results, 
for a single value of join key, it produces two records, each record has a 
valid value for one table and NULL for the other table.

The query is:
{code:java}
SET mapreduce.job.reduces=2;
SELECT d.id, u.id
FROM (
       SELECT id
       FROM   airflow.tableA rud
       WHERE  rud.dt = '2022-04-02-1row'
) d
FULL JOIN (
       SELECT id
       FROM   default.tableB
       WHERE  dt = '2022-04-01' and device_token='blabla'
 ) u
ON u.id = d.id
; {code}
According to the job log, the two reducers each get an input record, and output 
a record.

And produces two records for id=350570497
{code:java}
350570497    NULL
NULL    350570497
Time taken: 62.692 seconds, Fetched: 2 row(s) {code}
I am sure tableB has only one row where device_token='blabla'

And we tried:

1, SET mapreduce.job.reduces=1; then it produces right result;

-2, SET hive.execution.engine=mr; then it produces right result;- mr also has 
the issue.

3, JOIN (instead of FULL JOIN) worked as expected

4, in sub query u, change filter device_token='blabla' to id=350570497, it 
worked ok

5, flatten the sub queries, then it works ok, like below:
{code:java}
SELECT  d.id, u.id 
from airflow.rds_users_delta d full join default.users u
on (u.id = d.id)
where d.dt = '2022-04-02-1row' and u.dt = '2022-04-01' and 
u.device_token='blabla' {code}
Below is the explain output of the query:
{code:java}
Plan optimized by CBO.Vertex dependency in root stage
Reducer 3 <- Map 1 (CUSTOM_SIMPLE_EDGE), Map 2 (CUSTOM_SIMPLE_EDGE)Stage-0
  Fetch Operator
    limit:-1
    Stage-1
      Reducer 3
      File Output Operator [FS_10]
        Map Join Operator [MAPJOIN_13] (rows=2 width=8)
          
Conds:RS_6.KEY.reducesinkkey0=RS_7.KEY.reducesinkkey0(Outer),DynamicPartitionHashJoin:true,Output:["_col0","_col1"]
        <-Map 1 [CUSTOM_SIMPLE_EDGE]
          PARTITION_ONLY_SHUFFLE [RS_6]
            PartitionCols:_col0
            Select Operator [SEL_2] (rows=1 width=4)
              Output:["_col0"]
              TableScan [TS_0] (rows=1 width=4)
                
airflow@rds_users_delta,rud,Tbl:COMPLETE,Col:COMPLETE,Output:["id"]
        <-Map 2 [CUSTOM_SIMPLE_EDGE]
          PARTITION_ONLY_SHUFFLE [RS_7]
            PartitionCols:_col0
            Select Operator [SEL_5] (rows=1 width=4)
              Output:["_col0"]
              Filter Operator [FIL_12] (rows=1 width=110)
                predicate:(device_token = 'blabla')
                TableScan [TS_3] (rows=215192362 width=109)
                  
default@users,users,Tbl:COMPLETE,Col:COMPLETE,Output:["id","device_token"]  
{code}
I can't generate a small enough result set to reproduce the issue, I have 
minimized the tableA to only 1 row, tableB has ~200m rows, but if I further 
reduce the size of tableB, then the issue can't be reproduced.

Any suggestion would be highly appreciated, regarding the root cause of the 
issue, how to work around it, or how to reproduce it with small enough dataset. 

 

below is the log found in hive.log
{code:java}
220405004014_2c3b3486-9bc7-4d1d-9639-693dad39da17 : STAGE DEPENDENCIES:
  Stage-1 is a root stage [MAPRED]
  Stage-0 depends on stages: Stage-1 [FETCH]STAGE PLANS:
  Stage: Stage-1
    Tez
      DagId: ec2-user_20220405004014_2c3b3486-9bc7-4d1d-9639-693dad39da17:1
      Edges:
        Reducer 3 <- Map 1 (CUSTOM_SIMPLE_EDGE), Map 2 (CUSTOM_SIMPLE_EDGE)
      DagName: ec2-user_20220405004014_2c3b3486-9bc7-4d1d-9639-693dad39da17:1
      Vertices:
        Map 1
            Map Operator Tree:
                TableScan
                  alias: rud
                  Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
Column stats: COMPLETE
                  GatherStats: false
                  Select Operator
                    expressions: id (type: int)
                    outputColumnNames: _col0
                    Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: _col0 (type: int)
                      null sort order: a
                      sort order: +
                      Map-reduce partition columns: _col0 (type: int)
                      Statistics: Num rows: 1 Data size: 4 Basic stats: 
COMPLETE Column stats: COMPLETE
                      tag: 0
                      auto parallelism: true
            Path -> Alias:
              s3a://.../rds_users_delta/dt=2022-04-02-1row/hh=00 [rud]
            Path -> Partition:
              s3a://.../rds_users_delta/dt=2022-04-02-1row/hh=00
                Partition
                  base file name: hh=00
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: 
org.apache.

[jira] [Updated] (HIVE-26111) FULL JOIN returns incorrect result

2022-04-06 Thread Youjun Yuan (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Youjun Yuan updated HIVE-26111:
---
Summary: FULL JOIN returns incorrect result  (was: FULL JOIN returns 
incorrect result with Tez engine)

> FULL JOIN returns incorrect result
> --
>
> Key: HIVE-26111
> URL: https://issues.apache.org/jira/browse/HIVE-26111
> Project: Hive
>  Issue Type: Bug
> Environment: aws EMR (hive 3.1.2 + Tez 0.10.1)
>Reporter: Youjun Yuan
>Priority: Blocker
>
> we hit a query which FULL JOINs two tables, hive produces incorrect results, 
> for a single value of join key, it produces two records, each record has a 
> valid value for one table and NULL for the other table.
> The query is:
> {code:java}
> SET mapreduce.job.reduces=2;
> SELECT d.id, u.id
> FROM (
>        SELECT id
>        FROM   airflow.tableA rud
>        WHERE  rud.dt = '2022-04-02-1row'
> ) d
> FULL JOIN (
>        SELECT id
>        FROM   default.tableB
>        WHERE  dt = '2022-04-01' and device_token='blabla'
>  ) u
> ON u.id = d.id
> ; {code}
> According to the job log, the two reducers each get an input record, and 
> output a record.
> And produces two records for id=350570497
> {code:java}
> 350570497    NULL
> NULL    350570497
> Time taken: 62.692 seconds, Fetched: 2 row(s) {code}
> I am sure tableB has only one row where device_token='blabla'
> And we tried:
> 1, SET mapreduce.job.reduces=1; then it produces right result;
> -2, SET hive.execution.engine=mr; then it produces right result;- mr also has 
> the issue.
> 3, JOIN (instead of FULL JOIN) worked as expected
> 4, in sub query u, change filter device_token='blabla' to id=350570497, it 
> worked ok
> Below is the explain output of the query:
> {code:java}
> Plan optimized by CBO.Vertex dependency in root stage
> Reducer 3 <- Map 1 (CUSTOM_SIMPLE_EDGE), Map 2 (CUSTOM_SIMPLE_EDGE)Stage-0
>   Fetch Operator
>     limit:-1
>     Stage-1
>       Reducer 3
>       File Output Operator [FS_10]
>         Map Join Operator [MAPJOIN_13] (rows=2 width=8)
>           
> Conds:RS_6.KEY.reducesinkkey0=RS_7.KEY.reducesinkkey0(Outer),DynamicPartitionHashJoin:true,Output:["_col0","_col1"]
>         <-Map 1 [CUSTOM_SIMPLE_EDGE]
>           PARTITION_ONLY_SHUFFLE [RS_6]
>             PartitionCols:_col0
>             Select Operator [SEL_2] (rows=1 width=4)
>               Output:["_col0"]
>               TableScan [TS_0] (rows=1 width=4)
>                 
> airflow@rds_users_delta,rud,Tbl:COMPLETE,Col:COMPLETE,Output:["id"]
>         <-Map 2 [CUSTOM_SIMPLE_EDGE]
>           PARTITION_ONLY_SHUFFLE [RS_7]
>             PartitionCols:_col0
>             Select Operator [SEL_5] (rows=1 width=4)
>               Output:["_col0"]
>               Filter Operator [FIL_12] (rows=1 width=110)
>                 predicate:(device_token = 'blabla')
>                 TableScan [TS_3] (rows=215192362 width=109)
>                   
> default@users,users,Tbl:COMPLETE,Col:COMPLETE,Output:["id","device_token"]  
> {code}
> I can't generate a small enough result set to reproduce the issue, I have 
> minimized the tableA to only 1 row, tableB has ~200m rows, but if I further 
> reduce the size of tableB, then the issue can't be reproduced.
> Any suggestion would be highly appreciated, regarding the root cause of the 
> issue, how to work around it, or how to reproduce it with small enough 
> dataset. 
>  
> below is the log found in hive.log
> {code:java}
> 220405004014_2c3b3486-9bc7-4d1d-9639-693dad39da17 : STAGE DEPENDENCIES:
>   Stage-1 is a root stage [MAPRED]
>   Stage-0 depends on stages: Stage-1 [FETCH]STAGE PLANS:
>   Stage: Stage-1
>     Tez
>       DagId: ec2-user_20220405004014_2c3b3486-9bc7-4d1d-9639-693dad39da17:1
>       Edges:
>         Reducer 3 <- Map 1 (CUSTOM_SIMPLE_EDGE), Map 2 (CUSTOM_SIMPLE_EDGE)
>       DagName: ec2-user_20220405004014_2c3b3486-9bc7-4d1d-9639-693dad39da17:1
>       Vertices:
>         Map 1
>             Map Operator Tree:
>                 TableScan
>                   alias: rud
>                   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: COMPLETE
>                   GatherStats: false
>                   Select Operator
>                     expressions: id (type: int)
>                     outputColumnNames: _col0
>                     Statistics: Num rows: 1 Data size: 4 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                     Reduce Output Operator
>                       key expressions: _col0 (type: int)
>                       null sort order: a
>                       sort order: +
>                       Map-reduce partition columns: _col0 (type: int)
>                       Statistics: Num rows: 1 Data size: 4 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                       tag: 0
>                       aut

[jira] [Assigned] (HIVE-26124) Upgrade HBase from 2.0.0-alpha4 to 2.0.0

2022-04-06 Thread Peter Vary (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-26124:
-

Assignee: Peter Vary

> Upgrade HBase from 2.0.0-alpha4 to 2.0.0
> 
>
> Key: HIVE-26124
> URL: https://issues.apache.org/jira/browse/HIVE-26124
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should remove the alpha version to the stable one



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work started] (HIVE-26124) Upgrade HBase from 2.0.0-alpha4 to 2.0.0

2022-04-06 Thread Peter Vary (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-26124 started by Peter Vary.
-
> Upgrade HBase from 2.0.0-alpha4 to 2.0.0
> 
>
> Key: HIVE-26124
> URL: https://issues.apache.org/jira/browse/HIVE-26124
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should remove the alpha version to the stable one



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-26124) Upgrade HBase from 2.0.0-alpha4 to 2.0.0

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26124?focusedWorklogId=753680&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753680
 ]

ASF GitHub Bot logged work on HIVE-26124:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 20:47
Start Date: 06/Apr/22 20:47
Worklog Time Spent: 10m 
  Work Description: pvary opened a new pull request, #3186:
URL: https://github.com/apache/hive/pull/3186

   ### What changes were proposed in this pull request?
   Upgrade the HBase to the 2.0.0
   
   ### Why are the changes needed?
   In a release we minimally should depend on a stable version
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Unit tests




Issue Time Tracking
---

Worklog Id: (was: 753680)
Remaining Estimate: 0h
Time Spent: 10m

> Upgrade HBase from 2.0.0-alpha4 to 2.0.0
> 
>
> Key: HIVE-26124
> URL: https://issues.apache.org/jira/browse/HIVE-26124
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should remove the alpha version to the stable one



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-26124) Upgrade HBase from 2.0.0-alpha4 to 2.0.0

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26124:
--
Labels: pull-request-available  (was: )

> Upgrade HBase from 2.0.0-alpha4 to 2.0.0
> 
>
> Key: HIVE-26124
> URL: https://issues.apache.org/jira/browse/HIVE-26124
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should remove the alpha version to the stable one



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HIVE-26092) Fix javadoc errors for the 4.0.0 release

2022-04-06 Thread Peter Vary (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-26092:
-

Assignee: Peter Vary

> Fix javadoc errors for the 4.0.0 release
> 
>
> Key: HIVE-26092
> URL: https://issues.apache.org/jira/browse/HIVE-26092
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently there are plenty of errors in the javadoc.
> We should fix those before a final release



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work started] (HIVE-26092) Fix javadoc errors for the 4.0.0 release

2022-04-06 Thread Peter Vary (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-26092 started by Peter Vary.
-
> Fix javadoc errors for the 4.0.0 release
> 
>
> Key: HIVE-26092
> URL: https://issues.apache.org/jira/browse/HIVE-26092
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently there are plenty of errors in the javadoc.
> We should fix those before a final release



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-26092) Fix javadoc errors for the 4.0.0 release

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26092?focusedWorklogId=753669&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753669
 ]

ASF GitHub Bot logged work on HIVE-26092:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 20:36
Start Date: 06/Apr/22 20:36
Worklog Time Spent: 10m 
  Work Description: pvary opened a new pull request, #3185:
URL: https://github.com/apache/hive/pull/3185

   ### What changes were proposed in this pull request?
   Fixes the javadoc errors and adds a CI test for generating the javadoc
   
   ### Why are the changes needed?
   To fix the errors and prevent any new occurring again
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Manually running:
   ```
   mvn install javadoc:javadoc javadoc:aggregate -DskipTests
   ```




Issue Time Tracking
---

Worklog Id: (was: 753669)
Remaining Estimate: 0h
Time Spent: 10m

> Fix javadoc errors for the 4.0.0 release
> 
>
> Key: HIVE-26092
> URL: https://issues.apache.org/jira/browse/HIVE-26092
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently there are plenty of errors in the javadoc.
> We should fix those before a final release



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-26092) Fix javadoc errors for the 4.0.0 release

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26092:
--
Labels: pull-request-available  (was: )

> Fix javadoc errors for the 4.0.0 release
> 
>
> Key: HIVE-26092
> URL: https://issues.apache.org/jira/browse/HIVE-26092
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently there are plenty of errors in the javadoc.
> We should fix those before a final release



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-25882) Using the Hive Metastore with Kudu not work

2022-04-06 Thread liu (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liu updated HIVE-25882:
---
Affects Version/s: 4.0.0

> Using the Hive Metastore with Kudu not work
> ---
>
> Key: HIVE-25882
> URL: https://issues.apache.org/jira/browse/HIVE-25882
> Project: Hive
>  Issue Type: Bug
>  Components: Accumulo Storage Handler
>Affects Versions: 3.1.2, 4.0.0
> Environment: HIVE: 3.1
> HDP: 3.1.1.3.1
> KUDU:kudu 1.15.0
>Reporter: liu
>Priority: Critical
>
> I follow this page config . It looks as if the configuration was successful.
> [https://kudu.apache.org/docs/hive_metastore.html#enabling-the-hive-metastore-integration]
> kudu master start log
> {code:java}
> I1115 18:51:37.391942  1832 catalog_manager.cc:1253] Loading table and tablet 
> metadata into memory...
> I1115 18:51:37.392135  1832 catalog_manager.cc:495] Loaded metadata for table 
> $schemas [id=9c31d249228f42b38468835a7ae2c6e6]
> I1115 18:51:37.392266  1832 catalog_manager.cc:549] Loaded metadata for 
> tablet 1526622b192145b8973fc852c2cfbd8f (table $schemas 
> [id=9c31d249228f42b38468835a7ae2c6e6])
> I1115 18:51:37.392287  1832 catalog_manager.cc:549] Loaded metadata for 
> tablet 2842be87bec74f0592a01ca0535bd9aa (table $schemas 
> [id=9c31d249228f42b38468835a7ae2c6e6])
> I1115 18:51:37.392294  1832 catalog_manager.cc:1262] Initializing Kudu 
> cluster ID...
> I1115 18:51:37.392381  1832 catalog_manager.cc:1098] Loaded cluster ID: 
> 70b19944b04543759922355e6ce259ac
> I1115 18:51:37.392387  1832 catalog_manager.cc:1273] Initializing Kudu 
> internal certificate authority...
> I1115 18:51:37.392593  1832 catalog_manager.cc:1282] Loading token signing 
> keys...
> I1115 18:51:37.392693  1832 catalog_manager.cc:5093] T 
>  P 2bc3b2318ca640a78a99fcbe4d058a9f: Loaded 
> TSK: 0
> I1115 18:51:37.392736  1832 catalog_manager.cc:1292] Initializing in-progress 
> tserver states...
> I1115 18:51:37.392812  1832 catalog_manager.cc:1305] Loading latest processed 
> Hive Metastore notification log event ID... {code}
> Now I use trino to connect to kudu and execute the following script。
> {code:java}
> trino:default> create schema cdr;
> CREATE SCHEMA
> trino:default> use cdr;
> USE
> trino:cdr> show schemas;
>        Schema       
> 
>  cdr                
>  default            
>  information_schema 
> (3 rows)Query 2028_033415_00020_4gwuw, FINISHED, 3 nodes
> Splits: 36 total, 36 done (100.00%)
> 0.22 [3 rows, 43B] [13 rows/s, 195B/s] 
> trino:cdr> CREATE TABLE kudu.cdr.users (
>         ->   user_id int WITH (primary_key = true),
>         ->   first_name varchar,
>         ->   last_name varchar
>         -> ) WITH (
>         ->   partition_by_hash_columns = ARRAY['user_id'],
>         ->   partition_by_hash_buckets = 2
>         -> );
>         ->  
> W1118 13:56:00.671370 31226 catalog_manager.cc:1959] Remote error: failed to 
> create HMS catalog entry for table [id=3490249b929842509d3364a18f07a4e5]: 
> failed to create Hive MetaStore table: TException - service has thrown: 
> MetaException(message=NoSuchObjectException(message:cdr))  {code}
> master log：
> {code:java}
> W1118 13:56:00.671370 31226 catalog_manager.cc:1959] Remote error: failed to 
> create HMS catalog entry for table  [id=3490249b929842509d3364a18f07a4e5]: 
> failed to create Hive MetaStore table: TException - service has thrown: 
> MetaException(message=NoSuchObjectException(message:cdr))  {code}
> Schemas   Failed to synchronize to hive metadata，If I create this database in 
> hive,the error log：
> {code:java}
> W1118 13:30:00.148990 31226 catalog_manager.cc:1959] Remote error: failed to 
> create HMS catalog entry for table [id=4a40e0c12d9a4d26a11fcce0cf259d35]: 
> failed to create Hive MetaStore table: TException - service has thrown: 
> MetaException(message=java.lang.IllegalArgumentException: Can not create a 
> Path from an empty string)  {code}
> Now I don't know how to solve this problem
> [link title|https://issues.apache.org/jira/browse/KUDU-3338]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-26122) Factorize out common docker code between DatabaseRule and AbstractExternalDB

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26122?focusedWorklogId=753467&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753467
 ]

ASF GitHub Bot logged work on HIVE-26122:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 15:30
Start Date: 06/Apr/22 15:30
Worklog Time Spent: 10m 
  Work Description: asolimando closed pull request #3182: HIVE-26122: 
Factorize out common docker code between DatabaseRule and…
URL: https://github.com/apache/hive/pull/3182




Issue Time Tracking
---

Worklog Id: (was: 753467)
Time Spent: 20m  (was: 10m)

> Factorize out common docker code between DatabaseRule and AbstractExternalDB
> 
>
> Key: HIVE-26122
> URL: https://issues.apache.org/jira/browse/HIVE-26122
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently there is a lot of shared code between the two classes which could 
> be extracted into a utility class called DockerUtils, since all this code 
> pertains docker.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (HIVE-26122) Factorize out common docker code between DatabaseRule and AbstractExternalDB

2022-04-06 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando resolved HIVE-26122.
-
Resolution: Duplicate

Thanks [~zabetak], I have missed that, closing as duplicate.

> Factorize out common docker code between DatabaseRule and AbstractExternalDB
> 
>
> Key: HIVE-26122
> URL: https://issues.apache.org/jira/browse/HIVE-26122
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently there is a lot of shared code between the two classes which could 
> be extracted into a utility class called DockerUtils, since all this code 
> pertains docker.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753466&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753466
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 15:29
Start Date: 06/Apr/22 15:29
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r844089279


##
iceberg/iceberg-handler/src/test/queries/negative/delete_iceberg_vectorized.q:
##
@@ -0,0 +1,10 @@
+set hive.vectorized.execution.enabled=true;
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

Review Comment:
   If there is a way to skip this check for Iceberg tables, then it would be 
nice





Issue Time Tracking
---

Worklog Id: (was: 753466)
Time Spent: 7.5h  (was: 7h 20m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753462&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753462
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 15:23
Start Date: 06/Apr/22 15:23
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r844076945


##
iceberg/iceberg-handler/src/test/queries/negative/delete_iceberg_vectorized.q:
##
@@ -0,0 +1,10 @@
+set hive.vectorized.execution.enabled=true;
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

Review Comment:
   We get an exception here if the txn handler does not support acid 
operations: 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/RewriteSemanticAnalyzer.java#L70
   It crossed my mind whether to disable this check for Iceberg, but it didn't 
seem worth the effort, since we only have the ASTTree available in this method 
so the parsing might be complicated





Issue Time Tracking
---

Worklog Id: (was: 753462)
Time Spent: 7h 20m  (was: 7h 10m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753461&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753461
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 15:22
Start Date: 06/Apr/22 15:22
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r844076945


##
iceberg/iceberg-handler/src/test/queries/negative/delete_iceberg_vectorized.q:
##
@@ -0,0 +1,10 @@
+set hive.vectorized.execution.enabled=true;
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

Review Comment:
   We get an exception here if the txn handler does not support acid 
operations: 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/RewriteSemanticAnalyzer.java#L70
   It crossed my mind whether to disable this check for Iceberg, but it didn't 
seem worth the effort, since we only have the ASTTree available in this method





Issue Time Tracking
---

Worklog Id: (was: 753461)
Time Spent: 7h 10m  (was: 7h)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-26113) Align HMS and metastore tables's schema

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26113?focusedWorklogId=753460&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753460
 ]

ASF GitHub Bot logged work on HIVE-26113:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 15:20
Start Date: 06/Apr/22 15:20
Worklog Time Spent: 10m 
  Work Description: asolimando commented on PR #3175:
URL: https://github.com/apache/hive/pull/3175#issuecomment-1090396057

   > Might be a different story, but I think it would be good to have some 
tests in place where we can at least run a single query against all of the 
tables on all of the different supported databases. I am a bit concerned that 
we write wrong sqls and we do not run a test against them.
   
   You are right, I have filed https://issues.apache.org/jira/browse/HIVE-26123 
and I am working on it, I will resume this one once I have it working.




Issue Time Tracking
---

Worklog Id: (was: 753460)
Time Spent: 40m  (was: 0.5h)

> Align HMS and metastore tables's schema
> ---
>
> Key: HIVE-26113
> URL: https://issues.apache.org/jira/browse/HIVE-26113
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> HMS tables should be in sync with those exposed by Hive metastore via _sysdb_.
> At the moment there are some discrepancies for the existing tables, the 
> present ticket aims at bridging this gap.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753459&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753459
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 15:18
Start Date: 06/Apr/22 15:18
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r844076945


##
iceberg/iceberg-handler/src/test/queries/negative/delete_iceberg_vectorized.q:
##
@@ -0,0 +1,10 @@
+set hive.vectorized.execution.enabled=true;
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

Review Comment:
   We get an exception here if the txn handler does not support acid 
operations: 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/RewriteSemanticAnalyzer.java#L70
   I wonder if we should avoid this check for Iceberg?





Issue Time Tracking
---

Worklog Id: (was: 753459)
Time Spent: 7h  (was: 6h 50m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-26122) Factorize out common docker code between DatabaseRule and AbstractExternalDB

2022-04-06 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26122:

Affects Version/s: 4.0.0-alpha-2
   (was: 4.0.0-alpha-1)

> Factorize out common docker code between DatabaseRule and AbstractExternalDB
> 
>
> Key: HIVE-26122
> URL: https://issues.apache.org/jira/browse/HIVE-26122
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently there is a lot of shared code between the two classes which could 
> be extracted into a utility class called DockerUtils, since all this code 
> pertains docker.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-26123) Introduce test coverage for sysdb for the different metastores

2022-04-06 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26123:

Affects Version/s: 4.0.0-alpha-2
   (was: 4.0.0-alpha-1)

> Introduce test coverage for sysdb for the different metastores
> --
>
> Key: HIVE-26123
> URL: https://issues.apache.org/jira/browse/HIVE-26123
> Project: Hive
>  Issue Type: Test
>  Components: Testing Infrastructure
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
> Fix For: 4.0.0-alpha-2
>
>
> _sydb_ provides a view over (some) metastore tables from Hive via JDBC 
> queries. 
> Existing tests are running only against Derby, meaning that any change 
> against sysdb query mapping is not covered by CI.
> The present ticket aims at bridging this gap by introducing test coverage for 
> the different supported metastore for sydb.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-25540) Enable batch update of column stats only for MySql and Postgres

2022-04-06 Thread Stamatis Zampetakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-25540:
---
Fix Version/s: 4.0.0-alpha-2
   (was: 4.0.0-alpha-1)

> Enable batch update of column stats only for MySql and Postgres 
> 
>
> Key: HIVE-25540
> URL: https://issues.apache.org/jira/browse/HIVE-25540
> Project: Hive
>  Issue Type: Sub-task
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The batch updation of partition column stats using direct sql is tested only 
> for MySql and Postgres.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-26122) Factorize out common docker code between DatabaseRule and AbstractExternalDB

2022-04-06 Thread Stamatis Zampetakis (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17518216#comment-17518216
 ] 

Stamatis Zampetakis commented on HIVE-26122:


[~asolimando] This looks like a duplicate of 
https://issues.apache.org/jira/browse/HIVE-25667. Have you seen that?

> Factorize out common docker code between DatabaseRule and AbstractExternalDB
> 
>
> Key: HIVE-26122
> URL: https://issues.apache.org/jira/browse/HIVE-26122
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Affects Versions: 4.0.0-alpha-1
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently there is a lot of shared code between the two classes which could 
> be extracted into a utility class called DockerUtils, since all this code 
> pertains docker.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-26075) hive metastore connection leaking when hiveserver2 kerberos enable and hive.server2.enable.doAs set to true

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26075:
--
Labels: pull-request-available  (was: )

> hive metastore connection leaking when hiveserver2 kerberos enable and  
> hive.server2.enable.doAs set to true
> 
>
> Key: HIVE-26075
> URL: https://issues.apache.org/jira/browse/HIVE-26075
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0
>Reporter: liuguanghua
>Assignee: liuguanghua
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-26075.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> (1)When hadoop cluster  kerberos is enable
> (2) HiveServer2 config hive.server2.enable.doAs is set true
> After a beeline   scripte has been executed, hivemetastore connection  is 
> created are in ESTABLISHED state and never closed.
> If we submit a lot of task to hiveserver2 ,this will result in hive metastore 
> thrift thread(default is 1000) full ,thus new task will fail.
>  
> HiveServer2 use ThreadLocal  to store multithreading  metastore 
> connection,the application should call Hive.closeCurrent() to close 
> connection after  task finished.
>  
> When HiveServer2 impersonate is enable (hive.server2.enable.doAs is set 
> true), the ugi   will create proxy user via  
> UserGroupInformation.createProxyUser(
> owner, UserGroupInformation.getLoginUser()),the old metastore client is never 
> closed.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-26075) hive metastore connection leaking when hiveserver2 kerberos enable and hive.server2.enable.doAs set to true

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26075?focusedWorklogId=753456&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753456
 ]

ASF GitHub Bot logged work on HIVE-26075:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 15:13
Start Date: 06/Apr/22 15:13
Worklog Time Spent: 10m 
  Work Description: lgh-cn opened a new pull request, #3183:
URL: https://github.com/apache/hive/pull/3183

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   




Issue Time Tracking
---

Worklog Id: (was: 753456)
Remaining Estimate: 0h
Time Spent: 10m

> hive metastore connection leaking when hiveserver2 kerberos enable and  
> hive.server2.enable.doAs set to true
> 
>
> Key: HIVE-26075
> URL: https://issues.apache.org/jira/browse/HIVE-26075
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0
>Reporter: liuguanghua
>Assignee: liuguanghua
>Priority: Major
> Attachments: HIVE-26075.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> (1)When hadoop cluster  kerberos is enable
> (2) HiveServer2 config hive.server2.enable.doAs is set true
> After a beeline   scripte has been executed, hivemetastore connection  is 
> created are in ESTABLISHED state and never closed.
> If we submit a lot of task to hiveserver2 ,this will result in hive metastore 
> thrift thread(default is 1000) full ,thus new task will fail.
>  
> HiveServer2 use ThreadLocal  to store multithreading  metastore 
> connection,the application should call Hive.closeCurrent() to close 
> connection after  task finished.
>  
> When HiveServer2 impersonate is enable (hive.server2.enable.doAs is set 
> true), the ugi   will create proxy user via  
> UserGroupInformation.createProxyUser(
> owner, UserGroupInformation.getLoginUser()),the old metastore client is never 
> closed.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-26104) HIVE-19138 May block queries to compile

2022-04-06 Thread Stamatis Zampetakis (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17518180#comment-17518180
 ] 

Stamatis Zampetakis commented on HIVE-26104:


[~liuyan] Can you clarify if we are talking about queries in the same session 
or different sessions?



> HIVE-19138 May block queries to compile
> ---
>
> Key: HIVE-26104
> URL: https://issues.apache.org/jira/browse/HIVE-26104
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 3.0.0, 3.1.2
>Reporter: liuyan
>Priority: Critical
>
> HIVE-19138 introduce a way to allow other queries to stay in compilation 
> state while there are placeholder for the same query in result cache.   
> However, multiple queires may enter the same state and hence used all the 
> avaliable parallel compilation limit via 
> hive.driver.parallel.compilation.global.limit.Althought we can turn off 
> this feature by setting  hive.query.results.cache.wait.for.pending.results = 
> false, but seems this negelects all the efforts that Hive-19138 trying to 
> reslove.  We need a better solution for such situation 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work started] (HIVE-26123) Introduce test coverage for sysdb for the different metastores

2022-04-06 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-26123 started by Alessandro Solimando.
---
> Introduce test coverage for sysdb for the different metastores
> --
>
> Key: HIVE-26123
> URL: https://issues.apache.org/jira/browse/HIVE-26123
> Project: Hive
>  Issue Type: Test
>  Components: Testing Infrastructure
>Affects Versions: 4.0.0-alpha-1
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
> Fix For: 4.0.0-alpha-2
>
>
> _sydb_ provides a view over (some) metastore tables from Hive via JDBC 
> queries. Existing tests are running only against Derby, meaning that any 
> change against sysdb query mapping are not covered by CI.
> The present ticket aims at bridging this gap by introducing test coverage for 
> the different supported metastore for sydb.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HIVE-26123) Introduce test coverage for sysdb for the different metastores

2022-04-06 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando reassigned HIVE-26123:
---


> Introduce test coverage for sysdb for the different metastores
> --
>
> Key: HIVE-26123
> URL: https://issues.apache.org/jira/browse/HIVE-26123
> Project: Hive
>  Issue Type: Test
>  Components: Testing Infrastructure
>Affects Versions: 4.0.0-alpha-1
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
> Fix For: 4.0.0-alpha-2
>
>
> _sydb_ provides a view over (some) metastore tables from Hive via JDBC 
> queries. Existing tests are running only against Derby, meaning that any 
> change against sysdb query mapping are not covered by CI.
> The present ticket aims at bridging this gap by introducing test coverage for 
> the different supported metastore for sydb.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-26123) Introduce test coverage for sysdb for the different metastores

2022-04-06 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26123:

Description: 
_sydb_ provides a view over (some) metastore tables from Hive via JDBC queries. 
Existing tests are running only against Derby, meaning that any change against 
sysdb query mapping is not covered by CI.

The present ticket aims at bridging this gap by introducing test coverage for 
the different supported metastore for sydb.

  was:
_sydb_ provides a view over (some) metastore tables from Hive via JDBC queries. 
Existing tests are running only against Derby, meaning that any change against 
sysdb query mapping are not covered by CI.

The present ticket aims at bridging this gap by introducing test coverage for 
the different supported metastore for sydb.


> Introduce test coverage for sysdb for the different metastores
> --
>
> Key: HIVE-26123
> URL: https://issues.apache.org/jira/browse/HIVE-26123
> Project: Hive
>  Issue Type: Test
>  Components: Testing Infrastructure
>Affects Versions: 4.0.0-alpha-1
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
> Fix For: 4.0.0-alpha-2
>
>
> _sydb_ provides a view over (some) metastore tables from Hive via JDBC 
> queries. Existing tests are running only against Derby, meaning that any 
> change against sysdb query mapping is not covered by CI.
> The present ticket aims at bridging this gap by introducing test coverage for 
> the different supported metastore for sydb.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-26123) Introduce test coverage for sysdb for the different metastores

2022-04-06 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26123:

Description: 
_sydb_ provides a view over (some) metastore tables from Hive via JDBC queries. 

Existing tests are running only against Derby, meaning that any change against 
sysdb query mapping is not covered by CI.

The present ticket aims at bridging this gap by introducing test coverage for 
the different supported metastore for sydb.

  was:
_sydb_ provides a view over (some) metastore tables from Hive via JDBC queries. 
Existing tests are running only against Derby, meaning that any change against 
sysdb query mapping is not covered by CI.

The present ticket aims at bridging this gap by introducing test coverage for 
the different supported metastore for sydb.


> Introduce test coverage for sysdb for the different metastores
> --
>
> Key: HIVE-26123
> URL: https://issues.apache.org/jira/browse/HIVE-26123
> Project: Hive
>  Issue Type: Test
>  Components: Testing Infrastructure
>Affects Versions: 4.0.0-alpha-1
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
> Fix For: 4.0.0-alpha-2
>
>
> _sydb_ provides a view over (some) metastore tables from Hive via JDBC 
> queries. 
> Existing tests are running only against Derby, meaning that any change 
> against sysdb query mapping is not covered by CI.
> The present ticket aims at bridging this gap by introducing test coverage for 
> the different supported metastore for sydb.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-26122) Factorize out common docker code between DatabaseRule and AbstractExternalDB

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26122:
--
Labels: pull-request-available  (was: )

> Factorize out common docker code between DatabaseRule and AbstractExternalDB
> 
>
> Key: HIVE-26122
> URL: https://issues.apache.org/jira/browse/HIVE-26122
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Affects Versions: 4.0.0-alpha-1
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently there is a lot of shared code between the two classes which could 
> be extracted into a utility class called DockerUtils, since all this code 
> pertains docker.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-26122) Factorize out common docker code between DatabaseRule and AbstractExternalDB

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26122?focusedWorklogId=753375&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753375
 ]

ASF GitHub Bot logged work on HIVE-26122:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 13:12
Start Date: 06/Apr/22 13:12
Worklog Time Spent: 10m 
  Work Description: asolimando opened a new pull request, #3182:
URL: https://github.com/apache/hive/pull/3182

   … AbstractExternalDB
   
   Introduced support for running docker-based tests on MacOS
   
   
   
   ### What changes were proposed in this pull request?
   
   
   Reduce code duplication by introducing a utility class for common code.
   
   ### Why are the changes needed?
   
   
   There is a lot of redundancy between the classes.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   
   Locally on MacOS and via remote CI.




Issue Time Tracking
---

Worklog Id: (was: 753375)
Remaining Estimate: 0h
Time Spent: 10m

> Factorize out common docker code between DatabaseRule and AbstractExternalDB
> 
>
> Key: HIVE-26122
> URL: https://issues.apache.org/jira/browse/HIVE-26122
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Affects Versions: 4.0.0-alpha-1
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently there is a lot of shared code between the two classes which could 
> be extracted into a utility class called DockerUtils, since all this code 
> pertains docker.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work started] (HIVE-26122) Factorize out common docker code between DatabaseRule and AbstractExternalDB

2022-04-06 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-26122 started by Alessandro Solimando.
---
> Factorize out common docker code between DatabaseRule and AbstractExternalDB
> 
>
> Key: HIVE-26122
> URL: https://issues.apache.org/jira/browse/HIVE-26122
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Affects Versions: 4.0.0-alpha-1
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
> Fix For: 4.0.0-alpha-2
>
>
> Currently there is a lot of shared code between the two classes which could 
> be extracted into a utility class called DockerUtils, since all this code 
> pertains docker.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HIVE-26122) Factorize out common docker code between DatabaseRule and AbstractExternalDB

2022-04-06 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando reassigned HIVE-26122:
---


> Factorize out common docker code between DatabaseRule and AbstractExternalDB
> 
>
> Key: HIVE-26122
> URL: https://issues.apache.org/jira/browse/HIVE-26122
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Affects Versions: 4.0.0-alpha-1
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
> Fix For: 4.0.0-alpha-2
>
>
> Currently there is a lot of shared code between the two classes which could 
> be extracted into a utility class called DockerUtils, since all this code 
> pertains docker.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753367&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753367
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 12:57
Start Date: 06/Apr/22 12:57
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r843917563


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergDeleteWriter.java:
##
@@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.io.IOException;
+import java.util.List;
+import org.apache.hadoop.io.Writable;
+import org.apache.hadoop.mapred.TaskAttemptID;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.data.Record;
+import org.apache.iceberg.deletes.PositionDelete;
+import org.apache.iceberg.io.ClusteredPositionDeleteWriter;
+import org.apache.iceberg.io.FileIO;
+import org.apache.iceberg.io.FileWriterFactory;
+import org.apache.iceberg.io.OutputFileFactory;
+import org.apache.iceberg.mr.mapred.Container;
+import org.apache.iceberg.util.Tasks;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class HiveIcebergDeleteWriter extends HiveIcebergWriter {
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveIcebergDeleteWriter.class);
+
+  private final ClusteredPositionDeleteWriter innerWriter;
+
+  HiveIcebergDeleteWriter(Schema schema, PartitionSpec spec, FileFormat 
fileFormat,
+  FileWriterFactory writerFactory, OutputFileFactory fileFactory, 
FileIO io, long targetFileSize,
+  TaskAttemptID taskAttemptID, String tableName) {
+super(schema, spec, io, taskAttemptID, tableName, true);
+this.innerWriter = new ClusteredPositionDeleteWriter<>(writerFactory, 
fileFactory, io, fileFormat, targetFileSize);
+  }
+
+  @Override
+  public void write(Writable row) throws IOException {
+Record rec = ((Container) row).get();
+PositionDelete positionDelete = 
IcebergAcidUtil.getPositionDelete(spec.schema(), rec);
+innerWriter.write(positionDelete, spec, partition(positionDelete.row()));
+  }
+
+  @Override
+  public void close(boolean abort) throws IOException {
+innerWriter.close();
+List deleteFiles = deleteFiles();
+
+// If abort then remove the unnecessary files
+if (abort) {
+  Tasks.foreach(deleteFiles)
+  .retry(3)
+  .suppressFailureWhenFinished()
+  .onFailure((file, exception) -> LOG.debug("Failed on to remove 
delete file {} on abort", file, exception))
+  .run(deleteFile -> io.deleteFile(deleteFile.path().toString()));
+}
+
+LOG.info("IcebergDeleteWriter is closed with abort={}. Created {} files", 
abort, deleteFiles.size());
+  }
+
+  @Override
+  public List deleteFiles() {

Review Comment:
   Refactored interface to be `protected abstract FileForCommit files()` and 
moved the `close()` method into the parent class





Issue Time Tracking
---

Worklog Id: (was: 753367)
Time Spent: 6h 40m  (was: 6.5h)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753368&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753368
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 12:57
Start Date: 06/Apr/22 12:57
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r843918164


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java:
##
@@ -118,18 +120,23 @@ public void commitTask(TaskAttemptContext 
originalContext) throws IOException {
   .run(output -> {
 Table table = 
HiveIcebergStorageHandler.table(context.getJobConf(), output);
 if (table != null) {
-  HiveIcebergRecordWriter writer = writers.get(output);
-  DataFile[] closedFiles;
+  HiveIcebergWriter writer = writers.get(output);
+  HiveIcebergWriter delWriter = delWriters.get(output);
+  String fileForCommitLocation = 
generateFileForCommitLocation(table.location(), jobConf,
+  attemptID.getJobID(), attemptID.getTaskID().getId());
+  if (delWriter != null) {
+DeleteFile[] closedFiles = delWriter.deleteFiles().toArray(new 
DeleteFile[0]);
+createFileForCommit(closedFiles, fileForCommitLocation, 
table.io());

Review Comment:
   I've created a new container class `FilesForCommit`, which we now use to 
serialize into S3 during commitTask, and read it back during jobCommit





Issue Time Tracking
---

Worklog Id: (was: 753368)
Time Spent: 6h 50m  (was: 6h 40m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Closed] (HIVE-25934) Non blocking RENAME PARTITION implementation

2022-04-06 Thread Denys Kuzmenko (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko closed HIVE-25934.
-

> Non blocking RENAME PARTITION implementation
> 
>
> Key: HIVE-25934
> URL: https://issues.apache.org/jira/browse/HIVE-25934
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Implement RENAME PARTITION in a way that doesn't have to wait for currently 
> running read operations to be finished.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-26104) HIVE-19138 May block queries to compile

2022-04-06 Thread liuyan (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17518050#comment-17518050
 ] 

liuyan commented on HIVE-26104:
---

Log findings 

2022-03-22 05:23:00,252 INFO  
org.apache.hadoop.hive.ql.cache.results.QueryResultsCache: 
[f4c8782f-903d-46da-96d5-4e45d72ff431 HiveServer2-Handler-Pool: 
Thread-8884649]: Waiting on pending cacheEntry

2022-03-22 05:54:25,257 INFO  org.apache.hadoop.hive.ql.Driver: 
[f4c8782f-903d-46da-96d5-4e45d72ff431 HiveServer2-Handler-Pool: 
Thread-8884649]: Semantic Analysis Completed (retrial = false)

2022-03-22 05:54:25,304 INFO  org.apache.hadoop.hive.ql.Driver: 
[f4c8782f-903d-46da-96d5-4e45d72ff431 HiveServer2-Handler-Pool: 
Thread-8884649]: Completed compiling 
command(queryId=hive_20220322052300_7c219f1f-b969-49bb-aa7b-ea2f8926ac76); Time 
taken: 1885.298 seconds

seems the query(hive_20220322052300_7c219f1f-b969-49bb-aa7b-ea2f8926ac76) was 
freezeed  for 30 minutes for compilation due to Waiting on pending cacheEntry. 


it introduces two issues : 

1.  The user does not aware of the waiting for pending cache status, so from 
the beeline or client side, the user do not know why the query is not executing 
for a very long period.  we need to notify the user in some sort of way so that 
the user aware this query is currently waiting for cache(hence will not run 
before the cache went to ready state )

2.  We had hive.driver.parallel.compilation.global.limit normally set to 3 ,  
which means that if we have 4 identical queries runs on the managed table, the 
4th query will be blocked, as well as any following queries sending to this HS2



> HIVE-19138 May block queries to compile
> ---
>
> Key: HIVE-26104
> URL: https://issues.apache.org/jira/browse/HIVE-26104
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 3.0.0, 3.1.2
>Reporter: liuyan
>Priority: Critical
>
> HIVE-19138 introduce a way to allow other queries to stay in compilation 
> state while there are placeholder for the same query in result cache.   
> However, multiple queires may enter the same state and hence used all the 
> avaliable parallel compilation limit via 
> hive.driver.parallel.compilation.global.limit.Althought we can turn off 
> this feature by setting  hive.query.results.cache.wait.for.pending.results = 
> false, but seems this negelects all the efforts that Hive-19138 trying to 
> reslove.  We need a better solution for such situation 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-26121) Hive transaction rollback should be thread-safe

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26121?focusedWorklogId=753313&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753313
 ]

ASF GitHub Bot logged work on HIVE-26121:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 11:28
Start Date: 06/Apr/22 11:28
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3181:
URL: https://github.com/apache/hive/pull/3181#discussion_r843829763


##
ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java:
##
@@ -570,7 +570,7 @@ void endTransactionAndCleanup(boolean commit) throws 
LockException {
 txnRollbackRunner = null;
   }
 
-  void endTransactionAndCleanup(boolean commit, HiveTxnManager txnManager) 
throws LockException {
+  synchronized void endTransactionAndCleanup(boolean commit, HiveTxnManager 
txnManager) throws LockException {

Review Comment:
   added





Issue Time Tracking
---

Worklog Id: (was: 753313)
Time Spent: 0.5h  (was: 20m)

> Hive transaction rollback should be thread-safe
> ---
>
> Key: HIVE-26121
> URL: https://issues.apache.org/jira/browse/HIVE-26121
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When Hive query is being interrupted via cancel request, both the background 
> pool thread (HiveServer2-Background) executing the query and the HttpHandler 
> thread (HiveServer2-Handler) running the HiveSession.cancelOperation logic 
> will eventually trigger the below method:
> {code}
> DriverTxnHandler.endTransactionAndCleanup(boolean commit)
> {code}
> Since this method could be invoked concurrently we need to synchronize access 
> to it, so that only 1 thread would attempt to abort the transaction and stop 
> the heartbeat.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-26121) Hive transaction rollback should be thread-safe

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26121?focusedWorklogId=753311&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753311
 ]

ASF GitHub Bot logged work on HIVE-26121:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 11:27
Start Date: 06/Apr/22 11:27
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3181:
URL: https://github.com/apache/hive/pull/3181#discussion_r843829383


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -710,49 +691,32 @@ private Heartbeater startHeartbeat(long initialDelay) 
throws LockException {
 return task;
   }
 
-  private void stopHeartbeat() {
-if (heartbeatTask == null) {
-  // avoid unnecessary locking if the field is null
-  return;
-}
-
-boolean isLockAcquired = false;
-try {
-  // The lock should not be held by other thread trying to stop the 
heartbeat for more than 31 seconds
-  isLockAcquired = heartbeatTaskLock.tryLock(31000, TimeUnit.MILLISECONDS);
-} catch (InterruptedException e) {
-  // safe to go on
-}
-
-try {
-  if (isLockAcquired && heartbeatTask != null) {
-heartbeatTask.cancel(true);
-long startTime = System.currentTimeMillis();
-long sleepInterval = 100;
-while (!heartbeatTask.isCancelled() && !heartbeatTask.isDone()) {
-  // We will wait for 30 seconds for the task to be cancelled.
-  // If it's still not cancelled (unlikely), we will just move on.
-  long now = System.currentTimeMillis();
-  if (now - startTime > 3) {
-LOG.warn("Heartbeat task cannot be cancelled for unknown reason. 
QueryId: " + queryId);
-break;
-  }
-  try {
-Thread.sleep(sleepInterval);
-  } catch (InterruptedException e) {
-  }
-  sleepInterval *= 2;
+  private synchronized void stopHeartbeat() {

Review Comment:
   added





Issue Time Tracking
---

Worklog Id: (was: 753311)
Remaining Estimate: 0h
Time Spent: 10m

> Hive transaction rollback should be thread-safe
> ---
>
> Key: HIVE-26121
> URL: https://issues.apache.org/jira/browse/HIVE-26121
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When Hive query is being interrupted via cancel request, both the background 
> pool thread (HiveServer2-Background) executing the query and the HttpHandler 
> thread (HiveServer2-Handler) running the HiveSession.cancelOperation logic 
> will eventually trigger the below method:
> {code}
> DriverTxnHandler.endTransactionAndCleanup(boolean commit)
> {code}
> Since this method could be invoked concurrently we need to synchronize access 
> to it, so that only 1 thread would attempt to abort the transaction and stop 
> the heartbeat.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-26121) Hive transaction rollback should be thread-safe

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26121?focusedWorklogId=753312&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753312
 ]

ASF GitHub Bot logged work on HIVE-26121:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 11:27
Start Date: 06/Apr/22 11:27
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3181:
URL: https://github.com/apache/hive/pull/3181#discussion_r843829613


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -574,30 +571,24 @@ public void rollbackTxn() throws LockException {
 if (!isTxnOpen()) {
   throw new RuntimeException("Attempt to rollback before opening a 
transaction");
 }
-stopHeartbeat();
-
 try {
-  lockMgr.clearLocalLockRecords();
+  clearLocksAndHB();
   LOG.debug("Rolling back " + JavaUtils.txnIdToString(txnId));
-
-  // Re-checking as txn could have been closed, in the meantime, by a 
competing thread.
-  if (isTxnOpen()) {
-if (replPolicy != null) {
-  getMS().replRollbackTxn(txnId, replPolicy, TxnType.DEFAULT);
-} else {
-  getMS().rollbackTxn(txnId);
-}
+  
+  if (replPolicy != null) {

Review Comment:
   marked as @NotThreadSafe





Issue Time Tracking
---

Worklog Id: (was: 753312)
Time Spent: 20m  (was: 10m)

> Hive transaction rollback should be thread-safe
> ---
>
> Key: HIVE-26121
> URL: https://issues.apache.org/jira/browse/HIVE-26121
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When Hive query is being interrupted via cancel request, both the background 
> pool thread (HiveServer2-Background) executing the query and the HttpHandler 
> thread (HiveServer2-Handler) running the HiveSession.cancelOperation logic 
> will eventually trigger the below method:
> {code}
> DriverTxnHandler.endTransactionAndCleanup(boolean commit)
> {code}
> Since this method could be invoked concurrently we need to synchronize access 
> to it, so that only 1 thread would attempt to abort the transaction and stop 
> the heartbeat.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-26121) Hive transaction rollback should be thread-safe

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26121:
--
Labels: pull-request-available  (was: )

> Hive transaction rollback should be thread-safe
> ---
>
> Key: HIVE-26121
> URL: https://issues.apache.org/jira/browse/HIVE-26121
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When Hive query is being interrupted via cancel request, both the background 
> pool thread (HiveServer2-Background) executing the query and the HttpHandler 
> thread (HiveServer2-Handler) running the HiveSession.cancelOperation logic 
> will eventually trigger the below method:
> {code}
> DriverTxnHandler.endTransactionAndCleanup(boolean commit)
> {code}
> Since this method could be invoked concurrently we need to synchronize access 
> to it, so that only 1 thread would attempt to abort the transaction and stop 
> the heartbeat.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-26121) Hive transaction rollback should be thread-safe

2022-04-06 Thread Denys Kuzmenko (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-26121:
--
Description: 
When Hive query is being interrupted via cancel request, both the background 
pool thread (HiveServer2-Background) executing the query and the HttpHandler 
thread (HiveServer2-Handler) running the HiveSession.cancelOperation logic will 
eventually trigger the below method:
{code}
DriverTxnHandler.endTransactionAndCleanup(boolean commit)
{code}
Since this method could be invoked concurrently we need to synchronize access 
to it, so that only 1 thread would attempt to abort the transaction and stop 
the heartbeat.

  was:
When Hive query is being interrupted via cancel request, both the background 
pool thread (HiveServer2-Background) executing the query and the HttpHandler 
thread (HiveServer2-Handler) running the HiveSession.cancelOperation logic will 
eventually trigger the below method:
{code}
DriverTxnHandler.endTransactionAndCleanup(boolean commit)
{code}
Since this method could be invoked concurrently we need to synchronize access 
to it, so that one 1 thread would abort the transaction and stop the heartbeat.


> Hive transaction rollback should be thread-safe
> ---
>
> Key: HIVE-26121
> URL: https://issues.apache.org/jira/browse/HIVE-26121
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Priority: Major
>
> When Hive query is being interrupted via cancel request, both the background 
> pool thread (HiveServer2-Background) executing the query and the HttpHandler 
> thread (HiveServer2-Handler) running the HiveSession.cancelOperation logic 
> will eventually trigger the below method:
> {code}
> DriverTxnHandler.endTransactionAndCleanup(boolean commit)
> {code}
> Since this method could be invoked concurrently we need to synchronize access 
> to it, so that only 1 thread would attempt to abort the transaction and stop 
> the heartbeat.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-26121) Hive transaction rollback should be thread-safe

2022-04-06 Thread Denys Kuzmenko (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-26121:
--
Description: 
When Hive query is being interrupted via cancel request, both the background 
pool thread (HiveServer2-Background) executing the query and the HttpHandler 
thread (HiveServer2-Handler) running the HiveSession.cancelOperation logic will 
eventually trigger the below method:
{code}
DriverTxnHandler.endTransactionAndCleanup(boolean commit)
{code}
Since this method could be invoked concurrently we need to synchronize access 
to it, so that one 1 thread would abort the transaction and stop the heartbeat.

> Hive transaction rollback should be thread-safe
> ---
>
> Key: HIVE-26121
> URL: https://issues.apache.org/jira/browse/HIVE-26121
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Priority: Major
>
> When Hive query is being interrupted via cancel request, both the background 
> pool thread (HiveServer2-Background) executing the query and the HttpHandler 
> thread (HiveServer2-Handler) running the HiveSession.cancelOperation logic 
> will eventually trigger the below method:
> {code}
> DriverTxnHandler.endTransactionAndCleanup(boolean commit)
> {code}
> Since this method could be invoked concurrently we need to synchronize access 
> to it, so that one 1 thread would abort the transaction and stop the 
> heartbeat.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-22420) DbTxnManager.stopHeartbeat() should be thread-safe

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22420?focusedWorklogId=753289&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753289
 ]

ASF GitHub Bot logged work on HIVE-22420:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 10:05
Start Date: 06/Apr/22 10:05
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3181:
URL: https://github.com/apache/hive/pull/3181#discussion_r843750551


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -710,49 +691,32 @@ private Heartbeater startHeartbeat(long initialDelay) 
throws LockException {
 return task;
   }
 
-  private void stopHeartbeat() {
-if (heartbeatTask == null) {
-  // avoid unnecessary locking if the field is null
-  return;
-}
-
-boolean isLockAcquired = false;
-try {
-  // The lock should not be held by other thread trying to stop the 
heartbeat for more than 31 seconds
-  isLockAcquired = heartbeatTaskLock.tryLock(31000, TimeUnit.MILLISECONDS);
-} catch (InterruptedException e) {
-  // safe to go on
-}
-
-try {
-  if (isLockAcquired && heartbeatTask != null) {
-heartbeatTask.cancel(true);
-long startTime = System.currentTimeMillis();
-long sleepInterval = 100;
-while (!heartbeatTask.isCancelled() && !heartbeatTask.isDone()) {
-  // We will wait for 30 seconds for the task to be cancelled.
-  // If it's still not cancelled (unlikely), we will just move on.
-  long now = System.currentTimeMillis();
-  if (now - startTime > 3) {
-LOG.warn("Heartbeat task cannot be cancelled for unknown reason. 
QueryId: " + queryId);
-break;
-  }
-  try {
-Thread.sleep(sleepInterval);
-  } catch (InterruptedException e) {
-  }
-  sleepInterval *= 2;
+  private synchronized void stopHeartbeat() {

Review Comment:
   Comment here as well





Issue Time Tracking
---

Worklog Id: (was: 753289)
Time Spent: 40m  (was: 0.5h)

> DbTxnManager.stopHeartbeat() should be thread-safe
> --
>
> Key: HIVE-22420
> URL: https://issues.apache.org/jira/browse/HIVE-22420
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Aron Hamvas
>Assignee: Aron Hamvas
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 4.0.0-alpha-1
>
> Attachments: HIVE-22420.1.patch, HIVE-22420.2.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When a transactional query is being executed and interrupted via HS2 close 
> operation request, both the background pool thread executing the query and 
> the HttpHandler thread running the close operation logic will eventually call 
> the below method:
> {noformat}
> Driver.releaseLocksAndCommitOrRollback(commit boolean)
> {noformat}
> Since this method is invoked several times in both threads, it can happen 
> that the two threads invoke it at the same time, and due to a race condition, 
> the txnId field of the DbTxnManager used by both threads could be set to 0 
> without actually successfully aborting the transaction.
> The root cause is stopHeartbeat() method in DbTxnManager not being thread 
> safe:
> When Thread-1 and Thread-2 enter stopHeartbeat() with very little time 
> difference, Thread-1 might successfully cancel the heartbeat task and set the 
> heartbeatTask field to null, while Thread-2 is trying to observe its state. 
> Thread-1 will return to the calling rollbackTxn() method and continue 
> execution there, while Thread-2 wis thrown back to the same method with a 
> NullPointerException. Thread-2 will then set txnId to 0, and Thread-1 is 
> sending this 0 value to HMS. So, the txn will not be aborted, and the locks 
> cannot be released later on either.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-22420) DbTxnManager.stopHeartbeat() should be thread-safe

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22420?focusedWorklogId=753288&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753288
 ]

ASF GitHub Bot logged work on HIVE-22420:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 10:04
Start Date: 06/Apr/22 10:04
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3181:
URL: https://github.com/apache/hive/pull/3181#discussion_r843750056


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -574,30 +571,24 @@ public void rollbackTxn() throws LockException {
 if (!isTxnOpen()) {
   throw new RuntimeException("Attempt to rollback before opening a 
transaction");
 }
-stopHeartbeat();
-
 try {
-  lockMgr.clearLocalLockRecords();
+  clearLocksAndHB();
   LOG.debug("Rolling back " + JavaUtils.txnIdToString(txnId));
-
-  // Re-checking as txn could have been closed, in the meantime, by a 
competing thread.
-  if (isTxnOpen()) {
-if (replPolicy != null) {
-  getMS().replRollbackTxn(txnId, replPolicy, TxnType.DEFAULT);
-} else {
-  getMS().rollbackTxn(txnId);
-}
+  
+  if (replPolicy != null) {

Review Comment:
   If we expect that this class should not be shared between threads, then we 
should write a comment on the class level for it





Issue Time Tracking
---

Worklog Id: (was: 753288)
Time Spent: 0.5h  (was: 20m)

> DbTxnManager.stopHeartbeat() should be thread-safe
> --
>
> Key: HIVE-22420
> URL: https://issues.apache.org/jira/browse/HIVE-22420
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Aron Hamvas
>Assignee: Aron Hamvas
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 4.0.0-alpha-1
>
> Attachments: HIVE-22420.1.patch, HIVE-22420.2.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When a transactional query is being executed and interrupted via HS2 close 
> operation request, both the background pool thread executing the query and 
> the HttpHandler thread running the close operation logic will eventually call 
> the below method:
> {noformat}
> Driver.releaseLocksAndCommitOrRollback(commit boolean)
> {noformat}
> Since this method is invoked several times in both threads, it can happen 
> that the two threads invoke it at the same time, and due to a race condition, 
> the txnId field of the DbTxnManager used by both threads could be set to 0 
> without actually successfully aborting the transaction.
> The root cause is stopHeartbeat() method in DbTxnManager not being thread 
> safe:
> When Thread-1 and Thread-2 enter stopHeartbeat() with very little time 
> difference, Thread-1 might successfully cancel the heartbeat task and set the 
> heartbeatTask field to null, while Thread-2 is trying to observe its state. 
> Thread-1 will return to the calling rollbackTxn() method and continue 
> execution there, while Thread-2 wis thrown back to the same method with a 
> NullPointerException. Thread-2 will then set txnId to 0, and Thread-1 is 
> sending this 0 value to HMS. So, the txn will not be aborted, and the locks 
> cannot be released later on either.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-22420) DbTxnManager.stopHeartbeat() should be thread-safe

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22420?focusedWorklogId=753285&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753285
 ]

ASF GitHub Bot logged work on HIVE-22420:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 10:00
Start Date: 06/Apr/22 10:00
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3181:
URL: https://github.com/apache/hive/pull/3181#discussion_r843745809


##
ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java:
##
@@ -570,7 +570,7 @@ void endTransactionAndCleanup(boolean commit) throws 
LockException {
 txnRollbackRunner = null;
   }
 
-  void endTransactionAndCleanup(boolean commit, HiveTxnManager txnManager) 
throws LockException {
+  synchronized void endTransactionAndCleanup(boolean commit, HiveTxnManager 
txnManager) throws LockException {

Review Comment:
   Could we leave a comment here, why is this synchronized?





Issue Time Tracking
---

Worklog Id: (was: 753285)
Time Spent: 20m  (was: 10m)

> DbTxnManager.stopHeartbeat() should be thread-safe
> --
>
> Key: HIVE-22420
> URL: https://issues.apache.org/jira/browse/HIVE-22420
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Aron Hamvas
>Assignee: Aron Hamvas
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 4.0.0-alpha-1
>
> Attachments: HIVE-22420.1.patch, HIVE-22420.2.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When a transactional query is being executed and interrupted via HS2 close 
> operation request, both the background pool thread executing the query and 
> the HttpHandler thread running the close operation logic will eventually call 
> the below method:
> {noformat}
> Driver.releaseLocksAndCommitOrRollback(commit boolean)
> {noformat}
> Since this method is invoked several times in both threads, it can happen 
> that the two threads invoke it at the same time, and due to a race condition, 
> the txnId field of the DbTxnManager used by both threads could be set to 0 
> without actually successfully aborting the transaction.
> The root cause is stopHeartbeat() method in DbTxnManager not being thread 
> safe:
> When Thread-1 and Thread-2 enter stopHeartbeat() with very little time 
> difference, Thread-1 might successfully cancel the heartbeat task and set the 
> heartbeatTask field to null, while Thread-2 is trying to observe its state. 
> Thread-1 will return to the calling rollbackTxn() method and continue 
> execution there, while Thread-2 wis thrown back to the same method with a 
> NullPointerException. Thread-2 will then set txnId to 0, and Thread-1 is 
> sending this 0 value to HMS. So, the txn will not be aborted, and the locks 
> cannot be released later on either.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-26120) beeline return 0 when Could not open connection to the HS2 server ERROR

2022-04-06 Thread MK (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MK updated HIVE-26120:
--
Summary: beeline return 0 when Could not open connection to the HS2 server 
ERROR  (was: beeline return 0 when Could not open connection to the HS2 server)

> beeline return 0 when Could not open connection to the HS2 server ERROR
> ---
>
> Key: HIVE-26120
> URL: https://issues.apache.org/jira/browse/HIVE-26120
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: MK
>Priority: Major
>
> when execute  :  beeline -u 'jdbc:hive2://bigdata-hs111:10003' -n 'etl' -p 
> '**' -f /opt/project/DWD/SPD/xxx.sql    and  bigdata-hs111  doesn't 
> exists  or can't connect , the command return code is 0 ,  NOT a Non-zero 
> value . 
>  
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/data/programs/apache-hive-3.1.2-bin/lib/log4j-slf4j-impl-2.17.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/data/programs/hadoop-3.1.4/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Connecting to jdbc:hive2://bigdata-hs111:10003
> 2022-04-06T17:28:04,247 WARN [main] org.apache.hive.jdbc.Utils - Could not 
> retrieve canonical hostname for bigdata-hs111
> java.net.UnknownHostException: bigdata-hs111: Name or service not known
>         at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) 
> ~[?:1.8.0_191]
>         at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) 
> ~[?:1.8.0_191]
>         at 
> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) 
> ~[?:1.8.0_191]
>         at java.net.InetAddress.getAllByName0(InetAddress.java:1277) 
> ~[?:1.8.0_191]
>         at java.net.InetAddress.getAllByName(InetAddress.java:1193) 
> ~[?:1.8.0_191]
>         at java.net.InetAddress.getAllByName(InetAddress.java:1127) 
> ~[?:1.8.0_191]
>         at java.net.InetAddress.getByName(InetAddress.java:1077) 
> ~[?:1.8.0_191]
>         at org.apache.hive.jdbc.Utils.getCanonicalHostName(Utils.java:701) 
> [hive-jdbc-3.1.2.jar:3.1.2]
>         at 
> org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:178) 
> [hive-jdbc-3.1.2.jar:3.1.2]
>         at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:107) 
> [hive-jdbc-3.1.2.jar:3.1.2]
>         at java.sql.DriverManager.getConnection(DriverManager.java:664) 
> [?:1.8.0_191]
>         at java.sql.DriverManager.getConnection(DriverManager.java:208) 
> [?:1.8.0_191]
>         at 
> org.apache.hive.beeline.DatabaseConnection.connect(DatabaseConnection.java:145)
>  [hive-beeline-3.1.2.jar:3.1.2]
>         at 
> org.apache.hive.beeline.DatabaseConnection.getConnection(DatabaseConnection.java:209)
>  [hive-beeline-3.1.2.jar:3.1.2]
>         at org.apache.hive.beeline.Commands.connect(Commands.java:1641) 
> [hive-beeline-3.1.2.jar:3.1.2]
>         at org.apache.hive.beeline.Commands.connect(Commands.java:1536) 
> [hive-beeline-3.1.2.jar:3.1.2]
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_191]
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_191]
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_191]
>         at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_191]
>         at 
> org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:56)
>  [hive-beeline-3.1.2.jar:3.1.2]
>         at 
> org.apache.hive.beeline.BeeLine.execCommandWithPrefix(BeeLine.java:1384) 
> [hive-beeline-3.1.2.jar:3.1.2]
>         at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1423) 
> [hive-beeline-3.1.2.jar:3.1.2]
>         at org.apache.hive.beeline.BeeLine.connectUsingArgs(BeeLine.java:900) 
> [hive-beeline-3.1.2.jar:3.1.2]
>         at org.apache.hive.beeline.BeeLine.initArgs(BeeLine.java:795) 
> [hive-beeline-3.1.2.jar:3.1.2]
>         at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1048) 
> [hive-beeline-3.1.2.jar:3.1.2]
>         at 
> org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:538) 
> [hive-beeline-3.1.2.jar:3.1.2]
>         at org.apache.hive.beeline.BeeLine.main(BeeLine.java:520) 
> [hive-beeline-3.1.2.jar:3.1.2]
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_191]
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_191]
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm

[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753278&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753278
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 09:29
Start Date: 06/Apr/22 09:29
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r843715671


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java:
##
@@ -156,11 +163,19 @@ public void abortTask(TaskAttemptContext originalContext) 
throws IOException {
 TaskAttemptContext context = 
TezUtil.enrichContextWithAttemptWrapper(originalContext);
 
 // Clean up writer data from the local store
-Map writers = 
HiveIcebergRecordWriter.removeWriters(context.getTaskAttemptID());
+Map writers = 
HiveIcebergWriter.getRecordWriters(context.getTaskAttemptID());

Review Comment:
   As discussed, let's use a single writer map for both DeleteWriters and 
RecordWriters





Issue Time Tracking
---

Worklog Id: (was: 753278)
Time Spent: 6.5h  (was: 6h 20m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753277&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753277
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 09:28
Start Date: 06/Apr/22 09:28
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r843714944


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java:
##
@@ -118,18 +120,23 @@ public void commitTask(TaskAttemptContext 
originalContext) throws IOException {
   .run(output -> {
 Table table = 
HiveIcebergStorageHandler.table(context.getJobConf(), output);
 if (table != null) {
-  HiveIcebergRecordWriter writer = writers.get(output);
-  DataFile[] closedFiles;
+  HiveIcebergWriter writer = writers.get(output);
+  HiveIcebergWriter delWriter = delWriters.get(output);
+  String fileForCommitLocation = 
generateFileForCommitLocation(table.location(), jobConf,
+  attemptID.getJobID(), attemptID.getTaskID().getId());
+  if (delWriter != null) {
+DeleteFile[] closedFiles = delWriter.deleteFiles().toArray(new 
DeleteFile[0]);
+createFileForCommit(closedFiles, fileForCommitLocation, 
table.io());

Review Comment:
   >  the S3 files is where we will spend some serious time
   
   Makes sense.
   As discussed, let's create a container object which we can 
serialize/deserialize





Issue Time Tracking
---

Worklog Id: (was: 753277)
Time Spent: 6h 20m  (was: 6h 10m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (HIVE-26075) hive metastore connection leaking when hiveserver2 kerberos enable and hive.server2.enable.doAs set to true

2022-04-06 Thread liuguanghua (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517981#comment-17517981
 ] 

liuguanghua edited comment on HIVE-26075 at 4/6/22 9:22 AM:


I have tested this problem is  reproduction on hive version 1.2.2 .  But the 
version 2.3.3 does not have the problem. The Master version I don't have tested 
because of lack of environment.

So I will push a PR on version 1.2.2.   


was (Author: liuguanghua):
I have tested this problem is  reproduction on hive version 1.2.2 .  But the 
version 2.3.3 does not have the problem. The Master doesn't have tested.

So I will push a PR on version 1.2.2.   

> hive metastore connection leaking when hiveserver2 kerberos enable and  
> hive.server2.enable.doAs set to true
> 
>
> Key: HIVE-26075
> URL: https://issues.apache.org/jira/browse/HIVE-26075
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0
>Reporter: liuguanghua
>Assignee: liuguanghua
>Priority: Major
> Attachments: HIVE-26075.patch
>
>
> (1)When hadoop cluster  kerberos is enable
> (2) HiveServer2 config hive.server2.enable.doAs is set true
> After a beeline   scripte has been executed, hivemetastore connection  is 
> created are in ESTABLISHED state and never closed.
> If we submit a lot of task to hiveserver2 ,this will result in hive metastore 
> thrift thread(default is 1000) full ,thus new task will fail.
>  
> HiveServer2 use ThreadLocal  to store multithreading  metastore 
> connection,the application should call Hive.closeCurrent() to close 
> connection after  task finished.
>  
> When HiveServer2 impersonate is enable (hive.server2.enable.doAs is set 
> true), the ugi   will create proxy user via  
> UserGroupInformation.createProxyUser(
> owner, UserGroupInformation.getLoginUser()),the old metastore client is never 
> closed.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-26075) hive metastore connection leaking when hiveserver2 kerberos enable and hive.server2.enable.doAs set to true

2022-04-06 Thread liuguanghua (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517981#comment-17517981
 ] 

liuguanghua commented on HIVE-26075:


I have tested this problem is  reproduction on hive version 1.2.2 .  But the 
version 2.3.3 does not have the problem. The Master doesn't have tested.

So I will push a PR on version 1.2.2.   

> hive metastore connection leaking when hiveserver2 kerberos enable and  
> hive.server2.enable.doAs set to true
> 
>
> Key: HIVE-26075
> URL: https://issues.apache.org/jira/browse/HIVE-26075
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0
>Reporter: liuguanghua
>Assignee: liuguanghua
>Priority: Major
> Attachments: HIVE-26075.patch
>
>
> (1)When hadoop cluster  kerberos is enable
> (2) HiveServer2 config hive.server2.enable.doAs is set true
> After a beeline   scripte has been executed, hivemetastore connection  is 
> created are in ESTABLISHED state and never closed.
> If we submit a lot of task to hiveserver2 ,this will result in hive metastore 
> thrift thread(default is 1000) full ,thus new task will fail.
>  
> HiveServer2 use ThreadLocal  to store multithreading  metastore 
> connection,the application should call Hive.closeCurrent() to close 
> connection after  task finished.
>  
> When HiveServer2 impersonate is enable (hive.server2.enable.doAs is set 
> true), the ugi   will create proxy user via  
> UserGroupInformation.createProxyUser(
> owner, UserGroupInformation.getLoginUser()),the old metastore client is never 
> closed.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] (HIVE-26075) hive metastore connection leaking when hiveserver2 kerberos enable and hive.server2.enable.doAs set to true

2022-04-06 Thread liuguanghua (Jira)



[ https://issues.apache.org/jira/browse/HIVE-26075 ]


liuguanghua deleted comment on HIVE-26075:


was (Author: liuguanghua):
I have tested hive version that is 1.2.2 and 2.3.3. Both of them has the same 
problem

> hive metastore connection leaking when hiveserver2 kerberos enable and  
> hive.server2.enable.doAs set to true
> 
>
> Key: HIVE-26075
> URL: https://issues.apache.org/jira/browse/HIVE-26075
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0
>Reporter: liuguanghua
>Assignee: liuguanghua
>Priority: Major
> Attachments: HIVE-26075.patch
>
>
> (1)When hadoop cluster  kerberos is enable
> (2) HiveServer2 config hive.server2.enable.doAs is set true
> After a beeline   scripte has been executed, hivemetastore connection  is 
> created are in ESTABLISHED state and never closed.
> If we submit a lot of task to hiveserver2 ,this will result in hive metastore 
> thrift thread(default is 1000) full ,thus new task will fail.
>  
> HiveServer2 use ThreadLocal  to store multithreading  metastore 
> connection,the application should call Hive.closeCurrent() to close 
> connection after  task finished.
>  
> When HiveServer2 impersonate is enable (hive.server2.enable.doAs is set 
> true), the ugi   will create proxy user via  
> UserGroupInformation.createProxyUser(
> owner, UserGroupInformation.getLoginUser()),the old metastore client is never 
> closed.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] (HIVE-26075) hive metastore connection leaking when hiveserver2 kerberos enable and hive.server2.enable.doAs set to true

2022-04-06 Thread liuguanghua (Jira)



[ https://issues.apache.org/jira/browse/HIVE-26075 ]


liuguanghua deleted comment on HIVE-26075:


was (Author: liuguanghua):
Thank you very much.I will pull requests on GitHub

> hive metastore connection leaking when hiveserver2 kerberos enable and  
> hive.server2.enable.doAs set to true
> 
>
> Key: HIVE-26075
> URL: https://issues.apache.org/jira/browse/HIVE-26075
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0
>Reporter: liuguanghua
>Assignee: liuguanghua
>Priority: Major
> Attachments: HIVE-26075.patch
>
>
> (1)When hadoop cluster  kerberos is enable
> (2) HiveServer2 config hive.server2.enable.doAs is set true
> After a beeline   scripte has been executed, hivemetastore connection  is 
> created are in ESTABLISHED state and never closed.
> If we submit a lot of task to hiveserver2 ,this will result in hive metastore 
> thrift thread(default is 1000) full ,thus new task will fail.
>  
> HiveServer2 use ThreadLocal  to store multithreading  metastore 
> connection,the application should call Hive.closeCurrent() to close 
> connection after  task finished.
>  
> When HiveServer2 impersonate is enable (hive.server2.enable.doAs is set 
> true), the ugi   will create proxy user via  
> UserGroupInformation.createProxyUser(
> owner, UserGroupInformation.getLoginUser()),the old metastore client is never 
> closed.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-26075) hive metastore connection leaking when hiveserver2 kerberos enable and hive.server2.enable.doAs set to true

2022-04-06 Thread liuguanghua (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liuguanghua updated HIVE-26075:
---
Affects Version/s: 1.2.0
   (was: All Versions)

> hive metastore connection leaking when hiveserver2 kerberos enable and  
> hive.server2.enable.doAs set to true
> 
>
> Key: HIVE-26075
> URL: https://issues.apache.org/jira/browse/HIVE-26075
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0
>Reporter: liuguanghua
>Assignee: liuguanghua
>Priority: Major
> Attachments: HIVE-26075.patch
>
>
> (1)When hadoop cluster  kerberos is enable
> (2) HiveServer2 config hive.server2.enable.doAs is set true
> After a beeline   scripte has been executed, hivemetastore connection  is 
> created are in ESTABLISHED state and never closed.
> If we submit a lot of task to hiveserver2 ,this will result in hive metastore 
> thrift thread(default is 1000) full ,thus new task will fail.
>  
> HiveServer2 use ThreadLocal  to store multithreading  metastore 
> connection,the application should call Hive.closeCurrent() to close 
> connection after  task finished.
>  
> When HiveServer2 impersonate is enable (hive.server2.enable.doAs is set 
> true), the ugi   will create proxy user via  
> UserGroupInformation.createProxyUser(
> owner, UserGroupInformation.getLoginUser()),the old metastore client is never 
> closed.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753266&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753266
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 09:06
Start Date: 06/Apr/22 09:06
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r843693954


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergDeleteWriter.java:
##
@@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.io.IOException;
+import java.util.List;
+import org.apache.hadoop.io.Writable;
+import org.apache.hadoop.mapred.TaskAttemptID;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.data.Record;
+import org.apache.iceberg.deletes.PositionDelete;
+import org.apache.iceberg.io.ClusteredPositionDeleteWriter;
+import org.apache.iceberg.io.FileIO;
+import org.apache.iceberg.io.FileWriterFactory;
+import org.apache.iceberg.io.OutputFileFactory;
+import org.apache.iceberg.mr.mapred.Container;
+import org.apache.iceberg.util.Tasks;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class HiveIcebergDeleteWriter extends HiveIcebergWriter {

Review Comment:
   Let's talk about this offline





Issue Time Tracking
---

Worklog Id: (was: 753266)
Time Spent: 6h 10m  (was: 6h)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753264&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753264
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 09:05
Start Date: 06/Apr/22 09:05
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r843693178


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java:
##
@@ -118,18 +120,23 @@ public void commitTask(TaskAttemptContext 
originalContext) throws IOException {
   .run(output -> {
 Table table = 
HiveIcebergStorageHandler.table(context.getJobConf(), output);
 if (table != null) {
-  HiveIcebergRecordWriter writer = writers.get(output);
-  DataFile[] closedFiles;
+  HiveIcebergWriter writer = writers.get(output);
+  HiveIcebergWriter delWriter = delWriters.get(output);
+  String fileForCommitLocation = 
generateFileForCommitLocation(table.location(), jobConf,
+  attemptID.getJobID(), attemptID.getTaskID().getId());
+  if (delWriter != null) {
+DeleteFile[] closedFiles = delWriter.deleteFiles().toArray(new 
DeleteFile[0]);
+createFileForCommit(closedFiles, fileForCommitLocation, 
table.io());

Review Comment:
   Maybe we can create a little bit more complex data structure to serialise.
   I think creating/reading back the S3 files is where we will spend some 
serious time





Issue Time Tracking
---

Worklog Id: (was: 753264)
Time Spent: 6h  (was: 5h 50m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25980) Reduce fs calls in HiveMetaStoreChecker.checkTable

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25980?focusedWorklogId=753261&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753261
 ]

ASF GitHub Bot logged work on HIVE-25980:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 09:01
Start Date: 06/Apr/22 09:01
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3053:
URL: https://github.com/apache/hive/pull/3053#discussion_r843688528


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java:
##
@@ -422,18 +418,50 @@ void findUnknownPartitions(Table table, Set 
partPaths, byte[] filterExp,
   }
   allPartDirs = partDirs;
 }
-// don't want the table dir
-allPartDirs.remove(tablePath);
-
-// remove the partition paths we know about
-allPartDirs.removeAll(partPaths);
-
 Set partColNames = Sets.newHashSet();
 for(FieldSchema fSchema : getPartCols(table)) {
   partColNames.add(fSchema.getName());
 }
 
 Map partitionColToTypeMap = 
getPartitionColtoTypeMap(table.getPartitionKeys());
+
+Set partPathsInMS = new HashSet<>(partPaths);

Review Comment:
   Could we just collect the needed path objects outside?





Issue Time Tracking
---

Worklog Id: (was: 753261)
Time Spent: 5h  (was: 4h 50m)

> Reduce fs calls in HiveMetaStoreChecker.checkTable
> --
>
> Key: HIVE-25980
> URL: https://issues.apache.org/jira/browse/HIVE-25980
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> MSCK Repair table for high partition table can perform slow on Cloud Storage 
> such as S3, one of the case we found where slowness was observed in 
> HiveMetaStoreChecker.checkTable.
> {code:java}
> "HiveServer2-Background-Pool: Thread-382" #382 prio=5 os_prio=0 
> tid=0x7f97fc4a4000 nid=0x5c2a runnable [0x7f97c41a8000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>   at java.net.SocketInputStream.read(SocketInputStream.java:171)
>   at java.net.SocketInputStream.read(SocketInputStream.java:141)
>   at 
> sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:464)
>   at 
> sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68)
>   at 
> sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1341)
>   at sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:73)
>   at 
> sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:957)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
>   at 
> com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
>   at 
> com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:82)
>   at 
> com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
>   at 
> com.amazonaws.thir

[jira] [Work logged] (HIVE-25967) Prevent residual expressions from getting serialized in Iceberg splits

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25967?focusedWorklogId=753252&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753252
 ]

ASF GitHub Bot logged work on HIVE-25967:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 08:36
Start Date: 06/Apr/22 08:36
Worklog Time Spent: 10m 
  Work Description: szlta merged PR #3178:
URL: https://github.com/apache/hive/pull/3178




Issue Time Tracking
---

Worklog Id: (was: 753252)
Time Spent: 1h  (was: 50m)

> Prevent residual expressions from getting serialized in Iceberg splits
> --
>
> Key: HIVE-25967
> URL: https://issues.apache.org/jira/browse/HIVE-25967
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This hack removes residual expressions from the file scan task just before 
> split serialization.
> Residuals can sometime take up too much space in the payload causing Tez AM 
> to OOM.
> Unfortunately Tez AM doesn't distribute splits in a streamed way, that is, it 
> serializes all splits for a job before sending them out to executors. Some 
> residuals may take ~ 1 MB in memory, multiplied with thousands of split could 
> kill the Tez AM JVM.
> Until the streamed split distribution is implemented we will kick residuals 
> out of the split.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-22420) DbTxnManager.stopHeartbeat() should be thread-safe

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22420?focusedWorklogId=753241&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753241
 ]

ASF GitHub Bot logged work on HIVE-22420:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 08:06
Start Date: 06/Apr/22 08:06
Worklog Time Spent: 10m 
  Work Description: deniskuzZ opened a new pull request, #3181:
URL: https://github.com/apache/hive/pull/3181

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   Proper ACID handling in case of operation interruption
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   




Issue Time Tracking
---

Worklog Id: (was: 753241)
Remaining Estimate: 0h
Time Spent: 10m

> DbTxnManager.stopHeartbeat() should be thread-safe
> --
>
> Key: HIVE-22420
> URL: https://issues.apache.org/jira/browse/HIVE-22420
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Aron Hamvas
>Assignee: Aron Hamvas
>Priority: Major
> Fix For: 4.0.0, 4.0.0-alpha-1
>
> Attachments: HIVE-22420.1.patch, HIVE-22420.2.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When a transactional query is being executed and interrupted via HS2 close 
> operation request, both the background pool thread executing the query and 
> the HttpHandler thread running the close operation logic will eventually call 
> the below method:
> {noformat}
> Driver.releaseLocksAndCommitOrRollback(commit boolean)
> {noformat}
> Since this method is invoked several times in both threads, it can happen 
> that the two threads invoke it at the same time, and due to a race condition, 
> the txnId field of the DbTxnManager used by both threads could be set to 0 
> without actually successfully aborting the transaction.
> The root cause is stopHeartbeat() method in DbTxnManager not being thread 
> safe:
> When Thread-1 and Thread-2 enter stopHeartbeat() with very little time 
> difference, Thread-1 might successfully cancel the heartbeat task and set the 
> heartbeatTask field to null, while Thread-2 is trying to observe its state. 
> Thread-1 will return to the calling rollbackTxn() method and continue 
> execution there, while Thread-2 wis thrown back to the same method with a 
> NullPointerException. Thread-2 will then set txnId to 0, and Thread-1 is 
> sending this 0 value to HMS. So, the txn will not be aborted, and the locks 
> cannot be released later on either.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-22420) DbTxnManager.stopHeartbeat() should be thread-safe

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-22420:
--
Labels: pull-request-available  (was: )

> DbTxnManager.stopHeartbeat() should be thread-safe
> --
>
> Key: HIVE-22420
> URL: https://issues.apache.org/jira/browse/HIVE-22420
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Aron Hamvas
>Assignee: Aron Hamvas
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 4.0.0-alpha-1
>
> Attachments: HIVE-22420.1.patch, HIVE-22420.2.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When a transactional query is being executed and interrupted via HS2 close 
> operation request, both the background pool thread executing the query and 
> the HttpHandler thread running the close operation logic will eventually call 
> the below method:
> {noformat}
> Driver.releaseLocksAndCommitOrRollback(commit boolean)
> {noformat}
> Since this method is invoked several times in both threads, it can happen 
> that the two threads invoke it at the same time, and due to a race condition, 
> the txnId field of the DbTxnManager used by both threads could be set to 0 
> without actually successfully aborting the transaction.
> The root cause is stopHeartbeat() method in DbTxnManager not being thread 
> safe:
> When Thread-1 and Thread-2 enter stopHeartbeat() with very little time 
> difference, Thread-1 might successfully cancel the heartbeat task and set the 
> heartbeatTask field to null, while Thread-2 is trying to observe its state. 
> Thread-1 will return to the calling rollbackTxn() method and continue 
> execution there, while Thread-2 wis thrown back to the same method with a 
> NullPointerException. Thread-2 will then set txnId to 0, and Thread-1 is 
> sending this 0 value to HMS. So, the txn will not be aborted, and the locks 
> cannot be released later on either.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (HIVE-26116) Fix handling of compaction requests originating from aborted dynamic partition queries in Initiator

2022-04-06 Thread Karen Coppage (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage resolved HIVE-26116.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Committed to master branch. Thanks for your contribution [~veghlaci05] !

> Fix handling of compaction requests originating from aborted dynamic 
> partition queries in Initiator
> ---
>
> Key: HIVE-26116
> URL: https://issues.apache.org/jira/browse/HIVE-26116
> Project: Hive
>  Issue Type: Bug
>Reporter: László Végh
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Compaction requests originated from an abort of a dynamic partition insert 
> can cause a NPE in Initiator.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-26116) Fix handling of compaction requests originating from aborted dynamic partition queries in Initiator

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26116?focusedWorklogId=753240&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753240
 ]

ASF GitHub Bot logged work on HIVE-26116:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 08:01
Start Date: 06/Apr/22 08:01
Worklog Time Spent: 10m 
  Work Description: klcopp merged PR #3177:
URL: https://github.com/apache/hive/pull/3177




Issue Time Tracking
---

Worklog Id: (was: 753240)
Time Spent: 1h  (was: 50m)

> Fix handling of compaction requests originating from aborted dynamic 
> partition queries in Initiator
> ---
>
> Key: HIVE-26116
> URL: https://issues.apache.org/jira/browse/HIVE-26116
> Project: Hive
>  Issue Type: Bug
>Reporter: László Végh
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Compaction requests originated from an abort of a dynamic partition insert 
> can cause a NPE in Initiator.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753230&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753230
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 07:26
Start Date: 06/Apr/22 07:26
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r843564071


##
ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java:
##
@@ -97,12 +100,22 @@ private void reparseAndSuperAnalyze(ASTNode tree) throws 
SemanticException {
 Table mTable = getTargetTable(tabName);
 validateTargetTable(mTable);
 
+// save the operation type into the query state
+SessionStateUtil.addResource(conf, 
Context.Operation.class.getSimpleName(), operation.name());
+
 StringBuilder rewrittenQueryStr = new StringBuilder();
 rewrittenQueryStr.append("insert into table ");
 rewrittenQueryStr.append(getFullTableNameForSQL(tabName));
 addPartitionColsToInsert(mTable.getPartCols(), rewrittenQueryStr);
 
-rewrittenQueryStr.append(" select ROW__ID");
+boolean nonNativeAcid = mTable.getStorageHandler() != null && 
mTable.getStorageHandler().supportsAcidOperations();

Review Comment:
   Maybe an util for this?
   I have seen this several times





Issue Time Tracking
---

Worklog Id: (was: 753230)
Time Spent: 5h 50m  (was: 5h 40m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753227&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753227
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 07:20
Start Date: 06/Apr/22 07:20
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r843558677


##
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##
@@ -7822,9 +7824,18 @@ protected Operator genFileSinkPlan(String dest, QB qb, 
Operator input)
 
 List vecCol = new ArrayList();
 
-if (updating(dest) || deleting(dest)) {
+boolean nonNativeAcid = Optional.ofNullable(destinationTable)
+.map(Table::getStorageHandler)
+.map(HiveStorageHandler::supportsAcidOperations)
+.orElse(false);
+boolean isUpdateDelete = updating(dest) || deleting(dest);
+if (!nonNativeAcid && isUpdateDelete) {

Review Comment:
   Is it a valid situation that: isUpdateDelete and we need to go to the `else`?
   If not it might be easier to read:
   ```
   if (updating(dest) || deleting(dest)) {
 if (nonNativeAcid) {
   ...
 } else {
   ...
   } else {
   ..
   }
   ```





Issue Time Tracking
---

Worklog Id: (was: 753227)
Time Spent: 5h 40m  (was: 5.5h)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HIVE-26008) Dynamic partition pruning not sending right partitions with subqueries

2022-04-06 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-26008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor reassigned HIVE-26008:
---

Assignee: László Bodor

> Dynamic partition pruning not sending right partitions with subqueries
> --
>
> Key: HIVE-26008
> URL: https://issues.apache.org/jira/browse/HIVE-26008
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: László Bodor
>Priority: Major
>  Labels: performance
> Attachments: Screenshot 2022-03-08 at 5.04.02 AM.png
>
>
> DPP isn't working fine when there are subqueries involved. Here is an example 
> query (q83). 
> Note that "date_dim" has another query involved. Due to this, DPP operator 
> ends up sending entire "date_dim" to the fact tables. 
> Because of this, data scanned for fact tables are way higher and query 
> runtime is increased.
> For context, on a very small cluster, this query ran for 265 seconds and with 
> the rewritten query it finished in 11 seconds!. Fact table scan was 10MB vs 
> 10 GB.
> {noformat}
> HiveJoin(condition=[=($2, $5)], joinType=[inner])
> HiveJoin(condition=[=($0, $3)], joinType=[inner])
>   HiveProject(cr_item_sk=[$1], cr_return_quantity=[$16], 
> cr_returned_date_sk=[$26])
> HiveFilter(condition=[AND(IS NOT NULL($26), IS NOT 
> NULL($1))])
>   HiveTableScan(table=[[tpcds_bin_partitioned_orc_1, 
> catalog_returns]], table:alias=[catalog_returns])
>   HiveProject(i_item_sk=[$0], i_item_id=[$1])
> HiveFilter(condition=[AND(IS NOT NULL($1), IS NOT 
> NULL($0))])
>   HiveTableScan(table=[[tpcds_bin_partitioned_orc_1, 
> item]], table:alias=[item])
> HiveProject(d_date_sk=[$0], d_date=[$2])
>   HiveFilter(condition=[AND(IS NOT NULL($2), IS NOT 
> NULL($0))])
> HiveTableScan(table=[[tpcds_bin_partitioned_orc_1, 
> date_dim]], table:alias=[date_dim])
>   HiveProject(d_date=[$0])
> HiveSemiJoin(condition=[=($1, $2)], joinType=[semi])
>   HiveProject(d_date=[$2], d_week_seq=[$4])
> HiveFilter(condition=[AND(IS NOT NULL($4), IS NOT 
> NULL($2))])
>   HiveTableScan(table=[[tpcds_bin_partitioned_orc_1, 
> date_dim]], table:alias=[date_dim])
>   HiveProject(d_week_seq=[$4])
> HiveFilter(condition=[AND(IN($2, 1998-01-02:DATE, 
> 1998-10-15:DATE, 1998-11-10:DATE), IS NOT NULL($4))])
>   HiveTableScan(table=[[tpcds_bin_partitioned_orc_1, 
> date_dim]], table:alias=[date_dim])
> {noformat}
> *Original Query & Plan: *
> {noformat}
> explain cbo with sr_items as
> (select i_item_id item_id,
> sum(sr_return_quantity) sr_item_qty
> from store_returns,
> item,
> date_dim
> where sr_item_sk = i_item_sk
> and   d_datein
> (select d_date
> from date_dim
> where d_week_seq in
> (select d_week_seq
> from date_dim
> where d_date in ('1998-01-02','1998-10-15','1998-11-10')))
> and   sr_returned_date_sk   = d_date_sk
> group by i_item_id),
> cr_items as
> (select i_item_id item_id,
> sum(cr_return_quantity) cr_item_qty
> from catalog_returns,
> item,
> date_dim
> where cr_item_sk = i_item_sk
> and   d_datein
> (select d_date
> from date_dim
> where d_week_seq in
> (select d_week_seq
> from date_dim
> where d_date in ('1998-01-02','1998-10-15','1998-11-10')))
> and   cr_returned_date_sk   = d_date_sk
> group by i_item_id),
> wr_items as
> (select i_item_id item_id,
> sum(wr_return_quantity) wr_item_qty
> from web_returns,
> item,
> date_dim
> where wr_item_sk = i_item_sk
> and   d_datein
> (select d_date
> from date_dim
> where d_week_seq in
> (select d_week_seq
> from date_dim
> where d_date in ('1998-01-02','1998-10-15','1998-11-10')))
> and   wr_returned_date_sk   = d_date_sk
> group by i_item_id)
> select  sr_items.item_id
> ,sr_item_qty
> ,sr_item_qty/(sr_item_qty+cr_item_qty+wr_item_qty)/3.0 * 100 sr_dev
> ,cr_item_qty
> ,cr_item_qty/(sr_item_qty+cr_item_qty+wr_item_qty)/3.0 * 100 cr_dev
> ,wr_item_qty
> ,wr_item_qty/(sr_item_qty+cr_item_qty+wr_item_qty)/3.0 * 100 wr_dev
> ,(sr_item_qty+cr_item_qty+wr_item_qty)/3.0 average
> from sr_items
> ,cr_items
> ,wr_items
> where sr_items.item_id=cr_items.item_id
> and sr_items.item_id=wr_items.item_id
> order by sr_items.item_id
> ,sr_item_qty
> limit 100
> INFO  : Starting task [Stage-3:EXPLAIN] in serial mode
> INFO  : Completed executing 
> command(queryId=hive_20220307055109_88ad0cbd-bd40-45bc-92ae-ab15fa6b1da4); 
> Time taken: 0.973 seconds
> INFO  : OK
> Explain
> CBO PLAN:
> HiveSortLimit(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC], fetch=

[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753225&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753225
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 07:15
Start Date: 06/Apr/22 07:15
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r843554296


##
ql/src/java/org/apache/hadoop/hive/ql/metadata/VirtualColumn.java:
##
@@ -50,10 +50,14 @@
 
   RAWDATASIZE("RAW__DATA__SIZE", TypeInfoFactory.longTypeInfo),
   /**
-   * {@link org.apache.hadoop.hive.ql.io.RecordIdentifier} 
+   * {@link org.apache.hadoop.hive.ql.io.RecordIdentifier}
*/
   ROWID("ROW__ID", RecordIdentifier.StructInfo.typeInfo, true, 
RecordIdentifier.StructInfo.oi),
   ROWISDELETED("ROW__IS__DELETED", TypeInfoFactory.booleanTypeInfo),
+  PARTITION_SPEC_ID("PARTITION__SPEC__ID", TypeInfoFactory.intTypeInfo),
+  PARTITION_HASH("PARTITION__HASH", TypeInfoFactory.longTypeInfo),
+  FILE_PATH("FILE__PATH", TypeInfoFactory.stringTypeInfo),
+  ROW_POSITION("ROW__POSITION", TypeInfoFactory.longTypeInfo),

Review Comment:
   How is this handled inside the `ROW__ID`?





Issue Time Tracking
---

Worklog Id: (was: 753225)
Time Spent: 5.5h  (was: 5h 20m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753224&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753224
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 07:14
Start Date: 06/Apr/22 07:14
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r843553566


##
ql/src/java/org/apache/hadoop/hive/ql/metadata/VirtualColumn.java:
##
@@ -50,10 +50,14 @@
 
   RAWDATASIZE("RAW__DATA__SIZE", TypeInfoFactory.longTypeInfo),
   /**
-   * {@link org.apache.hadoop.hive.ql.io.RecordIdentifier} 
+   * {@link org.apache.hadoop.hive.ql.io.RecordIdentifier}
*/
   ROWID("ROW__ID", RecordIdentifier.StructInfo.typeInfo, true, 
RecordIdentifier.StructInfo.oi),
   ROWISDELETED("ROW__IS__DELETED", TypeInfoFactory.booleanTypeInfo),
+  PARTITION_SPEC_ID("PARTITION__SPEC__ID", TypeInfoFactory.intTypeInfo),
+  PARTITION_HASH("PARTITION__HASH", TypeInfoFactory.longTypeInfo),
+  FILE_PATH("FILE__PATH", TypeInfoFactory.stringTypeInfo),

Review Comment:
   Isn't this the same as the `INPUT__FILE__NAME` in the delete case?





Issue Time Tracking
---

Worklog Id: (was: 753224)
Time Spent: 5h 20m  (was: 5h 10m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753223&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753223
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 07:13
Start Date: 06/Apr/22 07:13
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r843552057


##
iceberg/iceberg-handler/src/test/queries/positive/delete_iceberg_partitioned_avro.q:
##
@@ -0,0 +1,26 @@
+set hive.vectorized.execution.enabled=false;
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
+
+drop table if exists tbl_ice;
+create external table tbl_ice(a int, b string, c int) partitioned by spec 
(bucket(16, a), truncate(3, b)) stored by iceberg stored as avro tblproperties 
('format-version'='2');
+
+

Issue Time Tracking
---

Worklog Id: (was: 753223)
Time Spent: 5h 10m  (was: 5h)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753221&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753221
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 07:10
Start Date: 06/Apr/22 07:10
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r843549581


##
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergV2.java:
##
@@ -228,6 +230,104 @@ public void 
testReadAndWriteFormatV2Partitioned_PosDelete_RowSupplied() throws I
 Assert.assertArrayEquals(new Object[] {2L, "Trudy", "Pink"}, 
objects.get(3));
   }
 
+  @Test
+  public void testDeleteStatementUnpartitioned() {
+Assume.assumeFalse("Iceberg DELETEs are only implemented for 
non-vectorized mode for now", isVectorized);
+
+// create and insert an initial batch of records
+testTables.createTable(shell, "customers", 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+PartitionSpec.unpartitioned(), fileFormat, 
HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_2, 2);
+// insert one more batch so that we have multiple data files within the 
same partition
+
shell.executeStatement(testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_1,
+TableIdentifier.of("default", "customers"), false));
+
+shell.executeStatement("DELETE FROM customers WHERE customer_id=3 or 
first_name='Joanna'");
+
+List objects = shell.executeStatement("SELECT * FROM customers 
ORDER BY customer_id, last_name");
+Assert.assertEquals(6, objects.size());
+List expected = 
TestHelper.RecordsBuilder.newInstance(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA)
+.add(1L, "Sharon", "Taylor")
+.add(2L, "Jake", "Donnel")
+.add(2L, "Susan", "Morrison")
+.add(2L, "Bob", "Silver")
+.add(4L, "Laci", "Zold")
+.add(5L, "Peti", "Rozsaszin")
+.build();
+HiveIcebergTestUtils.validateData(expected,
+
HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
 objects), 0);
+  }
+
+  @Test
+  public void testDeleteStatementPartitioned() {
+Assume.assumeFalse("Iceberg DELETEs are only implemented for 
non-vectorized mode for now", isVectorized);
+PartitionSpec spec = 
PartitionSpec.builderFor(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA)
+.identity("last_name").bucket("customer_id", 16).build();
+
+// create and insert an initial batch of records
+testTables.createTable(shell, "customers", 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+spec, fileFormat, 
HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_2, 2);
+// insert one more batch so that we have multiple data files within the 
same partition
+
shell.executeStatement(testTables.getInsertQuery(HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_1,
+TableIdentifier.of("default", "customers"), false));
+
+shell.executeStatement("DELETE FROM customers WHERE customer_id=3 or 
first_name='Joanna'");
+
+List objects = shell.executeStatement("SELECT * FROM customers 
ORDER BY customer_id, last_name");
+Assert.assertEquals(6, objects.size());
+List expected = 
TestHelper.RecordsBuilder.newInstance(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA)
+.add(1L, "Sharon", "Taylor")
+.add(2L, "Jake", "Donnel")
+.add(2L, "Susan", "Morrison")
+.add(2L, "Bob", "Silver")
+.add(4L, "Laci", "Zold")
+.add(5L, "Peti", "Rozsaszin")
+.build();
+HiveIcebergTestUtils.validateData(expected,
+
HiveIcebergTestUtils.valueForRow(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
 objects), 0);
+  }
+
+  @Test
+  public void testDeleteStatementWithOtherTable() {
+Assume.assumeFalse("Iceberg DELETEs are only implemented for 
non-vectorized mode for now", isVectorized);
+PartitionSpec spec = 
PartitionSpec.builderFor(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA)
+.identity("last_name").bucket("customer_id", 16).build();
+
+// create a couple of tables, with an initial batch of records
+testTables.createTable(shell, "customers", 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+spec, fileFormat, 
HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_2, 2);
+testTables.createTable(shell, "other", 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
+spec, fileFormat, 
HiveIcebergStorageHandlerTestUtils.OTHER_CUSTOMER_RECORDS_1, 2);
+
+shell.executeStatement("DELETE FROM customers WHERE customer_id in (select 
t1.customer_id from customers t1 join " +
+"other t2 on t1.customer_id = t2.customer_id) or " +
+"first_name in (select first_name from cus

[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=753222&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753222
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 07:10
Start Date: 06/Apr/22 07:10
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r843549998


##
iceberg/iceberg-handler/src/test/queries/negative/delete_iceberg_vectorized.q:
##
@@ -0,0 +1,10 @@
+set hive.vectorized.execution.enabled=true;
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

Review Comment:
   Why do we set these?





Issue Time Tracking
---

Worklog Id: (was: 753222)
Time Spent: 5h  (was: 4h 50m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

72 matches

Mail list logo