[jira] [Updated] (HIVE-26119) Remove unnecessary Exceptions from DDLPlanUtils

2022-04-05 Thread Soumyakanti Das (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Soumyakanti Das updated HIVE-26119:
---
Description: There are a few {{HiveExceptions}} which were added to a few 
methods like {{getCreateTableCommand}}, {{getColumns}}, {{formatType}}, etc, 
which can be removed. Some methods in {{ExplainTask}} can also be cleaned up 
which are related.  (was: There are a few {{HiveExceptions}} which were added 
to a few methods like {{{}getCreateTableCommand{}}}, \{getColumns}}, 
{{{}formatType{}}}, etc, which can be removed. Some methods in \{{ExplainTask} 
can also be cleaned up which are related.)

> Remove unnecessary Exceptions from DDLPlanUtils
> ---
>
> Key: HIVE-26119
> URL: https://issues.apache.org/jira/browse/HIVE-26119
> Project: Hive
>  Issue Type: Improvement
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Trivial
>
> There are a few {{HiveExceptions}} which were added to a few methods like 
> {{getCreateTableCommand}}, {{getColumns}}, {{formatType}}, etc, which can be 
> removed. Some methods in {{ExplainTask}} can also be cleaned up which are 
> related.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-26119) Remove unnecessary Exceptions from DDLPlanUtils

2022-04-05 Thread Soumyakanti Das (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Soumyakanti Das reassigned HIVE-26119:
--


> Remove unnecessary Exceptions from DDLPlanUtils
> ---
>
> Key: HIVE-26119
> URL: https://issues.apache.org/jira/browse/HIVE-26119
> Project: Hive
>  Issue Type: Improvement
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Trivial
>
> There are a few {{HiveExceptions}} which were added to a few methods like 
> {{{}getCreateTableCommand{}}}, \{getColumns}}, {{{}formatType{}}}, etc, which 
> can be removed. Some methods in \{{ExplainTask} can also be cleaned up which 
> are related.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-25728) ParseException while gathering Column Stats

2022-04-05 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-25728.
---
Resolution: Fixed

Pushed to master. Thanks [~soumyakanti.das] for the fix.

> ParseException while gathering Column Stats
> ---
>
> Key: HIVE-25728
> URL: https://issues.apache.org/jira/browse/HIVE-25728
> Project: Hive
>  Issue Type: Bug
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The {{columnName}} is escaped twice in {{ColumnStatsSemanticAnalyzer}} at 
> [line 
> 261|https://github.com/apache/hive/blob/934faa73c56920fa19f86da53b5daa5bf7c98ef4/ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java#L261],
>  which can cause ParseException. Potential solution is to simply not escape 
> it second time.
> This can be reproduced as following:
> {code:java}
> CREATE TABLE table1(
>    t1_col1 bigint);
>  CREATE TABLE table2(
>    t2_col1 bigint,
>    t2_col2 int)
>  PARTITIONED BY (
>    t2_col3 date);
> insert into table1 values(1);
> insert into table2 values("1","1","1");
> --set hive.stats.autogather=false;
> set hive.support.quoted.identifiers=none;
> create external table ext_table STORED AS ORC 
> tblproperties('compression'='snappy','external.table.purge'='true') as
> SELECT a.* ,d.`(t2_col1|t2_col3)?+.+`
> FROM table1 a
> LEFT JOIN (SELECT * FROM table2 where t2_col3 like '2021-01-%') d
> on a.t1_col1 = d.t2_col1;{code}
> and it fails with the following stack trace:
> {noformat}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.parse.SemanticException: 
> org.apache.hadoop.hive.ql.parse.ParseException: line 1:772 rule Identifier 
> failed predicate: {allowQuotedId() != Quotation.NONE}?
> line 1:778 rule Identifier failed predicate: {allowQuotedId() != 
> Quotation.NONE}?
> line 1:782 rule Identifier failed predicate: {allowQuotedId() != 
> Quotation.NONE}?
> line 1:807 character '' not supported here
> at 
> org.apache.hadoop.hive.ql.parse.ColumnStatsAutoGatherContext.insertAnalyzePipeline(ColumnStatsAutoGatherContext.java:144)
> at 
> org.apache.hadoop.hive.ql.parse.ColumnStatsAutoGatherContext.insertTableValuesAnalyzePipeline(ColumnStatsAutoGatherContext.java:135)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAutoColumnStatsGatheringPipeline(SemanticAnalyzer.java:8380)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:7915)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11064)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10939)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11854)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11724)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:625)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12557)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:455)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:317)
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223)
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:500)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:453)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:417)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:411)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353)
> at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:783)
> at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:753)
> at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:142)

[jira] [Work logged] (HIVE-25728) ParseException while gathering Column Stats

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25728?focusedWorklogId=753185=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753185
 ]

ASF GitHub Bot logged work on HIVE-25728:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 04:07
Start Date: 06/Apr/22 04:07
Worklog Time Spent: 10m 
  Work Description: kasakrisz merged PR #3169:
URL: https://github.com/apache/hive/pull/3169




Issue Time Tracking
---

Worklog Id: (was: 753185)
Time Spent: 1h  (was: 50m)

> ParseException while gathering Column Stats
> ---
>
> Key: HIVE-25728
> URL: https://issues.apache.org/jira/browse/HIVE-25728
> Project: Hive
>  Issue Type: Bug
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The {{columnName}} is escaped twice in {{ColumnStatsSemanticAnalyzer}} at 
> [line 
> 261|https://github.com/apache/hive/blob/934faa73c56920fa19f86da53b5daa5bf7c98ef4/ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java#L261],
>  which can cause ParseException. Potential solution is to simply not escape 
> it second time.
> This can be reproduced as following:
> {code:java}
> CREATE TABLE table1(
>    t1_col1 bigint);
>  CREATE TABLE table2(
>    t2_col1 bigint,
>    t2_col2 int)
>  PARTITIONED BY (
>    t2_col3 date);
> insert into table1 values(1);
> insert into table2 values("1","1","1");
> --set hive.stats.autogather=false;
> set hive.support.quoted.identifiers=none;
> create external table ext_table STORED AS ORC 
> tblproperties('compression'='snappy','external.table.purge'='true') as
> SELECT a.* ,d.`(t2_col1|t2_col3)?+.+`
> FROM table1 a
> LEFT JOIN (SELECT * FROM table2 where t2_col3 like '2021-01-%') d
> on a.t1_col1 = d.t2_col1;{code}
> and it fails with the following stack trace:
> {noformat}
> See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, 
> or check ./ql/target/surefire-reports or 
> ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.parse.SemanticException: 
> org.apache.hadoop.hive.ql.parse.ParseException: line 1:772 rule Identifier 
> failed predicate: {allowQuotedId() != Quotation.NONE}?
> line 1:778 rule Identifier failed predicate: {allowQuotedId() != 
> Quotation.NONE}?
> line 1:782 rule Identifier failed predicate: {allowQuotedId() != 
> Quotation.NONE}?
> line 1:807 character '' not supported here
> at 
> org.apache.hadoop.hive.ql.parse.ColumnStatsAutoGatherContext.insertAnalyzePipeline(ColumnStatsAutoGatherContext.java:144)
> at 
> org.apache.hadoop.hive.ql.parse.ColumnStatsAutoGatherContext.insertTableValuesAnalyzePipeline(ColumnStatsAutoGatherContext.java:135)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAutoColumnStatsGatheringPipeline(SemanticAnalyzer.java:8380)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:7915)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11064)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10939)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11854)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11724)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:625)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12557)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:455)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:317)
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223)
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:500)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:453)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:417)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:411)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)
> at 

[jira] [Work logged] (HIVE-26118) [Standalone Beeline] Jar name mismatch between build and assembly

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26118?focusedWorklogId=753177=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753177
 ]

ASF GitHub Bot logged work on HIVE-26118:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 02:28
Start Date: 06/Apr/22 02:28
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on PR #3180:
URL: https://github.com/apache/hive/pull/3180#issuecomment-1089685882

   lgtm, +1




Issue Time Tracking
---

Worklog Id: (was: 753177)
Time Spent: 20m  (was: 10m)

> [Standalone Beeline] Jar name mismatch between build and assembly
> -
>
> Key: HIVE-26118
> URL: https://issues.apache.org/jira/browse/HIVE-26118
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Affects Versions: 3.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Fix from HIVE-25750 has an issue where the beeline builds a jar named 
> "jar-with-dependencies.jar" but the assembly looks for a jar name 
> "original-jar-with-dependencies.jar". Thus this uber jar never gets included 
> in the distribution.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26111) FULL JOIN returns incorrect result with Tez engine

2022-04-05 Thread Youjun Yuan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Youjun Yuan updated HIVE-26111:
---
Description: 
we hit a query which FULL JOINs two tables, hive produces incorrect results, 
for a single value of join key, it produces two records, each record has a 
valid value for one table and NULL for the other table.

The query is:
{code:java}
SET mapreduce.job.reduces=2;
SELECT d.id, u.id
FROM (
       SELECT id
       FROM   airflow.tableA rud
       WHERE  rud.dt = '2022-04-02-1row'
) d
FULL JOIN (
       SELECT id
       FROM   default.tableB
       WHERE  dt = '2022-04-01' and device_token='blabla'
 ) u
ON u.id = d.id
; {code}
According to the job log, the two reducers each get an input record, and output 
a record.

And produces two records for id=350570497
{code:java}
350570497    NULL
NULL    350570497
Time taken: 62.692 seconds, Fetched: 2 row(s) {code}
I am sure tableB has only one row where device_token='blabla'

And we tried:

1, SET mapreduce.job.reduces=1; then it produces right result;

2, SET hive.execution.engine=mr; then it produces right result;

3, JOIN (instead of FULL JOIN) worked as expected

4, in sub query u, change filter device_token='blabla' to id=350570497, it 
worked ok

Below is the explain output of the query:
{code:java}
Plan optimized by CBO.Vertex dependency in root stage
Reducer 3 <- Map 1 (CUSTOM_SIMPLE_EDGE), Map 2 (CUSTOM_SIMPLE_EDGE)Stage-0
  Fetch Operator
    limit:-1
    Stage-1
      Reducer 3
      File Output Operator [FS_10]
        Map Join Operator [MAPJOIN_13] (rows=2 width=8)
          
Conds:RS_6.KEY.reducesinkkey0=RS_7.KEY.reducesinkkey0(Outer),DynamicPartitionHashJoin:true,Output:["_col0","_col1"]
        <-Map 1 [CUSTOM_SIMPLE_EDGE]
          PARTITION_ONLY_SHUFFLE [RS_6]
            PartitionCols:_col0
            Select Operator [SEL_2] (rows=1 width=4)
              Output:["_col0"]
              TableScan [TS_0] (rows=1 width=4)
                
airflow@rds_users_delta,rud,Tbl:COMPLETE,Col:COMPLETE,Output:["id"]
        <-Map 2 [CUSTOM_SIMPLE_EDGE]
          PARTITION_ONLY_SHUFFLE [RS_7]
            PartitionCols:_col0
            Select Operator [SEL_5] (rows=1 width=4)
              Output:["_col0"]
              Filter Operator [FIL_12] (rows=1 width=110)
                predicate:(device_token = 'blabla')
                TableScan [TS_3] (rows=215192362 width=109)
                  
default@users,users,Tbl:COMPLETE,Col:COMPLETE,Output:["id","device_token"]  
{code}
I can't generate a small enough result set to reproduce the issue, I have 
minimized the tableA to only 1 row, tableB has ~200m rows, but if I further 
reduce the size of tableB, then the issue can't be reproduced.

Any suggestion would be highly appreciated, regarding the root cause of the 
issue, how to work around it, or how to reproduce it with small enough dataset. 

  was:
we hit a query which FULL JOINs two tables, hive produces incorrect results, 
for a single value of join key, it produces two records, each record has a 
valid value for one table and NULL for the other table.

The query is:
{code:java}
SELECT d.id, u.id
FROM (
       SELECT id
       FROM   airflow.tableA rud
       WHERE  rud.dt = '2022-04-02-1row'
) d
FULL JOIN (
       SELECT id
       FROM   default.tableB
       WHERE  dt = '2022-04-01' and device_token='blabla'
 ) u
ON u.id = d.id
; {code}
And produces two records for id=350570497
{code:java}
350570497    NULL
NULL    350570497
Time taken: 62.692 seconds, Fetched: 2 row(s) {code}
I am sure tableB has only one row where device_token='blabla'

And we tried:

1, SET mapreduce.job.reduces=1; then it produces right result;

2, SET hive.execution.engine=mr; then it produces right result;

3, JOIN (instead of FULL JOIN) worked as expected

4, in sub query u, change filter device_token='blabla' to id=350570497, it 
worked ok

Below is the explain output of the query:
{code:java}
Plan optimized by CBO.Vertex dependency in root stage
Reducer 3 <- Map 1 (CUSTOM_SIMPLE_EDGE), Map 2 (CUSTOM_SIMPLE_EDGE)Stage-0
  Fetch Operator
    limit:-1
    Stage-1
      Reducer 3
      File Output Operator [FS_10]
        Map Join Operator [MAPJOIN_13] (rows=2 width=8)
          
Conds:RS_6.KEY.reducesinkkey0=RS_7.KEY.reducesinkkey0(Outer),DynamicPartitionHashJoin:true,Output:["_col0","_col1"]
        <-Map 1 [CUSTOM_SIMPLE_EDGE]
          PARTITION_ONLY_SHUFFLE [RS_6]
            PartitionCols:_col0
            Select Operator [SEL_2] (rows=1 width=4)
              Output:["_col0"]
              TableScan [TS_0] (rows=1 width=4)
                
airflow@rds_users_delta,rud,Tbl:COMPLETE,Col:COMPLETE,Output:["id"]
        <-Map 2 [CUSTOM_SIMPLE_EDGE]
          PARTITION_ONLY_SHUFFLE [RS_7]
            PartitionCols:_col0
            Select Operator [SEL_5] (rows=1 width=4)
              Output:["_col0"]
              Filter Operator [FIL_12] (rows=1 width=110)
                

[jira] [Work logged] (HIVE-26118) [Standalone Beeline] Jar name mismatch between build and assembly

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26118?focusedWorklogId=753168=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753168
 ]

ASF GitHub Bot logged work on HIVE-26118:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 01:36
Start Date: 06/Apr/22 01:36
Worklog Time Spent: 10m 
  Work Description: nrg4878 opened a new pull request, #3180:
URL: https://github.com/apache/hive/pull/3180

   ### What changes were proposed in this pull request?
   The jar name the beeline standalone assembly build is looking for, 
original-jar-with-dependencies.jar, does not match the name of the jar being 
built by the beeline build, jar-with-dependencies.jar. So the uber jar is not 
being bundled into the final apache-hive-beeline.xxx.tar.gz
   
   ### Why are the changes needed?
   To include the uber jar into the beeline assembly.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Manually




Issue Time Tracking
---

Worklog Id: (was: 753168)
Remaining Estimate: 0h
Time Spent: 10m

> [Standalone Beeline] Jar name mismatch between build and assembly
> -
>
> Key: HIVE-26118
> URL: https://issues.apache.org/jira/browse/HIVE-26118
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Affects Versions: 3.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Fix from HIVE-25750 has an issue where the beeline builds a jar named 
> "jar-with-dependencies.jar" but the assembly looks for a jar name 
> "original-jar-with-dependencies.jar". Thus this uber jar never gets included 
> in the distribution.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26118) [Standalone Beeline] Jar name mismatch between build and assembly

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26118:
--
Labels: pull-request-available  (was: )

> [Standalone Beeline] Jar name mismatch between build and assembly
> -
>
> Key: HIVE-26118
> URL: https://issues.apache.org/jira/browse/HIVE-26118
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Affects Versions: 3.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Fix from HIVE-25750 has an issue where the beeline builds a jar named 
> "jar-with-dependencies.jar" but the assembly looks for a jar name 
> "original-jar-with-dependencies.jar". Thus this uber jar never gets included 
> in the distribution.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-23644) Fix FindBug issues in hive-jdbc

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23644?focusedWorklogId=753153=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753153
 ]

ASF GitHub Bot logged work on HIVE-23644:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 00:17
Start Date: 06/Apr/22 00:17
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #2960: 
HIVE-23644: Fix FindBug issues in hive-jdbc
URL: https://github.com/apache/hive/pull/2960




Issue Time Tracking
---

Worklog Id: (was: 753153)
Time Spent: 0.5h  (was: 20m)

> Fix FindBug issues in hive-jdbc
> ---
>
> Key: HIVE-23644
> URL: https://issues.apache.org/jira/browse/HIVE-23644
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
> Attachments: spotbugsXml.xml
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-23650) Fix FindBug issues in hive-streaming

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23650?focusedWorklogId=753151=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753151
 ]

ASF GitHub Bot logged work on HIVE-23650:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 00:17
Start Date: 06/Apr/22 00:17
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #2983: 
HIVE-23650: Fix FindBug issues in hive-streaming
URL: https://github.com/apache/hive/pull/2983




Issue Time Tracking
---

Worklog Id: (was: 753151)
Time Spent: 0.5h  (was: 20m)

> Fix FindBug issues in hive-streaming
> 
>
> Key: HIVE-23650
> URL: https://issues.apache.org/jira/browse/HIVE-23650
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: spotbugsXml.xml
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-23635) Fix FindBug issues in hive-vector-code-gen

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23635?focusedWorklogId=753152=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753152
 ]

ASF GitHub Bot logged work on HIVE-23635:
-

Author: ASF GitHub Bot
Created on: 06/Apr/22 00:17
Start Date: 06/Apr/22 00:17
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #2982: 
HIVE-23635: Fix FindBug issues in hive-vector-code-gen
URL: https://github.com/apache/hive/pull/2982




Issue Time Tracking
---

Worklog Id: (was: 753152)
Time Spent: 0.5h  (was: 20m)

> Fix FindBug issues in hive-vector-code-gen
> --
>
> Key: HIVE-23635
> URL: https://issues.apache.org/jira/browse/HIVE-23635
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: spotbugsXml.xml
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-26118) [Standalone Beeline] Jar name mismatch between build and assembly

2022-04-05 Thread Naveen Gangam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam reassigned HIVE-26118:



> [Standalone Beeline] Jar name mismatch between build and assembly
> -
>
> Key: HIVE-26118
> URL: https://issues.apache.org/jira/browse/HIVE-26118
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Affects Versions: 3.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
>
> Fix from HIVE-25750 has an issue where the beeline builds a jar named 
> "jar-with-dependencies.jar" but the assembly looks for a jar name 
> "original-jar-with-dependencies.jar". Thus this uber jar never gets included 
> in the distribution.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26117) Remove 2 superfluous lines of code in genJoinRelNode

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26117?focusedWorklogId=753038=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753038
 ]

ASF GitHub Bot logged work on HIVE-26117:
-

Author: ASF GitHub Bot
Created on: 05/Apr/22 18:23
Start Date: 05/Apr/22 18:23
Worklog Time Spent: 10m 
  Work Description: scarlin-cloudera opened a new pull request, #3179:
URL: https://github.com/apache/hive/pull/3179

   
   
   ### What changes were proposed in this pull request?
   Removed 2 lines of code that had no effect on compilation
   
   
   
   ### Why are the changes needed?
   Dead code removed
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   
   ### How was this patch tested?
   Run through unit tests
   
   




Issue Time Tracking
---

Worklog Id: (was: 753038)
Remaining Estimate: 0h
Time Spent: 10m

> Remove 2 superfluous lines of code in genJoinRelNode
> 
>
> Key: HIVE-26117
> URL: https://issues.apache.org/jira/browse/HIVE-26117
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Steve Carlin
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The code was rewritten to associate ASTNodes to RexNodes.  Some code was left 
> behind that doesn't add any value.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26117) Remove 2 superfluous lines of code in genJoinRelNode

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26117:
--
Labels: pull-request-available  (was: )

> Remove 2 superfluous lines of code in genJoinRelNode
> 
>
> Key: HIVE-26117
> URL: https://issues.apache.org/jira/browse/HIVE-26117
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Steve Carlin
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The code was rewritten to associate ASTNodes to RexNodes.  Some code was left 
> behind that doesn't add any value.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26116) Fix handling of compaction requests originating from aborted dynamic partition queries in Initiator

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26116?focusedWorklogId=752971=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752971
 ]

ASF GitHub Bot logged work on HIVE-26116:
-

Author: ASF GitHub Bot
Created on: 05/Apr/22 16:03
Start Date: 05/Apr/22 16:03
Worklog Time Spent: 10m 
  Work Description: klcopp commented on code in PR #3177:
URL: https://github.com/apache/hive/pull/3177#discussion_r842986110


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java:
##
@@ -320,10 +320,19 @@ private boolean 
foundCurrentOrFailedCompactions(ShowCompactResponse compactions,
 if (compactions.getCompacts() == null) {
   return false;
 }
+
+//In case of an aborted Dynamic partition insert, the created entry in the 
compaction queue does not contain
+//a partition name even for partitioned tables. As a result it can happen 
that the ShowCompactResponse contains
+//an element without partition name for partitioned tables. Therefore, it 
is necessary to check the partition name of
+//the ShowCompactResponseElement even if the CompactionInfo.partName is 
not null. These special compaction requests
+//are skipped by the worker, and only cleaner will pick them up, so in 
this special case it is OK to schedule a
+//compaction for the particular table and partition even if the dynamic 
partition abort originated compaction request
+//is still in WORKING or INITIATED state.

Review Comment:
   It's also okay because the DP abort compaction doesn't specify a partition, 
so it can't count as a compaction on the partition we are currently 
investigating.





Issue Time Tracking
---

Worklog Id: (was: 752971)
Time Spent: 50m  (was: 40m)

> Fix handling of compaction requests originating from aborted dynamic 
> partition queries in Initiator
> ---
>
> Key: HIVE-26116
> URL: https://issues.apache.org/jira/browse/HIVE-26116
> Project: Hive
>  Issue Type: Bug
>Reporter: László Végh
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Compaction requests originated from an abort of a dynamic partition insert 
> can cause a NPE in Initiator.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26116) Fix handling of compaction requests originating from aborted dynamic partition queries in Initiator

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26116?focusedWorklogId=752967=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752967
 ]

ASF GitHub Bot logged work on HIVE-26116:
-

Author: ASF GitHub Bot
Created on: 05/Apr/22 16:00
Start Date: 05/Apr/22 16:00
Worklog Time Spent: 10m 
  Work Description: klcopp commented on code in PR #3177:
URL: https://github.com/apache/hive/pull/3177#discussion_r842981037


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java:
##
@@ -320,10 +320,19 @@ private boolean 
foundCurrentOrFailedCompactions(ShowCompactResponse compactions,
 if (compactions.getCompacts() == null) {
   return false;
 }
+
+//In case of an aborted Dynamic partition insert, the created entry in the 
compaction queue does not contain
+//a partition name even for partitioned tables. As a result it can happen 
that the ShowCompactResponse contains
+//an element without partition name for partitioned tables. Therefore, it 
is necessary to check the partition name of

Review Comment:
   I have to say I got confused here. For people like me this would be better:
   > Therefore, it is necessary to check
   we could have:
   > Therefore, it is necessary to null check



##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java:
##
@@ -320,10 +320,19 @@ private boolean 
foundCurrentOrFailedCompactions(ShowCompactResponse compactions,
 if (compactions.getCompacts() == null) {
   return false;
 }
+
+//In case of an aborted Dynamic partition insert, the created entry in the 
compaction queue does not contain
+//a partition name even for partitioned tables. As a result it can happen 
that the ShowCompactResponse contains
+//an element without partition name for partitioned tables. Therefore, it 
is necessary to check the partition name of

Review Comment:
   I have to say I got confused here. For people like me this would be better:
   > Therefore, it is necessary to check
   
   we could have:
   > Therefore, it is necessary to null check





Issue Time Tracking
---

Worklog Id: (was: 752967)
Time Spent: 0.5h  (was: 20m)

> Fix handling of compaction requests originating from aborted dynamic 
> partition queries in Initiator
> ---
>
> Key: HIVE-26116
> URL: https://issues.apache.org/jira/browse/HIVE-26116
> Project: Hive
>  Issue Type: Bug
>Reporter: László Végh
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Compaction requests originated from an abort of a dynamic partition insert 
> can cause a NPE in Initiator.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26116) Fix handling of compaction requests originating from aborted dynamic partition queries in Initiator

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26116?focusedWorklogId=752968=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752968
 ]

ASF GitHub Bot logged work on HIVE-26116:
-

Author: ASF GitHub Bot
Created on: 05/Apr/22 16:00
Start Date: 05/Apr/22 16:00
Worklog Time Spent: 10m 
  Work Description: klcopp commented on code in PR #3177:
URL: https://github.com/apache/hive/pull/3177#discussion_r842981037


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java:
##
@@ -320,10 +320,19 @@ private boolean 
foundCurrentOrFailedCompactions(ShowCompactResponse compactions,
 if (compactions.getCompacts() == null) {
   return false;
 }
+
+//In case of an aborted Dynamic partition insert, the created entry in the 
compaction queue does not contain
+//a partition name even for partitioned tables. As a result it can happen 
that the ShowCompactResponse contains
+//an element without partition name for partitioned tables. Therefore, it 
is necessary to check the partition name of

Review Comment:
   I have to say I got confused here. For people like me, instead of:
   > Therefore, it is necessary to check
   
   this would be better:
   > Therefore, it is necessary to null check





Issue Time Tracking
---

Worklog Id: (was: 752968)
Time Spent: 40m  (was: 0.5h)

> Fix handling of compaction requests originating from aborted dynamic 
> partition queries in Initiator
> ---
>
> Key: HIVE-26116
> URL: https://issues.apache.org/jira/browse/HIVE-26116
> Project: Hive
>  Issue Type: Bug
>Reporter: László Végh
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Compaction requests originated from an abort of a dynamic partition insert 
> can cause a NPE in Initiator.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-26110) Bulk insert into partitioned table creates lots of files in iceberg

2022-04-05 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-26110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita resolved HIVE-26110.
---
Fix Version/s: 4.0.0
 Assignee: Ádám Szita
   Resolution: Fixed

Committed to master. Thanks for the review [~rajesh.balamohan] [~pvary] [~mbod] 

> Bulk insert into partitioned table creates lots of files in iceberg
> ---
>
> Key: HIVE-26110
> URL: https://issues.apache.org/jira/browse/HIVE-26110
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> For e.g, create web_returns table in tpcds in iceberg format and try to copy 
> over data from regular table. More like "insert into web_returns_iceberg as 
> select * from web_returns".
> This inserts the data correctly, however there are lot of files present in 
> each partition. IMO, dynamic sort optimisation isn't working fine and this 
> causes records not to be grouped in the final phase.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25967) Prevent residual expressions from getting serialized in Iceberg splits

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25967?focusedWorklogId=752945=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752945
 ]

ASF GitHub Bot logged work on HIVE-25967:
-

Author: ASF GitHub Bot
Created on: 05/Apr/22 15:26
Start Date: 05/Apr/22 15:26
Worklog Time Spent: 10m 
  Work Description: szlta opened a new pull request, #3178:
URL: https://github.com/apache/hive/pull/3178

   I originally thought that we only need the hack whenever residuals are 
present, so I added this condition:
   
   
https://github.com/apache/hive/commit/1aa6ce84798e78ea53c3bec2beedb5f55b6c#diff-9487d7073613adf5132783cf905ea72164eb4c19461c50e5ce3cd735bb5704a3R127
   
   What I didn't know is that in some cases the residuals() invocation may end 
up returning True while the expression is still some longer construct. The 
residuals() invocation actually evaluates said expression against the partition 
information found in the base scan file task... Because of this the residuals 
are left untouched and will cause OOM.. 
   
   This addendum removes aforementioned unnecessary condition




Issue Time Tracking
---

Worklog Id: (was: 752945)
Time Spent: 50m  (was: 40m)

> Prevent residual expressions from getting serialized in Iceberg splits
> --
>
> Key: HIVE-25967
> URL: https://issues.apache.org/jira/browse/HIVE-25967
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This hack removes residual expressions from the file scan task just before 
> split serialization.
> Residuals can sometime take up too much space in the payload causing Tez AM 
> to OOM.
> Unfortunately Tez AM doesn't distribute splits in a streamed way, that is, it 
> serializes all splits for a job before sending them out to executors. Some 
> residuals may take ~ 1 MB in memory, multiplied with thousands of split could 
> kill the Tez AM JVM.
> Until the streamed split distribution is implemented we will kick residuals 
> out of the split.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=752931=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752931
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 05/Apr/22 14:52
Start Date: 05/Apr/22 14:52
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r842882389


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java:
##
@@ -156,11 +163,19 @@ public void abortTask(TaskAttemptContext originalContext) 
throws IOException {
 TaskAttemptContext context = 
TezUtil.enrichContextWithAttemptWrapper(originalContext);
 
 // Clean up writer data from the local store
-Map writers = 
HiveIcebergRecordWriter.removeWriters(context.getTaskAttemptID());
+Map writers = 
HiveIcebergWriter.getRecordWriters(context.getTaskAttemptID());

Review Comment:
   We could move the logic to a static method in the writer, e.g. 
`HiveIcebergWriter.closeAndRemoveWriters(attemptId)`. Is this what you had in 
mind? This way we could call this static method in both `commitTask` and 
`abortTask`.
   
   Actually I don't see the writer closing logic in the `commitTask` method 
currently (only in `abortTask`), so it's possible we're not even closing out 
our iceberg writers currently in the happy path?





Issue Time Tracking
---

Worklog Id: (was: 752931)
Time Spent: 4h 40m  (was: 4.5h)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=752930=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752930
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 05/Apr/22 14:46
Start Date: 05/Apr/22 14:46
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r842875835


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergDeleteWriter.java:
##
@@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.io.IOException;
+import java.util.List;
+import org.apache.hadoop.io.Writable;
+import org.apache.hadoop.mapred.TaskAttemptID;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.data.Record;
+import org.apache.iceberg.deletes.PositionDelete;
+import org.apache.iceberg.io.ClusteredPositionDeleteWriter;
+import org.apache.iceberg.io.FileIO;
+import org.apache.iceberg.io.FileWriterFactory;
+import org.apache.iceberg.io.OutputFileFactory;
+import org.apache.iceberg.mr.mapred.Container;
+import org.apache.iceberg.util.Tasks;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class HiveIcebergDeleteWriter extends HiveIcebergWriter {
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveIcebergDeleteWriter.class);
+
+  private final ClusteredPositionDeleteWriter innerWriter;
+
+  HiveIcebergDeleteWriter(Schema schema, PartitionSpec spec, FileFormat 
fileFormat,
+  FileWriterFactory writerFactory, OutputFileFactory fileFactory, 
FileIO io, long targetFileSize,
+  TaskAttemptID taskAttemptID, String tableName) {
+super(schema, spec, io, taskAttemptID, tableName, true);
+this.innerWriter = new ClusteredPositionDeleteWriter<>(writerFactory, 
fileFactory, io, fileFormat, targetFileSize);
+  }
+
+  @Override
+  public void write(Writable row) throws IOException {
+Record rec = ((Container) row).get();
+PositionDelete positionDelete = 
IcebergAcidUtil.getPositionDelete(spec.schema(), rec);
+innerWriter.write(positionDelete, spec, partition(positionDelete.row()));
+  }
+
+  @Override
+  public void close(boolean abort) throws IOException {
+innerWriter.close();
+List deleteFiles = deleteFiles();
+
+// If abort then remove the unnecessary files
+if (abort) {
+  Tasks.foreach(deleteFiles)
+  .retry(3)
+  .suppressFailureWhenFinished()
+  .onFailure((file, exception) -> LOG.debug("Failed on to remove 
delete file {} on abort", file, exception))
+  .run(deleteFile -> io.deleteFile(deleteFile.path().toString()));
+}
+
+LOG.info("IcebergDeleteWriter is closed with abort={}. Created {} files", 
abort, deleteFiles.size());
+  }
+
+  @Override
+  public List deleteFiles() {

Review Comment:
   That's a good idea. We can use this signature I think: `public List files()`





Issue Time Tracking
---

Worklog Id: (was: 752930)
Time Spent: 4.5h  (was: 4h 20m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=752928=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752928
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 05/Apr/22 14:39
Start Date: 05/Apr/22 14:39
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r842867791


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##
@@ -261,6 +268,13 @@ public boolean nextKeyValue() throws IOException {
   while (true) {
 if (currentIterator.hasNext()) {
   current = currentIterator.next();
+  Configuration conf = context.getConfiguration();
+  if (HiveIcebergStorageHandler.isDelete(conf, 
conf.get(Catalogs.NAME))) {
+if (current instanceof GenericRecord) {
+  PositionDeleteInfo pdi = 
IcebergAcidUtil.parsePositionDeleteInfoFromRecord((GenericRecord) current);

Review Comment:
   > Can we reuse the pdi object to save some time for GC?
   
   Yes, I think it's a good idea to reuse it. We can make it mutable and the 
AcidUtil can just update its fields





Issue Time Tracking
---

Worklog Id: (was: 752928)
Time Spent: 4h 20m  (was: 4h 10m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=752926=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752926
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 05/Apr/22 14:38
Start Date: 05/Apr/22 14:38
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r842866645


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##
@@ -261,6 +268,13 @@ public boolean nextKeyValue() throws IOException {
   while (true) {
 if (currentIterator.hasNext()) {
   current = currentIterator.next();
+  Configuration conf = context.getConfiguration();
+  if (HiveIcebergStorageHandler.isDelete(conf, 
conf.get(Catalogs.NAME))) {
+if (current instanceof GenericRecord) {
+  PositionDeleteInfo pdi = 
IcebergAcidUtil.parsePositionDeleteInfoFromRecord((GenericRecord) current);

Review Comment:
   > Do we need a GenericRecord here?
   We need to have a GenericRecord here because it's straightforward to grab 
the positional delete info for each record from the GenericRecord. I haven't 
looked into how to grab it from a VectorizedRowBatch (or its parquet 
equivalent), but seemed more complicated at first glance. As for whether we 
need the instanceof check, that's a different question, I guess we don't, since 
we already assert that vectorization is off during compilation, so I can remove 
that extra check



##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##
@@ -261,6 +268,13 @@ public boolean nextKeyValue() throws IOException {
   while (true) {
 if (currentIterator.hasNext()) {
   current = currentIterator.next();
+  Configuration conf = context.getConfiguration();
+  if (HiveIcebergStorageHandler.isDelete(conf, 
conf.get(Catalogs.NAME))) {
+if (current instanceof GenericRecord) {
+  PositionDeleteInfo pdi = 
IcebergAcidUtil.parsePositionDeleteInfoFromRecord((GenericRecord) current);

Review Comment:
   > Do we need a GenericRecord here?
   
   We need to have a GenericRecord here because it's straightforward to grab 
the positional delete info for each record from the GenericRecord. I haven't 
looked into how to grab it from a VectorizedRowBatch (or its parquet 
equivalent), but seemed more complicated at first glance. As for whether we 
need the instanceof check, that's a different question, I guess we don't, since 
we already assert that vectorization is off during compilation, so I can remove 
that extra check





Issue Time Tracking
---

Worklog Id: (was: 752926)
Time Spent: 4h 10m  (was: 4h)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=752924=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752924
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 05/Apr/22 14:35
Start Date: 05/Apr/22 14:35
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r842863629


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##
@@ -261,6 +268,13 @@ public boolean nextKeyValue() throws IOException {
   while (true) {
 if (currentIterator.hasNext()) {
   current = currentIterator.next();
+  Configuration conf = context.getConfiguration();

Review Comment:
   No, it's not changing, it makes sense to move it outside the loop





Issue Time Tracking
---

Worklog Id: (was: 752924)
Time Spent: 4h  (was: 3h 50m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=752923=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752923
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 05/Apr/22 14:34
Start Date: 05/Apr/22 14:34
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r842862653


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergDeleteWriter.java:
##
@@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.io.IOException;
+import java.util.List;
+import org.apache.hadoop.io.Writable;
+import org.apache.hadoop.mapred.TaskAttemptID;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.data.Record;
+import org.apache.iceberg.deletes.PositionDelete;
+import org.apache.iceberg.io.ClusteredPositionDeleteWriter;
+import org.apache.iceberg.io.FileIO;
+import org.apache.iceberg.io.FileWriterFactory;
+import org.apache.iceberg.io.OutputFileFactory;
+import org.apache.iceberg.mr.mapred.Container;
+import org.apache.iceberg.util.Tasks;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class HiveIcebergDeleteWriter extends HiveIcebergWriter {

Review Comment:
   Do you mean moving this logic into a factory?
   ```
   if (HiveIcebergStorageHandler.isDelete(jc, tableName)) {
 return new HiveIcebergDeleteWriter(schema, spec, fileFormat, 
writerFactory, outputFileFactory, io, targetFileSize,
 taskAttemptID, tableName);
   } else {
 return new HiveIcebergRecordWriter(schema, spec, fileFormat, 
writerFactory, outputFileFactory, io, targetFileSize,
 taskAttemptID, tableName);
   }
   ```
   ->
   ```
   return HiveIcebergWriterFactory.getWriter(schema, spec, fileFormat, 
writerFactory, outputFileFactory, io, targetFileSize,
 taskAttemptID, tableName, isDelete);
   ```
   As for implementing the iceberg `FileWriterFactory`, I don't see the benefit 
yet, but we can discuss if that's what you had in mind.





Issue Time Tracking
---

Worklog Id: (was: 752923)
Time Spent: 3h 50m  (was: 3h 40m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=752921=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752921
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 05/Apr/22 14:30
Start Date: 05/Apr/22 14:30
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r842857047


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java:
##
@@ -118,18 +120,23 @@ public void commitTask(TaskAttemptContext 
originalContext) throws IOException {
   .run(output -> {
 Table table = 
HiveIcebergStorageHandler.table(context.getJobConf(), output);
 if (table != null) {
-  HiveIcebergRecordWriter writer = writers.get(output);
-  DataFile[] closedFiles;
+  HiveIcebergWriter writer = writers.get(output);
+  HiveIcebergWriter delWriter = delWriters.get(output);
+  String fileForCommitLocation = 
generateFileForCommitLocation(table.location(), jobConf,
+  attemptID.getJobID(), attemptID.getTaskID().getId());
+  if (delWriter != null) {
+DeleteFile[] closedFiles = delWriter.deleteFiles().toArray(new 
DeleteFile[0]);
+createFileForCommit(closedFiles, fileForCommitLocation, 
table.io());

Review Comment:
   We could write into a single forCommit, but it would complicate things 
during the jobCommit operation. We'd have to read the files back from disk and 
check for each file separately whether it's a DataFile or a DeleteFile, since 
they each need a different Iceberg API call to commit them (`table.newAppend()` 
vs `table.newRowDelta()`, respcetively)





Issue Time Tracking
---

Worklog Id: (was: 752921)
Time Spent: 3h 40m  (was: 3.5h)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=752913=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752913
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 05/Apr/22 14:09
Start Date: 05/Apr/22 14:09
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r842831675


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##
@@ -261,6 +268,13 @@ public boolean nextKeyValue() throws IOException {
   while (true) {
 if (currentIterator.hasNext()) {
   current = currentIterator.next();
+  Configuration conf = context.getConfiguration();

Review Comment:
   Is the context changing here? Shall we move the conf object to outside of 
the loop?





Issue Time Tracking
---

Worklog Id: (was: 752913)
Time Spent: 3.5h  (was: 3h 20m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=752912=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752912
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 05/Apr/22 14:08
Start Date: 05/Apr/22 14:08
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r842830816


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##
@@ -261,6 +268,13 @@ public boolean nextKeyValue() throws IOException {
   while (true) {
 if (currentIterator.hasNext()) {
   current = currentIterator.next();
+  Configuration conf = context.getConfiguration();
+  if (HiveIcebergStorageHandler.isDelete(conf, 
conf.get(Catalogs.NAME))) {
+if (current instanceof GenericRecord) {
+  PositionDeleteInfo pdi = 
IcebergAcidUtil.parsePositionDeleteInfoFromRecord((GenericRecord) current);

Review Comment:
   Can we reuse the pdi object to save some time for GC?





Issue Time Tracking
---

Worklog Id: (was: 752912)
Time Spent: 3h 20m  (was: 3h 10m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=752911=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752911
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 05/Apr/22 14:07
Start Date: 05/Apr/22 14:07
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r842829926


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##
@@ -261,6 +268,13 @@ public boolean nextKeyValue() throws IOException {
   while (true) {
 if (currentIterator.hasNext()) {
   current = currentIterator.next();
+  Configuration conf = context.getConfiguration();
+  if (HiveIcebergStorageHandler.isDelete(conf, 
conf.get(Catalogs.NAME))) {
+if (current instanceof GenericRecord) {
+  PositionDeleteInfo pdi = 
IcebergAcidUtil.parsePositionDeleteInfoFromRecord((GenericRecord) current);

Review Comment:
   Do we need a `GenericRecord` here?





Issue Time Tracking
---

Worklog Id: (was: 752911)
Time Spent: 3h 10m  (was: 3h)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=752910=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752910
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 05/Apr/22 14:05
Start Date: 05/Apr/22 14:05
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r842827417


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java:
##
@@ -156,11 +163,19 @@ public void abortTask(TaskAttemptContext originalContext) 
throws IOException {
 TaskAttemptContext context = 
TezUtil.enrichContextWithAttemptWrapper(originalContext);
 
 // Clean up writer data from the local store
-Map writers = 
HiveIcebergRecordWriter.removeWriters(context.getTaskAttemptID());
+Map writers = 
HiveIcebergWriter.getRecordWriters(context.getTaskAttemptID());

Review Comment:
   Maybe the place of this should not be here, but in the respecitve writers, 
or in the Committer?





Issue Time Tracking
---

Worklog Id: (was: 752910)
Time Spent: 3h  (was: 2h 50m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=752909=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752909
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 05/Apr/22 14:00
Start Date: 05/Apr/22 14:00
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r842821652


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergDeleteWriter.java:
##
@@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.io.IOException;
+import java.util.List;
+import org.apache.hadoop.io.Writable;
+import org.apache.hadoop.mapred.TaskAttemptID;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.data.Record;
+import org.apache.iceberg.deletes.PositionDelete;
+import org.apache.iceberg.io.ClusteredPositionDeleteWriter;
+import org.apache.iceberg.io.FileIO;
+import org.apache.iceberg.io.FileWriterFactory;
+import org.apache.iceberg.io.OutputFileFactory;
+import org.apache.iceberg.mr.mapred.Container;
+import org.apache.iceberg.util.Tasks;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class HiveIcebergDeleteWriter extends HiveIcebergWriter {
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveIcebergDeleteWriter.class);
+
+  private final ClusteredPositionDeleteWriter innerWriter;
+
+  HiveIcebergDeleteWriter(Schema schema, PartitionSpec spec, FileFormat 
fileFormat,
+  FileWriterFactory writerFactory, OutputFileFactory fileFactory, 
FileIO io, long targetFileSize,
+  TaskAttemptID taskAttemptID, String tableName) {
+super(schema, spec, io, taskAttemptID, tableName, true);
+this.innerWriter = new ClusteredPositionDeleteWriter<>(writerFactory, 
fileFactory, io, fileFormat, targetFileSize);
+  }
+
+  @Override
+  public void write(Writable row) throws IOException {
+Record rec = ((Container) row).get();
+PositionDelete positionDelete = 
IcebergAcidUtil.getPositionDelete(spec.schema(), rec);
+innerWriter.write(positionDelete, spec, partition(positionDelete.row()));
+  }
+
+  @Override
+  public void close(boolean abort) throws IOException {
+innerWriter.close();
+List deleteFiles = deleteFiles();
+
+// If abort then remove the unnecessary files
+if (abort) {
+  Tasks.foreach(deleteFiles)
+  .retry(3)
+  .suppressFailureWhenFinished()
+  .onFailure((file, exception) -> LOG.debug("Failed on to remove 
delete file {} on abort", file, exception))
+  .run(deleteFile -> io.deleteFile(deleteFile.path().toString()));
+}
+
+LOG.info("IcebergDeleteWriter is closed with abort={}. Created {} files", 
abort, deleteFiles.size());
+  }
+
+  @Override
+  public List deleteFiles() {

Review Comment:
   While not just `files()` and then we can generalize the `close()` too





Issue Time Tracking
---

Worklog Id: (was: 752909)
Time Spent: 2h 50m  (was: 2h 40m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-25934) Non blocking RENAME PARTITION implementation

2022-04-05 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-25934.
---
Resolution: Fixed

> Non blocking RENAME PARTITION implementation
> 
>
> Key: HIVE-25934
> URL: https://issues.apache.org/jira/browse/HIVE-25934
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Implement RENAME PARTITION in a way that doesn't have to wait for currently 
> running read operations to be finished.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-25934) Non blocking RENAME PARTITION implementation

2022-04-05 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517452#comment-17517452
 ] 

Denys Kuzmenko commented on HIVE-25934:
---

Merged to master.
[~pvary], [~rajesh.balamohan], thank you for the review!

> Non blocking RENAME PARTITION implementation
> 
>
> Key: HIVE-25934
> URL: https://issues.apache.org/jira/browse/HIVE-25934
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Implement RENAME PARTITION in a way that doesn't have to wait for currently 
> running read operations to be finished.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=752904=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752904
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 05/Apr/22 13:53
Start Date: 05/Apr/22 13:53
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r842813397


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergDeleteWriter.java:
##
@@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr.hive;
+
+import java.io.IOException;
+import java.util.List;
+import org.apache.hadoop.io.Writable;
+import org.apache.hadoop.mapred.TaskAttemptID;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.data.Record;
+import org.apache.iceberg.deletes.PositionDelete;
+import org.apache.iceberg.io.ClusteredPositionDeleteWriter;
+import org.apache.iceberg.io.FileIO;
+import org.apache.iceberg.io.FileWriterFactory;
+import org.apache.iceberg.io.OutputFileFactory;
+import org.apache.iceberg.mr.mapred.Container;
+import org.apache.iceberg.util.Tasks;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class HiveIcebergDeleteWriter extends HiveIcebergWriter {

Review Comment:
   Do we want to create a single `HiveWriterFactory implements 
FileWriterFactory`?





Issue Time Tracking
---

Worklog Id: (was: 752904)
Time Spent: 2h 40m  (was: 2.5h)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26102?focusedWorklogId=752902=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752902
 ]

ASF GitHub Bot logged work on HIVE-26102:
-

Author: ASF GitHub Bot
Created on: 05/Apr/22 13:51
Start Date: 05/Apr/22 13:51
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3131:
URL: https://github.com/apache/hive/pull/3131#discussion_r842811770


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java:
##
@@ -118,18 +120,23 @@ public void commitTask(TaskAttemptContext 
originalContext) throws IOException {
   .run(output -> {
 Table table = 
HiveIcebergStorageHandler.table(context.getJobConf(), output);
 if (table != null) {
-  HiveIcebergRecordWriter writer = writers.get(output);
-  DataFile[] closedFiles;
+  HiveIcebergWriter writer = writers.get(output);
+  HiveIcebergWriter delWriter = delWriters.get(output);
+  String fileForCommitLocation = 
generateFileForCommitLocation(table.location(), jobConf,
+  attemptID.getJobID(), attemptID.getTaskID().getId());
+  if (delWriter != null) {
+DeleteFile[] closedFiles = delWriter.deleteFiles().toArray(new 
DeleteFile[0]);
+createFileForCommit(closedFiles, fileForCommitLocation, 
table.io());

Review Comment:
   would it make sense to write into a single `forCommit` file?





Issue Time Tracking
---

Worklog Id: (was: 752902)
Time Spent: 2.5h  (was: 2h 20m)

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25934) Non blocking RENAME PARTITION implementation

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25934?focusedWorklogId=752880=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752880
 ]

ASF GitHub Bot logged work on HIVE-25934:
-

Author: ASF GitHub Bot
Created on: 05/Apr/22 13:26
Start Date: 05/Apr/22 13:26
Worklog Time Spent: 10m 
  Work Description: deniskuzZ merged PR #3015:
URL: https://github.com/apache/hive/pull/3015




Issue Time Tracking
---

Worklog Id: (was: 752880)
Time Spent: 2h 40m  (was: 2.5h)

> Non blocking RENAME PARTITION implementation
> 
>
> Key: HIVE-25934
> URL: https://issues.apache.org/jira/browse/HIVE-25934
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Implement RENAME PARTITION in a way that doesn't have to wait for currently 
> running read operations to be finished.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26116) Fix handling of compaction requests originating from aborted dynamic partition queries in Initiator

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26116?focusedWorklogId=752866=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752866
 ]

ASF GitHub Bot logged work on HIVE-26116:
-

Author: ASF GitHub Bot
Created on: 05/Apr/22 13:03
Start Date: 05/Apr/22 13:03
Worklog Time Spent: 10m 
  Work Description: pvary commented on PR #3177:
URL: https://github.com/apache/hive/pull/3177#issuecomment-1088677854

   +1 pending tests




Issue Time Tracking
---

Worklog Id: (was: 752866)
Time Spent: 20m  (was: 10m)

> Fix handling of compaction requests originating from aborted dynamic 
> partition queries in Initiator
> ---
>
> Key: HIVE-26116
> URL: https://issues.apache.org/jira/browse/HIVE-26116
> Project: Hive
>  Issue Type: Bug
>Reporter: László Végh
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Compaction requests originated from an abort of a dynamic partition insert 
> can cause a NPE in Initiator.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26116) Fix handling of compaction requests originating from aborted dynamic partition queries in Initiator

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26116:
--
Labels: pull-request-available  (was: )

> Fix handling of compaction requests originating from aborted dynamic 
> partition queries in Initiator
> ---
>
> Key: HIVE-26116
> URL: https://issues.apache.org/jira/browse/HIVE-26116
> Project: Hive
>  Issue Type: Bug
>Reporter: László Végh
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Compaction requests originated from an abort of a dynamic partition insert 
> can cause a NPE in Initiator.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26116) Fix handling of compaction requests originating from aborted dynamic partition queries in Initiator

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26116?focusedWorklogId=752854=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752854
 ]

ASF GitHub Bot logged work on HIVE-26116:
-

Author: ASF GitHub Bot
Created on: 05/Apr/22 12:46
Start Date: 05/Apr/22 12:46
Worklog Time Spent: 10m 
  Work Description: veghlaci05 opened a new pull request, #3177:
URL: https://github.com/apache/hive/pull/3177

   ### What changes were proposed in this pull request?
   
> Key: HIVE-26116
> URL: https://issues.apache.org/jira/browse/HIVE-26116
> Project: Hive
>  Issue Type: Bug
>Reporter: László Végh
>Assignee: László Végh
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Compaction requests originated from an abort of a dynamic partition insert 
> can cause a NPE in Initiator.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (HIVE-26112) Missing scripts for metastore

2022-04-05 Thread Alessandro Solimando (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517403#comment-17517403
 ] 

Alessandro Solimando edited comment on HIVE-26112 at 4/5/22 12:38 PM:
--

After "HIVE-26044: Remove hardcoded version references from the tests (Peter 
Vary reviewed by Marton Bod and Stamatis Zampetakis)" we should not have this 
issue for newer versions, if the script for the current version is missing, it 
will be caught here.

So for this ticket I will create the missing 3.2.0 script, I don't think we 
need to add more tests.

EDIT: something that is not covered is the upgrade path of sysdb, I will keep 
checking the existing tests to see if there is anything along that line.


was (Author: asolimando):
After "HIVE-26044: Remove hardcoded version references from the tests (Peter 
Vary reviewed by Marton Bod and Stamatis Zampetakis)" we should not have this 
issue for newer versions, if the script for the current version is missing, it 
will be caught here.

So for this ticket I will create the missing 3.2.0 script, I don't think we 
need to add more tests.

> Missing scripts for metastore
> -
>
> Key: HIVE-26112
> URL: https://issues.apache.org/jira/browse/HIVE-26112
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Blocker
> Fix For: 4.0.0-alpha-2
>
>
> The version of the scripts for _metastore_ and _standalone-metastore_ should 
> be in sync, but at the moment for the metastore side we are missing 3.2.0 
> scripts (in _metastore/scripts/upgrade/hive_), while they are present in the 
> standalone_metastore counterpart(s):
> * hive-schema-3.2.0.*.sql
> * upgrade-3.1.0-to-3.2.0.*.sql
> * upgrade-3.2.0-to-4.0.0-alpha-1.*.sql
> * upgrade-4.0.0-alpha-1-to-4.0.0-alpha-2.*.sql



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-26116) Fix handling of compaction requests originating from aborted dynamic partition queries in Initiator

2022-04-05 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-26116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Végh reassigned HIVE-26116:
--

Assignee: László Végh

> Fix handling of compaction requests originating from aborted dynamic 
> partition queries in Initiator
> ---
>
> Key: HIVE-26116
> URL: https://issues.apache.org/jira/browse/HIVE-26116
> Project: Hive
>  Issue Type: Bug
>Reporter: László Végh
>Assignee: László Végh
>Priority: Major
>
> Compaction requests originated from an abort of a dynamic partition insert 
> can cause a NPE in Initiator.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (HIVE-26112) Missing scripts for metastore

2022-04-05 Thread Alessandro Solimando (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517403#comment-17517403
 ] 

Alessandro Solimando edited comment on HIVE-26112 at 4/5/22 12:20 PM:
--

After "HIVE-26044: Remove hardcoded version references from the tests (Peter 
Vary reviewed by Marton Bod and Stamatis Zampetakis)" we should not have this 
issue for newer versions, if the script for the current version is missing, it 
will be caught here.

So for this ticket I will create the missing 3.2.0 script, I don't think we 
need to add more tests.


was (Author: asolimando):
After HIVE-26044: Remove hardcoded version references from the tests (Peter 
Vary reviewed by Marton Bod and Stamatis Zampetakis) we should not have this 
issue for newer versions, if the script for the current version is missing, it 
will be caught here.

So for this ticket I will create the missing 3.2.0 script, I don't think we 
need to add more tests.

> Missing scripts for metastore
> -
>
> Key: HIVE-26112
> URL: https://issues.apache.org/jira/browse/HIVE-26112
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Blocker
> Fix For: 4.0.0-alpha-2
>
>
> The version of the scripts for _metastore_ and _standalone-metastore_ should 
> be in sync, but at the moment for the metastore side we are missing 3.2.0 
> scripts (in _metastore/scripts/upgrade/hive_), while they are present in the 
> standalone_metastore counterpart(s):
> * hive-schema-3.2.0.*.sql
> * upgrade-3.1.0-to-3.2.0.*.sql
> * upgrade-3.2.0-to-4.0.0-alpha-1.*.sql
> * upgrade-4.0.0-alpha-1-to-4.0.0-alpha-2.*.sql



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-26112) Missing scripts for metastore

2022-04-05 Thread Alessandro Solimando (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517403#comment-17517403
 ] 

Alessandro Solimando commented on HIVE-26112:
-

After HIVE-26044: Remove hardcoded version references from the tests (Peter 
Vary reviewed by Marton Bod and Stamatis Zampetakis) we should not have this 
issue for newer versions, if the script for the current version is missing, it 
will be caught here.

So for this ticket I will create the missing 3.2.0 script, I don't think we 
need to add more tests.

> Missing scripts for metastore
> -
>
> Key: HIVE-26112
> URL: https://issues.apache.org/jira/browse/HIVE-26112
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Blocker
> Fix For: 4.0.0-alpha-2
>
>
> The version of the scripts for _metastore_ and _standalone-metastore_ should 
> be in sync, but at the moment for the metastore side we are missing 3.2.0 
> scripts (in _metastore/scripts/upgrade/hive_), while they are present in the 
> standalone_metastore counterpart(s):
> * hive-schema-3.2.0.*.sql
> * upgrade-3.1.0-to-3.2.0.*.sql
> * upgrade-3.2.0-to-4.0.0-alpha-1.*.sql
> * upgrade-4.0.0-alpha-1-to-4.0.0-alpha-2.*.sql



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-2342) mirror.facebook.net is 404ing

2022-04-05 Thread Swarup Patra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517366#comment-17517366
 ] 

Swarup Patra commented on HIVE-2342:


That was An  Combine As Well Treed Options Has Block Decited Well Soon As More 
Walk But Not away Has Sometimes Problem

> mirror.facebook.net is 404ing
> -
>
> Key: HIVE-2342
> URL: https://issues.apache.org/jira/browse/HIVE-2342
> Project: Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Reporter: Andrew Bayer
>Assignee: Carl Steinbach
>Priority: Major
> Fix For: 0.8.0
>
> Attachments: HIVE-2342.1.patch.txt, HIVE-2342.2.patch.txt
>
>
> http://mirror.facebook.net/ and everything under it is 404ing, which is 
> blocking any attempt to build Hive from working.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26110) Bulk insert into partitioned table creates lots of files in iceberg

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26110?focusedWorklogId=752722=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752722
 ]

ASF GitHub Bot logged work on HIVE-26110:
-

Author: ASF GitHub Bot
Created on: 05/Apr/22 09:01
Start Date: 05/Apr/22 09:01
Worklog Time Spent: 10m 
  Work Description: szlta merged PR #3174:
URL: https://github.com/apache/hive/pull/3174




Issue Time Tracking
---

Worklog Id: (was: 752722)
Time Spent: 1h  (was: 50m)

> Bulk insert into partitioned table creates lots of files in iceberg
> ---
>
> Key: HIVE-26110
> URL: https://issues.apache.org/jira/browse/HIVE-26110
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> For e.g, create web_returns table in tpcds in iceberg format and try to copy 
> over data from regular table. More like "insert into web_returns_iceberg as 
> select * from web_returns".
> This inserts the data correctly, however there are lot of files present in 
> each partition. IMO, dynamic sort optimisation isn't working fine and this 
> causes records not to be grouped in the final phase.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26110) Bulk insert into partitioned table creates lots of files in iceberg

2022-04-05 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-26110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita updated HIVE-26110:
--
Summary: Bulk insert into partitioned table creates lots of files in 
iceberg  (was: bulk insert into partitioned table creates lots of files in 
iceberg)

> Bulk insert into partitioned table creates lots of files in iceberg
> ---
>
> Key: HIVE-26110
> URL: https://issues.apache.org/jira/browse/HIVE-26110
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> For e.g, create web_returns table in tpcds in iceberg format and try to copy 
> over data from regular table. More like "insert into web_returns_iceberg as 
> select * from web_returns".
> This inserts the data correctly, however there are lot of files present in 
> each partition. IMO, dynamic sort optimisation isn't working fine and this 
> causes records not to be grouped in the final phase.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26110) bulk insert into partitioned table creates lots of files in iceberg

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26110?focusedWorklogId=752721=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752721
 ]

ASF GitHub Bot logged work on HIVE-26110:
-

Author: ASF GitHub Bot
Created on: 05/Apr/22 09:00
Start Date: 05/Apr/22 09:00
Worklog Time Spent: 10m 
  Work Description: szlta commented on code in PR #3174:
URL: https://github.com/apache/hive/pull/3174#discussion_r842544398


##
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java:
##
@@ -648,7 +648,12 @@ public ReduceSinkOperator getReduceSinkOp(List 
partitionPositions, List
   ArrayList partCols = Lists.newArrayList();
 
   for (Function, ExprNodeDesc> customSortExpr : 
customSortExprs) {
-keyCols.add(customSortExpr.apply(allCols));
+ExprNodeDesc colExpr = customSortExpr.apply(allCols);
+// Custom sort expressions are marked as KEYs, which is required for 
sorting the rows that are going for
+// a particular reducer instance. They also need to be marked as 
'partition' columns for MapReduce shuffle
+// phase, in order to gather the same keys to the same reducer 
instances.
+keyCols.add(colExpr);
+partCols.add(colExpr);

Review Comment:
   Thx!





Issue Time Tracking
---

Worklog Id: (was: 752721)
Time Spent: 50m  (was: 40m)

> bulk insert into partitioned table creates lots of files in iceberg
> ---
>
> Key: HIVE-26110
> URL: https://issues.apache.org/jira/browse/HIVE-26110
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> For e.g, create web_returns table in tpcds in iceberg format and try to copy 
> over data from regular table. More like "insert into web_returns_iceberg as 
> select * from web_returns".
> This inserts the data correctly, however there are lot of files present in 
> each partition. IMO, dynamic sort optimisation isn't working fine and this 
> causes records not to be grouped in the final phase.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26114) jdbc connection hivesrerver2 using dfs command with prefix space will cause exception

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26114?focusedWorklogId=752716=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752716
 ]

ASF GitHub Bot logged work on HIVE-26114:
-

Author: ASF GitHub Bot
Created on: 05/Apr/22 08:50
Start Date: 05/Apr/22 08:50
Worklog Time Spent: 10m 
  Work Description: pvary commented on PR #3176:
URL: https://github.com/apache/hive/pull/3176#issuecomment-1088439292

   @ming95: Can we have a unit test which fails before the patch, and runs 
successfully after the patch?
   Maybe `TestBeeLineDriver`, or `TestBeeLineWithArgs`, or something like them




Issue Time Tracking
---

Worklog Id: (was: 752716)
Time Spent: 0.5h  (was: 20m)

> jdbc connection hivesrerver2 using dfs command with prefix space will cause 
> exception
> -
>
> Key: HIVE-26114
> URL: https://issues.apache.org/jira/browse/HIVE-26114
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 2.3.8, 3.1.2
>Reporter: shezm
>Assignee: shezm
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code:java}
>         Connection con = 
> DriverManager.getConnection("jdbc:hive2://10.214.35.115:1/");
>         Statement stmt = con.createStatement();
>         // dfs command with prefix space or "\n"
>         ResultSet res = stmt.executeQuery(" dfs -ls /");
>         //ResultSet res = stmt.executeQuery("\ndfs -ls /"); {code}
> it will cause exception
> {code:java}
> Exception in thread "main" org.apache.hive.service.cli.HiveSQLException: 
> Error while processing statement: null
>     at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:231)
>     at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:217)
>     at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:244)
>     at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:375)
>     at com.ne.gdc.whitemane.shezm.TestJdbc.main(TestJdbc.java:30)
> Caused by: org.apache.hive.service.cli.HiveSQLException: Error while 
> processing statement: null
>     at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
>     at 
> org.apache.hive.service.cli.operation.HiveCommandOperation.runInternal(HiveCommandOperation.java:118)
>     at org.apache.hive.service.cli.operation.Operation.run(Operation.java:320)
>     at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
>     at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
>     at sun.reflect.GeneratedMethodAccessor65.invoke(Unknown Source)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
>     at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
>     at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>     at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
>     at com.sun.proxy.$Proxy43.executeStatementAsync(Unknown Source)
>     at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
>     at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:530)
>     at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
>     at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
>     at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>     at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>     at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:605)
>     at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
>  {code}
> But when I execute sql with prefix "\n" it works fine
> {code:java}
> ResultSet 

[jira] [Work logged] (HIVE-26110) bulk insert into partitioned table creates lots of files in iceberg

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26110?focusedWorklogId=752712=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752712
 ]

ASF GitHub Bot logged work on HIVE-26110:
-

Author: ASF GitHub Bot
Created on: 05/Apr/22 08:41
Start Date: 05/Apr/22 08:41
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on code in PR #3174:
URL: https://github.com/apache/hive/pull/3174#discussion_r842526145


##
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java:
##
@@ -648,7 +648,12 @@ public ReduceSinkOperator getReduceSinkOp(List 
partitionPositions, List
   ArrayList partCols = Lists.newArrayList();
 
   for (Function, ExprNodeDesc> customSortExpr : 
customSortExprs) {
-keyCols.add(customSortExpr.apply(allCols));
+ExprNodeDesc colExpr = customSortExpr.apply(allCols);
+// Custom sort expressions are marked as KEYs, which is required for 
sorting the rows that are going for
+// a particular reducer instance. They also need to be marked as 
'partition' columns for MapReduce shuffle
+// phase, in order to gather the same keys to the same reducer 
instances.
+keyCols.add(colExpr);
+partCols.add(colExpr);

Review Comment:
   I didn't realise that entire "customSortExprs" copying was part of 
HIVE-25975. Should be fine in this case. 
   
   +1 pending tests.





Issue Time Tracking
---

Worklog Id: (was: 752712)
Time Spent: 40m  (was: 0.5h)

> bulk insert into partitioned table creates lots of files in iceberg
> ---
>
> Key: HIVE-26110
> URL: https://issues.apache.org/jira/browse/HIVE-26110
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> For e.g, create web_returns table in tpcds in iceberg format and try to copy 
> over data from regular table. More like "insert into web_returns_iceberg as 
> select * from web_returns".
> This inserts the data correctly, however there are lot of files present in 
> each partition. IMO, dynamic sort optimisation isn't working fine and this 
> causes records not to be grouped in the final phase.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26114) jdbc connection hivesrerver2 using dfs command with prefix space will cause exception

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26114?focusedWorklogId=752710=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752710
 ]

ASF GitHub Bot logged work on HIVE-26114:
-

Author: ASF GitHub Bot
Created on: 05/Apr/22 08:40
Start Date: 05/Apr/22 08:40
Worklog Time Spent: 10m 
  Work Description: ming95 commented on PR #3176:
URL: https://github.com/apache/hive/pull/3176#issuecomment-1088429808

   @deniskuzZ @pvary @sankarh @adesh-rao @zabetak
   
   It's a simple bug. could you guys please review the PR?




Issue Time Tracking
---

Worklog Id: (was: 752710)
Time Spent: 20m  (was: 10m)

> jdbc connection hivesrerver2 using dfs command with prefix space will cause 
> exception
> -
>
> Key: HIVE-26114
> URL: https://issues.apache.org/jira/browse/HIVE-26114
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 2.3.8, 3.1.2
>Reporter: shezm
>Assignee: shezm
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code:java}
>         Connection con = 
> DriverManager.getConnection("jdbc:hive2://10.214.35.115:1/");
>         Statement stmt = con.createStatement();
>         // dfs command with prefix space or "\n"
>         ResultSet res = stmt.executeQuery(" dfs -ls /");
>         //ResultSet res = stmt.executeQuery("\ndfs -ls /"); {code}
> it will cause exception
> {code:java}
> Exception in thread "main" org.apache.hive.service.cli.HiveSQLException: 
> Error while processing statement: null
>     at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:231)
>     at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:217)
>     at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:244)
>     at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:375)
>     at com.ne.gdc.whitemane.shezm.TestJdbc.main(TestJdbc.java:30)
> Caused by: org.apache.hive.service.cli.HiveSQLException: Error while 
> processing statement: null
>     at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
>     at 
> org.apache.hive.service.cli.operation.HiveCommandOperation.runInternal(HiveCommandOperation.java:118)
>     at org.apache.hive.service.cli.operation.Operation.run(Operation.java:320)
>     at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
>     at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
>     at sun.reflect.GeneratedMethodAccessor65.invoke(Unknown Source)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
>     at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
>     at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>     at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
>     at com.sun.proxy.$Proxy43.executeStatementAsync(Unknown Source)
>     at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
>     at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:530)
>     at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
>     at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
>     at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>     at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>     at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:605)
>     at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
>  {code}
> But when I execute sql with prefix "\n" it works fine
> {code:java}
> ResultSet res = stmt.executeQuery("\n select 1"); {code}



--
This message was sent 

[jira] [Commented] (HIVE-25540) Enable batch update of column stats only for MySql and Postgres

2022-04-05 Thread mahesh kumar behera (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517302#comment-17517302
 ] 

mahesh kumar behera commented on HIVE-25540:


[~zabetak] 

The batch update is tested in scale for mysql and Postgres backend only. 

> Enable batch update of column stats only for MySql and Postgres 
> 
>
> Key: HIVE-25540
> URL: https://issues.apache.org/jira/browse/HIVE-25540
> Project: Hive
>  Issue Type: Sub-task
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-1
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The batch updation of partition column stats using direct sql is tested only 
> for MySql and Postgres.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25492) Major query-based compaction is skipped if partition is empty

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25492?focusedWorklogId=752705=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752705
 ]

ASF GitHub Bot logged work on HIVE-25492:
-

Author: ASF GitHub Bot
Created on: 05/Apr/22 08:21
Start Date: 05/Apr/22 08:21
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3157:
URL: https://github.com/apache/hive/pull/3157#discussion_r842506372


##
ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java:
##
@@ -1480,6 +1482,57 @@ private static ValidTxnList 
getValidTxnList(Configuration conf) {
 return validTxnList;
   }
 
+
+  /**
+   * In case of the cleaner, we don't need to go into file level, it is enough 
to collect base/delta/deletedelta directories.
+   *
+   * @param fs the filesystem used for the directory lookup
+   * @param path the path of the table or partition needs to be cleaned
+   * @return The listed directory snapshot needs to be checked for cleaning
+   * @throws IOException on filesystem errors
+   */
+  public static Map getHdfsDirSnapshotsForCleaner(final 
FileSystem fs, final Path path)
+  throws IOException {
+Map dirToSnapshots = new HashMap<>();
+// depth first search
+Deque> stack = new ArrayDeque<>();
+stack.push(fs.listStatusIterator(path));
+while (!stack.isEmpty()) {
+  RemoteIterator itr = stack.pop();
+  while (itr.hasNext()) {
+FileStatus fStatus = itr.next();
+Path fPath = fStatus.getPath();
+if (acidHiddenFileFilter.accept(fPath) && 
acidTempDirFilter.accept(fPath)) {

Review Comment:
   we could use hiddenFileFilter as we don't need to include METADATA_FILE & 
ACID_FORMAT





Issue Time Tracking
---

Worklog Id: (was: 752705)
Time Spent: 2h  (was: 1h 50m)

> Major query-based compaction is skipped if partition is empty
> -
>
> Key: HIVE-25492
> URL: https://issues.apache.org/jira/browse/HIVE-25492
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Currently if the result of query-based compaction is an empty base, delta, or 
> delete delta, the empty directory is deleted.
> This is because of minor compaction – if there are only deltas to compact, 
> then no compacted delete delta should be created (only a compacted delta). In 
> the same way, if there are only delete deltas to compact, then no compacted 
> delta should be created (only a compacted delete delta).
> There is an issue with major compaction. If all the data in the partition has 
> been deleted, then we should get an empty base directory after compaction. 
> Instead, the empty base directory is deleted because it's empty and 
> compaction claims to succeed but we end up with the same deltas/delete deltas 
> we started with – basically compaction does not run.
> Where to start? MajorQueryCompactor#commitCompaction



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-26105) Show columns shows extra values if column comments contains specific Chinese character

2022-04-05 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera resolved HIVE-26105.

Resolution: Fixed

> Show columns shows extra values if column comments contains specific Chinese 
> character 
> ---
>
> Key: HIVE-26105
> URL: https://issues.apache.org/jira/browse/HIVE-26105
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The issue is happening because the UTF code for one of the Chinese character 
> contains the binary value of '\r' (CR). Because of this, the Hadoop line 
> reader (used by fetch task in Hive) is assuming the value after that 
> character as new value and this extra value with junk is getting displayed. 
> The issue is with 0x540D 名 ... The last value is "D" ..that is 13. While 
> reading the result, Hadoop line reader interpreting it as CR ( '\r'). Thus an 
> extra value with Junk is coming as output. For show column, we do not need 
> the comments. So while writing to the file, only column names should be 
> included.
> [https://github.com/apache/hadoop/blob/0fbd96a2449ec49f840d93e1c7d290c5218ef4ea/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java#L238]
>  
> {code:java}
> create table tbl_test  (fld0 string COMMENT  '期 ' , fld string COMMENT 
> '期末日期', fld1 string COMMENT '班次名称', fld2  string COMMENT '排班人数');
> show columns from tbl_test;
> ++
> | field  |
> ++
> | fld    |
> | fld0   |
> | fld1   |
> | �      |
> | fld2   |
> ++
> 5 rows selected (171.809 seconds)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26105) Show columns shows extra values if column comments contains specific Chinese character

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26105?focusedWorklogId=752701=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752701
 ]

ASF GitHub Bot logged work on HIVE-26105:
-

Author: ASF GitHub Bot
Created on: 05/Apr/22 08:17
Start Date: 05/Apr/22 08:17
Worklog Time Spent: 10m 
  Work Description: maheshk114 merged PR #3166:
URL: https://github.com/apache/hive/pull/3166




Issue Time Tracking
---

Worklog Id: (was: 752701)
Time Spent: 0.5h  (was: 20m)

> Show columns shows extra values if column comments contains specific Chinese 
> character 
> ---
>
> Key: HIVE-26105
> URL: https://issues.apache.org/jira/browse/HIVE-26105
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The issue is happening because the UTF code for one of the Chinese character 
> contains the binary value of '\r' (CR). Because of this, the Hadoop line 
> reader (used by fetch task in Hive) is assuming the value after that 
> character as new value and this extra value with junk is getting displayed. 
> The issue is with 0x540D 名 ... The last value is "D" ..that is 13. While 
> reading the result, Hadoop line reader interpreting it as CR ( '\r'). Thus an 
> extra value with Junk is coming as output. For show column, we do not need 
> the comments. So while writing to the file, only column names should be 
> included.
> [https://github.com/apache/hadoop/blob/0fbd96a2449ec49f840d93e1c7d290c5218ef4ea/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java#L238]
>  
> {code:java}
> create table tbl_test  (fld0 string COMMENT  '期 ' , fld string COMMENT 
> '期末日期', fld1 string COMMENT '班次名称', fld2  string COMMENT '排班人数');
> show columns from tbl_test;
> ++
> | field  |
> ++
> | fld    |
> | fld0   |
> | fld1   |
> | �      |
> | fld2   |
> ++
> 5 rows selected (171.809 seconds)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-26071) JWT authentication for Thrift over HTTP in HiveMetaStore

2022-04-05 Thread Alexis D. (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517293#comment-17517293
 ] 

Alexis D. commented on HIVE-26071:
--

Hi, i understand from this issue that we can protect hive metastore with a JWT 
running in HTTP mode. I have some question related to this topic.

First question:
 * Would it be possible to run both protocol at same time (thrift and http?)

Second question :

      - What's about the Authenticator interface to get for example the 
username or groups in the jwt claims? For what i saw but i am not sur the 
Authenticator interface is quite couple with hadoop/kerberos ugi ?

      - is there already a design today to allow like storage based 
authorization implementation where authenticator can get information of who is 
authenticated but not hadoop related?

 

Thanks!

> JWT authentication for Thrift over HTTP in HiveMetaStore
> 
>
> Key: HIVE-26071
> URL: https://issues.apache.org/jira/browse/HIVE-26071
> Project: Hive
>  Issue Type: New Feature
>  Components: Standalone Metastore
>Reporter: Sourabh Goyal
>Assignee: Sourabh Goyal
>Priority: Major
>
> HIVE-25575 recently added a support for JWT authentication in HS2. This Jira 
> aims to add the same feature in HMS



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26110) bulk insert into partitioned table creates lots of files in iceberg

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26110?focusedWorklogId=752670=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752670
 ]

ASF GitHub Bot logged work on HIVE-26110:
-

Author: ASF GitHub Bot
Created on: 05/Apr/22 07:21
Start Date: 05/Apr/22 07:21
Worklog Time Spent: 10m 
  Work Description: szlta commented on code in PR #3174:
URL: https://github.com/apache/hive/pull/3174#discussion_r842445223


##
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java:
##
@@ -648,7 +648,12 @@ public ReduceSinkOperator getReduceSinkOp(List 
partitionPositions, List
   ArrayList partCols = Lists.newArrayList();
 
   for (Function, ExprNodeDesc> customSortExpr : 
customSortExprs) {
-keyCols.add(customSortExpr.apply(allCols));
+ExprNodeDesc colExpr = customSortExpr.apply(allCols);
+// Custom sort expressions are marked as KEYs, which is required for 
sorting the rows that are going for
+// a particular reducer instance. They also need to be marked as 
'partition' columns for MapReduce shuffle
+// phase, in order to gather the same keys to the same reducer 
instances.
+keyCols.add(colExpr);
+partCols.add(colExpr);

Review Comment:
   If customSortExprs are present, then we can be sure that partitionPositions 
are empty as per 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java#L592-L596
   As for dpColNames - I'm not sure why it would matter? With Iceberg tables 
the table schema already contains the partition columns too, it's just that 
Hive doesn't think of these as partition columns, rather just regular columns.
   I think the schema should be fine, all columns will serve as VALUE (with 
Iceberg we want to write out the partition values into the file too, as in some 
cases the spec can have a non-identity type of partition transform) plus the 
ones identified by customSortExpr will be added as KEY for sorting purposes 
(only) additionally.





Issue Time Tracking
---

Worklog Id: (was: 752670)
Time Spent: 0.5h  (was: 20m)

> bulk insert into partitioned table creates lots of files in iceberg
> ---
>
> Key: HIVE-26110
> URL: https://issues.apache.org/jira/browse/HIVE-26110
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> For e.g, create web_returns table in tpcds in iceberg format and try to copy 
> over data from regular table. More like "insert into web_returns_iceberg as 
> select * from web_returns".
> This inserts the data correctly, however there are lot of files present in 
> each partition. IMO, dynamic sort optimisation isn't working fine and this 
> causes records not to be grouped in the final phase.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26113) Align HMS and metastore tables's schema

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26113?focusedWorklogId=752649=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752649
 ]

ASF GitHub Bot logged work on HIVE-26113:
-

Author: ASF GitHub Bot
Created on: 05/Apr/22 06:47
Start Date: 05/Apr/22 06:47
Worklog Time Spent: 10m 
  Work Description: pvary commented on PR #3175:
URL: https://github.com/apache/hive/pull/3175#issuecomment-1088327446

   Also there are some qtests (sysdb.q) which are checking the number of tables 
etc. This might make them fail.




Issue Time Tracking
---

Worklog Id: (was: 752649)
Time Spent: 0.5h  (was: 20m)

> Align HMS and metastore tables's schema
> ---
>
> Key: HIVE-26113
> URL: https://issues.apache.org/jira/browse/HIVE-26113
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> HMS tables should be in sync with those exposed by Hive metastore via _sysdb_.
> At the moment there are some discrepancies for the existing tables, the 
> present ticket aims at bridging this gap.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26113) Align HMS and metastore tables's schema

2022-04-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26113?focusedWorklogId=752647=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752647
 ]

ASF GitHub Bot logged work on HIVE-26113:
-

Author: ASF GitHub Bot
Created on: 05/Apr/22 06:45
Start Date: 05/Apr/22 06:45
Worklog Time Spent: 10m 
  Work Description: pvary commented on PR #3175:
URL: https://github.com/apache/hive/pull/3175#issuecomment-1088326712

   Might be a different story, but I think it would be good to have some tests 
in place where we can at least run a single query against all of the tables on 
all of the different supported databases. I am a bit concerned that we write 
wrong sqls and we do not run a test against them.




Issue Time Tracking
---

Worklog Id: (was: 752647)
Time Spent: 20m  (was: 10m)

> Align HMS and metastore tables's schema
> ---
>
> Key: HIVE-26113
> URL: https://issues.apache.org/jira/browse/HIVE-26113
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HMS tables should be in sync with those exposed by Hive metastore via _sysdb_.
> At the moment there are some discrepancies for the existing tables, the 
> present ticket aims at bridging this gap.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)