date:20230327

[jira] [Work logged] (HIVE-27150) Drop single partition can also support direct sql

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27150?focusedWorklogId=853290=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853290
 ]

ASF GitHub Bot logged work on HIVE-27150:
-

Author: ASF GitHub Bot
Created on: 28/Mar/23 05:37
Start Date: 28/Mar/23 05:37
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera commented on code in PR #4123:
URL: https://github.com/apache/hive/pull/4123#discussion_r1150050461


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java:
##
@@ -3101,6 +3100,22 @@ public boolean dropPartition(String catName, String 
dbName, String tableName,
 return success;
   }
 
+  @Override
+  public boolean dropPartition(String catName, String dbName, String 
tableName, String partName)
+  throws MetaException, NoSuchObjectException, InvalidObjectException, 
InvalidInputException {
+boolean success = false;
+try {
+  openTransaction();
+  dropPartitionsInternal(catName, dbName, tableName, 
Arrays.asList(partName), true, true);

Review Comment:
   cc @VenuReddy2103 





Issue Time Tracking
---

Worklog Id: (was: 853290)
Time Spent: 1.5h  (was: 1h 20m)

> Drop single partition can also support direct sql
> -
>
> Key: HIVE-27150
> URL: https://issues.apache.org/jira/browse/HIVE-27150
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> *Background:*
> [HIVE-6980|https://issues.apache.org/jira/browse/HIVE-6980] supports direct 
> sql for drop_partitions, we can reuse this huge improvement in drop_partition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27180) Remove JsonSerde from hcatalog, Upgrade should update changed FQN for JsonSerDe in HMS DB

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27180?focusedWorklogId=853288=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853288
 ]

ASF GitHub Bot logged work on HIVE-27180:
-

Author: ASF GitHub Bot
Created on: 28/Mar/23 04:47
Start Date: 28/Mar/23 04:47
Worklog Time Spent: 10m 
  Work Description: rtrivedi12 commented on code in PR #4159:
URL: https://github.com/apache/hive/pull/4159#discussion_r1150021419


##
standalone-metastore/metastore-server/src/main/sql/oracle/upgrade-3.2.0-to-4.0.0-alpha-1.oracle.sql:
##
@@ -149,6 +149,9 @@ CREATE INDEX CTLG_NAME_DBS ON DBS(CTLG_NAME);
 

Issue Time Tracking
---

Worklog Id: (was: 853288)
Time Spent: 1h  (was: 50m)

> Remove JsonSerde from hcatalog, Upgrade should update changed FQN for 
> JsonSerDe in HMS DB 
> --
>
> Key: HIVE-27180
> URL: https://issues.apache.org/jira/browse/HIVE-27180
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Riju Trivedi
>Assignee: Riju Trivedi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> As Hcatalog JsonSerDe uses the "serde2" version as a back end, Remove 
> o{*}rg.apache.hive.hcatalog.data.JsonSerDe{*} from hive-hcatalog. Fix tests 
> to use the new Serde class org.apache.hadoop.hive.serde2.JsonSerDe. Hive 
> Upgrade schema script can update the SERDES table to alter the class name to 
> the new class name, the old tables would work automatically.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27180) Remove JsonSerde from hcatalog, Upgrade should update changed FQN for JsonSerDe in HMS DB

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27180?focusedWorklogId=853287=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853287
 ]

ASF GitHub Bot logged work on HIVE-27180:
-

Author: ASF GitHub Bot
Created on: 28/Mar/23 04:47
Start Date: 28/Mar/23 04:47
Worklog Time Spent: 10m 
  Work Description: rtrivedi12 commented on code in PR #4159:
URL: https://github.com/apache/hive/pull/4159#discussion_r1150021322


##
standalone-metastore/metastore-server/src/main/sql/mssql/upgrade-3.2.0-to-4.0.0-alpha-1.mssql.sql:
##
@@ -176,6 +176,9 @@ ALTER TABLE COMPACTION_QUEUE ADD CQ_COMMIT_TIME bigint NULL;
 

Issue Time Tracking
---

Worklog Id: (was: 853287)
Time Spent: 50m  (was: 40m)

> Remove JsonSerde from hcatalog, Upgrade should update changed FQN for 
> JsonSerDe in HMS DB 
> --
>
> Key: HIVE-27180
> URL: https://issues.apache.org/jira/browse/HIVE-27180
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Riju Trivedi
>Assignee: Riju Trivedi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> As Hcatalog JsonSerDe uses the "serde2" version as a back end, Remove 
> o{*}rg.apache.hive.hcatalog.data.JsonSerDe{*} from hive-hcatalog. Fix tests 
> to use the new Serde class org.apache.hadoop.hive.serde2.JsonSerDe. Hive 
> Upgrade schema script can update the SERDES table to alter the class name to 
> the new class name, the old tables would work automatically.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27180) Remove JsonSerde from hcatalog, Upgrade should update changed FQN for JsonSerDe in HMS DB

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27180?focusedWorklogId=853286=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853286
 ]

ASF GitHub Bot logged work on HIVE-27180:
-

Author: ASF GitHub Bot
Created on: 28/Mar/23 04:46
Start Date: 28/Mar/23 04:46
Worklog Time Spent: 10m 
  Work Description: rtrivedi12 commented on code in PR #4159:
URL: https://github.com/apache/hive/pull/4159#discussion_r1150021190


##
standalone-metastore/metastore-server/src/main/sql/derby/upgrade-3.2.0-to-4.0.0-alpha-1.derby.sql:
##
@@ -152,6 +152,9 @@ ALTER TABLE COMPACTION_QUEUE ADD CQ_COMMIT_TIME bigint;
 

Issue Time Tracking
---

Worklog Id: (was: 853286)
Time Spent: 40m  (was: 0.5h)

> Remove JsonSerde from hcatalog, Upgrade should update changed FQN for 
> JsonSerDe in HMS DB 
> --
>
> Key: HIVE-27180
> URL: https://issues.apache.org/jira/browse/HIVE-27180
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Riju Trivedi
>Assignee: Riju Trivedi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> As Hcatalog JsonSerDe uses the "serde2" version as a back end, Remove 
> o{*}rg.apache.hive.hcatalog.data.JsonSerDe{*} from hive-hcatalog. Fix tests 
> to use the new Serde class org.apache.hadoop.hive.serde2.JsonSerDe. Hive 
> Upgrade schema script can update the SERDES table to alter the class name to 
> the new class name, the old tables would work automatically.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27179) HS2 WebUI throws NPE when JspFactory loaded from jetty-runner

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27179?focusedWorklogId=853285=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853285
 ]

ASF GitHub Bot logged work on HIVE-27179:
-

Author: ASF GitHub Bot
Created on: 28/Mar/23 04:13
Start Date: 28/Mar/23 04:13
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4164:
URL: https://github.com/apache/hive/pull/4164#issuecomment-1486190580

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4164)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4164=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4164=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4164=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4164=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4164=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4164=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4164=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4164=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4164=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4164=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4164=false=CODE_SMELL)
 [0 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4164=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4164=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4164=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 853285)
Time Spent: 20m  (was: 10m)

> HS2 WebUI throws NPE when JspFactory loaded from jetty-runner
> -
>
> Key: HIVE-27179
> URL: https://issues.apache.org/jira/browse/HIVE-27179
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In HIVE-17088{*},{*} we resolved a NPE thrown from HS2 WebUI by introducing 
> javax.servlet.jsp-api. It works as expected when the javax.servlet.jsp-api 
> jar prevails jetty-runner jar, but things can be different in some 
> environments, it still throws NPE when opening the HS2 web:
> {noformat}
> java.lang.NullPointerException 
> at 
> org.apache.hive.generated.hiveserver2.hiveserver2_jsp._jspService(hiveserver2_jsp.java:286)
>  
> at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:71) 
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) 
> at 
> org.eclipse.jetty.servlet.ServletHolder$NotAsync.service(ServletHolder.java:1443)
>  
> at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:791) 
> at 
> org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1626)
> ...{noformat}
> The jetty-runner JspFactory.getDefaultFactory() just returns null.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27179) HS2 WebUI throws NPE when JspFactory loaded from jetty-runner

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27179:
--
Labels: pull-request-available  (was: )

> HS2 WebUI throws NPE when JspFactory loaded from jetty-runner
> -
>
> Key: HIVE-27179
> URL: https://issues.apache.org/jira/browse/HIVE-27179
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In HIVE-17088{*},{*} we resolved a NPE thrown from HS2 WebUI by introducing 
> javax.servlet.jsp-api. It works as expected when the javax.servlet.jsp-api 
> jar prevails jetty-runner jar, but things can be different in some 
> environments, it still throws NPE when opening the HS2 web:
> {noformat}
> java.lang.NullPointerException 
> at 
> org.apache.hive.generated.hiveserver2.hiveserver2_jsp._jspService(hiveserver2_jsp.java:286)
>  
> at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:71) 
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) 
> at 
> org.eclipse.jetty.servlet.ServletHolder$NotAsync.service(ServletHolder.java:1443)
>  
> at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:791) 
> at 
> org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1626)
> ...{noformat}
> The jetty-runner JspFactory.getDefaultFactory() just returns null.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27179) HS2 WebUI throws NPE when JspFactory loaded from jetty-runner

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27179?focusedWorklogId=853283=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853283
 ]

ASF GitHub Bot logged work on HIVE-27179:
-

Author: ASF GitHub Bot
Created on: 28/Mar/23 03:21
Start Date: 28/Mar/23 03:21
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 opened a new pull request, #4164:
URL: https://github.com/apache/hive/pull/4164

   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   When the jetty-runner jar defeats the javax.servlet.jsp-api jar for loading 
JspFactory, a NPE will be here when opening the HS2's home page.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   
   ### How was this patch tested?
   Apply and compile the changes, place the jetty-runner jar ahead of 
CLASSPATH, restart the affected HS2, the NPE is gone.
   
   




Issue Time Tracking
---

Worklog Id: (was: 853283)
Remaining Estimate: 0h
Time Spent: 10m

> HS2 WebUI throws NPE when JspFactory loaded from jetty-runner
> -
>
> Key: HIVE-27179
> URL: https://issues.apache.org/jira/browse/HIVE-27179
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In HIVE-17088{*},{*} we resolved a NPE thrown from HS2 WebUI by introducing 
> javax.servlet.jsp-api. It works as expected when the javax.servlet.jsp-api 
> jar prevails jetty-runner jar, but things can be different in some 
> environments, it still throws NPE when opening the HS2 web:
> {noformat}
> java.lang.NullPointerException 
> at 
> org.apache.hive.generated.hiveserver2.hiveserver2_jsp._jspService(hiveserver2_jsp.java:286)
>  
> at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:71) 
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) 
> at 
> org.eclipse.jetty.servlet.ServletHolder$NotAsync.service(ServletHolder.java:1443)
>  
> at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:791) 
> at 
> org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1626)
> ...{noformat}
> The jetty-runner JspFactory.getDefaultFactory() just returns null.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27180) Remove JsonSerde from hcatalog, Upgrade should update changed FQN for JsonSerDe in HMS DB

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27180?focusedWorklogId=853281=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853281
 ]

ASF GitHub Bot logged work on HIVE-27180:
-

Author: ASF GitHub Bot
Created on: 28/Mar/23 03:10
Start Date: 28/Mar/23 03:10
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on code in PR #4159:
URL: https://github.com/apache/hive/pull/4159#discussion_r1149977911


##
standalone-metastore/metastore-server/src/main/sql/mssql/upgrade-3.2.0-to-4.0.0-alpha-1.mssql.sql:
##
@@ -176,6 +176,9 @@ ALTER TABLE COMPACTION_QUEUE ADD CQ_COMMIT_TIME bigint NULL;
 

Issue Time Tracking
---

Worklog Id: (was: 853281)
Time Spent: 0.5h  (was: 20m)

> Remove JsonSerde from hcatalog, Upgrade should update changed FQN for 
> JsonSerDe in HMS DB 
> --
>
> Key: HIVE-27180
> URL: https://issues.apache.org/jira/browse/HIVE-27180
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Riju Trivedi
>Assignee: Riju Trivedi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> As Hcatalog JsonSerDe uses the "serde2" version as a back end, Remove 
> o{*}rg.apache.hive.hcatalog.data.JsonSerDe{*} from hive-hcatalog. Fix tests 
> to use the new Serde class org.apache.hadoop.hive.serde2.JsonSerDe. Hive 
> Upgrade schema script can update the SERDES table to alter the class name to 
> the new class name, the old tables would work automatically.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27183) Iceberg: Table information is loaded multiple times

2023-03-27 Thread Rajesh Balamohan (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-27183:

Description: 
HMS::getTable invokes "HiveIcebergMetaHook::postGetTable" which internally 
loads iceberg table again.

If this isn't needed or needed only for show-create-table, do not load the 
table again.

 

Note: It looks like it invokes loadTable around 6 times during entire planning 
(semAnalyzer, stats etc). Attached the snapshot for reference.

 
{noformat}
    at jdk.internal.misc.Unsafe.park(java.base@11.0.18/Native Method)
    - parking to wait for  <0x00066f84eef0> (a 
java.util.concurrent.CompletableFuture$Signaller)
    at 
java.util.concurrent.locks.LockSupport.park(java.base@11.0.18/LockSupport.java:194)
    at 
java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.18/CompletableFuture.java:1796)
    at 
java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.18/ForkJoinPool.java:3128)
    at 
java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.18/CompletableFuture.java:1823)
    at 
java.util.concurrent.CompletableFuture.get(java.base@11.0.18/CompletableFuture.java:1998)
    at org.apache.hadoop.util.functional.FutureIO.awaitFuture(FutureIO.java:77)
    at 
org.apache.iceberg.hadoop.HadoopInputFile.newStream(HadoopInputFile.java:196)
    at org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:263)
    at org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:258)
    at 
org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$0(BaseMetastoreTableOperations.java:177)
    at 
org.apache.iceberg.BaseMetastoreTableOperations$$Lambda$609/0x000840e18040.apply(Unknown
 Source)
    at 
org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$1(BaseMetastoreTableOperations.java:191)
    at 
org.apache.iceberg.BaseMetastoreTableOperations$$Lambda$610/0x000840e18440.run(Unknown
 Source)
    at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:404)
    at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:214)
    at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:198)
    at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:190)
    at 
org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:191)
    at 
org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:176)
    at 
org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:171)
    at 
org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:153)
    at 
org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:96)
    at 
org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:79)
    at 
org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:44)
    at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:115)
    at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:105)
    at 
org.apache.iceberg.mr.hive.IcebergTableUtil.lambda$getTable$1(IcebergTableUtil.java:99)
    at 
org.apache.iceberg.mr.hive.IcebergTableUtil$$Lambda$552/0x000840d59840.apply(Unknown
 Source)
    at 
org.apache.iceberg.mr.hive.IcebergTableUtil.lambda$getTable$4(IcebergTableUtil.java:111)
    at 
org.apache.iceberg.mr.hive.IcebergTableUtil$$Lambda$557/0x000840d58c40.get(Unknown
 Source)
    at java.util.Optional.orElseGet(java.base@11.0.18/Optional.java:369)
    at 
org.apache.iceberg.mr.hive.IcebergTableUtil.getTable(IcebergTableUtil.java:108)
    at 
org.apache.iceberg.mr.hive.IcebergTableUtil.getTable(IcebergTableUtil.java:69)
    at 
org.apache.iceberg.mr.hive.IcebergTableUtil.getTable(IcebergTableUtil.java:73)
    at 
org.apache.iceberg.mr.hive.HiveIcebergMetaHook.postGetTable(HiveIcebergMetaHook.java:931)
    at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.executePostGetTableHook(HiveMetaStoreClient.java:2638)
    at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:2624)
    at 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getTable(SessionHiveMetaStoreClient.java:267)
    at jdk.internal.reflect.GeneratedMethodAccessor137.invoke(Unknown Source)
    at 
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.18/DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(java.base@11.0.18/Method.java:566)
    at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:216)
    at com.sun.proxy.$Proxy56.getTable(Unknown Source)
    at jdk.internal.reflect.GeneratedMethodAccessor137.invoke(Unknown Source)
    at

[jira] [Updated] (HIVE-27183) Iceberg: Table information is loaded multiple times

2023-03-27 Thread Rajesh Balamohan (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-27183:

Attachment: Screenshot 2023-03-28 at 8.13.52 AM.png

> Iceberg: Table information is loaded multiple times
> ---
>
> Key: HIVE-27183
> URL: https://issues.apache.org/jira/browse/HIVE-27183
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: performance
> Attachments: Screenshot 2023-03-28 at 8.13.52 AM.png, 
> hs2_iceberg_load.html
>
>
> HMS::getTable invokes "HiveIcebergMetaHook::postGetTable" which internally 
> loads iceberg table again.
> If this isn't needed or needed only for show-create-table, do not load the 
> table again.
>  
> Note: It looks like it invokes loadTable around 6 times during entire 
> planning (semAnalyzer, stats etc). Attached the snapshot for reference.
>  
> {noformat}
>     at jdk.internal.misc.Unsafe.park(java.base@11.0.18/Native Method)
>     - parking to wait for  <0x00066f84eef0> (a 
> java.util.concurrent.CompletableFuture$Signaller)
>     at 
> java.util.concurrent.locks.LockSupport.park(java.base@11.0.18/LockSupport.java:194)
>     at 
> java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.18/CompletableFuture.java:1796)
>     at 
> java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.18/ForkJoinPool.java:3128)
>     at 
> java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.18/CompletableFuture.java:1823)
>     at 
> java.util.concurrent.CompletableFuture.get(java.base@11.0.18/CompletableFuture.java:1998)
>     at 
> org.apache.hadoop.util.functional.FutureIO.awaitFuture(FutureIO.java:77)
>     at 
> org.apache.iceberg.hadoop.HadoopInputFile.newStream(HadoopInputFile.java:196)
>     at 
> org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:263)
>     at 
> org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:258)
>     at 
> org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$0(BaseMetastoreTableOperations.java:177)
>     at 
> org.apache.iceberg.BaseMetastoreTableOperations$$Lambda$609/0x000840e18040.apply(Unknown
>  Source)
>     at 
> org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$1(BaseMetastoreTableOperations.java:191)
>     at 
> org.apache.iceberg.BaseMetastoreTableOperations$$Lambda$610/0x000840e18440.run(Unknown
>  Source)
>     at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:404)
>     at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:214)
>     at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:198)
>     at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:190)
>     at 
> org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:191)
>     at 
> org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:176)
>     at 
> org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:171)
>     at 
> org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:153)
>     at 
> org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:96)
>     at 
> org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:79)
>     at 
> org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:44)
>     at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:115)
>     at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:105)
>     at 
> org.apache.iceberg.mr.hive.IcebergTableUtil.lambda$getTable$1(IcebergTableUtil.java:99)
>     at 
> org.apache.iceberg.mr.hive.IcebergTableUtil$$Lambda$552/0x000840d59840.apply(Unknown
>  Source)
>     at 
> org.apache.iceberg.mr.hive.IcebergTableUtil.lambda$getTable$4(IcebergTableUtil.java:111)
>     at 
> org.apache.iceberg.mr.hive.IcebergTableUtil$$Lambda$557/0x000840d58c40.get(Unknown
>  Source)
>     at java.util.Optional.orElseGet(java.base@11.0.18/Optional.java:369)
>     at 
> org.apache.iceberg.mr.hive.IcebergTableUtil.getTable(IcebergTableUtil.java:108)
>     at 
> org.apache.iceberg.mr.hive.IcebergTableUtil.getTable(IcebergTableUtil.java:69)
>     at 
> org.apache.iceberg.mr.hive.IcebergTableUtil.getTable(IcebergTableUtil.java:73)
>     at 
> org.apache.iceberg.mr.hive.HiveIcebergMetaHook.postGetTable(HiveIcebergMetaHook.java:931)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.executePostGetTableHook(HiveMetaStoreClient.java:2638)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:2624)
>     at 
>

[jira] [Updated] (HIVE-27183) Iceberg: Table information is loaded multiple times

2023-03-27 Thread Rajesh Balamohan (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-27183:

Attachment: hs2_iceberg_load.html

> Iceberg: Table information is loaded multiple times
> ---
>
> Key: HIVE-27183
> URL: https://issues.apache.org/jira/browse/HIVE-27183
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: performance
> Attachments: hs2_iceberg_load.html
>
>
> HMS::getTable invokes "HiveIcebergMetaHook::postGetTable" which internally 
> loads iceberg table again.
> If this isn't needed or needed only for show-create-table, do not load the 
> table again.
> {noformat}
>     at jdk.internal.misc.Unsafe.park(java.base@11.0.18/Native Method)
>     - parking to wait for  <0x00066f84eef0> (a 
> java.util.concurrent.CompletableFuture$Signaller)
>     at 
> java.util.concurrent.locks.LockSupport.park(java.base@11.0.18/LockSupport.java:194)
>     at 
> java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.18/CompletableFuture.java:1796)
>     at 
> java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.18/ForkJoinPool.java:3128)
>     at 
> java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.18/CompletableFuture.java:1823)
>     at 
> java.util.concurrent.CompletableFuture.get(java.base@11.0.18/CompletableFuture.java:1998)
>     at 
> org.apache.hadoop.util.functional.FutureIO.awaitFuture(FutureIO.java:77)
>     at 
> org.apache.iceberg.hadoop.HadoopInputFile.newStream(HadoopInputFile.java:196)
>     at 
> org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:263)
>     at 
> org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:258)
>     at 
> org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$0(BaseMetastoreTableOperations.java:177)
>     at 
> org.apache.iceberg.BaseMetastoreTableOperations$$Lambda$609/0x000840e18040.apply(Unknown
>  Source)
>     at 
> org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$1(BaseMetastoreTableOperations.java:191)
>     at 
> org.apache.iceberg.BaseMetastoreTableOperations$$Lambda$610/0x000840e18440.run(Unknown
>  Source)
>     at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:404)
>     at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:214)
>     at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:198)
>     at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:190)
>     at 
> org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:191)
>     at 
> org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:176)
>     at 
> org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:171)
>     at 
> org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:153)
>     at 
> org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:96)
>     at 
> org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:79)
>     at 
> org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:44)
>     at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:115)
>     at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:105)
>     at 
> org.apache.iceberg.mr.hive.IcebergTableUtil.lambda$getTable$1(IcebergTableUtil.java:99)
>     at 
> org.apache.iceberg.mr.hive.IcebergTableUtil$$Lambda$552/0x000840d59840.apply(Unknown
>  Source)
>     at 
> org.apache.iceberg.mr.hive.IcebergTableUtil.lambda$getTable$4(IcebergTableUtil.java:111)
>     at 
> org.apache.iceberg.mr.hive.IcebergTableUtil$$Lambda$557/0x000840d58c40.get(Unknown
>  Source)
>     at java.util.Optional.orElseGet(java.base@11.0.18/Optional.java:369)
>     at 
> org.apache.iceberg.mr.hive.IcebergTableUtil.getTable(IcebergTableUtil.java:108)
>     at 
> org.apache.iceberg.mr.hive.IcebergTableUtil.getTable(IcebergTableUtil.java:69)
>     at 
> org.apache.iceberg.mr.hive.IcebergTableUtil.getTable(IcebergTableUtil.java:73)
>     at 
> org.apache.iceberg.mr.hive.HiveIcebergMetaHook.postGetTable(HiveIcebergMetaHook.java:931)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.executePostGetTableHook(HiveMetaStoreClient.java:2638)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:2624)
>     at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getTable(SessionHiveMetaStoreClient.java:267)
>     at jdk.internal.reflect.GeneratedMethodAccessor137.invoke(Unknown Source)
>     at 
>

[jira] [Work logged] (HIVE-26956) Improv find_in_set function

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26956?focusedWorklogId=853276=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853276
 ]

ASF GitHub Bot logged work on HIVE-26956:
-

Author: ASF GitHub Bot
Created on: 28/Mar/23 00:20
Start Date: 28/Mar/23 00:20
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #3961: 
HIVE-26956: Improve find_in_set function
URL: https://github.com/apache/hive/pull/3961




Issue Time Tracking
---

Worklog Id: (was: 853276)
Time Spent: 1h 10m  (was: 1h)

> Improv find_in_set function
> ---
>
> Key: HIVE-26956
> URL: https://issues.apache.org/jira/browse/HIVE-26956
> Project: Hive
>  Issue Type: Improvement
>Reporter: Bingye Chen
>Assignee: Bingye Chen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Improv find_in_set function



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26997) Iceberg: Vectorization gets disabled at runtime in merge-into statements

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26997?focusedWorklogId=853274=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853274
 ]

ASF GitHub Bot logged work on HIVE-26997:
-

Author: ASF GitHub Bot
Created on: 28/Mar/23 00:14
Start Date: 28/Mar/23 00:14
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4162:
URL: https://github.com/apache/hive/pull/4162#issuecomment-1486029917

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4162)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4162=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4162=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4162=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4162=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4162=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4162=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4162=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4162=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4162=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4162=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4162=false=CODE_SMELL)
 [0 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4162=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4162=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4162=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 853274)
Time Spent: 1h  (was: 50m)

> Iceberg: Vectorization gets disabled at runtime in merge-into statements
> 
>
> Key: HIVE-26997
> URL: https://issues.apache.org/jira/browse/HIVE-26997
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: Rajesh Balamohan
>Assignee: Zsolt Miskolczi
>Priority: Major
>  Labels: pull-request-available
> Attachments: explain_merge_into.txt
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> *Query:*
> Think of "ssv" table as a table containing trickle feed data in the following 
> query. "store_sales_delete_1" is the destination table.
>  
> {noformat}
> MERGE INTO tpcds_1000_iceberg_mor_v4.store_sales_delete_1 t USING 
> tpcds_1000_update.ssv s ON (t.ss_item_sk = s.ss_item_sk
>                                                                               
>                 AND t.ss_customer_sk=s.ss_customer_sk
>                                                                               
>                 AND t.ss_sold_date_sk = "2451181"
>                                                                               
>                 AND ((Floor((s.ss_item_sk) / 1000) * 1000) BETWEEN 1000 AND 
> 2000)
>                                                                               
>                 AND s.ss_ext_discount_amt < 0.0) WHEN

[jira] [Work logged] (HIVE-26905) Backport HIVE-25173 to 3.2.0: Exclude pentaho-aggdesigner-algorithm from upgrade-acid build.

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26905?focusedWorklogId=853273=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853273
 ]

ASF GitHub Bot logged work on HIVE-26905:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 23:56
Start Date: 27/Mar/23 23:56
Worklog Time Spent: 10m 
  Work Description: cnauroth commented on PR #4163:
URL: https://github.com/apache/hive/pull/4163#issuecomment-1486012078

   Hello @zabetak . You previously approved this change in #3911 :
   
   https://github.com/apache/hive/pull/3911#pullrequestreview-1237668110
   
   However, I just realized it was not actually merged. Could you please take 
another look? Thank you.




Issue Time Tracking
---

Worklog Id: (was: 853273)
Time Spent: 1h  (was: 50m)

> Backport HIVE-25173 to 3.2.0: Exclude pentaho-aggdesigner-algorithm from 
> upgrade-acid build.
> 
>
> Key: HIVE-26905
> URL: https://issues.apache.org/jira/browse/HIVE-26905
> Project: Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>  Labels: hive-3.2.0-must, pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In the current branch-3, upgrade-acid has a dependency on an old hive-exec 
> version that has a transitive dependency to 
> org.pentaho:pentaho-aggdesigner-algorithm. This artifact is no longer 
> available in commonly supported Maven repositories, which causes a build 
> failure. We can safely exclude the dependency, as was originally done in 
> HIVE-25173.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26905) Backport HIVE-25173 to 3.2.0: Exclude pentaho-aggdesigner-algorithm from upgrade-acid build.

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26905?focusedWorklogId=853272=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853272
 ]

ASF GitHub Bot logged work on HIVE-26905:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 23:54
Start Date: 27/Mar/23 23:54
Worklog Time Spent: 10m 
  Work Description: cnauroth opened a new pull request, #4163:
URL: https://github.com/apache/hive/pull/4163

   ### What changes were proposed in this pull request?
   
   Exclude pentaho-aggdesigner-algorithm from upgrade-acid build.
   
   ### Why are the changes needed?
   
   In the current branch-3, upgrade-acid has a dependency on an old hive-exec 
version that has a transitive dependency to 
org.pentaho:pentaho-aggdesigner-algorithm. This artifact is no longer available 
in commonly supported Maven repositories, which causes a build failure. We can 
safely exclude the dependency, as was originally done in 
[HIVE-25173](https://issues.apache.org/jira/browse/HIVE-25173).
   
   Differences from the master patch branch are:
   1. On master, this applied to the pre-upgrade sub-module. This sub-module 
doesn't exist in branch-3, so the patch was rebased to the parent upgrade-acid 
module.
   2. Additionally, the pom.xml code had changed quite a bit on master. This is 
just applying the equivalent exclusion from the HIVE-25173 diff: 
a1d4c8a6b3cf8465ac1ae074748a8f5a04bb473f.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   I can run a full local build from branch-3 after applying this patch.
   
   ```
   mvn -B -T 8 clean install -Pitests -DskipTests
   ```
   
   Prior to this patch, my build failed while trying to download the 
org.pentaho:pentaho-aggdesigner-algorithm artifact.




Issue Time Tracking
---

Worklog Id: (was: 853272)
Time Spent: 50m  (was: 40m)

> Backport HIVE-25173 to 3.2.0: Exclude pentaho-aggdesigner-algorithm from 
> upgrade-acid build.
> 
>
> Key: HIVE-26905
> URL: https://issues.apache.org/jira/browse/HIVE-26905
> Project: Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>  Labels: hive-3.2.0-must, pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In the current branch-3, upgrade-acid has a dependency on an old hive-exec 
> version that has a transitive dependency to 
> org.pentaho:pentaho-aggdesigner-algorithm. This artifact is no longer 
> available in commonly supported Maven repositories, which causes a build 
> failure. We can safely exclude the dependency, as was originally done in 
> HIVE-25173.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-22383) `alterPartitions` is invoked twice during dynamic partition load causing runtime delay

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22383?focusedWorklogId=853232=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853232
 ]

ASF GitHub Bot logged work on HIVE-22383:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 19:07
Start Date: 27/Mar/23 19:07
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4161:
URL: https://github.com/apache/hive/pull/4161#issuecomment-1485719817

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4161)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4161=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4161=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4161=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4161=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4161=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4161=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4161=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4161=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4161=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4161=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4161=false=CODE_SMELL)
 [3 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4161=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4161=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4161=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 853232)
Time Spent: 20m  (was: 10m)

> `alterPartitions` is invoked twice during dynamic partition load causing 
> runtime delay
> --
>
> Key: HIVE-22383
> URL: https://issues.apache.org/jira/browse/HIVE-22383
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: performance, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> First invocation in {{Hive::loadDynamicPartitions}}
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2978
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2638
> Second invocation in {{BasicStatsTask::aggregateStats}}
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsTask.java#L335
> This leads to good amount of delay in dynamic partition loading.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26997) Iceberg: Vectorization gets disabled at runtime in merge-into statements

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26997?focusedWorklogId=853229=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853229
 ]

ASF GitHub Bot logged work on HIVE-26997:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 18:35
Start Date: 27/Mar/23 18:35
Worklog Time Spent: 10m 
  Work Description: deniskuzZ opened a new pull request, #4162:
URL: https://github.com/apache/hive/pull/4162

   
   
   
   ### What changes were proposed in this pull request?
   
   Fixed non-vectorization cause
   
   ### Why are the changes needed?
   
   Performance
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   




Issue Time Tracking
---

Worklog Id: (was: 853229)
Time Spent: 50m  (was: 40m)

> Iceberg: Vectorization gets disabled at runtime in merge-into statements
> 
>
> Key: HIVE-26997
> URL: https://issues.apache.org/jira/browse/HIVE-26997
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: Rajesh Balamohan
>Assignee: Zsolt Miskolczi
>Priority: Major
>  Labels: pull-request-available
> Attachments: explain_merge_into.txt
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> *Query:*
> Think of "ssv" table as a table containing trickle feed data in the following 
> query. "store_sales_delete_1" is the destination table.
>  
> {noformat}
> MERGE INTO tpcds_1000_iceberg_mor_v4.store_sales_delete_1 t USING 
> tpcds_1000_update.ssv s ON (t.ss_item_sk = s.ss_item_sk
>                                                                               
>                 AND t.ss_customer_sk=s.ss_customer_sk
>                                                                               
>                 AND t.ss_sold_date_sk = "2451181"
>                                                                               
>                 AND ((Floor((s.ss_item_sk) / 1000) * 1000) BETWEEN 1000 AND 
> 2000)
>                                                                               
>                 AND s.ss_ext_discount_amt < 0.0) WHEN matched
> AND t.ss_ext_discount_amt IS NULL THEN
> UPDATE
> SET ss_ext_discount_amt = 0.0 WHEN NOT matched THEN
> INSERT (ss_sold_time_sk,
>         ss_item_sk,
>         ss_customer_sk,
>         ss_cdemo_sk,
>         ss_hdemo_sk,
>         ss_addr_sk,
>         ss_store_sk,
>         ss_promo_sk,
>         ss_ticket_number,
>         ss_quantity,
>         ss_wholesale_cost,
>         ss_list_price,
>         ss_sales_price,
>         ss_ext_discount_amt,
>         ss_ext_sales_price,
>         ss_ext_wholesale_cost,
>         ss_ext_list_price,
>         ss_ext_tax,
>         ss_coupon_amt,
>         ss_net_paid,
>         ss_net_paid_inc_tax,
>         ss_net_profit,
>         ss_sold_date_sk)
> VALUES (s.ss_sold_time_sk,
>         s.ss_item_sk,
>         s.ss_customer_sk,
>         s.ss_cdemo_sk,
>         s.ss_hdemo_sk,
>         s.ss_addr_sk,
>         s.ss_store_sk,
>         s.ss_promo_sk,
>         s.ss_ticket_number,
>         s.ss_quantity,
>         s.ss_wholesale_cost,
>         s.ss_list_price,
>         s.ss_sales_price,
>         s.ss_ext_discount_amt,
>         s.ss_ext_sales_price,
>         s.ss_ext_wholesale_cost,
>         s.ss_ext_list_price,
>         s.ss_ext_tax,
>         s.ss_coupon_amt,
>         s.ss_net_paid,
>         s.ss_net_paid_inc_tax,
>         s.ss_net_profit,
>         "2451181")
>  {noformat}
>  
>  
> *Issue:*
>  # Map phase is not getting vectorized due to "PARTITION_{_}SPEC{_}_ID" column
> {noformat}
> Map notVectorizedReason: Select expression for SELECT operator: Virtual 
> column PARTITION__SPEC__ID is not supported {noformat}
>  
> 2. "Reducer 2" stage isn't vectorized. 
> {noformat}
> Reduce notVectorizedReason: exception: java.lang.RuntimeException: Full Outer 
> Small Table Key Mapping duplicate column 0 in ordered column map {0=(value 
> column: 30, type info: int), 1=(value column: 31, type info: int)} when 
> adding value column 53, type into int stack trace: 
> org.apache.hadoop.hive.ql.exec.vector.VectorColumnOrderedMap.add(VectorColumnOrderedMap.java:102),
>  
> org.apache.hadoop.hive.ql.exec.vector.VectorColumnSourceMapping.add(VectorColumnSourceMapping.java:41),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.canSpecializeMapJoin(Vectorizer.java:3865),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperator(Vectorizer.java:5246),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChild(Vectorizer.java:988),
>  
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChildren(Vectorizer.java:874),
>  
>

[jira] [Work logged] (HIVE-22383) `alterPartitions` is invoked twice during dynamic partition load causing runtime delay

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22383?focusedWorklogId=853218=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853218
 ]

ASF GitHub Bot logged work on HIVE-22383:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 17:02
Start Date: 27/Mar/23 17:02
Worklog Time Spent: 10m 
  Work Description: difin opened a new pull request, #4161:
URL: https://github.com/apache/hive/pull/4161

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   




Issue Time Tracking
---

Worklog Id: (was: 853218)
Remaining Estimate: 0h
Time Spent: 10m

> `alterPartitions` is invoked twice during dynamic partition load causing 
> runtime delay
> --
>
> Key: HIVE-22383
> URL: https://issues.apache.org/jira/browse/HIVE-22383
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: performance
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> First invocation in {{Hive::loadDynamicPartitions}}
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2978
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2638
> Second invocation in {{BasicStatsTask::aggregateStats}}
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsTask.java#L335
> This leads to good amount of delay in dynamic partition loading.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-22383) `alterPartitions` is invoked twice during dynamic partition load causing runtime delay

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-22383:
--
Labels: performance pull-request-available  (was: performance)

> `alterPartitions` is invoked twice during dynamic partition load causing 
> runtime delay
> --
>
> Key: HIVE-22383
> URL: https://issues.apache.org/jira/browse/HIVE-22383
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: performance, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> First invocation in {{Hive::loadDynamicPartitions}}
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2978
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2638
> Second invocation in {{BasicStatsTask::aggregateStats}}
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsTask.java#L335
> This leads to good amount of delay in dynamic partition loading.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27135) AcidUtils#getHdfsDirSnapshots() throws FNFE when a directory is removed in HDFS

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27135?focusedWorklogId=853214=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853214
 ]

ASF GitHub Bot logged work on HIVE-27135:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 16:52
Start Date: 27/Mar/23 16:52
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4114:
URL: https://github.com/apache/hive/pull/4114#issuecomment-1485490382

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4114)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4114=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4114=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4114=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=CODE_SMELL)
 [0 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4114=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4114=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 853214)
Time Spent: 5h 10m  (was: 5h)

> AcidUtils#getHdfsDirSnapshots() throws FNFE when a directory is removed in 
> HDFS
> ---
>
> Key: HIVE-27135
> URL: https://issues.apache.org/jira/browse/HIVE-27135
> Project: Hive
>  Issue Type: Bug
>Reporter: Dayakar M
>Assignee: Dayakar M
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> AcidUtils#getHdfsDirSnapshots() throws FileNotFoundException when a directory 
> is removed in HDFS while fetching HDFS Snapshots.
> Below testcode can be used to reproduce this issue.
> {code:java}
>  @Test
>   public void 
> testShouldNotThrowFNFEWhenHiveStagingDirectoryIsRemovedWhileFetchingHDFSSnapshots()
>  throws Exception {
> MockFileSystem fs = new MockFileSystem(new HiveConf(),
> new MockFile("mock:/tbl/part1/.hive-staging_dir/-ext-10002", 500, new 
> byte[0]),
> new MockFile("mock:/tbl/part2/.hive-staging_dir", 500, new byte[0]),
> new MockFile("mock:/tbl/part1/_tmp_space.db", 500, new byte[0]),
> new MockFile("mock:/tbl/part1/delta_1_1/bucket--", 500, new 
> byte[0]));
> Path path = new MockPath(fs, "/tbl");
> Path stageDir = new MockPath(fs, "mock:/tbl/part1/.hive-staging_dir");
> FileSystem mockFs = spy(fs);
> Mockito.doThrow(new 
>

[jira] [Work logged] (HIVE-27180) Remove JsonSerde from hcatalog, Upgrade should update changed FQN for JsonSerDe in HMS DB

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27180?focusedWorklogId=853211=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853211
 ]

ASF GitHub Bot logged work on HIVE-27180:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 16:50
Start Date: 27/Mar/23 16:50
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4159:
URL: https://github.com/apache/hive/pull/4159#issuecomment-1485485988

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4159)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4159=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4159=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4159=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4159=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4159=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4159=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4159=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4159=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4159=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4159=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4159=false=CODE_SMELL)
 [0 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4159=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4159=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4159=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 853211)
Time Spent: 20m  (was: 10m)

> Remove JsonSerde from hcatalog, Upgrade should update changed FQN for 
> JsonSerDe in HMS DB 
> --
>
> Key: HIVE-27180
> URL: https://issues.apache.org/jira/browse/HIVE-27180
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Riju Trivedi
>Assignee: Riju Trivedi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As Hcatalog JsonSerDe uses the "serde2" version as a back end, Remove 
> o{*}rg.apache.hive.hcatalog.data.JsonSerDe{*} from hive-hcatalog. Fix tests 
> to use the new Serde class org.apache.hadoop.hive.serde2.JsonSerDe. Hive 
> Upgrade schema script can update the SERDES table to alter the class name to 
> the new class name, the old tables would work automatically.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27180) Remove JsonSerde from hcatalog, Upgrade should update changed FQN for JsonSerDe in HMS DB

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27180:
--
Labels: pull-request-available  (was: )

> Remove JsonSerde from hcatalog, Upgrade should update changed FQN for 
> JsonSerDe in HMS DB 
> --
>
> Key: HIVE-27180
> URL: https://issues.apache.org/jira/browse/HIVE-27180
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Riju Trivedi
>Assignee: Riju Trivedi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As Hcatalog JsonSerDe uses the "serde2" version as a back end, Remove 
> o{*}rg.apache.hive.hcatalog.data.JsonSerDe{*} from hive-hcatalog. Fix tests 
> to use the new Serde class org.apache.hadoop.hive.serde2.JsonSerDe. Hive 
> Upgrade schema script can update the SERDES table to alter the class name to 
> the new class name, the old tables would work automatically.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27180) Remove JsonSerde from hcatalog, Upgrade should update changed FQN for JsonSerDe in HMS DB

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27180?focusedWorklogId=853196=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853196
 ]

ASF GitHub Bot logged work on HIVE-27180:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 15:50
Start Date: 27/Mar/23 15:50
Worklog Time Spent: 10m 
  Work Description: rtrivedi12 opened a new pull request, #4159:
URL: https://github.com/apache/hive/pull/4159

   …nged FQN for JsonSerDe in HMS DB
   
   HIVE-18545 makes Hcatalog JsonSerde use the "hive.serde2" version as a back 
end, there are no feature differences between these implementations. This 
change will fix tests to use the new JsonSerDe class and remove JsonSerDe from 
hive-contrib. Hive Upgrade schema script should automatically update the hive 
table schema to rename the serde package.
   
   ### What changes were proposed in this pull request?
   1. Fixed tests to use new SerDe class 
'org.apache.hadoop.hive.serde2.JsonSerDe'
   2. Removed JsonSerDe from hive-hcatalog.
   3. Schema upgrade scripts update the SLIB column value in SERDES table to 
update it from "org.apache.hive.hcatalog.data.JsonSerDe" to 
"org.apache.hadoop.hive.serde2.JsonSerDe"
   
   
   ### Why are the changes needed?
   Removing redundant code to make JsonSerde as first class serde in hive
   Better user experience after the upgrade (Avoids CNF)
   
   
   ### Does this PR introduce _any_ user-facing change?
'No'
   
   
   ### How was this patch tested?
   mvn test -Dtest=TestMiniLlapLocalCliDriver 
-Dqfile=json_serde1.q,json_serde_qualified_types.q,json_serde_tsformat.q,parquet_mixed_partition_formats2.q,temp_table_parquet_mixed_partition_formats2.q
   




Issue Time Tracking
---

Worklog Id: (was: 853196)
Remaining Estimate: 0h
Time Spent: 10m

> Remove JsonSerde from hcatalog, Upgrade should update changed FQN for 
> JsonSerDe in HMS DB 
> --
>
> Key: HIVE-27180
> URL: https://issues.apache.org/jira/browse/HIVE-27180
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Riju Trivedi
>Assignee: Riju Trivedi
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As Hcatalog JsonSerDe uses the "serde2" version as a back end, Remove 
> o{*}rg.apache.hive.hcatalog.data.JsonSerDe{*} from hive-hcatalog. Fix tests 
> to use the new Serde class org.apache.hadoop.hive.serde2.JsonSerDe. Hive 
> Upgrade schema script can update the SERDES table to alter the class name to 
> the new class name, the old tables would work automatically.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27180) Remove JsonSerde from hcatalog, Upgrade should update changed FQN for JsonSerDe in HMS DB

2023-03-27 Thread Riju Trivedi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riju Trivedi updated HIVE-27180:

Description: As Hcatalog JsonSerDe uses the "serde2" version as a back end, 
Remove o{*}rg.apache.hive.hcatalog.data.JsonSerDe{*} from hive-hcatalog. Fix 
tests to use the new Serde class org.apache.hadoop.hive.serde2.JsonSerDe. Hive 
Upgrade schema script can update the SERDES table to alter the class name to 
the new class name, the old tables would work automatically.

> Remove JsonSerde from hcatalog, Upgrade should update changed FQN for 
> JsonSerDe in HMS DB 
> --
>
> Key: HIVE-27180
> URL: https://issues.apache.org/jira/browse/HIVE-27180
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Riju Trivedi
>Assignee: Riju Trivedi
>Priority: Major
>
> As Hcatalog JsonSerDe uses the "serde2" version as a back end, Remove 
> o{*}rg.apache.hive.hcatalog.data.JsonSerDe{*} from hive-hcatalog. Fix tests 
> to use the new Serde class org.apache.hadoop.hive.serde2.JsonSerDe. Hive 
> Upgrade schema script can update the SERDES table to alter the class name to 
> the new class name, the old tables would work automatically.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27182) tez_union_with_udf.q with TestMiniTezCliDriver is flaky

2023-03-27 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-27182:

Description: 
Looks like memory issue:

{noformat}
< Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.OutOfMemoryError: GC overhead limit exceeded
< Serialization trace:
< genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
< colExprMap (org.apache.hadoop.hive.ql.plan.SelectDesc)
< conf (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
< childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorLimitOperator)
< childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
{noformat}

Ref: 
http://ci.hive.apache.org/job/hive-precommit/job/PR-4155/2/testReport/junit/org.apache.hadoop.hive.cli/TestMiniTezCliDriver/Testing___split_20___PostProcess___testCliDriver_tez_union_with_udf_/

  was:
Looks like memory issue:

{noformat}
< Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.OutOfMemoryError: GC overhead limit exceeded
< Serialization trace:
< genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
< colExprMap (org.apache.hadoop.hive.ql.plan.SelectDesc)
< conf (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
< childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorLimitOperator)
< childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
{noformat}



> tez_union_with_udf.q with TestMiniTezCliDriver is flaky
> ---
>
> Key: HIVE-27182
> URL: https://issues.apache.org/jira/browse/HIVE-27182
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Priority: Major
>
> Looks like memory issue:
> {noformat}
> < Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> < Serialization trace:
> < genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> < colExprMap (org.apache.hadoop.hive.ql.plan.SelectDesc)
> < conf (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
> < childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorLimitOperator)
> < childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> {noformat}
> Ref: 
> http://ci.hive.apache.org/job/hive-precommit/job/PR-4155/2/testReport/junit/org.apache.hadoop.hive.cli/TestMiniTezCliDriver/Testing___split_20___PostProcess___testCliDriver_tez_union_with_udf_/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-27181) Remove RegexSerDe from hive-contrib, Upgrade should update changed FQN for RegexSerDe in HMS DB

2023-03-27 Thread Riju Trivedi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riju Trivedi reassigned HIVE-27181:
---

Assignee: Riju Trivedi

> Remove RegexSerDe from hive-contrib, Upgrade should update changed FQN for 
> RegexSerDe in HMS DB
> ---
>
> Key: HIVE-27181
> URL: https://issues.apache.org/jira/browse/HIVE-27181
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Riju Trivedi
>Assignee: Riju Trivedi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-27180) Remove JsonSerde from hcatalog, Upgrade should update changed FQN for JsonSerDe in HMS DB

2023-03-27 Thread Riju Trivedi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riju Trivedi reassigned HIVE-27180:
---


> Remove JsonSerde from hcatalog, Upgrade should update changed FQN for 
> JsonSerDe in HMS DB 
> --
>
> Key: HIVE-27180
> URL: https://issues.apache.org/jira/browse/HIVE-27180
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Riju Trivedi
>Assignee: Riju Trivedi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26655) VectorUDAFBloomFilterMerge should take care of safe batch handling when working in parallel

2023-03-27 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-26655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-26655:

Fix Version/s: 4.0.0

> VectorUDAFBloomFilterMerge should take care of safe batch handling when 
> working in parallel
> ---
>
> Key: HIVE-26655
> URL: https://issues.apache.org/jira/browse/HIVE-26655
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sungwoo Park
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When tested with 100GB ORC tables, the number of rows returned by query 17 is 
> not stable. It returns fewer rows than the correct result (55 rows).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-26655) VectorUDAFBloomFilterMerge should take care of safe batch handling when working in parallel

2023-03-27 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-26655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor resolved HIVE-26655.
-
Resolution: Fixed

> VectorUDAFBloomFilterMerge should take care of safe batch handling when 
> working in parallel
> ---
>
> Key: HIVE-26655
> URL: https://issues.apache.org/jira/browse/HIVE-26655
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sungwoo Park
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When tested with 100GB ORC tables, the number of rows returned by query 17 is 
> not stable. It returns fewer rows than the correct result (55 rows).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26655) VectorUDAFBloomFilterMerge should take care of safe batch handling when working in parallel

2023-03-27 Thread Jira



[ 
https://issues.apache.org/jira/browse/HIVE-26655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705347#comment-17705347
 ] 

László Bodor commented on HIVE-26655:
-

fix merged to master, thanks [~ayushtkn] for the review, and thanks [~glapark] 
for reporting this correctness issue!

> VectorUDAFBloomFilterMerge should take care of safe batch handling when 
> working in parallel
> ---
>
> Key: HIVE-26655
> URL: https://issues.apache.org/jira/browse/HIVE-26655
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sungwoo Park
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When tested with 100GB ORC tables, the number of rows returned by query 17 is 
> not stable. It returns fewer rows than the correct result (55 rows).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26655) VectorUDAFBloomFilterMerge should take care of safe batch handling when working in parallel

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26655?focusedWorklogId=853158=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853158
 ]

ASF GitHub Bot logged work on HIVE-26655:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 13:21
Start Date: 27/Mar/23 13:21
Worklog Time Spent: 10m 
  Work Description: abstractdog merged PR #4158:
URL: https://github.com/apache/hive/pull/4158




Issue Time Tracking
---

Worklog Id: (was: 853158)
Time Spent: 50m  (was: 40m)

> VectorUDAFBloomFilterMerge should take care of safe batch handling when 
> working in parallel
> ---
>
> Key: HIVE-26655
> URL: https://issues.apache.org/jira/browse/HIVE-26655
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sungwoo Park
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When tested with 100GB ORC tables, the number of rows returned by query 17 is 
> not stable. It returns fewer rows than the correct result (55 rows).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26400) Provide docker images for Hive

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26400?focusedWorklogId=853157=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853157
 ]

ASF GitHub Bot logged work on HIVE-26400:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 13:18
Start Date: 27/Mar/23 13:18
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on PR #3448:
URL: https://github.com/apache/hive/pull/3448#issuecomment-1485078827

   > I love this initiative. Can we get more eyes on it?
   > 
   > I have 2 comments about it:
   > 
   > 1. HMS can work together with MySQL but to many times we found bugs with 
MySQL which gave us a lot of headache. Is it possible to change for Postgre?
   > 2. I think we should ask a docker account to push the image to the 
repository as we have a new build or new release.
   > 
   > What is the remaining part of this task to make it happens?
   
   Thank you @TuroczyX for the comments.
   1. Have changed the back db to Postgres or embedded Derby;
   2. This is the remaining part, want to track it in the future after this 
task has finished.




Issue Time Tracking
---

Worklog Id: (was: 853157)
Time Spent: 10.5h  (was: 10h 20m)

> Provide docker images for Hive
> --
>
> Key: HIVE-26400
> URL: https://issues.apache.org/jira/browse/HIVE-26400
> Project: Hive
>  Issue Type: Sub-task
>  Components: Build Infrastructure
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Blocker
>  Labels: hive-4.0.0-must, pull-request-available
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> Make Apache Hive be able to run inside docker container in pseudo-distributed 
> mode, with MySQL/Derby as its back database, provide the following:
>  * Quick-start/Debugging/Prepare a test env for Hive;
>  * Tools to build target image with specified version of Hive and its 
> dependencies;
>  * Images can be used as the basis for the Kubernetes operator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27135) AcidUtils#getHdfsDirSnapshots() throws FNFE when a directory is removed in HDFS

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27135?focusedWorklogId=853155=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853155
 ]

ASF GitHub Bot logged work on HIVE-27135:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 12:07
Start Date: 27/Mar/23 12:07
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4114:
URL: https://github.com/apache/hive/pull/4114#issuecomment-1485028689

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4114)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4114=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4114=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4114=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=CODE_SMELL)
 [1 Code 
Smell](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4114=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4114=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 853155)
Time Spent: 5h  (was: 4h 50m)

> AcidUtils#getHdfsDirSnapshots() throws FNFE when a directory is removed in 
> HDFS
> ---
>
> Key: HIVE-27135
> URL: https://issues.apache.org/jira/browse/HIVE-27135
> Project: Hive
>  Issue Type: Bug
>Reporter: Dayakar M
>Assignee: Dayakar M
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> AcidUtils#getHdfsDirSnapshots() throws FileNotFoundException when a directory 
> is removed in HDFS while fetching HDFS Snapshots.
> Below testcode can be used to reproduce this issue.
> {code:java}
>  @Test
>   public void 
> testShouldNotThrowFNFEWhenHiveStagingDirectoryIsRemovedWhileFetchingHDFSSnapshots()
>  throws Exception {
> MockFileSystem fs = new MockFileSystem(new HiveConf(),
> new MockFile("mock:/tbl/part1/.hive-staging_dir/-ext-10002", 500, new 
> byte[0]),
> new MockFile("mock:/tbl/part2/.hive-staging_dir", 500, new byte[0]),
> new MockFile("mock:/tbl/part1/_tmp_space.db", 500, new byte[0]),
> new MockFile("mock:/tbl/part1/delta_1_1/bucket--", 500, new 
> byte[0]));
> Path path = new MockPath(fs, "/tbl");
> Path stageDir = new MockPath(fs, "mock:/tbl/part1/.hive-staging_dir");
> FileSystem mockFs = spy(fs);
> Mockito.doThrow(new 
>

[jira] [Work logged] (HIVE-26400) Provide docker images for Hive

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26400?focusedWorklogId=853153=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853153
 ]

ASF GitHub Bot logged work on HIVE-26400:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 11:59
Start Date: 27/Mar/23 11:59
Worklog Time Spent: 10m 
  Work Description: TuroczyX commented on PR #3448:
URL: https://github.com/apache/hive/pull/3448#issuecomment-1485015881

   I love this initiative. Can we get more eyes on it? 
   
   I have 2 comments about it:
   1. HMS can work together with MySQL but to many times we found bugs with 
MySQL which gave us a lot of headache. Is it possible to change for Postgre?
   2. I think we should ask a docker account to push the image to the 
repository as we have a new build or new release.
   
   
   What is the remaining part of this task to make it happens?




Issue Time Tracking
---

Worklog Id: (was: 853153)
Time Spent: 10h 20m  (was: 10h 10m)

> Provide docker images for Hive
> --
>
> Key: HIVE-26400
> URL: https://issues.apache.org/jira/browse/HIVE-26400
> Project: Hive
>  Issue Type: Sub-task
>  Components: Build Infrastructure
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Blocker
>  Labels: hive-4.0.0-must, pull-request-available
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> Make Apache Hive be able to run inside docker container in pseudo-distributed 
> mode, with MySQL/Derby as its back database, provide the following:
>  * Quick-start/Debugging/Prepare a test env for Hive;
>  * Tools to build target image with specified version of Hive and its 
> dependencies;
>  * Images can be used as the basis for the Kubernetes operator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26655) VectorUDAFBloomFilterMerge should take care of safe batch handling when working in parallel

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26655?focusedWorklogId=853152=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853152
 ]

ASF GitHub Bot logged work on HIVE-26655:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 11:50
Start Date: 27/Mar/23 11:50
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on code in PR #4158:
URL: https://github.com/apache/hive/pull/4158#discussion_r1149197845


##
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java:
##
@@ -602,4 +602,11 @@ public void assignRowColumn(VectorizedRowBatch batch, int 
batchIndex, int column
 Aggregation bfAgg = (Aggregation) agg;
 outputColVector.setVal(batchIndex, bfAgg.bfBytes, 0, bfAgg.bfBytes.length);
   }
+
+  /**
+   * Let's clone the batch when we're working in parallel, see HIVE-26655.
+   */
+  public boolean batchNeedsClone() {
+return numThreads > 0;
+  }

Review Comment:
   still need yes, thread=1 means the executor start processing the bloomfilter 
on 1 thread async while the main thread is fetching the next one





Issue Time Tracking
---

Worklog Id: (was: 853152)
Time Spent: 40m  (was: 0.5h)

> VectorUDAFBloomFilterMerge should take care of safe batch handling when 
> working in parallel
> ---
>
> Key: HIVE-26655
> URL: https://issues.apache.org/jira/browse/HIVE-26655
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sungwoo Park
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When tested with 100GB ORC tables, the number of rows returned by query 17 is 
> not stable. It returns fewer rows than the correct result (55 rows).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26655) VectorUDAFBloomFilterMerge should take care of safe batch handling when working in parallel

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26655?focusedWorklogId=853151=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853151
 ]

ASF GitHub Bot logged work on HIVE-26655:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 11:48
Start Date: 27/Mar/23 11:48
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on code in PR #4158:
URL: https://github.com/apache/hive/pull/4158#discussion_r1149193455


##
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java:
##
@@ -602,4 +602,11 @@ public void assignRowColumn(VectorizedRowBatch batch, int 
batchIndex, int column
 Aggregation bfAgg = (Aggregation) agg;
 outputColVector.setVal(batchIndex, bfAgg.bfBytes, 0, bfAgg.bfBytes.length);
   }
+
+  /**
+   * Let's clone the batch when we're working in parallel, see HIVE-26655.
+   */
+  public boolean batchNeedsClone() {
+return numThreads > 0;
+  }

Review Comment:
   just a quick pass, if thread count is 1, you still need a clone? for 
parallel execution it should be more than 1 thread? or some catch?





Issue Time Tracking
---

Worklog Id: (was: 853151)
Time Spent: 0.5h  (was: 20m)

> VectorUDAFBloomFilterMerge should take care of safe batch handling when 
> working in parallel
> ---
>
> Key: HIVE-26655
> URL: https://issues.apache.org/jira/browse/HIVE-26655
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sungwoo Park
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When tested with 100GB ORC tables, the number of rows returned by query 17 is 
> not stable. It returns fewer rows than the correct result (55 rows).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27177) Add alter table...Convert to Iceberg command

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27177?focusedWorklogId=853135=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853135
 ]

ASF GitHub Bot logged work on HIVE-27177:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 11:03
Start Date: 27/Mar/23 11:03
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4155:
URL: https://github.com/apache/hive/pull/4155#issuecomment-1484942204

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4155)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4155=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4155=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4155=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4155=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4155=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4155=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4155=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4155=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4155=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4155=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4155=false=CODE_SMELL)
 [4 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4155=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4155=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4155=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 853135)
Time Spent: 0.5h  (was: 20m)

> Add alter table...Convert to Iceberg command
> 
>
> Key: HIVE-27177
> URL: https://issues.apache.org/jira/browse/HIVE-27177
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Add an alter table  convert to Iceberg [TBLPROPERTIES('','')] to 
> convert exiting external tables to iceberg tables



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27135) AcidUtils#getHdfsDirSnapshots() throws FNFE when a directory is removed in HDFS

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27135?focusedWorklogId=853133=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853133
 ]

ASF GitHub Bot logged work on HIVE-27135:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 10:52
Start Date: 27/Mar/23 10:52
Worklog Time Spent: 10m 
  Work Description: mdayakar commented on code in PR #4114:
URL: https://github.com/apache/hive/pull/4114#discussion_r1149137172


##
ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java:
##
@@ -1538,32 +1538,36 @@ private static HdfsDirSnapshot addToSnapshot(Map dirToSna
   public static Map getHdfsDirSnapshots(final 
FileSystem fs, final Path path)
   throws IOException {
 Map dirToSnapshots = new HashMap<>();
-RemoteIterator itr = FileUtils.listFiles(fs, path, 
true, acidHiddenFileFilter);
-while (itr.hasNext()) {
-  FileStatus fStatus = itr.next();
-  Path fPath = fStatus.getPath();
-  if (fStatus.isDirectory() && acidTempDirFilter.accept(fPath)) {
-addToSnapshot(dirToSnapshots, fPath);
-  } else {
-Path parentDirPath = fPath.getParent();
-if (acidTempDirFilter.accept(parentDirPath)) {
-  while (isChildOfDelta(parentDirPath, path)) {
-// Some cases there are other directory layers between the delta 
and the datafiles
-// (export-import mm table, insert with union all to mm table, 
skewed tables).
-// But it does not matter for the AcidState, we just need the 
deltas and the data files
-// So build the snapshot with the files inside the delta directory
-parentDirPath = parentDirPath.getParent();
-  }
-  HdfsDirSnapshot dirSnapshot = addToSnapshot(dirToSnapshots, 
parentDirPath);
-  // We're not filtering out the metadata file and acid format file,
-  // as they represent parts of a valid snapshot
-  // We're not using the cached values downstream, but we can 
potentially optimize more in a follow-up task
-  if 
(fStatus.getPath().toString().contains(MetaDataFile.METADATA_FILE)) {
-dirSnapshot.addMetadataFile(fStatus);
-  } else if 
(fStatus.getPath().toString().contains(OrcAcidVersion.ACID_FORMAT)) {
-dirSnapshot.addOrcAcidFormatFile(fStatus);
-  } else {
-dirSnapshot.addFile(fStatus);
+Deque> stack = new ArrayDeque<>();
+stack.push(FileUtils.listLocatedStatusIterator(fs, path, 
acidHiddenFileFilter));
+while (!stack.isEmpty()) {
+  RemoteIterator itr = stack.pop();
+  while (itr.hasNext()) {
+FileStatus fStatus = itr.next();
+Path fPath = fStatus.getPath();
+if (fStatus.isDirectory()) {
+  stack.push(FileUtils.listLocatedStatusIterator(fs, fPath, 
acidHiddenFileFilter));

Review Comment:
   No, `addToSnapshot(dirToSnapshots, fPath) ` need to call if a folder 
contains a file, which is taken care in else part. Same logic exists in the 
existing code.





Issue Time Tracking
---

Worklog Id: (was: 853133)
Time Spent: 4h 50m  (was: 4h 40m)

> AcidUtils#getHdfsDirSnapshots() throws FNFE when a directory is removed in 
> HDFS
> ---
>
> Key: HIVE-27135
> URL: https://issues.apache.org/jira/browse/HIVE-27135
> Project: Hive
>  Issue Type: Bug
>Reporter: Dayakar M
>Assignee: Dayakar M
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> AcidUtils#getHdfsDirSnapshots() throws FileNotFoundException when a directory 
> is removed in HDFS while fetching HDFS Snapshots.
> Below testcode can be used to reproduce this issue.
> {code:java}
>  @Test
>   public void 
> testShouldNotThrowFNFEWhenHiveStagingDirectoryIsRemovedWhileFetchingHDFSSnapshots()
>  throws Exception {
> MockFileSystem fs = new MockFileSystem(new HiveConf(),
> new MockFile("mock:/tbl/part1/.hive-staging_dir/-ext-10002", 500, new 
> byte[0]),
> new MockFile("mock:/tbl/part2/.hive-staging_dir", 500, new byte[0]),
> new MockFile("mock:/tbl/part1/_tmp_space.db", 500, new byte[0]),
> new MockFile("mock:/tbl/part1/delta_1_1/bucket--", 500, new 
> byte[0]));
> Path path = new MockPath(fs, "/tbl");
> Path stageDir = new MockPath(fs, "mock:/tbl/part1/.hive-staging_dir");
> FileSystem mockFs = spy(fs);
> Mockito.doThrow(new 
> FileNotFoundException("")).when(mockFs).listLocatedStatus(eq(stageDir));
> try {
>   Map hdfsDirSnapshots = 
> AcidUtils.getHdfsDirSnapshots(mockFs, path);
>   Assert.assertEquals(1, hdfsDirSnapshots.size());
> }
> catch (FileNotFoundException fnf) {
>   fail("Should not throw FileNotFoundException when a

[jira] [Work logged] (HIVE-27135) AcidUtils#getHdfsDirSnapshots() throws FNFE when a directory is removed in HDFS

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27135?focusedWorklogId=853132=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853132
 ]

ASF GitHub Bot logged work on HIVE-27135:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 10:52
Start Date: 27/Mar/23 10:52
Worklog Time Spent: 10m 
  Work Description: mdayakar commented on code in PR #4114:
URL: https://github.com/apache/hive/pull/4114#discussion_r1149137172


##
ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java:
##
@@ -1538,32 +1538,36 @@ private static HdfsDirSnapshot addToSnapshot(Map dirToSna
   public static Map getHdfsDirSnapshots(final 
FileSystem fs, final Path path)
   throws IOException {
 Map dirToSnapshots = new HashMap<>();
-RemoteIterator itr = FileUtils.listFiles(fs, path, 
true, acidHiddenFileFilter);
-while (itr.hasNext()) {
-  FileStatus fStatus = itr.next();
-  Path fPath = fStatus.getPath();
-  if (fStatus.isDirectory() && acidTempDirFilter.accept(fPath)) {
-addToSnapshot(dirToSnapshots, fPath);
-  } else {
-Path parentDirPath = fPath.getParent();
-if (acidTempDirFilter.accept(parentDirPath)) {
-  while (isChildOfDelta(parentDirPath, path)) {
-// Some cases there are other directory layers between the delta 
and the datafiles
-// (export-import mm table, insert with union all to mm table, 
skewed tables).
-// But it does not matter for the AcidState, we just need the 
deltas and the data files
-// So build the snapshot with the files inside the delta directory
-parentDirPath = parentDirPath.getParent();
-  }
-  HdfsDirSnapshot dirSnapshot = addToSnapshot(dirToSnapshots, 
parentDirPath);
-  // We're not filtering out the metadata file and acid format file,
-  // as they represent parts of a valid snapshot
-  // We're not using the cached values downstream, but we can 
potentially optimize more in a follow-up task
-  if 
(fStatus.getPath().toString().contains(MetaDataFile.METADATA_FILE)) {
-dirSnapshot.addMetadataFile(fStatus);
-  } else if 
(fStatus.getPath().toString().contains(OrcAcidVersion.ACID_FORMAT)) {
-dirSnapshot.addOrcAcidFormatFile(fStatus);
-  } else {
-dirSnapshot.addFile(fStatus);
+Deque> stack = new ArrayDeque<>();
+stack.push(FileUtils.listLocatedStatusIterator(fs, path, 
acidHiddenFileFilter));
+while (!stack.isEmpty()) {
+  RemoteIterator itr = stack.pop();
+  while (itr.hasNext()) {
+FileStatus fStatus = itr.next();
+Path fPath = fStatus.getPath();
+if (fStatus.isDirectory()) {
+  stack.push(FileUtils.listLocatedStatusIterator(fs, fPath, 
acidHiddenFileFilter));

Review Comment:
   No, `addToSnapshot(dirToSnapshots, fPath) ` need to add if a folder contains 
a file which is taken care in else part. Same logic exists in the existing code.





Issue Time Tracking
---

Worklog Id: (was: 853132)
Time Spent: 4h 40m  (was: 4.5h)

> AcidUtils#getHdfsDirSnapshots() throws FNFE when a directory is removed in 
> HDFS
> ---
>
> Key: HIVE-27135
> URL: https://issues.apache.org/jira/browse/HIVE-27135
> Project: Hive
>  Issue Type: Bug
>Reporter: Dayakar M
>Assignee: Dayakar M
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> AcidUtils#getHdfsDirSnapshots() throws FileNotFoundException when a directory 
> is removed in HDFS while fetching HDFS Snapshots.
> Below testcode can be used to reproduce this issue.
> {code:java}
>  @Test
>   public void 
> testShouldNotThrowFNFEWhenHiveStagingDirectoryIsRemovedWhileFetchingHDFSSnapshots()
>  throws Exception {
> MockFileSystem fs = new MockFileSystem(new HiveConf(),
> new MockFile("mock:/tbl/part1/.hive-staging_dir/-ext-10002", 500, new 
> byte[0]),
> new MockFile("mock:/tbl/part2/.hive-staging_dir", 500, new byte[0]),
> new MockFile("mock:/tbl/part1/_tmp_space.db", 500, new byte[0]),
> new MockFile("mock:/tbl/part1/delta_1_1/bucket--", 500, new 
> byte[0]));
> Path path = new MockPath(fs, "/tbl");
> Path stageDir = new MockPath(fs, "mock:/tbl/part1/.hive-staging_dir");
> FileSystem mockFs = spy(fs);
> Mockito.doThrow(new 
> FileNotFoundException("")).when(mockFs).listLocatedStatus(eq(stageDir));
> try {
>   Map hdfsDirSnapshots = 
> AcidUtils.getHdfsDirSnapshots(mockFs, path);
>   Assert.assertEquals(1, hdfsDirSnapshots.size());
> }
> catch (FileNotFoundException fnf) {
>   fail("Should not throw FileNotFoundException when a directory

[jira] [Updated] (HIVE-27179) HS2 WebUI throws NPE when JspFactory loaded from jetty-runner

2023-03-27 Thread Zhihua Deng (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-27179:
---
Description: 
In HIVE-17088{*},{*} we resolved a NPE thrown from HS2 WebUI by introducing 

javax.servlet.jsp-api. It works as expected when the javax.servlet.jsp-api jar 
prevails jetty-runner jar, but things can be different in some environments, it 
still throws NPE when opening the HS2 web:
{noformat}
java.lang.NullPointerException 
at 
org.apache.hive.generated.hiveserver2.hiveserver2_jsp._jspService(hiveserver2_jsp.java:286)
 
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:71) 
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) 
at 
org.eclipse.jetty.servlet.ServletHolder$NotAsync.service(ServletHolder.java:1443)
 
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:791) 
at 
org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1626)
...{noformat}
The jetty-runner JspFactory.getDefaultFactory() just returns null.

  was:
In HIVE-17088{*},{*} we resolved a NPE thrown from HS2 WebUI by introducing 

javax.servlet.jsp-api. It works as expected when the javax.servlet.jsp-api jar 
prevails jetty-runner jar, but things can be different in some environments, it 
still throws NPE when opening the HS2 web:
{noformat}
java.lang.NullPointerException at 
org.apache.hive.generated.hiveserver2.hiveserver2_jsp._jspService(hiveserver2_jsp.java:286)
 at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:71) at 
javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at 
org.eclipse.jetty.servlet.ServletHolder$NotAsync.service(ServletHolder.java:1443)
 at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:791) at 
org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1626)
...{noformat}
The jetty-runner JspFactory.getDefaultFactory() just returns null.


> HS2 WebUI throws NPE when JspFactory loaded from jetty-runner
> -
>
> Key: HIVE-27179
> URL: https://issues.apache.org/jira/browse/HIVE-27179
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Priority: Major
>
> In HIVE-17088{*},{*} we resolved a NPE thrown from HS2 WebUI by introducing 
> javax.servlet.jsp-api. It works as expected when the javax.servlet.jsp-api 
> jar prevails jetty-runner jar, but things can be different in some 
> environments, it still throws NPE when opening the HS2 web:
> {noformat}
> java.lang.NullPointerException 
> at 
> org.apache.hive.generated.hiveserver2.hiveserver2_jsp._jspService(hiveserver2_jsp.java:286)
>  
> at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:71) 
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) 
> at 
> org.eclipse.jetty.servlet.ServletHolder$NotAsync.service(ServletHolder.java:1443)
>  
> at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:791) 
> at 
> org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1626)
> ...{noformat}
> The jetty-runner JspFactory.getDefaultFactory() just returns null.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853129=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853129
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 10:43
Start Date: 27/Mar/23 10:43
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1149127885


##
ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java:
##
@@ -1069,8 +1069,12 @@ public static List getTableColumnStats(
 }
 if (fetchColStats && !colStatsToRetrieve.isEmpty()) {
   try {
-List colStat = 
Hive.get().getTableColumnStatistics(
-dbName, tabName, colStatsToRetrieve, false);
+List colStat;
+if (table != null && table.isNonNative() && 
table.getStorageHandler().canProvideColStatistics(table)) {

Review Comment:
   metastore mode for Iceberg is no longer supported, right? have we disabled 
to relevant stats persist logic?





Issue Time Tracking
---

Worklog Id: (was: 853129)
Time Spent: 6h 20m  (was: 6h 10m)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853128=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853128
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 10:39
Start Date: 27/Mar/23 10:39
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1149123408


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish 
partish) {
 return stats;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase();
+return statsSource.equals(PUFFIN);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+if (table.currentSnapshot() != null) {
+  String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+  String statsPath = table.location() + STATS + table.name() + 
table.currentSnapshot().snapshotId();
+  if (statsSource.equals(PUFFIN)) {
+try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) {
+  if (fs.exists(new Path(statsPath))) {
+return true;
+  }
+} catch (IOException e) {
+  LOG.warn(e.getMessage());
+}
+  }
+}
+return false;
+  }
+
+  @Override
+  public List 
getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+switch (statsSource) {
+  case ICEBERG:
+// Place holder for iceberg stats
+break;
+  case PUFFIN:
+String snapshotId = table.name() + 
table.currentSnapshot().snapshotId();
+String statsPath = table.location() + STATS + snapshotId;
+LOG.info("Using stats from puffin file at:" + statsPath);
+try (PuffinReader reader = 
Puffin.read(table.io().newInputFile(statsPath)).build()) {
+  BlobMetadata blobMetadata = reader.fileMetadata().blobs().get(0);
+  Map> collect =
+  
Streams.stream(reader.readAll(ImmutableList.of(blobMetadata))).collect(Collectors.toMap(Pair::first,
+  blobMetadataByteBufferPair -> SerializationUtils.deserialize(
+  
ByteBuffers.toByteArray(blobMetadataByteBufferPair.second();
+
+  return 
collect.entrySet().stream().iterator().next().getValue().get(0).getStatsObj();
+} catch (IOException e) {
+  LOG.info(String.valueOf(e));
+}
+break;
+  default:
+// fall back to metastore
+}
+return null;
+  }
+
+
+  @Override
+  public boolean setColStatistics(org.apache.hadoop.hive.ql.metadata.Table 
table,
+  List colStats) {
+TableDesc tableDesc = Utilities.getTableDesc(table);
+Table tbl = Catalogs.loadTable(conf, tableDesc.getProperties());
+String snapshotId = tbl.name() + tbl.currentSnapshot().snapshotId();
+byte[] serializeColStats = SerializationUtils.serialize((Serializable) 
colStats);
+
+try (PuffinWriter writer = 
Puffin.write(tbl.io().newOutputFile(tbl.location() + STATS + snapshotId))

Review Comment:
   looks like not, per `getColStatsForPartCol` comment: currently, metastore 
does not store column stats for the partition column. Also in the HMS column 
stats table is called `TAB_COL_STATS` which doesn't track the per-partition 
stats.
   





Issue Time Tracking
---

Worklog Id: (was: 853128)
Time Spent: 6h 10m  (was: 6h)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853127=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853127
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 10:38
Start Date: 27/Mar/23 10:38
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1149123408


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish 
partish) {
 return stats;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase();
+return statsSource.equals(PUFFIN);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+if (table.currentSnapshot() != null) {
+  String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+  String statsPath = table.location() + STATS + table.name() + 
table.currentSnapshot().snapshotId();
+  if (statsSource.equals(PUFFIN)) {
+try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) {
+  if (fs.exists(new Path(statsPath))) {
+return true;
+  }
+} catch (IOException e) {
+  LOG.warn(e.getMessage());
+}
+  }
+}
+return false;
+  }
+
+  @Override
+  public List 
getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+switch (statsSource) {
+  case ICEBERG:
+// Place holder for iceberg stats
+break;
+  case PUFFIN:
+String snapshotId = table.name() + 
table.currentSnapshot().snapshotId();
+String statsPath = table.location() + STATS + snapshotId;
+LOG.info("Using stats from puffin file at:" + statsPath);
+try (PuffinReader reader = 
Puffin.read(table.io().newInputFile(statsPath)).build()) {
+  BlobMetadata blobMetadata = reader.fileMetadata().blobs().get(0);
+  Map> collect =
+  
Streams.stream(reader.readAll(ImmutableList.of(blobMetadata))).collect(Collectors.toMap(Pair::first,
+  blobMetadataByteBufferPair -> SerializationUtils.deserialize(
+  
ByteBuffers.toByteArray(blobMetadataByteBufferPair.second();
+
+  return 
collect.entrySet().stream().iterator().next().getValue().get(0).getStatsObj();
+} catch (IOException e) {
+  LOG.info(String.valueOf(e));
+}
+break;
+  default:
+// fall back to metastore
+}
+return null;
+  }
+
+
+  @Override
+  public boolean setColStatistics(org.apache.hadoop.hive.ql.metadata.Table 
table,
+  List colStats) {
+TableDesc tableDesc = Utilities.getTableDesc(table);
+Table tbl = Catalogs.loadTable(conf, tableDesc.getProperties());
+String snapshotId = tbl.name() + tbl.currentSnapshot().snapshotId();
+byte[] serializeColStats = SerializationUtils.serialize((Serializable) 
colStats);
+
+try (PuffinWriter writer = 
Puffin.write(tbl.io().newOutputFile(tbl.location() + STATS + snapshotId))

Review Comment:
   looks like not, per `getColStatsForPartCol` comment: currently, metastore 
does not store column stats for the partition column. Also in the HMS column 
stats table is called `TAB_COL_STATS` that doesn't track the per-partition 
stats.
   





Issue Time Tracking
---

Worklog Id: (was: 853127)
Time Spent: 6h  (was: 5h 50m)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853126=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853126
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 10:28
Start Date: 27/Mar/23 10:28
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1149111504


##
ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java:
##
@@ -218,6 +218,9 @@ public int persistColumnStats(Hive db, Table tbl) throws 
HiveException, MetaExce
   }
 
   start = System. currentTimeMillis();
+  if(tbl != null && tbl.isNonNative() && 
tbl.getStorageHandler().canSetColStatistics()){
+   tbl.getStorageHandler().setColStatistics(tbl, colStats);

Review Comment:
   when this code is getting invoked, is it by auto-gather thread? if yes, what 
if there was several snapshots generated between the runs? 



##
ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java:
##
@@ -218,6 +218,9 @@ public int persistColumnStats(Hive db, Table tbl) throws 
HiveException, MetaExce
   }
 
   start = System. currentTimeMillis();
+  if(tbl != null && tbl.isNonNative() && 
tbl.getStorageHandler().canSetColStatistics()){
+   tbl.getStorageHandler().setColStatistics(tbl, colStats);

Review Comment:
   when this code is getting invoked, is it by auto-gather thread? if yes, what 
if there were several snapshots generated between the runs? 





Issue Time Tracking
---

Worklog Id: (was: 853126)
Time Spent: 5h 50m  (was: 5h 40m)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853125=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853125
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 10:25
Start Date: 27/Mar/23 10:25
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1149109172


##
ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java:
##
@@ -218,6 +218,9 @@ public int persistColumnStats(Hive db, Table tbl) throws 
HiveException, MetaExce
   }
 
   start = System. currentTimeMillis();
+  if(tbl != null && tbl.isNonNative() && 
tbl.getStorageHandler().canSetColStatistics()){

Review Comment:
   nit: space





Issue Time Tracking
---

Worklog Id: (was: 853125)
Time Spent: 5h 40m  (was: 5.5h)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26655) VectorUDAFBloomFilterMerge should take care of safe batch handling when working in parallel

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26655?focusedWorklogId=853124=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853124
 ]

ASF GitHub Bot logged work on HIVE-26655:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 10:15
Start Date: 27/Mar/23 10:15
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4158:
URL: https://github.com/apache/hive/pull/4158#issuecomment-1484880192

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4158)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4158=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4158=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4158=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4158=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4158=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4158=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4158=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4158=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4158=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4158=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4158=false=CODE_SMELL)
 [5 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4158=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4158=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4158=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 853124)
Time Spent: 20m  (was: 10m)

> VectorUDAFBloomFilterMerge should take care of safe batch handling when 
> working in parallel
> ---
>
> Key: HIVE-26655
> URL: https://issues.apache.org/jira/browse/HIVE-26655
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sungwoo Park
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When tested with 100GB ORC tables, the number of rows returned by query 17 is 
> not stable. It returns fewer rows than the correct result (55 rows).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853121=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853121
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 09:30
Start Date: 27/Mar/23 09:30
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1149037383


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish 
partish) {
 return stats;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase();
+return statsSource.equals(PUFFIN);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+if (table.currentSnapshot() != null) {
+  String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+  String statsPath = table.location() + STATS + table.name() + 
table.currentSnapshot().snapshotId();
+  if (statsSource.equals(PUFFIN)) {
+try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) {
+  if (fs.exists(new Path(statsPath))) {
+return true;
+  }
+} catch (IOException e) {
+  LOG.warn(e.getMessage());
+}
+  }
+}
+return false;
+  }
+
+  @Override
+  public List 
getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+switch (statsSource) {
+  case ICEBERG:
+// Place holder for iceberg stats
+break;
+  case PUFFIN:
+String snapshotId = table.name() + 
table.currentSnapshot().snapshotId();
+String statsPath = table.location() + STATS + snapshotId;
+LOG.info("Using stats from puffin file at:" + statsPath);
+try (PuffinReader reader = 
Puffin.read(table.io().newInputFile(statsPath)).build()) {
+  BlobMetadata blobMetadata = reader.fileMetadata().blobs().get(0);
+  Map> collect =
+  
Streams.stream(reader.readAll(ImmutableList.of(blobMetadata))).collect(Collectors.toMap(Pair::first,
+  blobMetadataByteBufferPair -> SerializationUtils.deserialize(
+  
ByteBuffers.toByteArray(blobMetadataByteBufferPair.second();
+
+  return 
collect.entrySet().stream().iterator().next().getValue().get(0).getStatsObj();
+} catch (IOException e) {
+  LOG.info(String.valueOf(e));
+}
+break;
+  default:
+// fall back to metastore
+}
+return null;
+  }
+
+
+  @Override
+  public boolean setColStatistics(org.apache.hadoop.hive.ql.metadata.Table 
table,
+  List colStats) {
+TableDesc tableDesc = Utilities.getTableDesc(table);
+Table tbl = Catalogs.loadTable(conf, tableDesc.getProperties());
+String snapshotId = tbl.name() + tbl.currentSnapshot().snapshotId();
+byte[] serializeColStats = SerializationUtils.serialize((Serializable) 
colStats);
+
+try (PuffinWriter writer = 
Puffin.write(tbl.io().newOutputFile(tbl.location() + STATS + snapshotId))

Review Comment:
   what about partition-level stats? if the table is partitioned you need to 
append the partition stats as well





Issue Time Tracking
---

Worklog Id: (was: 853121)
Time Spent: 5.5h  (was: 5h 20m)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853120=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853120
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 09:25
Start Date: 27/Mar/23 09:25
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1149037383


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish 
partish) {
 return stats;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase();
+return statsSource.equals(PUFFIN);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+if (table.currentSnapshot() != null) {
+  String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+  String statsPath = table.location() + STATS + table.name() + 
table.currentSnapshot().snapshotId();
+  if (statsSource.equals(PUFFIN)) {
+try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) {
+  if (fs.exists(new Path(statsPath))) {
+return true;
+  }
+} catch (IOException e) {
+  LOG.warn(e.getMessage());
+}
+  }
+}
+return false;
+  }
+
+  @Override
+  public List 
getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+switch (statsSource) {
+  case ICEBERG:
+// Place holder for iceberg stats
+break;
+  case PUFFIN:
+String snapshotId = table.name() + 
table.currentSnapshot().snapshotId();
+String statsPath = table.location() + STATS + snapshotId;
+LOG.info("Using stats from puffin file at:" + statsPath);
+try (PuffinReader reader = 
Puffin.read(table.io().newInputFile(statsPath)).build()) {
+  BlobMetadata blobMetadata = reader.fileMetadata().blobs().get(0);
+  Map> collect =
+  
Streams.stream(reader.readAll(ImmutableList.of(blobMetadata))).collect(Collectors.toMap(Pair::first,
+  blobMetadataByteBufferPair -> SerializationUtils.deserialize(
+  
ByteBuffers.toByteArray(blobMetadataByteBufferPair.second();
+
+  return 
collect.entrySet().stream().iterator().next().getValue().get(0).getStatsObj();
+} catch (IOException e) {
+  LOG.info(String.valueOf(e));
+}
+break;
+  default:
+// fall back to metastore
+}
+return null;
+  }
+
+
+  @Override
+  public boolean setColStatistics(org.apache.hadoop.hive.ql.metadata.Table 
table,
+  List colStats) {
+TableDesc tableDesc = Utilities.getTableDesc(table);
+Table tbl = Catalogs.loadTable(conf, tableDesc.getProperties());
+String snapshotId = tbl.name() + tbl.currentSnapshot().snapshotId();
+byte[] serializeColStats = SerializationUtils.serialize((Serializable) 
colStats);
+
+try (PuffinWriter writer = 
Puffin.write(tbl.io().newOutputFile(tbl.location() + STATS + snapshotId))

Review Comment:
   what about partition-level stats?





Issue Time Tracking
---

Worklog Id: (was: 853120)
Time Spent: 5h 20m  (was: 5h 10m)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853119=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853119
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 09:24
Start Date: 27/Mar/23 09:24
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1149035181


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish 
partish) {
 return stats;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase();
+return statsSource.equals(PUFFIN);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+if (table.currentSnapshot() != null) {
+  String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+  String statsPath = table.location() + STATS + table.name() + 
table.currentSnapshot().snapshotId();
+  if (statsSource.equals(PUFFIN)) {
+try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) {
+  if (fs.exists(new Path(statsPath))) {
+return true;
+  }
+} catch (IOException e) {
+  LOG.warn(e.getMessage());
+}
+  }
+}
+return false;
+  }
+
+  @Override
+  public List 
getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+switch (statsSource) {
+  case ICEBERG:
+// Place holder for iceberg stats
+break;
+  case PUFFIN:
+String snapshotId = table.name() + 
table.currentSnapshot().snapshotId();
+String statsPath = table.location() + STATS + snapshotId;
+LOG.info("Using stats from puffin file at:" + statsPath);
+try (PuffinReader reader = 
Puffin.read(table.io().newInputFile(statsPath)).build()) {
+  BlobMetadata blobMetadata = reader.fileMetadata().blobs().get(0);
+  Map> collect =
+  
Streams.stream(reader.readAll(ImmutableList.of(blobMetadata))).collect(Collectors.toMap(Pair::first,
+  blobMetadataByteBufferPair -> SerializationUtils.deserialize(
+  
ByteBuffers.toByteArray(blobMetadataByteBufferPair.second();
+
+  return 
collect.entrySet().stream().iterator().next().getValue().get(0).getStatsObj();
+} catch (IOException e) {
+  LOG.info(String.valueOf(e));
+}
+break;
+  default:
+// fall back to metastore
+}
+return null;
+  }
+
+
+  @Override
+  public boolean setColStatistics(org.apache.hadoop.hive.ql.metadata.Table 
table,
+  List colStats) {
+TableDesc tableDesc = Utilities.getTableDesc(table);
+Table tbl = Catalogs.loadTable(conf, tableDesc.getProperties());
+String snapshotId = tbl.name() + tbl.currentSnapshot().snapshotId();
+byte[] serializeColStats = SerializationUtils.serialize((Serializable) 
colStats);
+
+try (PuffinWriter writer = 
Puffin.write(tbl.io().newOutputFile(tbl.location() + STATS + snapshotId))
+.createdBy("Hive").build()) {
+  writer.add(
+  new Blob(
+  tbl.name() + "-" + snapshotId,
+  ImmutableList.of(1),
+  tbl.currentSnapshot().snapshotId(),
+  tbl.currentSnapshot().sequenceNumber(),
+  ByteBuffer.wrap(serializeColStats),
+  PuffinCompressionCodec.NONE,
+  ImmutableMap.of()));
+  writer.finish();
+} catch (IOException e) {
+  LOG.info(String.valueOf(e));

Review Comment:
   do not swallow exception





Issue Time Tracking
---

Worklog Id: (was: 853119)
Time Spent: 5h 10m  (was: 5h)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>

[jira] [Work logged] (HIVE-26655) VectorUDAFBloomFilterMerge should take care of safe batch handling when working in parallel

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26655?focusedWorklogId=853118=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853118
 ]

ASF GitHub Bot logged work on HIVE-26655:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 09:23
Start Date: 27/Mar/23 09:23
Worklog Time Spent: 10m 
  Work Description: abstractdog opened a new pull request, #4158:
URL: https://github.com/apache/hive/pull/4158

   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   




Issue Time Tracking
---

Worklog Id: (was: 853118)
Remaining Estimate: 0h
Time Spent: 10m

> VectorUDAFBloomFilterMerge should take care of safe batch handling when 
> working in parallel
> ---
>
> Key: HIVE-26655
> URL: https://issues.apache.org/jira/browse/HIVE-26655
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sungwoo Park
>Assignee: László Bodor
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When tested with 100GB ORC tables, the number of rows returned by query 17 is 
> not stable. It returns fewer rows than the correct result (55 rows).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26655) VectorUDAFBloomFilterMerge should take care of safe batch handling when working in parallel

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26655:
--
Labels: pull-request-available  (was: )

> VectorUDAFBloomFilterMerge should take care of safe batch handling when 
> working in parallel
> ---
>
> Key: HIVE-26655
> URL: https://issues.apache.org/jira/browse/HIVE-26655
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sungwoo Park
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When tested with 100GB ORC tables, the number of rows returned by query 17 is 
> not stable. It returns fewer rows than the correct result (55 rows).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853117=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853117
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 09:22
Start Date: 27/Mar/23 09:22
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1149033188


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish 
partish) {
 return stats;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase();
+return statsSource.equals(PUFFIN);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+if (table.currentSnapshot() != null) {
+  String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+  String statsPath = table.location() + STATS + table.name() + 
table.currentSnapshot().snapshotId();
+  if (statsSource.equals(PUFFIN)) {
+try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) {
+  if (fs.exists(new Path(statsPath))) {
+return true;
+  }
+} catch (IOException e) {
+  LOG.warn(e.getMessage());
+}
+  }
+}
+return false;
+  }
+
+  @Override
+  public List 
getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+switch (statsSource) {
+  case ICEBERG:
+// Place holder for iceberg stats
+break;
+  case PUFFIN:
+String snapshotId = table.name() + 
table.currentSnapshot().snapshotId();
+String statsPath = table.location() + STATS + snapshotId;
+LOG.info("Using stats from puffin file at:" + statsPath);
+try (PuffinReader reader = 
Puffin.read(table.io().newInputFile(statsPath)).build()) {
+  BlobMetadata blobMetadata = reader.fileMetadata().blobs().get(0);
+  Map> collect =
+  
Streams.stream(reader.readAll(ImmutableList.of(blobMetadata))).collect(Collectors.toMap(Pair::first,
+  blobMetadataByteBufferPair -> SerializationUtils.deserialize(
+  
ByteBuffers.toByteArray(blobMetadataByteBufferPair.second();
+
+  return 
collect.entrySet().stream().iterator().next().getValue().get(0).getStatsObj();
+} catch (IOException e) {
+  LOG.info(String.valueOf(e));
+}
+break;
+  default:
+// fall back to metastore
+}
+return null;
+  }
+
+
+  @Override
+  public boolean setColStatistics(org.apache.hadoop.hive.ql.metadata.Table 
table,
+  List colStats) {
+TableDesc tableDesc = Utilities.getTableDesc(table);
+Table tbl = Catalogs.loadTable(conf, tableDesc.getProperties());
+String snapshotId = tbl.name() + tbl.currentSnapshot().snapshotId();
+byte[] serializeColStats = SerializationUtils.serialize((Serializable) 
colStats);

Review Comment:
   should we check if for NULL here?





Issue Time Tracking
---

Worklog Id: (was: 853117)
Time Spent: 5h  (was: 4h 50m)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853115=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853115
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 09:20
Start Date: 27/Mar/23 09:20
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1149030014


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish 
partish) {
 return stats;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase();
+return statsSource.equals(PUFFIN);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+if (table.currentSnapshot() != null) {
+  String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+  String statsPath = table.location() + STATS + table.name() + 
table.currentSnapshot().snapshotId();
+  if (statsSource.equals(PUFFIN)) {
+try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) {
+  if (fs.exists(new Path(statsPath))) {
+return true;
+  }
+} catch (IOException e) {
+  LOG.warn(e.getMessage());
+}
+  }
+}
+return false;
+  }
+
+  @Override
+  public List 
getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+switch (statsSource) {
+  case ICEBERG:
+// Place holder for iceberg stats
+break;
+  case PUFFIN:
+String snapshotId = table.name() + 
table.currentSnapshot().snapshotId();
+String statsPath = table.location() + STATS + snapshotId;
+LOG.info("Using stats from puffin file at:" + statsPath);
+try (PuffinReader reader = 
Puffin.read(table.io().newInputFile(statsPath)).build()) {
+  BlobMetadata blobMetadata = reader.fileMetadata().blobs().get(0);
+  Map> collect =
+  
Streams.stream(reader.readAll(ImmutableList.of(blobMetadata))).collect(Collectors.toMap(Pair::first,
+  blobMetadataByteBufferPair -> SerializationUtils.deserialize(
+  
ByteBuffers.toByteArray(blobMetadataByteBufferPair.second();
+
+  return 
collect.entrySet().stream().iterator().next().getValue().get(0).getStatsObj();

Review Comment:
   why do you need a map in the first place if you are only interested in the 
first value?





Issue Time Tracking
---

Worklog Id: (was: 853115)
Time Spent: 4h 40m  (was: 4.5h)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853116=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853116
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 09:20
Start Date: 27/Mar/23 09:20
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1149031483


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish 
partish) {
 return stats;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase();
+return statsSource.equals(PUFFIN);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+if (table.currentSnapshot() != null) {
+  String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+  String statsPath = table.location() + STATS + table.name() + 
table.currentSnapshot().snapshotId();
+  if (statsSource.equals(PUFFIN)) {
+try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) {
+  if (fs.exists(new Path(statsPath))) {
+return true;
+  }
+} catch (IOException e) {
+  LOG.warn(e.getMessage());
+}
+  }
+}
+return false;
+  }
+
+  @Override
+  public List 
getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+switch (statsSource) {
+  case ICEBERG:
+// Place holder for iceberg stats
+break;
+  case PUFFIN:
+String snapshotId = table.name() + 
table.currentSnapshot().snapshotId();
+String statsPath = table.location() + STATS + snapshotId;
+LOG.info("Using stats from puffin file at:" + statsPath);
+try (PuffinReader reader = 
Puffin.read(table.io().newInputFile(statsPath)).build()) {
+  BlobMetadata blobMetadata = reader.fileMetadata().blobs().get(0);
+  Map> collect =
+  
Streams.stream(reader.readAll(ImmutableList.of(blobMetadata))).collect(Collectors.toMap(Pair::first,
+  blobMetadataByteBufferPair -> SerializationUtils.deserialize(
+  
ByteBuffers.toByteArray(blobMetadataByteBufferPair.second();
+
+  return 
collect.entrySet().stream().iterator().next().getValue().get(0).getStatsObj();
+} catch (IOException e) {
+  LOG.info(String.valueOf(e));

Review Comment:
   why are we swallowing exception here?





Issue Time Tracking
---

Worklog Id: (was: 853116)
Time Spent: 4h 50m  (was: 4h 40m)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853114=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853114
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 09:19
Start Date: 27/Mar/23 09:19
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1149030014


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish 
partish) {
 return stats;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase();
+return statsSource.equals(PUFFIN);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+if (table.currentSnapshot() != null) {
+  String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+  String statsPath = table.location() + STATS + table.name() + 
table.currentSnapshot().snapshotId();
+  if (statsSource.equals(PUFFIN)) {
+try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) {
+  if (fs.exists(new Path(statsPath))) {
+return true;
+  }
+} catch (IOException e) {
+  LOG.warn(e.getMessage());
+}
+  }
+}
+return false;
+  }
+
+  @Override
+  public List 
getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+switch (statsSource) {
+  case ICEBERG:
+// Place holder for iceberg stats
+break;
+  case PUFFIN:
+String snapshotId = table.name() + 
table.currentSnapshot().snapshotId();
+String statsPath = table.location() + STATS + snapshotId;
+LOG.info("Using stats from puffin file at:" + statsPath);
+try (PuffinReader reader = 
Puffin.read(table.io().newInputFile(statsPath)).build()) {
+  BlobMetadata blobMetadata = reader.fileMetadata().blobs().get(0);
+  Map> collect =
+  
Streams.stream(reader.readAll(ImmutableList.of(blobMetadata))).collect(Collectors.toMap(Pair::first,
+  blobMetadataByteBufferPair -> SerializationUtils.deserialize(
+  
ByteBuffers.toByteArray(blobMetadataByteBufferPair.second();
+
+  return 
collect.entrySet().stream().iterator().next().getValue().get(0).getStatsObj();

Review Comment:
   why do you need a map in the first place?





Issue Time Tracking
---

Worklog Id: (was: 853114)
Time Spent: 4.5h  (was: 4h 20m)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853113=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853113
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 09:17
Start Date: 27/Mar/23 09:17
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1149027518


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish 
partish) {
 return stats;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase();
+return statsSource.equals(PUFFIN);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+if (table.currentSnapshot() != null) {
+  String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+  String statsPath = table.location() + STATS + table.name() + 
table.currentSnapshot().snapshotId();
+  if (statsSource.equals(PUFFIN)) {
+try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) {
+  if (fs.exists(new Path(statsPath))) {
+return true;
+  }
+} catch (IOException e) {
+  LOG.warn(e.getMessage());
+}
+  }
+}
+return false;
+  }
+
+  @Override
+  public List 
getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+switch (statsSource) {
+  case ICEBERG:
+// Place holder for iceberg stats
+break;
+  case PUFFIN:
+String snapshotId = table.name() + 
table.currentSnapshot().snapshotId();
+String statsPath = table.location() + STATS + snapshotId;
+LOG.info("Using stats from puffin file at:" + statsPath);
+try (PuffinReader reader = 
Puffin.read(table.io().newInputFile(statsPath)).build()) {
+  BlobMetadata blobMetadata = reader.fileMetadata().blobs().get(0);
+  Map> collect =
+  
Streams.stream(reader.readAll(ImmutableList.of(blobMetadata))).collect(Collectors.toMap(Pair::first,

Review Comment:
   why do you need to wrap with ImmutableList.of(blobMetadata) ?





Issue Time Tracking
---

Worklog Id: (was: 853113)
Time Spent: 4h 20m  (was: 4h 10m)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853112=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853112
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 09:16
Start Date: 27/Mar/23 09:16
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1149025872


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish 
partish) {
 return stats;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase();
+return statsSource.equals(PUFFIN);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+if (table.currentSnapshot() != null) {
+  String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+  String statsPath = table.location() + STATS + table.name() + 
table.currentSnapshot().snapshotId();
+  if (statsSource.equals(PUFFIN)) {
+try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) {
+  if (fs.exists(new Path(statsPath))) {
+return true;
+  }
+} catch (IOException e) {
+  LOG.warn(e.getMessage());
+}
+  }
+}
+return false;
+  }
+
+  @Override
+  public List 
getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+switch (statsSource) {
+  case ICEBERG:
+// Place holder for iceberg stats
+break;
+  case PUFFIN:
+String snapshotId = table.name() + 
table.currentSnapshot().snapshotId();
+String statsPath = table.location() + STATS + snapshotId;
+LOG.info("Using stats from puffin file at:" + statsPath);
+try (PuffinReader reader = 
Puffin.read(table.io().newInputFile(statsPath)).build()) {
+  BlobMetadata blobMetadata = reader.fileMetadata().blobs().get(0);

Review Comment:
   could it be empty (IndexOutOfBoundsException)?





Issue Time Tracking
---

Worklog Id: (was: 853112)
Time Spent: 4h 10m  (was: 4h)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853111=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853111
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 09:13
Start Date: 27/Mar/23 09:13
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1149023043


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish 
partish) {
 return stats;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase();
+return statsSource.equals(PUFFIN);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+if (table.currentSnapshot() != null) {
+  String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+  String statsPath = table.location() + STATS + table.name() + 
table.currentSnapshot().snapshotId();
+  if (statsSource.equals(PUFFIN)) {
+try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) {
+  if (fs.exists(new Path(statsPath))) {
+return true;
+  }
+} catch (IOException e) {
+  LOG.warn(e.getMessage());
+}
+  }
+}
+return false;
+  }
+
+  @Override
+  public List 
getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+switch (statsSource) {
+  case ICEBERG:
+// Place holder for iceberg stats
+break;
+  case PUFFIN:
+String snapshotId = table.name() + 
table.currentSnapshot().snapshotId();
+String statsPath = table.location() + STATS + snapshotId;
+LOG.info("Using stats from puffin file at:" + statsPath);

Review Comment:
   debug level?





Issue Time Tracking
---

Worklog Id: (was: 853111)
Time Spent: 4h  (was: 3h 50m)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853110=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853110
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 09:13
Start Date: 27/Mar/23 09:13
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1149022350


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish 
partish) {
 return stats;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase();
+return statsSource.equals(PUFFIN);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+if (table.currentSnapshot() != null) {
+  String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+  String statsPath = table.location() + STATS + table.name() + 
table.currentSnapshot().snapshotId();
+  if (statsSource.equals(PUFFIN)) {
+try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) {
+  if (fs.exists(new Path(statsPath))) {
+return true;
+  }
+} catch (IOException e) {
+  LOG.warn(e.getMessage());
+}
+  }
+}
+return false;
+  }
+
+  @Override
+  public List 
getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+switch (statsSource) {
+  case ICEBERG:
+// Place holder for iceberg stats
+break;
+  case PUFFIN:
+String snapshotId = table.name() + 
table.currentSnapshot().snapshotId();
+String statsPath = table.location() + STATS + snapshotId;

Review Comment:
   could we extract path construction into a helper method?



##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish 
partish) {
 return stats;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase();
+return statsSource.equals(PUFFIN);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+if (table.currentSnapshot() != null) {
+  String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+  String statsPath = table.location() + STATS + table.name() + 
table.currentSnapshot().snapshotId();
+  if (statsSource.equals(PUFFIN)) {
+try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) {
+  if (fs.exists(new Path(statsPath))) {
+return true;
+  }
+} catch (IOException e) {
+  LOG.warn(e.getMessage());
+}
+  }
+}
+return false;
+  }
+
+  @Override
+  public List 
getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+switch (statsSource) {
+  case ICEBERG:
+// Place holder for iceberg stats
+break;
+  case PUFFIN:
+String snapshotId = table.name() + 
table.currentSnapshot().snapshotId();
+String statsPath = table.location() + STATS + snapshotId;

Review Comment:
   could we extract path construction into a helper method and reuse?





Issue Time Tracking
---

Worklog Id: (was: 853110)
Time Spent: 3h 50m  (was: 3h 40m)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL:

[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853108=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853108
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 09:09
Start Date: 27/Mar/23 09:09
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1149015944


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish 
partish) {
 return stats;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase();
+return statsSource.equals(PUFFIN);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+if (table.currentSnapshot() != null) {
+  String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+  String statsPath = table.location() + STATS + table.name() + 
table.currentSnapshot().snapshotId();
+  if (statsSource.equals(PUFFIN)) {
+try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) {
+  if (fs.exists(new Path(statsPath))) {
+return true;
+  }
+} catch (IOException e) {
+  LOG.warn(e.getMessage());
+}
+  }
+}
+return false;
+  }
+
+  @Override
+  public List 
getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();

Review Comment:
   enum?





Issue Time Tracking
---

Worklog Id: (was: 853108)
Time Spent: 3.5h  (was: 3h 20m)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853109=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853109
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 09:09
Start Date: 27/Mar/23 09:09
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1149016455


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish 
partish) {
 return stats;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase();
+return statsSource.equals(PUFFIN);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+if (table.currentSnapshot() != null) {
+  String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+  String statsPath = table.location() + STATS + table.name() + 
table.currentSnapshot().snapshotId();
+  if (statsSource.equals(PUFFIN)) {
+try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) {
+  if (fs.exists(new Path(statsPath))) {
+return true;
+  }
+} catch (IOException e) {
+  LOG.warn(e.getMessage());
+}
+  }
+}
+return false;
+  }
+
+  @Override
+  public List 
getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;

Review Comment:
   why is this needed?





Issue Time Tracking
---

Worklog Id: (was: 853109)
Time Spent: 3h 40m  (was: 3.5h)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27178) Backport of HIVE-23321 to branch-3

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27178?focusedWorklogId=853107=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853107
 ]

ASF GitHub Bot logged work on HIVE-27178:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 09:08
Start Date: 27/Mar/23 09:08
Worklog Time Spent: 10m 
  Work Description: amanraj2520 commented on PR #4157:
URL: https://github.com/apache/hive/pull/4157#issuecomment-1484779655

   @vihangk1 Can you please approve and merge this. This is a part of the fix 
for sysdb.q. The other part is fixed in #4156 




Issue Time Tracking
---

Worklog Id: (was: 853107)
Time Spent: 20m  (was: 10m)

> Backport of HIVE-23321 to branch-3
> --
>
> Key: HIVE-27178
> URL: https://issues.apache.org/jira/browse/HIVE-27178
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Aman Raj
>Assignee: Aman Raj
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Current branch-3 fails with the diff in select count(*) from 
> skewed_string_list and select count(*) from skewed_string_list_values. 
> Jenkins run : [jenkins / hive-precommit / PR-4156 / #1 
> (apache.org)|http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4156/1/tests/]
> Diff : 
> Client Execution succeeded but contained differences (error code = 1) after 
> executing sysdb.q 
> 3740d3739
> < hdfs://### HDFS PATH ### default public ROLE
> 4036c4035
> < 3
> ---
> > 6
> 4045c4044
> < 3
> ---
> > 6
>  
> This ticket tries to fix this diff. Please read the description of this 
> ticket for the exact reason.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853106=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853106
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 09:07
Start Date: 27/Mar/23 09:07
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1148989152


##
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##
@@ -2207,6 +2207,8 @@ public static enum ConfVars {
 "Whether to use codec pool in ORC. Disable if there are bugs with 
codec reuse."),
 HIVE_USE_STATS_FROM("hive.use.stats.from","iceberg","Use stats from 
iceberg table snapshot for query " +
 "planning. This has three values metastore, puffin and iceberg"),
+HIVE_COL_STATS_SOURCE("hive.col.stats.source","metastore","Use stats from 
puffin file for  query " +

Review Comment:
   - do we support `iceberg` mode? 
   - please update the description - "Use stats from the selected source for 
query planning" 
   
   what's the difference between `HIVE_USE_STATS_FROM` && 
`HIVE_COL_STATS_SOURCE `? it doesn't seem to be a generic config, should we add 
an ICEBERG prefix?





Issue Time Tracking
---

Worklog Id: (was: 853106)
Time Spent: 3h 20m  (was: 3h 10m)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work started] (HIVE-26655) VectorUDAFBloomFilterMerge should take care of safe batch handling when working in parallel

2023-03-27 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-26655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-26655 started by László Bodor.
---
> VectorUDAFBloomFilterMerge should take care of safe batch handling when 
> working in parallel
> ---
>
> Key: HIVE-26655
> URL: https://issues.apache.org/jira/browse/HIVE-26655
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sungwoo Park
>Assignee: László Bodor
>Priority: Major
>
> When tested with 100GB ORC tables, the number of rows returned by query 17 is 
> not stable. It returns fewer rows than the correct result (55 rows).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27057) Revert "HIVE-21741 Backport HIVE-20221 & related fix HIVE-20833 to branch-3"

2023-03-27 Thread Aman Raj (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Raj updated HIVE-27057:

Summary: Revert "HIVE-21741 Backport HIVE-20221 & related fix HIVE-20833 to 
branch-3"  (was: Test fix for sysdb.q)

> Revert "HIVE-21741 Backport HIVE-20221 & related fix HIVE-20833 to branch-3"
> 
>
> Key: HIVE-27057
> URL: https://issues.apache.org/jira/browse/HIVE-27057
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Aman Raj
>Assignee: Aman Raj
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The sysdb test fails with the following error:
> h4. Error
> Client Execution succeeded but contained differences (error code = 1) after 
> executing sysdb.q 
> 3803,3807c3803,3807
> < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@125b285b
> < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@471246f3
> < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@57c013
> < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@59f1d7ac
> < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@71a0
> ---
> > COLUMN_STATS_ACCURATE 
> > \{"BASIC_STATS":"true","COLUMN_STATS":{"c_boolean":"true","c_float":"true","c_int":"true","key":"true","value":"true"}}
> > COLUMN_STATS_ACCURATE 
> > \{"BASIC_STATS":"true","COLUMN_STATS":{"c_boolean":"true","c_float":"true","c_int":"true","key":"true","value":"true"}}
> > COLUMN_STATS_ACCURATE 
> > \{"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}}
> > COLUMN_STATS_ACCURATE 
> > \{"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}}
> > COLUMN_STATS_ACCURATE 
> > \{"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27057) Test fix for sysdb.q

2023-03-27 Thread Aman Raj (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Raj updated HIVE-27057:

Parent: HIVE-26836
Issue Type: Sub-task  (was: Test)

> Test fix for sysdb.q
> 
>
> Key: HIVE-27057
> URL: https://issues.apache.org/jira/browse/HIVE-27057
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Aman Raj
>Assignee: Aman Raj
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The sysdb test fails with the following error:
> h4. Error
> Client Execution succeeded but contained differences (error code = 1) after 
> executing sysdb.q 
> 3803,3807c3803,3807
> < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@125b285b
> < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@471246f3
> < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@57c013
> < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@59f1d7ac
> < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@71a0
> ---
> > COLUMN_STATS_ACCURATE 
> > \{"BASIC_STATS":"true","COLUMN_STATS":{"c_boolean":"true","c_float":"true","c_int":"true","key":"true","value":"true"}}
> > COLUMN_STATS_ACCURATE 
> > \{"BASIC_STATS":"true","COLUMN_STATS":{"c_boolean":"true","c_float":"true","c_int":"true","key":"true","value":"true"}}
> > COLUMN_STATS_ACCURATE 
> > \{"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}}
> > COLUMN_STATS_ACCURATE 
> > \{"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}}
> > COLUMN_STATS_ACCURATE 
> > \{"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-21741) Backport HIVE-20221 & related fix HIVE-20833 to branch-3: Increase column width for partition_params

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21741?focusedWorklogId=853105=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853105
 ]

ASF GitHub Bot logged work on HIVE-21741:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 09:03
Start Date: 27/Mar/23 09:03
Worklog Time Spent: 10m 
  Work Description: amanraj2520 commented on PR #4156:
URL: https://github.com/apache/hive/pull/4156#issuecomment-1484772726

   @vihangk1 As suggested by you, this revert fixed the BASIC_STATS printed in 
a json string issue. But there is another failure in the sysdb.q file which is 
why I have raised https://github.com/apache/hive/pull/4157. First we should 
merge this PR (https://github.com/apache/hive/pull/4157) and revert this. I 
have tested in my local. It is working fine. Can you please approve and merge 
the #4157 PR.




Issue Time Tracking
---

Worklog Id: (was: 853105)
Time Spent: 50m  (was: 40m)

> Backport HIVE-20221 & related fix HIVE-20833 to branch-3: Increase column 
> width for partition_params
> 
>
> Key: HIVE-21741
> URL: https://issues.apache.org/jira/browse/HIVE-21741
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore
>Affects Versions: 3.1.1
>Reporter: David Lavati
>Assignee: David Lavati
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0
>
> Attachments: HIVE-21741.01.branch-3.patch, 
> HIVE-21741.01.branch-3.patch, HIVE-21741.02.branch-3.patch, 
> HIVE-21741.branch-3.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This is an umbrella for backporting HIVE-20221 & the related fix of 
> HIVE-20833 to branch-3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853104=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853104
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 09:03
Start Date: 27/Mar/23 09:03
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1149008394


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish 
partish) {
 return stats;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase();
+return statsSource.equals(PUFFIN);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+if (table.currentSnapshot() != null) {
+  String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+  String statsPath = table.location() + STATS + table.name() + 
table.currentSnapshot().snapshotId();
+  if (statsSource.equals(PUFFIN)) {
+try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) {
+  if (fs.exists(new Path(statsPath))) {

Review Comment:
   can we create a path variable just once?
   
   Path statsPath = new Path(...);
   try (FileSystem fs = statsPath.getFileSystem(conf)) {
   return fs.exists(statsPath));
   }
   
   





Issue Time Tracking
---

Worklog Id: (was: 853104)
Time Spent: 3h 10m  (was: 3h)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853102=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853102
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 09:02
Start Date: 27/Mar/23 09:02
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1149008394


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish 
partish) {
 return stats;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase();
+return statsSource.equals(PUFFIN);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+if (table.currentSnapshot() != null) {
+  String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+  String statsPath = table.location() + STATS + table.name() + 
table.currentSnapshot().snapshotId();
+  if (statsSource.equals(PUFFIN)) {
+try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) {
+  if (fs.exists(new Path(statsPath))) {

Review Comment:
   can we create a path variable just once?
   
   Path statsPath = new Path(...);
   FileSystem fs = statsPath.getFileSystem(conf);
   return fs.exists(statsPath));
   
   





Issue Time Tracking
---

Worklog Id: (was: 853102)
Time Spent: 3h  (was: 2h 50m)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27178) Backport of HIVE-23321 to branch-3

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27178?focusedWorklogId=853101=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853101
 ]

ASF GitHub Bot logged work on HIVE-27178:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 09:01
Start Date: 27/Mar/23 09:01
Worklog Time Spent: 10m 
  Work Description: amanraj2520 opened a new pull request, #4157:
URL: https://github.com/apache/hive/pull/4157

   JIRA link : https://issues.apache.org/jira/browse/HIVE-27178
   
   Jenkins build link having the failure - 
http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4156/1/tests
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   




Issue Time Tracking
---

Worklog Id: (was: 853101)
Remaining Estimate: 0h
Time Spent: 10m

> Backport of HIVE-23321 to branch-3
> --
>
> Key: HIVE-27178
> URL: https://issues.apache.org/jira/browse/HIVE-27178
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Aman Raj
>Assignee: Aman Raj
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Current branch-3 fails with the diff in select count(*) from 
> skewed_string_list and select count(*) from skewed_string_list_values. 
> Jenkins run : [jenkins / hive-precommit / PR-4156 / #1 
> (apache.org)|http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4156/1/tests/]
> Diff : 
> Client Execution succeeded but contained differences (error code = 1) after 
> executing sysdb.q 
> 3740d3739
> < hdfs://### HDFS PATH ### default public ROLE
> 4036c4035
> < 3
> ---
> > 6
> 4045c4044
> < 3
> ---
> > 6
>  
> This ticket tries to fix this diff. Please read the description of this 
> ticket for the exact reason.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27178) Backport of HIVE-23321 to branch-3

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27178:
--
Labels: pull-request-available  (was: )

> Backport of HIVE-23321 to branch-3
> --
>
> Key: HIVE-27178
> URL: https://issues.apache.org/jira/browse/HIVE-27178
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Aman Raj
>Assignee: Aman Raj
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Current branch-3 fails with the diff in select count(*) from 
> skewed_string_list and select count(*) from skewed_string_list_values. 
> Jenkins run : [jenkins / hive-precommit / PR-4156 / #1 
> (apache.org)|http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4156/1/tests/]
> Diff : 
> Client Execution succeeded but contained differences (error code = 1) after 
> executing sysdb.q 
> 3740d3739
> < hdfs://### HDFS PATH ### default public ROLE
> 4036c4035
> < 3
> ---
> > 6
> 4045c4044
> < 3
> ---
> > 6
>  
> This ticket tries to fix this diff. Please read the description of this 
> ticket for the exact reason.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853100=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853100
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 08:58
Start Date: 27/Mar/23 08:58
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1149003439


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish 
partish) {
 return stats;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase();
+return statsSource.equals(PUFFIN);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+if (table.currentSnapshot() != null) {
+  String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+  String statsPath = table.location() + STATS + table.name() + 
table.currentSnapshot().snapshotId();
+  if (statsSource.equals(PUFFIN)) {

Review Comment:
   PUFFIN.equals(statsSource), maybe even better to create enum





Issue Time Tracking
---

Worklog Id: (was: 853100)
Time Spent: 2h 50m  (was: 2h 40m)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853099=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853099
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 08:57
Start Date: 27/Mar/23 08:57
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1148999605


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish 
partish) {
 return stats;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase();
+return statsSource.equals(PUFFIN);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+if (table.currentSnapshot() != null) {
+  String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+  String statsPath = table.location() + STATS + table.name() + 
table.currentSnapshot().snapshotId();

Review Comment:
   should we wrap table name with some delimiters like '-'? 
`STATS-customers-122121` is there some extension for the puffin file?
   btw, declare this vars under the PUFFIN code block





Issue Time Tracking
---

Worklog Id: (was: 853099)
Time Spent: 2h 40m  (was: 2.5h)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853098=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853098
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 08:56
Start Date: 27/Mar/23 08:56
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1148999605


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish 
partish) {
 return stats;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase();
+return statsSource.equals(PUFFIN);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+if (table.currentSnapshot() != null) {
+  String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+  String statsPath = table.location() + STATS + table.name() + 
table.currentSnapshot().snapshotId();

Review Comment:
   should we wrap table name with some delimiters like '-'? 
`STATS-customers-122121` is there some extension for the puffin file?





Issue Time Tracking
---

Worklog Id: (was: 853098)
Time Spent: 2.5h  (was: 2h 20m)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-27178) Backport of HIVE-23321 to branch-3

2023-03-27 Thread Aman Raj (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Raj reassigned HIVE-27178:
---


> Backport of HIVE-23321 to branch-3
> --
>
> Key: HIVE-27178
> URL: https://issues.apache.org/jira/browse/HIVE-27178
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Aman Raj
>Assignee: Aman Raj
>Priority: Major
>
> Current branch-3 fails with the diff in select count(*) from 
> skewed_string_list and select count(*) from skewed_string_list_values. 
> Jenkins run : [jenkins / hive-precommit / PR-4156 / #1 
> (apache.org)|http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4156/1/tests/]
> Diff : 
> Client Execution succeeded but contained differences (error code = 1) after 
> executing sysdb.q 
> 3740d3739
> < hdfs://### HDFS PATH ### default public ROLE
> 4036c4035
> < 3
> ---
> > 6
> 4045c4044
> < 3
> ---
> > 6
>  
> This ticket tries to fix this diff. Please read the description of this 
> ticket for the exact reason.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853097=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853097
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 08:55
Start Date: 27/Mar/23 08:55
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1148999605


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish 
partish) {
 return stats;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase();
+return statsSource.equals(PUFFIN);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+if (table.currentSnapshot() != null) {
+  String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+  String statsPath = table.location() + STATS + table.name() + 
table.currentSnapshot().snapshotId();

Review Comment:
   should we wrap table name with some delimiters like '-'?





Issue Time Tracking
---

Worklog Id: (was: 853097)
Time Spent: 2h 20m  (was: 2h 10m)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853096=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853096
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 08:53
Start Date: 27/Mar/23 08:53
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1148996651


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish 
partish) {
 return stats;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase();
+return statsSource.equals(PUFFIN);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;

Review Comment:
   why do we need local var?





Issue Time Tracking
---

Worklog Id: (was: 853096)
Time Spent: 2h 10m  (was: 2h)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853095=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853095
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 08:51
Start Date: 27/Mar/23 08:51
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1148993843


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish 
partish) {
 return stats;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase();
+return statsSource.equals(PUFFIN);

Review Comment:
   what if statsSource is undefined, could we get NPE?





Issue Time Tracking
---

Worklog Id: (was: 853095)
Time Spent: 2h  (was: 1h 50m)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853094=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853094
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 08:49
Start Date: 27/Mar/23 08:49
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1148989152


##
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##
@@ -2207,6 +2207,8 @@ public static enum ConfVars {
 "Whether to use codec pool in ORC. Disable if there are bugs with 
codec reuse."),
 HIVE_USE_STATS_FROM("hive.use.stats.from","iceberg","Use stats from 
iceberg table snapshot for query " +
 "planning. This has three values metastore, puffin and iceberg"),
+HIVE_COL_STATS_SOURCE("hive.col.stats.source","metastore","Use stats from 
puffin file for  query " +

Review Comment:
   - do we support `iceberg` mode? 
   - please update the description - "Use stats from selected source for query 
planning" 



##
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##
@@ -2207,6 +2207,8 @@ public static enum ConfVars {
 "Whether to use codec pool in ORC. Disable if there are bugs with 
codec reuse."),
 HIVE_USE_STATS_FROM("hive.use.stats.from","iceberg","Use stats from 
iceberg table snapshot for query " +
 "planning. This has three values metastore, puffin and iceberg"),
+HIVE_COL_STATS_SOURCE("hive.col.stats.source","metastore","Use stats from 
puffin file for  query " +

Review Comment:
   - do we support `iceberg` mode? 
   - please update the description - "Use stats from the selected source for 
query planning" 





Issue Time Tracking
---

Worklog Id: (was: 853094)
Time Spent: 1h 50m  (was: 1h 40m)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26655) VectorUDAFBloomFilterMerge should take care of safe batch handling when working in parallel

2023-03-27 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-26655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-26655:

Summary: VectorUDAFBloomFilterMerge should take care of safe batch handling 
when working in parallel  (was: TPC-DS query 17 returns wrong results)

> VectorUDAFBloomFilterMerge should take care of safe batch handling when 
> working in parallel
> ---
>
> Key: HIVE-26655
> URL: https://issues.apache.org/jira/browse/HIVE-26655
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sungwoo Park
>Assignee: László Bodor
>Priority: Major
>
> When tested with 100GB ORC tables, the number of rows returned by query 17 is 
> not stable. It returns fewer rows than the correct result (55 rows).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853070=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853070
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 08:00
Start Date: 27/Mar/23 08:00
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4131:
URL: https://github.com/apache/hive/pull/4131#issuecomment-1484679758

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4131)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4131=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4131=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4131=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=CODE_SMELL)
 [6 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4131=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4131=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 853070)
Time Spent: 1h 40m  (was: 1.5h)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-27168) Use basename of the datatype when fetching partition metadata using partition filters

2023-03-27 Thread Sourabh Badhya (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705212#comment-17705212
 ] 

Sourabh Badhya commented on HIVE-27168:
---

Thanks [~kokila19] , [~rkirtir] , [~akshatm] , [~InvisibleProgrammer] , 
[~veghlaci05] , [~dkuzmenko] for the reviews.

> Use basename of the datatype when fetching partition metadata using partition 
> filters
> -
>
> Key: HIVE-27168
> URL: https://issues.apache.org/jira/browse/HIVE-27168
> Project: Hive
>  Issue Type: Bug
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> While fetching partition metadata using partition filters, we use the column 
> type of the table directly. However, char/varchar types can contain extra 
> information such as length of the char/varchar column and hence it skips 
> fetching partition metadata due to this extra information.
> Solution: Use the basename of the column type while deciding on whether 
> partition pruning can be done on the partitioned column.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-21741) Backport HIVE-20221 & related fix HIVE-20833 to branch-3: Increase column width for partition_params

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21741?focusedWorklogId=853062=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853062
 ]

ASF GitHub Bot logged work on HIVE-21741:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 07:17
Start Date: 27/Mar/23 07:17
Worklog Time Spent: 10m 
  Work Description: amanraj2520 opened a new pull request, #4156:
URL: https://github.com/apache/hive/pull/4156

   …anch-3: Increase column width for partition_params (David Lavati via Alan 
Gates)"
   
   This reverts commit e3d5abda
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   




Issue Time Tracking
---

Worklog Id: (was: 853062)
Time Spent: 40m  (was: 0.5h)

> Backport HIVE-20221 & related fix HIVE-20833 to branch-3: Increase column 
> width for partition_params
> 
>
> Key: HIVE-21741
> URL: https://issues.apache.org/jira/browse/HIVE-21741
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore
>Affects Versions: 3.1.1
>Reporter: David Lavati
>Assignee: David Lavati
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0
>
> Attachments: HIVE-21741.01.branch-3.patch, 
> HIVE-21741.01.branch-3.patch, HIVE-21741.02.branch-3.patch, 
> HIVE-21741.branch-3.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This is an umbrella for backporting HIVE-20221 & the related fix of 
> HIVE-20833 to branch-3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27174) Disable sysdb.q test

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27174?focusedWorklogId=853056=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853056
 ]

ASF GitHub Bot logged work on HIVE-27174:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 06:45
Start Date: 27/Mar/23 06:45
Worklog Time Spent: 10m 
  Work Description: amanraj2520 commented on PR #4152:
URL: https://github.com/apache/hive/pull/4152#issuecomment-1484590603

   @vihangk1 I see that in this #HIVE-21741, there were some changes related to 
INDEX_PARAMS from varchar to Clob. My hunch is that somewhere because of that 
this issue comes. Any luck from your side.




Issue Time Tracking
---

Worklog Id: (was: 853056)
Time Spent: 20m  (was: 10m)

> Disable sysdb.q test
> 
>
> Key: HIVE-27174
> URL: https://issues.apache.org/jira/browse/HIVE-27174
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Aman Raj
>Assignee: Aman Raj
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> h3. What changes were proposed in this pull request?
> Disabled sysdb.q test. The test is failing because of diff in 
> BASIC_COLUMN_STATS json string.
> Client Execution succeeded but contained differences (error code = 1) after 
> executing sysdb.q
> 3803,3807c3803,3807
> < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@125b285b
> < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@471246f3
> < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@57c013
> < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@59f1d7ac
> < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@71a0
> —
> {quote}COLUMN_STATS_ACCURATE 
> \{"BASIC_STATS":"true","COLUMN_STATS":{"c_boolean":"true","c_float":"true","c_int":"true","key":"true","value":"true"}}
> COLUMN_STATS_ACCURATE 
> \{"BASIC_STATS":"true","COLUMN_STATS":{"c_boolean":"true","c_float":"true","c_int":"true","key":"true","value":"true"}}
> COLUMN_STATS_ACCURATE 
> \{"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}}
> COLUMN_STATS_ACCURATE 
> \{"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}}
> COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":
> {quote}
> h3. Why are the changes needed?
> There is no issue in the test. The current code prints the COL_STATS as an 
> Object instead of a json string. Not sure why is this case. Tried a lot of 
> ways but seems like this is not fixable at the moment. So, disabling it for 
> now. Note that, in Hive 3.1.3 release this test was disabled so there should 
> not be any issue in disabling it here.
>  
>  
> Created a followup ticket to fix this test that can be taken up later - 
> [HIVE-27057] Test fix for sysdb.q - ASF JIRA (apache.org)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27150) Drop single partition can also support direct sql

2023-03-27 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27150?focusedWorklogId=853049=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853049
 ]

ASF GitHub Bot logged work on HIVE-27150:
-

Author: ASF GitHub Bot
Created on: 27/Mar/23 06:08
Start Date: 27/Mar/23 06:08
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera commented on code in PR #4123:
URL: https://github.com/apache/hive/pull/4123#discussion_r1148824344


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java:
##
@@ -3101,6 +3100,22 @@ public boolean dropPartition(String catName, String 
dbName, String tableName,
 return success;
   }
 
+  @Override
+  public boolean dropPartition(String catName, String dbName, String 
tableName, String partName)
+  throws MetaException, NoSuchObjectException, InvalidObjectException, 
InvalidInputException {
+boolean success = false;
+try {
+  openTransaction();
+  dropPartitionsInternal(catName, dbName, tableName, 
Arrays.asList(partName), true, true);

Review Comment:
   I don't think this would improve the performance by any means. Consider 
dropping 10k partitions, each partition would have to access same number of 
tables in the underlying db to update the records, so it makes sense to batch 
them and implement with direct SQL. But for a single partition since it'll 
access same number of tables in the DB, I don't think it'll make sense to 
implement this feature.
   For example, [HIVE-26035](https://issues.apache.org/jira/browse/HIVE-26035) 
(see the details in the jira) proved that implementing direct SQL actually 
improved the performance by running against benchmark tests.
   Similarly can you provide any evidence that this patch also has an edge by 
running those tests? (Probably you might have to add some tests(e.g: dropping 
10K+ single partitions).





Issue Time Tracking
---

Worklog Id: (was: 853049)
Time Spent: 1h 20m  (was: 1h 10m)

> Drop single partition can also support direct sql
> -
>
> Key: HIVE-27150
> URL: https://issues.apache.org/jira/browse/HIVE-27150
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> *Background:*
> [HIVE-6980|https://issues.apache.org/jira/browse/HIVE-6980] supports direct 
> sql for drop_partitions, we can reuse this huge improvement in drop_partition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

85 matches

Mail list logo