[jira] [Work logged] (HIVE-27150) Drop single partition can also support direct sql
[ https://issues.apache.org/jira/browse/HIVE-27150?focusedWorklogId=853290=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853290 ] ASF GitHub Bot logged work on HIVE-27150: - Author: ASF GitHub Bot Created on: 28/Mar/23 05:37 Start Date: 28/Mar/23 05:37 Worklog Time Spent: 10m Work Description: saihemanth-cloudera commented on code in PR #4123: URL: https://github.com/apache/hive/pull/4123#discussion_r1150050461 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java: ## @@ -3101,6 +3100,22 @@ public boolean dropPartition(String catName, String dbName, String tableName, return success; } + @Override + public boolean dropPartition(String catName, String dbName, String tableName, String partName) + throws MetaException, NoSuchObjectException, InvalidObjectException, InvalidInputException { +boolean success = false; +try { + openTransaction(); + dropPartitionsInternal(catName, dbName, tableName, Arrays.asList(partName), true, true); Review Comment: cc @VenuReddy2103 Issue Time Tracking --- Worklog Id: (was: 853290) Time Spent: 1.5h (was: 1h 20m) > Drop single partition can also support direct sql > - > > Key: HIVE-27150 > URL: https://issues.apache.org/jira/browse/HIVE-27150 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Wechar >Assignee: Wechar >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > *Background:* > [HIVE-6980|https://issues.apache.org/jira/browse/HIVE-6980] supports direct > sql for drop_partitions, we can reuse this huge improvement in drop_partition. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27180) Remove JsonSerde from hcatalog, Upgrade should update changed FQN for JsonSerDe in HMS DB
[ https://issues.apache.org/jira/browse/HIVE-27180?focusedWorklogId=853288=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853288 ] ASF GitHub Bot logged work on HIVE-27180: - Author: ASF GitHub Bot Created on: 28/Mar/23 04:47 Start Date: 28/Mar/23 04:47 Worklog Time Spent: 10m Work Description: rtrivedi12 commented on code in PR #4159: URL: https://github.com/apache/hive/pull/4159#discussion_r1150021419 ## standalone-metastore/metastore-server/src/main/sql/oracle/upgrade-3.2.0-to-4.0.0-alpha-1.oracle.sql: ## @@ -149,6 +149,9 @@ CREATE INDEX CTLG_NAME_DBS ON DBS(CTLG_NAME); Issue Time Tracking --- Worklog Id: (was: 853288) Time Spent: 1h (was: 50m) > Remove JsonSerde from hcatalog, Upgrade should update changed FQN for > JsonSerDe in HMS DB > -- > > Key: HIVE-27180 > URL: https://issues.apache.org/jira/browse/HIVE-27180 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Riju Trivedi >Assignee: Riju Trivedi >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > As Hcatalog JsonSerDe uses the "serde2" version as a back end, Remove > o{*}rg.apache.hive.hcatalog.data.JsonSerDe{*} from hive-hcatalog. Fix tests > to use the new Serde class org.apache.hadoop.hive.serde2.JsonSerDe. Hive > Upgrade schema script can update the SERDES table to alter the class name to > the new class name, the old tables would work automatically. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27180) Remove JsonSerde from hcatalog, Upgrade should update changed FQN for JsonSerDe in HMS DB
[ https://issues.apache.org/jira/browse/HIVE-27180?focusedWorklogId=853287=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853287 ] ASF GitHub Bot logged work on HIVE-27180: - Author: ASF GitHub Bot Created on: 28/Mar/23 04:47 Start Date: 28/Mar/23 04:47 Worklog Time Spent: 10m Work Description: rtrivedi12 commented on code in PR #4159: URL: https://github.com/apache/hive/pull/4159#discussion_r1150021322 ## standalone-metastore/metastore-server/src/main/sql/mssql/upgrade-3.2.0-to-4.0.0-alpha-1.mssql.sql: ## @@ -176,6 +176,9 @@ ALTER TABLE COMPACTION_QUEUE ADD CQ_COMMIT_TIME bigint NULL; Issue Time Tracking --- Worklog Id: (was: 853287) Time Spent: 50m (was: 40m) > Remove JsonSerde from hcatalog, Upgrade should update changed FQN for > JsonSerDe in HMS DB > -- > > Key: HIVE-27180 > URL: https://issues.apache.org/jira/browse/HIVE-27180 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Riju Trivedi >Assignee: Riju Trivedi >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > As Hcatalog JsonSerDe uses the "serde2" version as a back end, Remove > o{*}rg.apache.hive.hcatalog.data.JsonSerDe{*} from hive-hcatalog. Fix tests > to use the new Serde class org.apache.hadoop.hive.serde2.JsonSerDe. Hive > Upgrade schema script can update the SERDES table to alter the class name to > the new class name, the old tables would work automatically. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27180) Remove JsonSerde from hcatalog, Upgrade should update changed FQN for JsonSerDe in HMS DB
[ https://issues.apache.org/jira/browse/HIVE-27180?focusedWorklogId=853286=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853286 ] ASF GitHub Bot logged work on HIVE-27180: - Author: ASF GitHub Bot Created on: 28/Mar/23 04:46 Start Date: 28/Mar/23 04:46 Worklog Time Spent: 10m Work Description: rtrivedi12 commented on code in PR #4159: URL: https://github.com/apache/hive/pull/4159#discussion_r1150021190 ## standalone-metastore/metastore-server/src/main/sql/derby/upgrade-3.2.0-to-4.0.0-alpha-1.derby.sql: ## @@ -152,6 +152,9 @@ ALTER TABLE COMPACTION_QUEUE ADD CQ_COMMIT_TIME bigint; Issue Time Tracking --- Worklog Id: (was: 853286) Time Spent: 40m (was: 0.5h) > Remove JsonSerde from hcatalog, Upgrade should update changed FQN for > JsonSerDe in HMS DB > -- > > Key: HIVE-27180 > URL: https://issues.apache.org/jira/browse/HIVE-27180 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Riju Trivedi >Assignee: Riju Trivedi >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > As Hcatalog JsonSerDe uses the "serde2" version as a back end, Remove > o{*}rg.apache.hive.hcatalog.data.JsonSerDe{*} from hive-hcatalog. Fix tests > to use the new Serde class org.apache.hadoop.hive.serde2.JsonSerDe. Hive > Upgrade schema script can update the SERDES table to alter the class name to > the new class name, the old tables would work automatically. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27179) HS2 WebUI throws NPE when JspFactory loaded from jetty-runner
[ https://issues.apache.org/jira/browse/HIVE-27179?focusedWorklogId=853285=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853285 ] ASF GitHub Bot logged work on HIVE-27179: - Author: ASF GitHub Bot Created on: 28/Mar/23 04:13 Start Date: 28/Mar/23 04:13 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #4164: URL: https://github.com/apache/hive/pull/4164#issuecomment-1486190580 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive=4164) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4164=false=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4164=false=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4164=false=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4164=false=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4164=false=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4164=false=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4164=false=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4164=false=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4164=false=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4164=false=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4164=false=CODE_SMELL) [0 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive=4164=false=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive=4164=coverage=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive=4164=duplicated_lines_density=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 853285) Time Spent: 20m (was: 10m) > HS2 WebUI throws NPE when JspFactory loaded from jetty-runner > - > > Key: HIVE-27179 > URL: https://issues.apache.org/jira/browse/HIVE-27179 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > In HIVE-17088{*},{*} we resolved a NPE thrown from HS2 WebUI by introducing > javax.servlet.jsp-api. It works as expected when the javax.servlet.jsp-api > jar prevails jetty-runner jar, but things can be different in some > environments, it still throws NPE when opening the HS2 web: > {noformat} > java.lang.NullPointerException > at > org.apache.hive.generated.hiveserver2.hiveserver2_jsp._jspService(hiveserver2_jsp.java:286) > > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:71) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.eclipse.jetty.servlet.ServletHolder$NotAsync.service(ServletHolder.java:1443) > > at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:791) > at > org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1626) > ...{noformat} > The jetty-runner JspFactory.getDefaultFactory() just returns null. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27179) HS2 WebUI throws NPE when JspFactory loaded from jetty-runner
[ https://issues.apache.org/jira/browse/HIVE-27179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-27179: -- Labels: pull-request-available (was: ) > HS2 WebUI throws NPE when JspFactory loaded from jetty-runner > - > > Key: HIVE-27179 > URL: https://issues.apache.org/jira/browse/HIVE-27179 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In HIVE-17088{*},{*} we resolved a NPE thrown from HS2 WebUI by introducing > javax.servlet.jsp-api. It works as expected when the javax.servlet.jsp-api > jar prevails jetty-runner jar, but things can be different in some > environments, it still throws NPE when opening the HS2 web: > {noformat} > java.lang.NullPointerException > at > org.apache.hive.generated.hiveserver2.hiveserver2_jsp._jspService(hiveserver2_jsp.java:286) > > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:71) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.eclipse.jetty.servlet.ServletHolder$NotAsync.service(ServletHolder.java:1443) > > at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:791) > at > org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1626) > ...{noformat} > The jetty-runner JspFactory.getDefaultFactory() just returns null. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27179) HS2 WebUI throws NPE when JspFactory loaded from jetty-runner
[ https://issues.apache.org/jira/browse/HIVE-27179?focusedWorklogId=853283=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853283 ] ASF GitHub Bot logged work on HIVE-27179: - Author: ASF GitHub Bot Created on: 28/Mar/23 03:21 Start Date: 28/Mar/23 03:21 Worklog Time Spent: 10m Work Description: dengzhhu653 opened a new pull request, #4164: URL: https://github.com/apache/hive/pull/4164 ### What changes were proposed in this pull request? ### Why are the changes needed? When the jetty-runner jar defeats the javax.servlet.jsp-api jar for loading JspFactory, a NPE will be here when opening the HS2's home page. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Apply and compile the changes, place the jetty-runner jar ahead of CLASSPATH, restart the affected HS2, the NPE is gone. Issue Time Tracking --- Worklog Id: (was: 853283) Remaining Estimate: 0h Time Spent: 10m > HS2 WebUI throws NPE when JspFactory loaded from jetty-runner > - > > Key: HIVE-27179 > URL: https://issues.apache.org/jira/browse/HIVE-27179 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Zhihua Deng >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In HIVE-17088{*},{*} we resolved a NPE thrown from HS2 WebUI by introducing > javax.servlet.jsp-api. It works as expected when the javax.servlet.jsp-api > jar prevails jetty-runner jar, but things can be different in some > environments, it still throws NPE when opening the HS2 web: > {noformat} > java.lang.NullPointerException > at > org.apache.hive.generated.hiveserver2.hiveserver2_jsp._jspService(hiveserver2_jsp.java:286) > > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:71) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.eclipse.jetty.servlet.ServletHolder$NotAsync.service(ServletHolder.java:1443) > > at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:791) > at > org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1626) > ...{noformat} > The jetty-runner JspFactory.getDefaultFactory() just returns null. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27180) Remove JsonSerde from hcatalog, Upgrade should update changed FQN for JsonSerDe in HMS DB
[ https://issues.apache.org/jira/browse/HIVE-27180?focusedWorklogId=853281=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853281 ] ASF GitHub Bot logged work on HIVE-27180: - Author: ASF GitHub Bot Created on: 28/Mar/23 03:10 Start Date: 28/Mar/23 03:10 Worklog Time Spent: 10m Work Description: nrg4878 commented on code in PR #4159: URL: https://github.com/apache/hive/pull/4159#discussion_r1149977911 ## standalone-metastore/metastore-server/src/main/sql/mssql/upgrade-3.2.0-to-4.0.0-alpha-1.mssql.sql: ## @@ -176,6 +176,9 @@ ALTER TABLE COMPACTION_QUEUE ADD CQ_COMMIT_TIME bigint NULL; Issue Time Tracking --- Worklog Id: (was: 853281) Time Spent: 0.5h (was: 20m) > Remove JsonSerde from hcatalog, Upgrade should update changed FQN for > JsonSerDe in HMS DB > -- > > Key: HIVE-27180 > URL: https://issues.apache.org/jira/browse/HIVE-27180 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Riju Trivedi >Assignee: Riju Trivedi >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > As Hcatalog JsonSerDe uses the "serde2" version as a back end, Remove > o{*}rg.apache.hive.hcatalog.data.JsonSerDe{*} from hive-hcatalog. Fix tests > to use the new Serde class org.apache.hadoop.hive.serde2.JsonSerDe. Hive > Upgrade schema script can update the SERDES table to alter the class name to > the new class name, the old tables would work automatically. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27183) Iceberg: Table information is loaded multiple times
[ https://issues.apache.org/jira/browse/HIVE-27183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HIVE-27183: Description: HMS::getTable invokes "HiveIcebergMetaHook::postGetTable" which internally loads iceberg table again. If this isn't needed or needed only for show-create-table, do not load the table again. Note: It looks like it invokes loadTable around 6 times during entire planning (semAnalyzer, stats etc). Attached the snapshot for reference. {noformat} at jdk.internal.misc.Unsafe.park(java.base@11.0.18/Native Method) - parking to wait for <0x00066f84eef0> (a java.util.concurrent.CompletableFuture$Signaller) at java.util.concurrent.locks.LockSupport.park(java.base@11.0.18/LockSupport.java:194) at java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.18/CompletableFuture.java:1796) at java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.18/ForkJoinPool.java:3128) at java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.18/CompletableFuture.java:1823) at java.util.concurrent.CompletableFuture.get(java.base@11.0.18/CompletableFuture.java:1998) at org.apache.hadoop.util.functional.FutureIO.awaitFuture(FutureIO.java:77) at org.apache.iceberg.hadoop.HadoopInputFile.newStream(HadoopInputFile.java:196) at org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:263) at org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:258) at org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$0(BaseMetastoreTableOperations.java:177) at org.apache.iceberg.BaseMetastoreTableOperations$$Lambda$609/0x000840e18040.apply(Unknown Source) at org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$1(BaseMetastoreTableOperations.java:191) at org.apache.iceberg.BaseMetastoreTableOperations$$Lambda$610/0x000840e18440.run(Unknown Source) at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:404) at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:214) at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:198) at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:190) at org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:191) at org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:176) at org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:171) at org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:153) at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:96) at org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:79) at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:44) at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:115) at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:105) at org.apache.iceberg.mr.hive.IcebergTableUtil.lambda$getTable$1(IcebergTableUtil.java:99) at org.apache.iceberg.mr.hive.IcebergTableUtil$$Lambda$552/0x000840d59840.apply(Unknown Source) at org.apache.iceberg.mr.hive.IcebergTableUtil.lambda$getTable$4(IcebergTableUtil.java:111) at org.apache.iceberg.mr.hive.IcebergTableUtil$$Lambda$557/0x000840d58c40.get(Unknown Source) at java.util.Optional.orElseGet(java.base@11.0.18/Optional.java:369) at org.apache.iceberg.mr.hive.IcebergTableUtil.getTable(IcebergTableUtil.java:108) at org.apache.iceberg.mr.hive.IcebergTableUtil.getTable(IcebergTableUtil.java:69) at org.apache.iceberg.mr.hive.IcebergTableUtil.getTable(IcebergTableUtil.java:73) at org.apache.iceberg.mr.hive.HiveIcebergMetaHook.postGetTable(HiveIcebergMetaHook.java:931) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.executePostGetTableHook(HiveMetaStoreClient.java:2638) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:2624) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getTable(SessionHiveMetaStoreClient.java:267) at jdk.internal.reflect.GeneratedMethodAccessor137.invoke(Unknown Source) at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.18/DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(java.base@11.0.18/Method.java:566) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:216) at com.sun.proxy.$Proxy56.getTable(Unknown Source) at jdk.internal.reflect.GeneratedMethodAccessor137.invoke(Unknown Source) at
[jira] [Updated] (HIVE-27183) Iceberg: Table information is loaded multiple times
[ https://issues.apache.org/jira/browse/HIVE-27183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HIVE-27183: Attachment: Screenshot 2023-03-28 at 8.13.52 AM.png > Iceberg: Table information is loaded multiple times > --- > > Key: HIVE-27183 > URL: https://issues.apache.org/jira/browse/HIVE-27183 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Major > Labels: performance > Attachments: Screenshot 2023-03-28 at 8.13.52 AM.png, > hs2_iceberg_load.html > > > HMS::getTable invokes "HiveIcebergMetaHook::postGetTable" which internally > loads iceberg table again. > If this isn't needed or needed only for show-create-table, do not load the > table again. > > Note: It looks like it invokes loadTable around 6 times during entire > planning (semAnalyzer, stats etc). Attached the snapshot for reference. > > {noformat} > at jdk.internal.misc.Unsafe.park(java.base@11.0.18/Native Method) > - parking to wait for <0x00066f84eef0> (a > java.util.concurrent.CompletableFuture$Signaller) > at > java.util.concurrent.locks.LockSupport.park(java.base@11.0.18/LockSupport.java:194) > at > java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.18/CompletableFuture.java:1796) > at > java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.18/ForkJoinPool.java:3128) > at > java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.18/CompletableFuture.java:1823) > at > java.util.concurrent.CompletableFuture.get(java.base@11.0.18/CompletableFuture.java:1998) > at > org.apache.hadoop.util.functional.FutureIO.awaitFuture(FutureIO.java:77) > at > org.apache.iceberg.hadoop.HadoopInputFile.newStream(HadoopInputFile.java:196) > at > org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:263) > at > org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:258) > at > org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$0(BaseMetastoreTableOperations.java:177) > at > org.apache.iceberg.BaseMetastoreTableOperations$$Lambda$609/0x000840e18040.apply(Unknown > Source) > at > org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$1(BaseMetastoreTableOperations.java:191) > at > org.apache.iceberg.BaseMetastoreTableOperations$$Lambda$610/0x000840e18440.run(Unknown > Source) > at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:404) > at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:214) > at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:198) > at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:190) > at > org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:191) > at > org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:176) > at > org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:171) > at > org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:153) > at > org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:96) > at > org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:79) > at > org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:44) > at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:115) > at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:105) > at > org.apache.iceberg.mr.hive.IcebergTableUtil.lambda$getTable$1(IcebergTableUtil.java:99) > at > org.apache.iceberg.mr.hive.IcebergTableUtil$$Lambda$552/0x000840d59840.apply(Unknown > Source) > at > org.apache.iceberg.mr.hive.IcebergTableUtil.lambda$getTable$4(IcebergTableUtil.java:111) > at > org.apache.iceberg.mr.hive.IcebergTableUtil$$Lambda$557/0x000840d58c40.get(Unknown > Source) > at java.util.Optional.orElseGet(java.base@11.0.18/Optional.java:369) > at > org.apache.iceberg.mr.hive.IcebergTableUtil.getTable(IcebergTableUtil.java:108) > at > org.apache.iceberg.mr.hive.IcebergTableUtil.getTable(IcebergTableUtil.java:69) > at > org.apache.iceberg.mr.hive.IcebergTableUtil.getTable(IcebergTableUtil.java:73) > at > org.apache.iceberg.mr.hive.HiveIcebergMetaHook.postGetTable(HiveIcebergMetaHook.java:931) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.executePostGetTableHook(HiveMetaStoreClient.java:2638) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:2624) > at >
[jira] [Updated] (HIVE-27183) Iceberg: Table information is loaded multiple times
[ https://issues.apache.org/jira/browse/HIVE-27183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HIVE-27183: Attachment: hs2_iceberg_load.html > Iceberg: Table information is loaded multiple times > --- > > Key: HIVE-27183 > URL: https://issues.apache.org/jira/browse/HIVE-27183 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Major > Labels: performance > Attachments: hs2_iceberg_load.html > > > HMS::getTable invokes "HiveIcebergMetaHook::postGetTable" which internally > loads iceberg table again. > If this isn't needed or needed only for show-create-table, do not load the > table again. > {noformat} > at jdk.internal.misc.Unsafe.park(java.base@11.0.18/Native Method) > - parking to wait for <0x00066f84eef0> (a > java.util.concurrent.CompletableFuture$Signaller) > at > java.util.concurrent.locks.LockSupport.park(java.base@11.0.18/LockSupport.java:194) > at > java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.18/CompletableFuture.java:1796) > at > java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.18/ForkJoinPool.java:3128) > at > java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.18/CompletableFuture.java:1823) > at > java.util.concurrent.CompletableFuture.get(java.base@11.0.18/CompletableFuture.java:1998) > at > org.apache.hadoop.util.functional.FutureIO.awaitFuture(FutureIO.java:77) > at > org.apache.iceberg.hadoop.HadoopInputFile.newStream(HadoopInputFile.java:196) > at > org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:263) > at > org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:258) > at > org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$0(BaseMetastoreTableOperations.java:177) > at > org.apache.iceberg.BaseMetastoreTableOperations$$Lambda$609/0x000840e18040.apply(Unknown > Source) > at > org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$1(BaseMetastoreTableOperations.java:191) > at > org.apache.iceberg.BaseMetastoreTableOperations$$Lambda$610/0x000840e18440.run(Unknown > Source) > at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:404) > at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:214) > at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:198) > at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:190) > at > org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:191) > at > org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:176) > at > org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:171) > at > org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:153) > at > org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:96) > at > org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:79) > at > org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:44) > at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:115) > at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:105) > at > org.apache.iceberg.mr.hive.IcebergTableUtil.lambda$getTable$1(IcebergTableUtil.java:99) > at > org.apache.iceberg.mr.hive.IcebergTableUtil$$Lambda$552/0x000840d59840.apply(Unknown > Source) > at > org.apache.iceberg.mr.hive.IcebergTableUtil.lambda$getTable$4(IcebergTableUtil.java:111) > at > org.apache.iceberg.mr.hive.IcebergTableUtil$$Lambda$557/0x000840d58c40.get(Unknown > Source) > at java.util.Optional.orElseGet(java.base@11.0.18/Optional.java:369) > at > org.apache.iceberg.mr.hive.IcebergTableUtil.getTable(IcebergTableUtil.java:108) > at > org.apache.iceberg.mr.hive.IcebergTableUtil.getTable(IcebergTableUtil.java:69) > at > org.apache.iceberg.mr.hive.IcebergTableUtil.getTable(IcebergTableUtil.java:73) > at > org.apache.iceberg.mr.hive.HiveIcebergMetaHook.postGetTable(HiveIcebergMetaHook.java:931) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.executePostGetTableHook(HiveMetaStoreClient.java:2638) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:2624) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getTable(SessionHiveMetaStoreClient.java:267) > at jdk.internal.reflect.GeneratedMethodAccessor137.invoke(Unknown Source) > at >
[jira] [Work logged] (HIVE-26956) Improv find_in_set function
[ https://issues.apache.org/jira/browse/HIVE-26956?focusedWorklogId=853276=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853276 ] ASF GitHub Bot logged work on HIVE-26956: - Author: ASF GitHub Bot Created on: 28/Mar/23 00:20 Start Date: 28/Mar/23 00:20 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #3961: HIVE-26956: Improve find_in_set function URL: https://github.com/apache/hive/pull/3961 Issue Time Tracking --- Worklog Id: (was: 853276) Time Spent: 1h 10m (was: 1h) > Improv find_in_set function > --- > > Key: HIVE-26956 > URL: https://issues.apache.org/jira/browse/HIVE-26956 > Project: Hive > Issue Type: Improvement >Reporter: Bingye Chen >Assignee: Bingye Chen >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > Improv find_in_set function -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26997) Iceberg: Vectorization gets disabled at runtime in merge-into statements
[ https://issues.apache.org/jira/browse/HIVE-26997?focusedWorklogId=853274=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853274 ] ASF GitHub Bot logged work on HIVE-26997: - Author: ASF GitHub Bot Created on: 28/Mar/23 00:14 Start Date: 28/Mar/23 00:14 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #4162: URL: https://github.com/apache/hive/pull/4162#issuecomment-1486029917 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive=4162) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4162=false=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4162=false=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4162=false=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4162=false=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4162=false=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4162=false=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4162=false=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4162=false=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4162=false=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4162=false=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4162=false=CODE_SMELL) [0 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive=4162=false=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive=4162=coverage=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive=4162=duplicated_lines_density=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 853274) Time Spent: 1h (was: 50m) > Iceberg: Vectorization gets disabled at runtime in merge-into statements > > > Key: HIVE-26997 > URL: https://issues.apache.org/jira/browse/HIVE-26997 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: Rajesh Balamohan >Assignee: Zsolt Miskolczi >Priority: Major > Labels: pull-request-available > Attachments: explain_merge_into.txt > > Time Spent: 1h > Remaining Estimate: 0h > > *Query:* > Think of "ssv" table as a table containing trickle feed data in the following > query. "store_sales_delete_1" is the destination table. > > {noformat} > MERGE INTO tpcds_1000_iceberg_mor_v4.store_sales_delete_1 t USING > tpcds_1000_update.ssv s ON (t.ss_item_sk = s.ss_item_sk > > AND t.ss_customer_sk=s.ss_customer_sk > > AND t.ss_sold_date_sk = "2451181" > > AND ((Floor((s.ss_item_sk) / 1000) * 1000) BETWEEN 1000 AND > 2000) > > AND s.ss_ext_discount_amt < 0.0) WHEN
[jira] [Work logged] (HIVE-26905) Backport HIVE-25173 to 3.2.0: Exclude pentaho-aggdesigner-algorithm from upgrade-acid build.
[ https://issues.apache.org/jira/browse/HIVE-26905?focusedWorklogId=853273=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853273 ] ASF GitHub Bot logged work on HIVE-26905: - Author: ASF GitHub Bot Created on: 27/Mar/23 23:56 Start Date: 27/Mar/23 23:56 Worklog Time Spent: 10m Work Description: cnauroth commented on PR #4163: URL: https://github.com/apache/hive/pull/4163#issuecomment-1486012078 Hello @zabetak . You previously approved this change in #3911 : https://github.com/apache/hive/pull/3911#pullrequestreview-1237668110 However, I just realized it was not actually merged. Could you please take another look? Thank you. Issue Time Tracking --- Worklog Id: (was: 853273) Time Spent: 1h (was: 50m) > Backport HIVE-25173 to 3.2.0: Exclude pentaho-aggdesigner-algorithm from > upgrade-acid build. > > > Key: HIVE-26905 > URL: https://issues.apache.org/jira/browse/HIVE-26905 > Project: Hive > Issue Type: Bug > Components: Build Infrastructure >Reporter: Chris Nauroth >Assignee: Chris Nauroth >Priority: Major > Labels: hive-3.2.0-must, pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > In the current branch-3, upgrade-acid has a dependency on an old hive-exec > version that has a transitive dependency to > org.pentaho:pentaho-aggdesigner-algorithm. This artifact is no longer > available in commonly supported Maven repositories, which causes a build > failure. We can safely exclude the dependency, as was originally done in > HIVE-25173. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26905) Backport HIVE-25173 to 3.2.0: Exclude pentaho-aggdesigner-algorithm from upgrade-acid build.
[ https://issues.apache.org/jira/browse/HIVE-26905?focusedWorklogId=853272=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853272 ] ASF GitHub Bot logged work on HIVE-26905: - Author: ASF GitHub Bot Created on: 27/Mar/23 23:54 Start Date: 27/Mar/23 23:54 Worklog Time Spent: 10m Work Description: cnauroth opened a new pull request, #4163: URL: https://github.com/apache/hive/pull/4163 ### What changes were proposed in this pull request? Exclude pentaho-aggdesigner-algorithm from upgrade-acid build. ### Why are the changes needed? In the current branch-3, upgrade-acid has a dependency on an old hive-exec version that has a transitive dependency to org.pentaho:pentaho-aggdesigner-algorithm. This artifact is no longer available in commonly supported Maven repositories, which causes a build failure. We can safely exclude the dependency, as was originally done in [HIVE-25173](https://issues.apache.org/jira/browse/HIVE-25173). Differences from the master patch branch are: 1. On master, this applied to the pre-upgrade sub-module. This sub-module doesn't exist in branch-3, so the patch was rebased to the parent upgrade-acid module. 2. Additionally, the pom.xml code had changed quite a bit on master. This is just applying the equivalent exclusion from the HIVE-25173 diff: a1d4c8a6b3cf8465ac1ae074748a8f5a04bb473f. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? I can run a full local build from branch-3 after applying this patch. ``` mvn -B -T 8 clean install -Pitests -DskipTests ``` Prior to this patch, my build failed while trying to download the org.pentaho:pentaho-aggdesigner-algorithm artifact. Issue Time Tracking --- Worklog Id: (was: 853272) Time Spent: 50m (was: 40m) > Backport HIVE-25173 to 3.2.0: Exclude pentaho-aggdesigner-algorithm from > upgrade-acid build. > > > Key: HIVE-26905 > URL: https://issues.apache.org/jira/browse/HIVE-26905 > Project: Hive > Issue Type: Bug > Components: Build Infrastructure >Reporter: Chris Nauroth >Assignee: Chris Nauroth >Priority: Major > Labels: hive-3.2.0-must, pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > In the current branch-3, upgrade-acid has a dependency on an old hive-exec > version that has a transitive dependency to > org.pentaho:pentaho-aggdesigner-algorithm. This artifact is no longer > available in commonly supported Maven repositories, which causes a build > failure. We can safely exclude the dependency, as was originally done in > HIVE-25173. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-22383) `alterPartitions` is invoked twice during dynamic partition load causing runtime delay
[ https://issues.apache.org/jira/browse/HIVE-22383?focusedWorklogId=853232=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853232 ] ASF GitHub Bot logged work on HIVE-22383: - Author: ASF GitHub Bot Created on: 27/Mar/23 19:07 Start Date: 27/Mar/23 19:07 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #4161: URL: https://github.com/apache/hive/pull/4161#issuecomment-1485719817 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive=4161) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4161=false=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4161=false=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4161=false=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4161=false=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4161=false=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4161=false=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4161=false=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4161=false=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4161=false=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4161=false=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4161=false=CODE_SMELL) [3 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive=4161=false=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive=4161=coverage=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive=4161=duplicated_lines_density=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 853232) Time Spent: 20m (was: 10m) > `alterPartitions` is invoked twice during dynamic partition load causing > runtime delay > -- > > Key: HIVE-22383 > URL: https://issues.apache.org/jira/browse/HIVE-22383 > Project: Hive > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Dmitriy Fingerman >Priority: Major > Labels: performance, pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > First invocation in {{Hive::loadDynamicPartitions}} > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2978 > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2638 > Second invocation in {{BasicStatsTask::aggregateStats}} > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsTask.java#L335 > This leads to good amount of delay in dynamic partition loading. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26997) Iceberg: Vectorization gets disabled at runtime in merge-into statements
[ https://issues.apache.org/jira/browse/HIVE-26997?focusedWorklogId=853229=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853229 ] ASF GitHub Bot logged work on HIVE-26997: - Author: ASF GitHub Bot Created on: 27/Mar/23 18:35 Start Date: 27/Mar/23 18:35 Worklog Time Spent: 10m Work Description: deniskuzZ opened a new pull request, #4162: URL: https://github.com/apache/hive/pull/4162 ### What changes were proposed in this pull request? Fixed non-vectorization cause ### Why are the changes needed? Performance ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Issue Time Tracking --- Worklog Id: (was: 853229) Time Spent: 50m (was: 40m) > Iceberg: Vectorization gets disabled at runtime in merge-into statements > > > Key: HIVE-26997 > URL: https://issues.apache.org/jira/browse/HIVE-26997 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: Rajesh Balamohan >Assignee: Zsolt Miskolczi >Priority: Major > Labels: pull-request-available > Attachments: explain_merge_into.txt > > Time Spent: 50m > Remaining Estimate: 0h > > *Query:* > Think of "ssv" table as a table containing trickle feed data in the following > query. "store_sales_delete_1" is the destination table. > > {noformat} > MERGE INTO tpcds_1000_iceberg_mor_v4.store_sales_delete_1 t USING > tpcds_1000_update.ssv s ON (t.ss_item_sk = s.ss_item_sk > > AND t.ss_customer_sk=s.ss_customer_sk > > AND t.ss_sold_date_sk = "2451181" > > AND ((Floor((s.ss_item_sk) / 1000) * 1000) BETWEEN 1000 AND > 2000) > > AND s.ss_ext_discount_amt < 0.0) WHEN matched > AND t.ss_ext_discount_amt IS NULL THEN > UPDATE > SET ss_ext_discount_amt = 0.0 WHEN NOT matched THEN > INSERT (ss_sold_time_sk, > ss_item_sk, > ss_customer_sk, > ss_cdemo_sk, > ss_hdemo_sk, > ss_addr_sk, > ss_store_sk, > ss_promo_sk, > ss_ticket_number, > ss_quantity, > ss_wholesale_cost, > ss_list_price, > ss_sales_price, > ss_ext_discount_amt, > ss_ext_sales_price, > ss_ext_wholesale_cost, > ss_ext_list_price, > ss_ext_tax, > ss_coupon_amt, > ss_net_paid, > ss_net_paid_inc_tax, > ss_net_profit, > ss_sold_date_sk) > VALUES (s.ss_sold_time_sk, > s.ss_item_sk, > s.ss_customer_sk, > s.ss_cdemo_sk, > s.ss_hdemo_sk, > s.ss_addr_sk, > s.ss_store_sk, > s.ss_promo_sk, > s.ss_ticket_number, > s.ss_quantity, > s.ss_wholesale_cost, > s.ss_list_price, > s.ss_sales_price, > s.ss_ext_discount_amt, > s.ss_ext_sales_price, > s.ss_ext_wholesale_cost, > s.ss_ext_list_price, > s.ss_ext_tax, > s.ss_coupon_amt, > s.ss_net_paid, > s.ss_net_paid_inc_tax, > s.ss_net_profit, > "2451181") > {noformat} > > > *Issue:* > # Map phase is not getting vectorized due to "PARTITION_{_}SPEC{_}_ID" column > {noformat} > Map notVectorizedReason: Select expression for SELECT operator: Virtual > column PARTITION__SPEC__ID is not supported {noformat} > > 2. "Reducer 2" stage isn't vectorized. > {noformat} > Reduce notVectorizedReason: exception: java.lang.RuntimeException: Full Outer > Small Table Key Mapping duplicate column 0 in ordered column map {0=(value > column: 30, type info: int), 1=(value column: 31, type info: int)} when > adding value column 53, type into int stack trace: > org.apache.hadoop.hive.ql.exec.vector.VectorColumnOrderedMap.add(VectorColumnOrderedMap.java:102), > > org.apache.hadoop.hive.ql.exec.vector.VectorColumnSourceMapping.add(VectorColumnSourceMapping.java:41), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.canSpecializeMapJoin(Vectorizer.java:3865), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperator(Vectorizer.java:5246), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChild(Vectorizer.java:988), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChildren(Vectorizer.java:874), > >
[jira] [Work logged] (HIVE-22383) `alterPartitions` is invoked twice during dynamic partition load causing runtime delay
[ https://issues.apache.org/jira/browse/HIVE-22383?focusedWorklogId=853218=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853218 ] ASF GitHub Bot logged work on HIVE-22383: - Author: ASF GitHub Bot Created on: 27/Mar/23 17:02 Start Date: 27/Mar/23 17:02 Worklog Time Spent: 10m Work Description: difin opened a new pull request, #4161: URL: https://github.com/apache/hive/pull/4161 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Issue Time Tracking --- Worklog Id: (was: 853218) Remaining Estimate: 0h Time Spent: 10m > `alterPartitions` is invoked twice during dynamic partition load causing > runtime delay > -- > > Key: HIVE-22383 > URL: https://issues.apache.org/jira/browse/HIVE-22383 > Project: Hive > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Dmitriy Fingerman >Priority: Major > Labels: performance > Time Spent: 10m > Remaining Estimate: 0h > > First invocation in {{Hive::loadDynamicPartitions}} > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2978 > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2638 > Second invocation in {{BasicStatsTask::aggregateStats}} > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsTask.java#L335 > This leads to good amount of delay in dynamic partition loading. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-22383) `alterPartitions` is invoked twice during dynamic partition load causing runtime delay
[ https://issues.apache.org/jira/browse/HIVE-22383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-22383: -- Labels: performance pull-request-available (was: performance) > `alterPartitions` is invoked twice during dynamic partition load causing > runtime delay > -- > > Key: HIVE-22383 > URL: https://issues.apache.org/jira/browse/HIVE-22383 > Project: Hive > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Dmitriy Fingerman >Priority: Major > Labels: performance, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > First invocation in {{Hive::loadDynamicPartitions}} > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2978 > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2638 > Second invocation in {{BasicStatsTask::aggregateStats}} > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsTask.java#L335 > This leads to good amount of delay in dynamic partition loading. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27135) AcidUtils#getHdfsDirSnapshots() throws FNFE when a directory is removed in HDFS
[ https://issues.apache.org/jira/browse/HIVE-27135?focusedWorklogId=853214=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853214 ] ASF GitHub Bot logged work on HIVE-27135: - Author: ASF GitHub Bot Created on: 27/Mar/23 16:52 Start Date: 27/Mar/23 16:52 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #4114: URL: https://github.com/apache/hive/pull/4114#issuecomment-1485490382 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive=4114) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4114=false=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4114=false=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4114=false=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=CODE_SMELL) [0 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive=4114=coverage=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive=4114=duplicated_lines_density=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 853214) Time Spent: 5h 10m (was: 5h) > AcidUtils#getHdfsDirSnapshots() throws FNFE when a directory is removed in > HDFS > --- > > Key: HIVE-27135 > URL: https://issues.apache.org/jira/browse/HIVE-27135 > Project: Hive > Issue Type: Bug >Reporter: Dayakar M >Assignee: Dayakar M >Priority: Major > Labels: pull-request-available > Time Spent: 5h 10m > Remaining Estimate: 0h > > AcidUtils#getHdfsDirSnapshots() throws FileNotFoundException when a directory > is removed in HDFS while fetching HDFS Snapshots. > Below testcode can be used to reproduce this issue. > {code:java} > @Test > public void > testShouldNotThrowFNFEWhenHiveStagingDirectoryIsRemovedWhileFetchingHDFSSnapshots() > throws Exception { > MockFileSystem fs = new MockFileSystem(new HiveConf(), > new MockFile("mock:/tbl/part1/.hive-staging_dir/-ext-10002", 500, new > byte[0]), > new MockFile("mock:/tbl/part2/.hive-staging_dir", 500, new byte[0]), > new MockFile("mock:/tbl/part1/_tmp_space.db", 500, new byte[0]), > new MockFile("mock:/tbl/part1/delta_1_1/bucket--", 500, new > byte[0])); > Path path = new MockPath(fs, "/tbl"); > Path stageDir = new MockPath(fs, "mock:/tbl/part1/.hive-staging_dir"); > FileSystem mockFs = spy(fs); > Mockito.doThrow(new >
[jira] [Work logged] (HIVE-27180) Remove JsonSerde from hcatalog, Upgrade should update changed FQN for JsonSerDe in HMS DB
[ https://issues.apache.org/jira/browse/HIVE-27180?focusedWorklogId=853211=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853211 ] ASF GitHub Bot logged work on HIVE-27180: - Author: ASF GitHub Bot Created on: 27/Mar/23 16:50 Start Date: 27/Mar/23 16:50 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #4159: URL: https://github.com/apache/hive/pull/4159#issuecomment-1485485988 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive=4159) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4159=false=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4159=false=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4159=false=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4159=false=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4159=false=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4159=false=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4159=false=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4159=false=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4159=false=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4159=false=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4159=false=CODE_SMELL) [0 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive=4159=false=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive=4159=coverage=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive=4159=duplicated_lines_density=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 853211) Time Spent: 20m (was: 10m) > Remove JsonSerde from hcatalog, Upgrade should update changed FQN for > JsonSerDe in HMS DB > -- > > Key: HIVE-27180 > URL: https://issues.apache.org/jira/browse/HIVE-27180 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Riju Trivedi >Assignee: Riju Trivedi >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > As Hcatalog JsonSerDe uses the "serde2" version as a back end, Remove > o{*}rg.apache.hive.hcatalog.data.JsonSerDe{*} from hive-hcatalog. Fix tests > to use the new Serde class org.apache.hadoop.hive.serde2.JsonSerDe. Hive > Upgrade schema script can update the SERDES table to alter the class name to > the new class name, the old tables would work automatically. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27180) Remove JsonSerde from hcatalog, Upgrade should update changed FQN for JsonSerDe in HMS DB
[ https://issues.apache.org/jira/browse/HIVE-27180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-27180: -- Labels: pull-request-available (was: ) > Remove JsonSerde from hcatalog, Upgrade should update changed FQN for > JsonSerDe in HMS DB > -- > > Key: HIVE-27180 > URL: https://issues.apache.org/jira/browse/HIVE-27180 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Riju Trivedi >Assignee: Riju Trivedi >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > As Hcatalog JsonSerDe uses the "serde2" version as a back end, Remove > o{*}rg.apache.hive.hcatalog.data.JsonSerDe{*} from hive-hcatalog. Fix tests > to use the new Serde class org.apache.hadoop.hive.serde2.JsonSerDe. Hive > Upgrade schema script can update the SERDES table to alter the class name to > the new class name, the old tables would work automatically. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27180) Remove JsonSerde from hcatalog, Upgrade should update changed FQN for JsonSerDe in HMS DB
[ https://issues.apache.org/jira/browse/HIVE-27180?focusedWorklogId=853196=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853196 ] ASF GitHub Bot logged work on HIVE-27180: - Author: ASF GitHub Bot Created on: 27/Mar/23 15:50 Start Date: 27/Mar/23 15:50 Worklog Time Spent: 10m Work Description: rtrivedi12 opened a new pull request, #4159: URL: https://github.com/apache/hive/pull/4159 …nged FQN for JsonSerDe in HMS DB HIVE-18545 makes Hcatalog JsonSerde use the "hive.serde2" version as a back end, there are no feature differences between these implementations. This change will fix tests to use the new JsonSerDe class and remove JsonSerDe from hive-contrib. Hive Upgrade schema script should automatically update the hive table schema to rename the serde package. ### What changes were proposed in this pull request? 1. Fixed tests to use new SerDe class 'org.apache.hadoop.hive.serde2.JsonSerDe' 2. Removed JsonSerDe from hive-hcatalog. 3. Schema upgrade scripts update the SLIB column value in SERDES table to update it from "org.apache.hive.hcatalog.data.JsonSerDe" to "org.apache.hadoop.hive.serde2.JsonSerDe" ### Why are the changes needed? Removing redundant code to make JsonSerde as first class serde in hive Better user experience after the upgrade (Avoids CNF) ### Does this PR introduce _any_ user-facing change? 'No' ### How was this patch tested? mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=json_serde1.q,json_serde_qualified_types.q,json_serde_tsformat.q,parquet_mixed_partition_formats2.q,temp_table_parquet_mixed_partition_formats2.q Issue Time Tracking --- Worklog Id: (was: 853196) Remaining Estimate: 0h Time Spent: 10m > Remove JsonSerde from hcatalog, Upgrade should update changed FQN for > JsonSerDe in HMS DB > -- > > Key: HIVE-27180 > URL: https://issues.apache.org/jira/browse/HIVE-27180 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Riju Trivedi >Assignee: Riju Trivedi >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > As Hcatalog JsonSerDe uses the "serde2" version as a back end, Remove > o{*}rg.apache.hive.hcatalog.data.JsonSerDe{*} from hive-hcatalog. Fix tests > to use the new Serde class org.apache.hadoop.hive.serde2.JsonSerDe. Hive > Upgrade schema script can update the SERDES table to alter the class name to > the new class name, the old tables would work automatically. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27180) Remove JsonSerde from hcatalog, Upgrade should update changed FQN for JsonSerDe in HMS DB
[ https://issues.apache.org/jira/browse/HIVE-27180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Riju Trivedi updated HIVE-27180: Description: As Hcatalog JsonSerDe uses the "serde2" version as a back end, Remove o{*}rg.apache.hive.hcatalog.data.JsonSerDe{*} from hive-hcatalog. Fix tests to use the new Serde class org.apache.hadoop.hive.serde2.JsonSerDe. Hive Upgrade schema script can update the SERDES table to alter the class name to the new class name, the old tables would work automatically. > Remove JsonSerde from hcatalog, Upgrade should update changed FQN for > JsonSerDe in HMS DB > -- > > Key: HIVE-27180 > URL: https://issues.apache.org/jira/browse/HIVE-27180 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Riju Trivedi >Assignee: Riju Trivedi >Priority: Major > > As Hcatalog JsonSerDe uses the "serde2" version as a back end, Remove > o{*}rg.apache.hive.hcatalog.data.JsonSerDe{*} from hive-hcatalog. Fix tests > to use the new Serde class org.apache.hadoop.hive.serde2.JsonSerDe. Hive > Upgrade schema script can update the SERDES table to alter the class name to > the new class name, the old tables would work automatically. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27182) tez_union_with_udf.q with TestMiniTezCliDriver is flaky
[ https://issues.apache.org/jira/browse/HIVE-27182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HIVE-27182: Description: Looks like memory issue: {noformat} < Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.OutOfMemoryError: GC overhead limit exceeded < Serialization trace: < genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc) < colExprMap (org.apache.hadoop.hive.ql.plan.SelectDesc) < conf (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator) < childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorLimitOperator) < childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) {noformat} Ref: http://ci.hive.apache.org/job/hive-precommit/job/PR-4155/2/testReport/junit/org.apache.hadoop.hive.cli/TestMiniTezCliDriver/Testing___split_20___PostProcess___testCliDriver_tez_union_with_udf_/ was: Looks like memory issue: {noformat} < Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.OutOfMemoryError: GC overhead limit exceeded < Serialization trace: < genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc) < colExprMap (org.apache.hadoop.hive.ql.plan.SelectDesc) < conf (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator) < childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorLimitOperator) < childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) {noformat} > tez_union_with_udf.q with TestMiniTezCliDriver is flaky > --- > > Key: HIVE-27182 > URL: https://issues.apache.org/jira/browse/HIVE-27182 > Project: Hive > Issue Type: Improvement >Reporter: Ayush Saxena >Priority: Major > > Looks like memory issue: > {noformat} > < Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: > java.lang.OutOfMemoryError: GC overhead limit exceeded > < Serialization trace: > < genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc) > < colExprMap (org.apache.hadoop.hive.ql.plan.SelectDesc) > < conf (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator) > < childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorLimitOperator) > < childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) > {noformat} > Ref: > http://ci.hive.apache.org/job/hive-precommit/job/PR-4155/2/testReport/junit/org.apache.hadoop.hive.cli/TestMiniTezCliDriver/Testing___split_20___PostProcess___testCliDriver_tez_union_with_udf_/ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27181) Remove RegexSerDe from hive-contrib, Upgrade should update changed FQN for RegexSerDe in HMS DB
[ https://issues.apache.org/jira/browse/HIVE-27181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Riju Trivedi reassigned HIVE-27181: --- Assignee: Riju Trivedi > Remove RegexSerDe from hive-contrib, Upgrade should update changed FQN for > RegexSerDe in HMS DB > --- > > Key: HIVE-27181 > URL: https://issues.apache.org/jira/browse/HIVE-27181 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Riju Trivedi >Assignee: Riju Trivedi >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27180) Remove JsonSerde from hcatalog, Upgrade should update changed FQN for JsonSerDe in HMS DB
[ https://issues.apache.org/jira/browse/HIVE-27180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Riju Trivedi reassigned HIVE-27180: --- > Remove JsonSerde from hcatalog, Upgrade should update changed FQN for > JsonSerDe in HMS DB > -- > > Key: HIVE-27180 > URL: https://issues.apache.org/jira/browse/HIVE-27180 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Riju Trivedi >Assignee: Riju Trivedi >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26655) VectorUDAFBloomFilterMerge should take care of safe batch handling when working in parallel
[ https://issues.apache.org/jira/browse/HIVE-26655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-26655: Fix Version/s: 4.0.0 > VectorUDAFBloomFilterMerge should take care of safe batch handling when > working in parallel > --- > > Key: HIVE-26655 > URL: https://issues.apache.org/jira/browse/HIVE-26655 > Project: Hive > Issue Type: Sub-task >Reporter: Sungwoo Park >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > When tested with 100GB ORC tables, the number of rows returned by query 17 is > not stable. It returns fewer rows than the correct result (55 rows). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-26655) VectorUDAFBloomFilterMerge should take care of safe batch handling when working in parallel
[ https://issues.apache.org/jira/browse/HIVE-26655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor resolved HIVE-26655. - Resolution: Fixed > VectorUDAFBloomFilterMerge should take care of safe batch handling when > working in parallel > --- > > Key: HIVE-26655 > URL: https://issues.apache.org/jira/browse/HIVE-26655 > Project: Hive > Issue Type: Sub-task >Reporter: Sungwoo Park >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > When tested with 100GB ORC tables, the number of rows returned by query 17 is > not stable. It returns fewer rows than the correct result (55 rows). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26655) VectorUDAFBloomFilterMerge should take care of safe batch handling when working in parallel
[ https://issues.apache.org/jira/browse/HIVE-26655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705347#comment-17705347 ] László Bodor commented on HIVE-26655: - fix merged to master, thanks [~ayushtkn] for the review, and thanks [~glapark] for reporting this correctness issue! > VectorUDAFBloomFilterMerge should take care of safe batch handling when > working in parallel > --- > > Key: HIVE-26655 > URL: https://issues.apache.org/jira/browse/HIVE-26655 > Project: Hive > Issue Type: Sub-task >Reporter: Sungwoo Park >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > When tested with 100GB ORC tables, the number of rows returned by query 17 is > not stable. It returns fewer rows than the correct result (55 rows). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26655) VectorUDAFBloomFilterMerge should take care of safe batch handling when working in parallel
[ https://issues.apache.org/jira/browse/HIVE-26655?focusedWorklogId=853158=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853158 ] ASF GitHub Bot logged work on HIVE-26655: - Author: ASF GitHub Bot Created on: 27/Mar/23 13:21 Start Date: 27/Mar/23 13:21 Worklog Time Spent: 10m Work Description: abstractdog merged PR #4158: URL: https://github.com/apache/hive/pull/4158 Issue Time Tracking --- Worklog Id: (was: 853158) Time Spent: 50m (was: 40m) > VectorUDAFBloomFilterMerge should take care of safe batch handling when > working in parallel > --- > > Key: HIVE-26655 > URL: https://issues.apache.org/jira/browse/HIVE-26655 > Project: Hive > Issue Type: Sub-task >Reporter: Sungwoo Park >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > When tested with 100GB ORC tables, the number of rows returned by query 17 is > not stable. It returns fewer rows than the correct result (55 rows). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26400) Provide docker images for Hive
[ https://issues.apache.org/jira/browse/HIVE-26400?focusedWorklogId=853157=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853157 ] ASF GitHub Bot logged work on HIVE-26400: - Author: ASF GitHub Bot Created on: 27/Mar/23 13:18 Start Date: 27/Mar/23 13:18 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on PR #3448: URL: https://github.com/apache/hive/pull/3448#issuecomment-1485078827 > I love this initiative. Can we get more eyes on it? > > I have 2 comments about it: > > 1. HMS can work together with MySQL but to many times we found bugs with MySQL which gave us a lot of headache. Is it possible to change for Postgre? > 2. I think we should ask a docker account to push the image to the repository as we have a new build or new release. > > What is the remaining part of this task to make it happens? Thank you @TuroczyX for the comments. 1. Have changed the back db to Postgres or embedded Derby; 2. This is the remaining part, want to track it in the future after this task has finished. Issue Time Tracking --- Worklog Id: (was: 853157) Time Spent: 10.5h (was: 10h 20m) > Provide docker images for Hive > -- > > Key: HIVE-26400 > URL: https://issues.apache.org/jira/browse/HIVE-26400 > Project: Hive > Issue Type: Sub-task > Components: Build Infrastructure >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Blocker > Labels: hive-4.0.0-must, pull-request-available > Time Spent: 10.5h > Remaining Estimate: 0h > > Make Apache Hive be able to run inside docker container in pseudo-distributed > mode, with MySQL/Derby as its back database, provide the following: > * Quick-start/Debugging/Prepare a test env for Hive; > * Tools to build target image with specified version of Hive and its > dependencies; > * Images can be used as the basis for the Kubernetes operator. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27135) AcidUtils#getHdfsDirSnapshots() throws FNFE when a directory is removed in HDFS
[ https://issues.apache.org/jira/browse/HIVE-27135?focusedWorklogId=853155=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853155 ] ASF GitHub Bot logged work on HIVE-27135: - Author: ASF GitHub Bot Created on: 27/Mar/23 12:07 Start Date: 27/Mar/23 12:07 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #4114: URL: https://github.com/apache/hive/pull/4114#issuecomment-1485028689 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive=4114) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4114=false=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4114=false=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4114=false=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=CODE_SMELL) [1 Code Smell](https://sonarcloud.io/project/issues?id=apache_hive=4114=false=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive=4114=coverage=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive=4114=duplicated_lines_density=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 853155) Time Spent: 5h (was: 4h 50m) > AcidUtils#getHdfsDirSnapshots() throws FNFE when a directory is removed in > HDFS > --- > > Key: HIVE-27135 > URL: https://issues.apache.org/jira/browse/HIVE-27135 > Project: Hive > Issue Type: Bug >Reporter: Dayakar M >Assignee: Dayakar M >Priority: Major > Labels: pull-request-available > Time Spent: 5h > Remaining Estimate: 0h > > AcidUtils#getHdfsDirSnapshots() throws FileNotFoundException when a directory > is removed in HDFS while fetching HDFS Snapshots. > Below testcode can be used to reproduce this issue. > {code:java} > @Test > public void > testShouldNotThrowFNFEWhenHiveStagingDirectoryIsRemovedWhileFetchingHDFSSnapshots() > throws Exception { > MockFileSystem fs = new MockFileSystem(new HiveConf(), > new MockFile("mock:/tbl/part1/.hive-staging_dir/-ext-10002", 500, new > byte[0]), > new MockFile("mock:/tbl/part2/.hive-staging_dir", 500, new byte[0]), > new MockFile("mock:/tbl/part1/_tmp_space.db", 500, new byte[0]), > new MockFile("mock:/tbl/part1/delta_1_1/bucket--", 500, new > byte[0])); > Path path = new MockPath(fs, "/tbl"); > Path stageDir = new MockPath(fs, "mock:/tbl/part1/.hive-staging_dir"); > FileSystem mockFs = spy(fs); > Mockito.doThrow(new >
[jira] [Work logged] (HIVE-26400) Provide docker images for Hive
[ https://issues.apache.org/jira/browse/HIVE-26400?focusedWorklogId=853153=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853153 ] ASF GitHub Bot logged work on HIVE-26400: - Author: ASF GitHub Bot Created on: 27/Mar/23 11:59 Start Date: 27/Mar/23 11:59 Worklog Time Spent: 10m Work Description: TuroczyX commented on PR #3448: URL: https://github.com/apache/hive/pull/3448#issuecomment-1485015881 I love this initiative. Can we get more eyes on it? I have 2 comments about it: 1. HMS can work together with MySQL but to many times we found bugs with MySQL which gave us a lot of headache. Is it possible to change for Postgre? 2. I think we should ask a docker account to push the image to the repository as we have a new build or new release. What is the remaining part of this task to make it happens? Issue Time Tracking --- Worklog Id: (was: 853153) Time Spent: 10h 20m (was: 10h 10m) > Provide docker images for Hive > -- > > Key: HIVE-26400 > URL: https://issues.apache.org/jira/browse/HIVE-26400 > Project: Hive > Issue Type: Sub-task > Components: Build Infrastructure >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Blocker > Labels: hive-4.0.0-must, pull-request-available > Time Spent: 10h 20m > Remaining Estimate: 0h > > Make Apache Hive be able to run inside docker container in pseudo-distributed > mode, with MySQL/Derby as its back database, provide the following: > * Quick-start/Debugging/Prepare a test env for Hive; > * Tools to build target image with specified version of Hive and its > dependencies; > * Images can be used as the basis for the Kubernetes operator. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26655) VectorUDAFBloomFilterMerge should take care of safe batch handling when working in parallel
[ https://issues.apache.org/jira/browse/HIVE-26655?focusedWorklogId=853152=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853152 ] ASF GitHub Bot logged work on HIVE-26655: - Author: ASF GitHub Bot Created on: 27/Mar/23 11:50 Start Date: 27/Mar/23 11:50 Worklog Time Spent: 10m Work Description: abstractdog commented on code in PR #4158: URL: https://github.com/apache/hive/pull/4158#discussion_r1149197845 ## ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java: ## @@ -602,4 +602,11 @@ public void assignRowColumn(VectorizedRowBatch batch, int batchIndex, int column Aggregation bfAgg = (Aggregation) agg; outputColVector.setVal(batchIndex, bfAgg.bfBytes, 0, bfAgg.bfBytes.length); } + + /** + * Let's clone the batch when we're working in parallel, see HIVE-26655. + */ + public boolean batchNeedsClone() { +return numThreads > 0; + } Review Comment: still need yes, thread=1 means the executor start processing the bloomfilter on 1 thread async while the main thread is fetching the next one Issue Time Tracking --- Worklog Id: (was: 853152) Time Spent: 40m (was: 0.5h) > VectorUDAFBloomFilterMerge should take care of safe batch handling when > working in parallel > --- > > Key: HIVE-26655 > URL: https://issues.apache.org/jira/browse/HIVE-26655 > Project: Hive > Issue Type: Sub-task >Reporter: Sungwoo Park >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > When tested with 100GB ORC tables, the number of rows returned by query 17 is > not stable. It returns fewer rows than the correct result (55 rows). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26655) VectorUDAFBloomFilterMerge should take care of safe batch handling when working in parallel
[ https://issues.apache.org/jira/browse/HIVE-26655?focusedWorklogId=853151=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853151 ] ASF GitHub Bot logged work on HIVE-26655: - Author: ASF GitHub Bot Created on: 27/Mar/23 11:48 Start Date: 27/Mar/23 11:48 Worklog Time Spent: 10m Work Description: ayushtkn commented on code in PR #4158: URL: https://github.com/apache/hive/pull/4158#discussion_r1149193455 ## ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java: ## @@ -602,4 +602,11 @@ public void assignRowColumn(VectorizedRowBatch batch, int batchIndex, int column Aggregation bfAgg = (Aggregation) agg; outputColVector.setVal(batchIndex, bfAgg.bfBytes, 0, bfAgg.bfBytes.length); } + + /** + * Let's clone the batch when we're working in parallel, see HIVE-26655. + */ + public boolean batchNeedsClone() { +return numThreads > 0; + } Review Comment: just a quick pass, if thread count is 1, you still need a clone? for parallel execution it should be more than 1 thread? or some catch? Issue Time Tracking --- Worklog Id: (was: 853151) Time Spent: 0.5h (was: 20m) > VectorUDAFBloomFilterMerge should take care of safe batch handling when > working in parallel > --- > > Key: HIVE-26655 > URL: https://issues.apache.org/jira/browse/HIVE-26655 > Project: Hive > Issue Type: Sub-task >Reporter: Sungwoo Park >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > When tested with 100GB ORC tables, the number of rows returned by query 17 is > not stable. It returns fewer rows than the correct result (55 rows). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27177) Add alter table...Convert to Iceberg command
[ https://issues.apache.org/jira/browse/HIVE-27177?focusedWorklogId=853135=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853135 ] ASF GitHub Bot logged work on HIVE-27177: - Author: ASF GitHub Bot Created on: 27/Mar/23 11:03 Start Date: 27/Mar/23 11:03 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #4155: URL: https://github.com/apache/hive/pull/4155#issuecomment-1484942204 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive=4155) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4155=false=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4155=false=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4155=false=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4155=false=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4155=false=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4155=false=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4155=false=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4155=false=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4155=false=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4155=false=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4155=false=CODE_SMELL) [4 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive=4155=false=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive=4155=coverage=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive=4155=duplicated_lines_density=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 853135) Time Spent: 0.5h (was: 20m) > Add alter table...Convert to Iceberg command > > > Key: HIVE-27177 > URL: https://issues.apache.org/jira/browse/HIVE-27177 > Project: Hive > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Add an alter table convert to Iceberg [TBLPROPERTIES('','')] to > convert exiting external tables to iceberg tables -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27135) AcidUtils#getHdfsDirSnapshots() throws FNFE when a directory is removed in HDFS
[ https://issues.apache.org/jira/browse/HIVE-27135?focusedWorklogId=853133=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853133 ] ASF GitHub Bot logged work on HIVE-27135: - Author: ASF GitHub Bot Created on: 27/Mar/23 10:52 Start Date: 27/Mar/23 10:52 Worklog Time Spent: 10m Work Description: mdayakar commented on code in PR #4114: URL: https://github.com/apache/hive/pull/4114#discussion_r1149137172 ## ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java: ## @@ -1538,32 +1538,36 @@ private static HdfsDirSnapshot addToSnapshot(Map dirToSna public static Map getHdfsDirSnapshots(final FileSystem fs, final Path path) throws IOException { Map dirToSnapshots = new HashMap<>(); -RemoteIterator itr = FileUtils.listFiles(fs, path, true, acidHiddenFileFilter); -while (itr.hasNext()) { - FileStatus fStatus = itr.next(); - Path fPath = fStatus.getPath(); - if (fStatus.isDirectory() && acidTempDirFilter.accept(fPath)) { -addToSnapshot(dirToSnapshots, fPath); - } else { -Path parentDirPath = fPath.getParent(); -if (acidTempDirFilter.accept(parentDirPath)) { - while (isChildOfDelta(parentDirPath, path)) { -// Some cases there are other directory layers between the delta and the datafiles -// (export-import mm table, insert with union all to mm table, skewed tables). -// But it does not matter for the AcidState, we just need the deltas and the data files -// So build the snapshot with the files inside the delta directory -parentDirPath = parentDirPath.getParent(); - } - HdfsDirSnapshot dirSnapshot = addToSnapshot(dirToSnapshots, parentDirPath); - // We're not filtering out the metadata file and acid format file, - // as they represent parts of a valid snapshot - // We're not using the cached values downstream, but we can potentially optimize more in a follow-up task - if (fStatus.getPath().toString().contains(MetaDataFile.METADATA_FILE)) { -dirSnapshot.addMetadataFile(fStatus); - } else if (fStatus.getPath().toString().contains(OrcAcidVersion.ACID_FORMAT)) { -dirSnapshot.addOrcAcidFormatFile(fStatus); - } else { -dirSnapshot.addFile(fStatus); +Deque> stack = new ArrayDeque<>(); +stack.push(FileUtils.listLocatedStatusIterator(fs, path, acidHiddenFileFilter)); +while (!stack.isEmpty()) { + RemoteIterator itr = stack.pop(); + while (itr.hasNext()) { +FileStatus fStatus = itr.next(); +Path fPath = fStatus.getPath(); +if (fStatus.isDirectory()) { + stack.push(FileUtils.listLocatedStatusIterator(fs, fPath, acidHiddenFileFilter)); Review Comment: No, `addToSnapshot(dirToSnapshots, fPath) ` need to call if a folder contains a file, which is taken care in else part. Same logic exists in the existing code. Issue Time Tracking --- Worklog Id: (was: 853133) Time Spent: 4h 50m (was: 4h 40m) > AcidUtils#getHdfsDirSnapshots() throws FNFE when a directory is removed in > HDFS > --- > > Key: HIVE-27135 > URL: https://issues.apache.org/jira/browse/HIVE-27135 > Project: Hive > Issue Type: Bug >Reporter: Dayakar M >Assignee: Dayakar M >Priority: Major > Labels: pull-request-available > Time Spent: 4h 50m > Remaining Estimate: 0h > > AcidUtils#getHdfsDirSnapshots() throws FileNotFoundException when a directory > is removed in HDFS while fetching HDFS Snapshots. > Below testcode can be used to reproduce this issue. > {code:java} > @Test > public void > testShouldNotThrowFNFEWhenHiveStagingDirectoryIsRemovedWhileFetchingHDFSSnapshots() > throws Exception { > MockFileSystem fs = new MockFileSystem(new HiveConf(), > new MockFile("mock:/tbl/part1/.hive-staging_dir/-ext-10002", 500, new > byte[0]), > new MockFile("mock:/tbl/part2/.hive-staging_dir", 500, new byte[0]), > new MockFile("mock:/tbl/part1/_tmp_space.db", 500, new byte[0]), > new MockFile("mock:/tbl/part1/delta_1_1/bucket--", 500, new > byte[0])); > Path path = new MockPath(fs, "/tbl"); > Path stageDir = new MockPath(fs, "mock:/tbl/part1/.hive-staging_dir"); > FileSystem mockFs = spy(fs); > Mockito.doThrow(new > FileNotFoundException("")).when(mockFs).listLocatedStatus(eq(stageDir)); > try { > Map hdfsDirSnapshots = > AcidUtils.getHdfsDirSnapshots(mockFs, path); > Assert.assertEquals(1, hdfsDirSnapshots.size()); > } > catch (FileNotFoundException fnf) { > fail("Should not throw FileNotFoundException when a
[jira] [Work logged] (HIVE-27135) AcidUtils#getHdfsDirSnapshots() throws FNFE when a directory is removed in HDFS
[ https://issues.apache.org/jira/browse/HIVE-27135?focusedWorklogId=853132=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853132 ] ASF GitHub Bot logged work on HIVE-27135: - Author: ASF GitHub Bot Created on: 27/Mar/23 10:52 Start Date: 27/Mar/23 10:52 Worklog Time Spent: 10m Work Description: mdayakar commented on code in PR #4114: URL: https://github.com/apache/hive/pull/4114#discussion_r1149137172 ## ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java: ## @@ -1538,32 +1538,36 @@ private static HdfsDirSnapshot addToSnapshot(Map dirToSna public static Map getHdfsDirSnapshots(final FileSystem fs, final Path path) throws IOException { Map dirToSnapshots = new HashMap<>(); -RemoteIterator itr = FileUtils.listFiles(fs, path, true, acidHiddenFileFilter); -while (itr.hasNext()) { - FileStatus fStatus = itr.next(); - Path fPath = fStatus.getPath(); - if (fStatus.isDirectory() && acidTempDirFilter.accept(fPath)) { -addToSnapshot(dirToSnapshots, fPath); - } else { -Path parentDirPath = fPath.getParent(); -if (acidTempDirFilter.accept(parentDirPath)) { - while (isChildOfDelta(parentDirPath, path)) { -// Some cases there are other directory layers between the delta and the datafiles -// (export-import mm table, insert with union all to mm table, skewed tables). -// But it does not matter for the AcidState, we just need the deltas and the data files -// So build the snapshot with the files inside the delta directory -parentDirPath = parentDirPath.getParent(); - } - HdfsDirSnapshot dirSnapshot = addToSnapshot(dirToSnapshots, parentDirPath); - // We're not filtering out the metadata file and acid format file, - // as they represent parts of a valid snapshot - // We're not using the cached values downstream, but we can potentially optimize more in a follow-up task - if (fStatus.getPath().toString().contains(MetaDataFile.METADATA_FILE)) { -dirSnapshot.addMetadataFile(fStatus); - } else if (fStatus.getPath().toString().contains(OrcAcidVersion.ACID_FORMAT)) { -dirSnapshot.addOrcAcidFormatFile(fStatus); - } else { -dirSnapshot.addFile(fStatus); +Deque> stack = new ArrayDeque<>(); +stack.push(FileUtils.listLocatedStatusIterator(fs, path, acidHiddenFileFilter)); +while (!stack.isEmpty()) { + RemoteIterator itr = stack.pop(); + while (itr.hasNext()) { +FileStatus fStatus = itr.next(); +Path fPath = fStatus.getPath(); +if (fStatus.isDirectory()) { + stack.push(FileUtils.listLocatedStatusIterator(fs, fPath, acidHiddenFileFilter)); Review Comment: No, `addToSnapshot(dirToSnapshots, fPath) ` need to add if a folder contains a file which is taken care in else part. Same logic exists in the existing code. Issue Time Tracking --- Worklog Id: (was: 853132) Time Spent: 4h 40m (was: 4.5h) > AcidUtils#getHdfsDirSnapshots() throws FNFE when a directory is removed in > HDFS > --- > > Key: HIVE-27135 > URL: https://issues.apache.org/jira/browse/HIVE-27135 > Project: Hive > Issue Type: Bug >Reporter: Dayakar M >Assignee: Dayakar M >Priority: Major > Labels: pull-request-available > Time Spent: 4h 40m > Remaining Estimate: 0h > > AcidUtils#getHdfsDirSnapshots() throws FileNotFoundException when a directory > is removed in HDFS while fetching HDFS Snapshots. > Below testcode can be used to reproduce this issue. > {code:java} > @Test > public void > testShouldNotThrowFNFEWhenHiveStagingDirectoryIsRemovedWhileFetchingHDFSSnapshots() > throws Exception { > MockFileSystem fs = new MockFileSystem(new HiveConf(), > new MockFile("mock:/tbl/part1/.hive-staging_dir/-ext-10002", 500, new > byte[0]), > new MockFile("mock:/tbl/part2/.hive-staging_dir", 500, new byte[0]), > new MockFile("mock:/tbl/part1/_tmp_space.db", 500, new byte[0]), > new MockFile("mock:/tbl/part1/delta_1_1/bucket--", 500, new > byte[0])); > Path path = new MockPath(fs, "/tbl"); > Path stageDir = new MockPath(fs, "mock:/tbl/part1/.hive-staging_dir"); > FileSystem mockFs = spy(fs); > Mockito.doThrow(new > FileNotFoundException("")).when(mockFs).listLocatedStatus(eq(stageDir)); > try { > Map hdfsDirSnapshots = > AcidUtils.getHdfsDirSnapshots(mockFs, path); > Assert.assertEquals(1, hdfsDirSnapshots.size()); > } > catch (FileNotFoundException fnf) { > fail("Should not throw FileNotFoundException when a directory
[jira] [Updated] (HIVE-27179) HS2 WebUI throws NPE when JspFactory loaded from jetty-runner
[ https://issues.apache.org/jira/browse/HIVE-27179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihua Deng updated HIVE-27179: --- Description: In HIVE-17088{*},{*} we resolved a NPE thrown from HS2 WebUI by introducing javax.servlet.jsp-api. It works as expected when the javax.servlet.jsp-api jar prevails jetty-runner jar, but things can be different in some environments, it still throws NPE when opening the HS2 web: {noformat} java.lang.NullPointerException at org.apache.hive.generated.hiveserver2.hiveserver2_jsp._jspService(hiveserver2_jsp.java:286) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:71) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder$NotAsync.service(ServletHolder.java:1443) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:791) at org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1626) ...{noformat} The jetty-runner JspFactory.getDefaultFactory() just returns null. was: In HIVE-17088{*},{*} we resolved a NPE thrown from HS2 WebUI by introducing javax.servlet.jsp-api. It works as expected when the javax.servlet.jsp-api jar prevails jetty-runner jar, but things can be different in some environments, it still throws NPE when opening the HS2 web: {noformat} java.lang.NullPointerException at org.apache.hive.generated.hiveserver2.hiveserver2_jsp._jspService(hiveserver2_jsp.java:286) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:71) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder$NotAsync.service(ServletHolder.java:1443) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:791) at org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1626) ...{noformat} The jetty-runner JspFactory.getDefaultFactory() just returns null. > HS2 WebUI throws NPE when JspFactory loaded from jetty-runner > - > > Key: HIVE-27179 > URL: https://issues.apache.org/jira/browse/HIVE-27179 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Zhihua Deng >Priority: Major > > In HIVE-17088{*},{*} we resolved a NPE thrown from HS2 WebUI by introducing > javax.servlet.jsp-api. It works as expected when the javax.servlet.jsp-api > jar prevails jetty-runner jar, but things can be different in some > environments, it still throws NPE when opening the HS2 web: > {noformat} > java.lang.NullPointerException > at > org.apache.hive.generated.hiveserver2.hiveserver2_jsp._jspService(hiveserver2_jsp.java:286) > > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:71) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.eclipse.jetty.servlet.ServletHolder$NotAsync.service(ServletHolder.java:1443) > > at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:791) > at > org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1626) > ...{noformat} > The jetty-runner JspFactory.getDefaultFactory() just returns null. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853129=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853129 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 27/Mar/23 10:43 Start Date: 27/Mar/23 10:43 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1149127885 ## ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java: ## @@ -1069,8 +1069,12 @@ public static List getTableColumnStats( } if (fetchColStats && !colStatsToRetrieve.isEmpty()) { try { -List colStat = Hive.get().getTableColumnStatistics( -dbName, tabName, colStatsToRetrieve, false); +List colStat; +if (table != null && table.isNonNative() && table.getStorageHandler().canProvideColStatistics(table)) { Review Comment: metastore mode for Iceberg is no longer supported, right? have we disabled to relevant stats persist logic? Issue Time Tracking --- Worklog Id: (was: 853129) Time Spent: 6h 20m (was: 6h 10m) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 6h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853128=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853128 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 27/Mar/23 10:39 Start Date: 27/Mar/23 10:39 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1149123408 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish partish) { return stats; } + + @Override + public boolean canSetColStatistics() { +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase(); +return statsSource.equals(PUFFIN); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +if (table.currentSnapshot() != null) { + String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); + String statsPath = table.location() + STATS + table.name() + table.currentSnapshot().snapshotId(); + if (statsSource.equals(PUFFIN)) { +try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) { + if (fs.exists(new Path(statsPath))) { +return true; + } +} catch (IOException e) { + LOG.warn(e.getMessage()); +} + } +} +return false; + } + + @Override + public List getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); +switch (statsSource) { + case ICEBERG: +// Place holder for iceberg stats +break; + case PUFFIN: +String snapshotId = table.name() + table.currentSnapshot().snapshotId(); +String statsPath = table.location() + STATS + snapshotId; +LOG.info("Using stats from puffin file at:" + statsPath); +try (PuffinReader reader = Puffin.read(table.io().newInputFile(statsPath)).build()) { + BlobMetadata blobMetadata = reader.fileMetadata().blobs().get(0); + Map> collect = + Streams.stream(reader.readAll(ImmutableList.of(blobMetadata))).collect(Collectors.toMap(Pair::first, + blobMetadataByteBufferPair -> SerializationUtils.deserialize( + ByteBuffers.toByteArray(blobMetadataByteBufferPair.second(); + + return collect.entrySet().stream().iterator().next().getValue().get(0).getStatsObj(); +} catch (IOException e) { + LOG.info(String.valueOf(e)); +} +break; + default: +// fall back to metastore +} +return null; + } + + + @Override + public boolean setColStatistics(org.apache.hadoop.hive.ql.metadata.Table table, + List colStats) { +TableDesc tableDesc = Utilities.getTableDesc(table); +Table tbl = Catalogs.loadTable(conf, tableDesc.getProperties()); +String snapshotId = tbl.name() + tbl.currentSnapshot().snapshotId(); +byte[] serializeColStats = SerializationUtils.serialize((Serializable) colStats); + +try (PuffinWriter writer = Puffin.write(tbl.io().newOutputFile(tbl.location() + STATS + snapshotId)) Review Comment: looks like not, per `getColStatsForPartCol` comment: currently, metastore does not store column stats for the partition column. Also in the HMS column stats table is called `TAB_COL_STATS` which doesn't track the per-partition stats. Issue Time Tracking --- Worklog Id: (was: 853128) Time Spent: 6h 10m (was: 6h) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 6h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853127=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853127 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 27/Mar/23 10:38 Start Date: 27/Mar/23 10:38 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1149123408 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish partish) { return stats; } + + @Override + public boolean canSetColStatistics() { +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase(); +return statsSource.equals(PUFFIN); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +if (table.currentSnapshot() != null) { + String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); + String statsPath = table.location() + STATS + table.name() + table.currentSnapshot().snapshotId(); + if (statsSource.equals(PUFFIN)) { +try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) { + if (fs.exists(new Path(statsPath))) { +return true; + } +} catch (IOException e) { + LOG.warn(e.getMessage()); +} + } +} +return false; + } + + @Override + public List getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); +switch (statsSource) { + case ICEBERG: +// Place holder for iceberg stats +break; + case PUFFIN: +String snapshotId = table.name() + table.currentSnapshot().snapshotId(); +String statsPath = table.location() + STATS + snapshotId; +LOG.info("Using stats from puffin file at:" + statsPath); +try (PuffinReader reader = Puffin.read(table.io().newInputFile(statsPath)).build()) { + BlobMetadata blobMetadata = reader.fileMetadata().blobs().get(0); + Map> collect = + Streams.stream(reader.readAll(ImmutableList.of(blobMetadata))).collect(Collectors.toMap(Pair::first, + blobMetadataByteBufferPair -> SerializationUtils.deserialize( + ByteBuffers.toByteArray(blobMetadataByteBufferPair.second(); + + return collect.entrySet().stream().iterator().next().getValue().get(0).getStatsObj(); +} catch (IOException e) { + LOG.info(String.valueOf(e)); +} +break; + default: +// fall back to metastore +} +return null; + } + + + @Override + public boolean setColStatistics(org.apache.hadoop.hive.ql.metadata.Table table, + List colStats) { +TableDesc tableDesc = Utilities.getTableDesc(table); +Table tbl = Catalogs.loadTable(conf, tableDesc.getProperties()); +String snapshotId = tbl.name() + tbl.currentSnapshot().snapshotId(); +byte[] serializeColStats = SerializationUtils.serialize((Serializable) colStats); + +try (PuffinWriter writer = Puffin.write(tbl.io().newOutputFile(tbl.location() + STATS + snapshotId)) Review Comment: looks like not, per `getColStatsForPartCol` comment: currently, metastore does not store column stats for the partition column. Also in the HMS column stats table is called `TAB_COL_STATS` that doesn't track the per-partition stats. Issue Time Tracking --- Worklog Id: (was: 853127) Time Spent: 6h (was: 5h 50m) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 6h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853126=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853126 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 27/Mar/23 10:28 Start Date: 27/Mar/23 10:28 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1149111504 ## ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java: ## @@ -218,6 +218,9 @@ public int persistColumnStats(Hive db, Table tbl) throws HiveException, MetaExce } start = System. currentTimeMillis(); + if(tbl != null && tbl.isNonNative() && tbl.getStorageHandler().canSetColStatistics()){ + tbl.getStorageHandler().setColStatistics(tbl, colStats); Review Comment: when this code is getting invoked, is it by auto-gather thread? if yes, what if there was several snapshots generated between the runs? ## ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java: ## @@ -218,6 +218,9 @@ public int persistColumnStats(Hive db, Table tbl) throws HiveException, MetaExce } start = System. currentTimeMillis(); + if(tbl != null && tbl.isNonNative() && tbl.getStorageHandler().canSetColStatistics()){ + tbl.getStorageHandler().setColStatistics(tbl, colStats); Review Comment: when this code is getting invoked, is it by auto-gather thread? if yes, what if there were several snapshots generated between the runs? Issue Time Tracking --- Worklog Id: (was: 853126) Time Spent: 5h 50m (was: 5h 40m) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 5h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853125=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853125 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 27/Mar/23 10:25 Start Date: 27/Mar/23 10:25 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1149109172 ## ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java: ## @@ -218,6 +218,9 @@ public int persistColumnStats(Hive db, Table tbl) throws HiveException, MetaExce } start = System. currentTimeMillis(); + if(tbl != null && tbl.isNonNative() && tbl.getStorageHandler().canSetColStatistics()){ Review Comment: nit: space Issue Time Tracking --- Worklog Id: (was: 853125) Time Spent: 5h 40m (was: 5.5h) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 5h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26655) VectorUDAFBloomFilterMerge should take care of safe batch handling when working in parallel
[ https://issues.apache.org/jira/browse/HIVE-26655?focusedWorklogId=853124=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853124 ] ASF GitHub Bot logged work on HIVE-26655: - Author: ASF GitHub Bot Created on: 27/Mar/23 10:15 Start Date: 27/Mar/23 10:15 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #4158: URL: https://github.com/apache/hive/pull/4158#issuecomment-1484880192 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive=4158) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4158=false=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4158=false=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4158=false=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4158=false=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4158=false=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4158=false=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4158=false=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4158=false=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4158=false=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4158=false=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4158=false=CODE_SMELL) [5 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive=4158=false=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive=4158=coverage=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive=4158=duplicated_lines_density=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 853124) Time Spent: 20m (was: 10m) > VectorUDAFBloomFilterMerge should take care of safe batch handling when > working in parallel > --- > > Key: HIVE-26655 > URL: https://issues.apache.org/jira/browse/HIVE-26655 > Project: Hive > Issue Type: Sub-task >Reporter: Sungwoo Park >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > When tested with 100GB ORC tables, the number of rows returned by query 17 is > not stable. It returns fewer rows than the correct result (55 rows). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853121=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853121 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 27/Mar/23 09:30 Start Date: 27/Mar/23 09:30 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1149037383 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish partish) { return stats; } + + @Override + public boolean canSetColStatistics() { +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase(); +return statsSource.equals(PUFFIN); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +if (table.currentSnapshot() != null) { + String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); + String statsPath = table.location() + STATS + table.name() + table.currentSnapshot().snapshotId(); + if (statsSource.equals(PUFFIN)) { +try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) { + if (fs.exists(new Path(statsPath))) { +return true; + } +} catch (IOException e) { + LOG.warn(e.getMessage()); +} + } +} +return false; + } + + @Override + public List getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); +switch (statsSource) { + case ICEBERG: +// Place holder for iceberg stats +break; + case PUFFIN: +String snapshotId = table.name() + table.currentSnapshot().snapshotId(); +String statsPath = table.location() + STATS + snapshotId; +LOG.info("Using stats from puffin file at:" + statsPath); +try (PuffinReader reader = Puffin.read(table.io().newInputFile(statsPath)).build()) { + BlobMetadata blobMetadata = reader.fileMetadata().blobs().get(0); + Map> collect = + Streams.stream(reader.readAll(ImmutableList.of(blobMetadata))).collect(Collectors.toMap(Pair::first, + blobMetadataByteBufferPair -> SerializationUtils.deserialize( + ByteBuffers.toByteArray(blobMetadataByteBufferPair.second(); + + return collect.entrySet().stream().iterator().next().getValue().get(0).getStatsObj(); +} catch (IOException e) { + LOG.info(String.valueOf(e)); +} +break; + default: +// fall back to metastore +} +return null; + } + + + @Override + public boolean setColStatistics(org.apache.hadoop.hive.ql.metadata.Table table, + List colStats) { +TableDesc tableDesc = Utilities.getTableDesc(table); +Table tbl = Catalogs.loadTable(conf, tableDesc.getProperties()); +String snapshotId = tbl.name() + tbl.currentSnapshot().snapshotId(); +byte[] serializeColStats = SerializationUtils.serialize((Serializable) colStats); + +try (PuffinWriter writer = Puffin.write(tbl.io().newOutputFile(tbl.location() + STATS + snapshotId)) Review Comment: what about partition-level stats? if the table is partitioned you need to append the partition stats as well Issue Time Tracking --- Worklog Id: (was: 853121) Time Spent: 5.5h (was: 5h 20m) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 5.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853120=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853120 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 27/Mar/23 09:25 Start Date: 27/Mar/23 09:25 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1149037383 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish partish) { return stats; } + + @Override + public boolean canSetColStatistics() { +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase(); +return statsSource.equals(PUFFIN); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +if (table.currentSnapshot() != null) { + String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); + String statsPath = table.location() + STATS + table.name() + table.currentSnapshot().snapshotId(); + if (statsSource.equals(PUFFIN)) { +try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) { + if (fs.exists(new Path(statsPath))) { +return true; + } +} catch (IOException e) { + LOG.warn(e.getMessage()); +} + } +} +return false; + } + + @Override + public List getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); +switch (statsSource) { + case ICEBERG: +// Place holder for iceberg stats +break; + case PUFFIN: +String snapshotId = table.name() + table.currentSnapshot().snapshotId(); +String statsPath = table.location() + STATS + snapshotId; +LOG.info("Using stats from puffin file at:" + statsPath); +try (PuffinReader reader = Puffin.read(table.io().newInputFile(statsPath)).build()) { + BlobMetadata blobMetadata = reader.fileMetadata().blobs().get(0); + Map> collect = + Streams.stream(reader.readAll(ImmutableList.of(blobMetadata))).collect(Collectors.toMap(Pair::first, + blobMetadataByteBufferPair -> SerializationUtils.deserialize( + ByteBuffers.toByteArray(blobMetadataByteBufferPair.second(); + + return collect.entrySet().stream().iterator().next().getValue().get(0).getStatsObj(); +} catch (IOException e) { + LOG.info(String.valueOf(e)); +} +break; + default: +// fall back to metastore +} +return null; + } + + + @Override + public boolean setColStatistics(org.apache.hadoop.hive.ql.metadata.Table table, + List colStats) { +TableDesc tableDesc = Utilities.getTableDesc(table); +Table tbl = Catalogs.loadTable(conf, tableDesc.getProperties()); +String snapshotId = tbl.name() + tbl.currentSnapshot().snapshotId(); +byte[] serializeColStats = SerializationUtils.serialize((Serializable) colStats); + +try (PuffinWriter writer = Puffin.write(tbl.io().newOutputFile(tbl.location() + STATS + snapshotId)) Review Comment: what about partition-level stats? Issue Time Tracking --- Worklog Id: (was: 853120) Time Spent: 5h 20m (was: 5h 10m) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 5h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853119=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853119 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 27/Mar/23 09:24 Start Date: 27/Mar/23 09:24 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1149035181 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish partish) { return stats; } + + @Override + public boolean canSetColStatistics() { +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase(); +return statsSource.equals(PUFFIN); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +if (table.currentSnapshot() != null) { + String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); + String statsPath = table.location() + STATS + table.name() + table.currentSnapshot().snapshotId(); + if (statsSource.equals(PUFFIN)) { +try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) { + if (fs.exists(new Path(statsPath))) { +return true; + } +} catch (IOException e) { + LOG.warn(e.getMessage()); +} + } +} +return false; + } + + @Override + public List getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); +switch (statsSource) { + case ICEBERG: +// Place holder for iceberg stats +break; + case PUFFIN: +String snapshotId = table.name() + table.currentSnapshot().snapshotId(); +String statsPath = table.location() + STATS + snapshotId; +LOG.info("Using stats from puffin file at:" + statsPath); +try (PuffinReader reader = Puffin.read(table.io().newInputFile(statsPath)).build()) { + BlobMetadata blobMetadata = reader.fileMetadata().blobs().get(0); + Map> collect = + Streams.stream(reader.readAll(ImmutableList.of(blobMetadata))).collect(Collectors.toMap(Pair::first, + blobMetadataByteBufferPair -> SerializationUtils.deserialize( + ByteBuffers.toByteArray(blobMetadataByteBufferPair.second(); + + return collect.entrySet().stream().iterator().next().getValue().get(0).getStatsObj(); +} catch (IOException e) { + LOG.info(String.valueOf(e)); +} +break; + default: +// fall back to metastore +} +return null; + } + + + @Override + public boolean setColStatistics(org.apache.hadoop.hive.ql.metadata.Table table, + List colStats) { +TableDesc tableDesc = Utilities.getTableDesc(table); +Table tbl = Catalogs.loadTable(conf, tableDesc.getProperties()); +String snapshotId = tbl.name() + tbl.currentSnapshot().snapshotId(); +byte[] serializeColStats = SerializationUtils.serialize((Serializable) colStats); + +try (PuffinWriter writer = Puffin.write(tbl.io().newOutputFile(tbl.location() + STATS + snapshotId)) +.createdBy("Hive").build()) { + writer.add( + new Blob( + tbl.name() + "-" + snapshotId, + ImmutableList.of(1), + tbl.currentSnapshot().snapshotId(), + tbl.currentSnapshot().sequenceNumber(), + ByteBuffer.wrap(serializeColStats), + PuffinCompressionCodec.NONE, + ImmutableMap.of())); + writer.finish(); +} catch (IOException e) { + LOG.info(String.valueOf(e)); Review Comment: do not swallow exception Issue Time Tracking --- Worklog Id: (was: 853119) Time Spent: 5h 10m (was: 5h) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available >
[jira] [Work logged] (HIVE-26655) VectorUDAFBloomFilterMerge should take care of safe batch handling when working in parallel
[ https://issues.apache.org/jira/browse/HIVE-26655?focusedWorklogId=853118=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853118 ] ASF GitHub Bot logged work on HIVE-26655: - Author: ASF GitHub Bot Created on: 27/Mar/23 09:23 Start Date: 27/Mar/23 09:23 Worklog Time Spent: 10m Work Description: abstractdog opened a new pull request, #4158: URL: https://github.com/apache/hive/pull/4158 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Issue Time Tracking --- Worklog Id: (was: 853118) Remaining Estimate: 0h Time Spent: 10m > VectorUDAFBloomFilterMerge should take care of safe batch handling when > working in parallel > --- > > Key: HIVE-26655 > URL: https://issues.apache.org/jira/browse/HIVE-26655 > Project: Hive > Issue Type: Sub-task >Reporter: Sungwoo Park >Assignee: László Bodor >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > When tested with 100GB ORC tables, the number of rows returned by query 17 is > not stable. It returns fewer rows than the correct result (55 rows). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26655) VectorUDAFBloomFilterMerge should take care of safe batch handling when working in parallel
[ https://issues.apache.org/jira/browse/HIVE-26655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-26655: -- Labels: pull-request-available (was: ) > VectorUDAFBloomFilterMerge should take care of safe batch handling when > working in parallel > --- > > Key: HIVE-26655 > URL: https://issues.apache.org/jira/browse/HIVE-26655 > Project: Hive > Issue Type: Sub-task >Reporter: Sungwoo Park >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When tested with 100GB ORC tables, the number of rows returned by query 17 is > not stable. It returns fewer rows than the correct result (55 rows). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853117=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853117 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 27/Mar/23 09:22 Start Date: 27/Mar/23 09:22 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1149033188 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish partish) { return stats; } + + @Override + public boolean canSetColStatistics() { +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase(); +return statsSource.equals(PUFFIN); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +if (table.currentSnapshot() != null) { + String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); + String statsPath = table.location() + STATS + table.name() + table.currentSnapshot().snapshotId(); + if (statsSource.equals(PUFFIN)) { +try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) { + if (fs.exists(new Path(statsPath))) { +return true; + } +} catch (IOException e) { + LOG.warn(e.getMessage()); +} + } +} +return false; + } + + @Override + public List getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); +switch (statsSource) { + case ICEBERG: +// Place holder for iceberg stats +break; + case PUFFIN: +String snapshotId = table.name() + table.currentSnapshot().snapshotId(); +String statsPath = table.location() + STATS + snapshotId; +LOG.info("Using stats from puffin file at:" + statsPath); +try (PuffinReader reader = Puffin.read(table.io().newInputFile(statsPath)).build()) { + BlobMetadata blobMetadata = reader.fileMetadata().blobs().get(0); + Map> collect = + Streams.stream(reader.readAll(ImmutableList.of(blobMetadata))).collect(Collectors.toMap(Pair::first, + blobMetadataByteBufferPair -> SerializationUtils.deserialize( + ByteBuffers.toByteArray(blobMetadataByteBufferPair.second(); + + return collect.entrySet().stream().iterator().next().getValue().get(0).getStatsObj(); +} catch (IOException e) { + LOG.info(String.valueOf(e)); +} +break; + default: +// fall back to metastore +} +return null; + } + + + @Override + public boolean setColStatistics(org.apache.hadoop.hive.ql.metadata.Table table, + List colStats) { +TableDesc tableDesc = Utilities.getTableDesc(table); +Table tbl = Catalogs.loadTable(conf, tableDesc.getProperties()); +String snapshotId = tbl.name() + tbl.currentSnapshot().snapshotId(); +byte[] serializeColStats = SerializationUtils.serialize((Serializable) colStats); Review Comment: should we check if for NULL here? Issue Time Tracking --- Worklog Id: (was: 853117) Time Spent: 5h (was: 4h 50m) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853115=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853115 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 27/Mar/23 09:20 Start Date: 27/Mar/23 09:20 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1149030014 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish partish) { return stats; } + + @Override + public boolean canSetColStatistics() { +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase(); +return statsSource.equals(PUFFIN); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +if (table.currentSnapshot() != null) { + String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); + String statsPath = table.location() + STATS + table.name() + table.currentSnapshot().snapshotId(); + if (statsSource.equals(PUFFIN)) { +try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) { + if (fs.exists(new Path(statsPath))) { +return true; + } +} catch (IOException e) { + LOG.warn(e.getMessage()); +} + } +} +return false; + } + + @Override + public List getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); +switch (statsSource) { + case ICEBERG: +// Place holder for iceberg stats +break; + case PUFFIN: +String snapshotId = table.name() + table.currentSnapshot().snapshotId(); +String statsPath = table.location() + STATS + snapshotId; +LOG.info("Using stats from puffin file at:" + statsPath); +try (PuffinReader reader = Puffin.read(table.io().newInputFile(statsPath)).build()) { + BlobMetadata blobMetadata = reader.fileMetadata().blobs().get(0); + Map> collect = + Streams.stream(reader.readAll(ImmutableList.of(blobMetadata))).collect(Collectors.toMap(Pair::first, + blobMetadataByteBufferPair -> SerializationUtils.deserialize( + ByteBuffers.toByteArray(blobMetadataByteBufferPair.second(); + + return collect.entrySet().stream().iterator().next().getValue().get(0).getStatsObj(); Review Comment: why do you need a map in the first place if you are only interested in the first value? Issue Time Tracking --- Worklog Id: (was: 853115) Time Spent: 4h 40m (was: 4.5h) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 4h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853116=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853116 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 27/Mar/23 09:20 Start Date: 27/Mar/23 09:20 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1149031483 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish partish) { return stats; } + + @Override + public boolean canSetColStatistics() { +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase(); +return statsSource.equals(PUFFIN); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +if (table.currentSnapshot() != null) { + String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); + String statsPath = table.location() + STATS + table.name() + table.currentSnapshot().snapshotId(); + if (statsSource.equals(PUFFIN)) { +try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) { + if (fs.exists(new Path(statsPath))) { +return true; + } +} catch (IOException e) { + LOG.warn(e.getMessage()); +} + } +} +return false; + } + + @Override + public List getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); +switch (statsSource) { + case ICEBERG: +// Place holder for iceberg stats +break; + case PUFFIN: +String snapshotId = table.name() + table.currentSnapshot().snapshotId(); +String statsPath = table.location() + STATS + snapshotId; +LOG.info("Using stats from puffin file at:" + statsPath); +try (PuffinReader reader = Puffin.read(table.io().newInputFile(statsPath)).build()) { + BlobMetadata blobMetadata = reader.fileMetadata().blobs().get(0); + Map> collect = + Streams.stream(reader.readAll(ImmutableList.of(blobMetadata))).collect(Collectors.toMap(Pair::first, + blobMetadataByteBufferPair -> SerializationUtils.deserialize( + ByteBuffers.toByteArray(blobMetadataByteBufferPair.second(); + + return collect.entrySet().stream().iterator().next().getValue().get(0).getStatsObj(); +} catch (IOException e) { + LOG.info(String.valueOf(e)); Review Comment: why are we swallowing exception here? Issue Time Tracking --- Worklog Id: (was: 853116) Time Spent: 4h 50m (was: 4h 40m) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 4h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853114=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853114 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 27/Mar/23 09:19 Start Date: 27/Mar/23 09:19 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1149030014 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish partish) { return stats; } + + @Override + public boolean canSetColStatistics() { +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase(); +return statsSource.equals(PUFFIN); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +if (table.currentSnapshot() != null) { + String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); + String statsPath = table.location() + STATS + table.name() + table.currentSnapshot().snapshotId(); + if (statsSource.equals(PUFFIN)) { +try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) { + if (fs.exists(new Path(statsPath))) { +return true; + } +} catch (IOException e) { + LOG.warn(e.getMessage()); +} + } +} +return false; + } + + @Override + public List getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); +switch (statsSource) { + case ICEBERG: +// Place holder for iceberg stats +break; + case PUFFIN: +String snapshotId = table.name() + table.currentSnapshot().snapshotId(); +String statsPath = table.location() + STATS + snapshotId; +LOG.info("Using stats from puffin file at:" + statsPath); +try (PuffinReader reader = Puffin.read(table.io().newInputFile(statsPath)).build()) { + BlobMetadata blobMetadata = reader.fileMetadata().blobs().get(0); + Map> collect = + Streams.stream(reader.readAll(ImmutableList.of(blobMetadata))).collect(Collectors.toMap(Pair::first, + blobMetadataByteBufferPair -> SerializationUtils.deserialize( + ByteBuffers.toByteArray(blobMetadataByteBufferPair.second(); + + return collect.entrySet().stream().iterator().next().getValue().get(0).getStatsObj(); Review Comment: why do you need a map in the first place? Issue Time Tracking --- Worklog Id: (was: 853114) Time Spent: 4.5h (was: 4h 20m) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 4.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853113=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853113 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 27/Mar/23 09:17 Start Date: 27/Mar/23 09:17 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1149027518 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish partish) { return stats; } + + @Override + public boolean canSetColStatistics() { +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase(); +return statsSource.equals(PUFFIN); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +if (table.currentSnapshot() != null) { + String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); + String statsPath = table.location() + STATS + table.name() + table.currentSnapshot().snapshotId(); + if (statsSource.equals(PUFFIN)) { +try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) { + if (fs.exists(new Path(statsPath))) { +return true; + } +} catch (IOException e) { + LOG.warn(e.getMessage()); +} + } +} +return false; + } + + @Override + public List getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); +switch (statsSource) { + case ICEBERG: +// Place holder for iceberg stats +break; + case PUFFIN: +String snapshotId = table.name() + table.currentSnapshot().snapshotId(); +String statsPath = table.location() + STATS + snapshotId; +LOG.info("Using stats from puffin file at:" + statsPath); +try (PuffinReader reader = Puffin.read(table.io().newInputFile(statsPath)).build()) { + BlobMetadata blobMetadata = reader.fileMetadata().blobs().get(0); + Map> collect = + Streams.stream(reader.readAll(ImmutableList.of(blobMetadata))).collect(Collectors.toMap(Pair::first, Review Comment: why do you need to wrap with ImmutableList.of(blobMetadata) ? Issue Time Tracking --- Worklog Id: (was: 853113) Time Spent: 4h 20m (was: 4h 10m) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 4h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853112=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853112 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 27/Mar/23 09:16 Start Date: 27/Mar/23 09:16 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1149025872 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish partish) { return stats; } + + @Override + public boolean canSetColStatistics() { +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase(); +return statsSource.equals(PUFFIN); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +if (table.currentSnapshot() != null) { + String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); + String statsPath = table.location() + STATS + table.name() + table.currentSnapshot().snapshotId(); + if (statsSource.equals(PUFFIN)) { +try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) { + if (fs.exists(new Path(statsPath))) { +return true; + } +} catch (IOException e) { + LOG.warn(e.getMessage()); +} + } +} +return false; + } + + @Override + public List getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); +switch (statsSource) { + case ICEBERG: +// Place holder for iceberg stats +break; + case PUFFIN: +String snapshotId = table.name() + table.currentSnapshot().snapshotId(); +String statsPath = table.location() + STATS + snapshotId; +LOG.info("Using stats from puffin file at:" + statsPath); +try (PuffinReader reader = Puffin.read(table.io().newInputFile(statsPath)).build()) { + BlobMetadata blobMetadata = reader.fileMetadata().blobs().get(0); Review Comment: could it be empty (IndexOutOfBoundsException)? Issue Time Tracking --- Worklog Id: (was: 853112) Time Spent: 4h 10m (was: 4h) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 4h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853111=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853111 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 27/Mar/23 09:13 Start Date: 27/Mar/23 09:13 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1149023043 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish partish) { return stats; } + + @Override + public boolean canSetColStatistics() { +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase(); +return statsSource.equals(PUFFIN); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +if (table.currentSnapshot() != null) { + String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); + String statsPath = table.location() + STATS + table.name() + table.currentSnapshot().snapshotId(); + if (statsSource.equals(PUFFIN)) { +try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) { + if (fs.exists(new Path(statsPath))) { +return true; + } +} catch (IOException e) { + LOG.warn(e.getMessage()); +} + } +} +return false; + } + + @Override + public List getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); +switch (statsSource) { + case ICEBERG: +// Place holder for iceberg stats +break; + case PUFFIN: +String snapshotId = table.name() + table.currentSnapshot().snapshotId(); +String statsPath = table.location() + STATS + snapshotId; +LOG.info("Using stats from puffin file at:" + statsPath); Review Comment: debug level? Issue Time Tracking --- Worklog Id: (was: 853111) Time Spent: 4h (was: 3h 50m) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 4h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853110=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853110 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 27/Mar/23 09:13 Start Date: 27/Mar/23 09:13 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1149022350 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish partish) { return stats; } + + @Override + public boolean canSetColStatistics() { +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase(); +return statsSource.equals(PUFFIN); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +if (table.currentSnapshot() != null) { + String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); + String statsPath = table.location() + STATS + table.name() + table.currentSnapshot().snapshotId(); + if (statsSource.equals(PUFFIN)) { +try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) { + if (fs.exists(new Path(statsPath))) { +return true; + } +} catch (IOException e) { + LOG.warn(e.getMessage()); +} + } +} +return false; + } + + @Override + public List getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); +switch (statsSource) { + case ICEBERG: +// Place holder for iceberg stats +break; + case PUFFIN: +String snapshotId = table.name() + table.currentSnapshot().snapshotId(); +String statsPath = table.location() + STATS + snapshotId; Review Comment: could we extract path construction into a helper method? ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish partish) { return stats; } + + @Override + public boolean canSetColStatistics() { +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase(); +return statsSource.equals(PUFFIN); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +if (table.currentSnapshot() != null) { + String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); + String statsPath = table.location() + STATS + table.name() + table.currentSnapshot().snapshotId(); + if (statsSource.equals(PUFFIN)) { +try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) { + if (fs.exists(new Path(statsPath))) { +return true; + } +} catch (IOException e) { + LOG.warn(e.getMessage()); +} + } +} +return false; + } + + @Override + public List getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); +switch (statsSource) { + case ICEBERG: +// Place holder for iceberg stats +break; + case PUFFIN: +String snapshotId = table.name() + table.currentSnapshot().snapshotId(); +String statsPath = table.location() + STATS + snapshotId; Review Comment: could we extract path construction into a helper method and reuse? Issue Time Tracking --- Worklog Id: (was: 853110) Time Spent: 3h 50m (was: 3h 40m) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL:
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853108=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853108 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 27/Mar/23 09:09 Start Date: 27/Mar/23 09:09 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1149015944 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish partish) { return stats; } + + @Override + public boolean canSetColStatistics() { +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase(); +return statsSource.equals(PUFFIN); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +if (table.currentSnapshot() != null) { + String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); + String statsPath = table.location() + STATS + table.name() + table.currentSnapshot().snapshotId(); + if (statsSource.equals(PUFFIN)) { +try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) { + if (fs.exists(new Path(statsPath))) { +return true; + } +} catch (IOException e) { + LOG.warn(e.getMessage()); +} + } +} +return false; + } + + @Override + public List getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); Review Comment: enum? Issue Time Tracking --- Worklog Id: (was: 853108) Time Spent: 3.5h (was: 3h 20m) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 3.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853109=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853109 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 27/Mar/23 09:09 Start Date: 27/Mar/23 09:09 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1149016455 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish partish) { return stats; } + + @Override + public boolean canSetColStatistics() { +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase(); +return statsSource.equals(PUFFIN); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +if (table.currentSnapshot() != null) { + String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); + String statsPath = table.location() + STATS + table.name() + table.currentSnapshot().snapshotId(); + if (statsSource.equals(PUFFIN)) { +try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) { + if (fs.exists(new Path(statsPath))) { +return true; + } +} catch (IOException e) { + LOG.warn(e.getMessage()); +} + } +} +return false; + } + + @Override + public List getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; Review Comment: why is this needed? Issue Time Tracking --- Worklog Id: (was: 853109) Time Spent: 3h 40m (was: 3.5h) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 3h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27178) Backport of HIVE-23321 to branch-3
[ https://issues.apache.org/jira/browse/HIVE-27178?focusedWorklogId=853107=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853107 ] ASF GitHub Bot logged work on HIVE-27178: - Author: ASF GitHub Bot Created on: 27/Mar/23 09:08 Start Date: 27/Mar/23 09:08 Worklog Time Spent: 10m Work Description: amanraj2520 commented on PR #4157: URL: https://github.com/apache/hive/pull/4157#issuecomment-1484779655 @vihangk1 Can you please approve and merge this. This is a part of the fix for sysdb.q. The other part is fixed in #4156 Issue Time Tracking --- Worklog Id: (was: 853107) Time Spent: 20m (was: 10m) > Backport of HIVE-23321 to branch-3 > -- > > Key: HIVE-27178 > URL: https://issues.apache.org/jira/browse/HIVE-27178 > Project: Hive > Issue Type: Sub-task >Reporter: Aman Raj >Assignee: Aman Raj >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Current branch-3 fails with the diff in select count(*) from > skewed_string_list and select count(*) from skewed_string_list_values. > Jenkins run : [jenkins / hive-precommit / PR-4156 / #1 > (apache.org)|http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4156/1/tests/] > Diff : > Client Execution succeeded but contained differences (error code = 1) after > executing sysdb.q > 3740d3739 > < hdfs://### HDFS PATH ### default public ROLE > 4036c4035 > < 3 > --- > > 6 > 4045c4044 > < 3 > --- > > 6 > > This ticket tries to fix this diff. Please read the description of this > ticket for the exact reason. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853106=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853106 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 27/Mar/23 09:07 Start Date: 27/Mar/23 09:07 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1148989152 ## common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: ## @@ -2207,6 +2207,8 @@ public static enum ConfVars { "Whether to use codec pool in ORC. Disable if there are bugs with codec reuse."), HIVE_USE_STATS_FROM("hive.use.stats.from","iceberg","Use stats from iceberg table snapshot for query " + "planning. This has three values metastore, puffin and iceberg"), +HIVE_COL_STATS_SOURCE("hive.col.stats.source","metastore","Use stats from puffin file for query " + Review Comment: - do we support `iceberg` mode? - please update the description - "Use stats from the selected source for query planning" what's the difference between `HIVE_USE_STATS_FROM` && `HIVE_COL_STATS_SOURCE `? it doesn't seem to be a generic config, should we add an ICEBERG prefix? Issue Time Tracking --- Worklog Id: (was: 853106) Time Spent: 3h 20m (was: 3h 10m) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 3h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (HIVE-26655) VectorUDAFBloomFilterMerge should take care of safe batch handling when working in parallel
[ https://issues.apache.org/jira/browse/HIVE-26655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-26655 started by László Bodor. --- > VectorUDAFBloomFilterMerge should take care of safe batch handling when > working in parallel > --- > > Key: HIVE-26655 > URL: https://issues.apache.org/jira/browse/HIVE-26655 > Project: Hive > Issue Type: Sub-task >Reporter: Sungwoo Park >Assignee: László Bodor >Priority: Major > > When tested with 100GB ORC tables, the number of rows returned by query 17 is > not stable. It returns fewer rows than the correct result (55 rows). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27057) Revert "HIVE-21741 Backport HIVE-20221 & related fix HIVE-20833 to branch-3"
[ https://issues.apache.org/jira/browse/HIVE-27057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Raj updated HIVE-27057: Summary: Revert "HIVE-21741 Backport HIVE-20221 & related fix HIVE-20833 to branch-3" (was: Test fix for sysdb.q) > Revert "HIVE-21741 Backport HIVE-20221 & related fix HIVE-20833 to branch-3" > > > Key: HIVE-27057 > URL: https://issues.apache.org/jira/browse/HIVE-27057 > Project: Hive > Issue Type: Sub-task >Reporter: Aman Raj >Assignee: Aman Raj >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The sysdb test fails with the following error: > h4. Error > Client Execution succeeded but contained differences (error code = 1) after > executing sysdb.q > 3803,3807c3803,3807 > < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@125b285b > < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@471246f3 > < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@57c013 > < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@59f1d7ac > < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@71a0 > --- > > COLUMN_STATS_ACCURATE > > \{"BASIC_STATS":"true","COLUMN_STATS":{"c_boolean":"true","c_float":"true","c_int":"true","key":"true","value":"true"}} > > COLUMN_STATS_ACCURATE > > \{"BASIC_STATS":"true","COLUMN_STATS":{"c_boolean":"true","c_float":"true","c_int":"true","key":"true","value":"true"}} > > COLUMN_STATS_ACCURATE > > \{"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}} > > COLUMN_STATS_ACCURATE > > \{"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}} > > COLUMN_STATS_ACCURATE > > \{"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27057) Test fix for sysdb.q
[ https://issues.apache.org/jira/browse/HIVE-27057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Raj updated HIVE-27057: Parent: HIVE-26836 Issue Type: Sub-task (was: Test) > Test fix for sysdb.q > > > Key: HIVE-27057 > URL: https://issues.apache.org/jira/browse/HIVE-27057 > Project: Hive > Issue Type: Sub-task >Reporter: Aman Raj >Assignee: Aman Raj >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The sysdb test fails with the following error: > h4. Error > Client Execution succeeded but contained differences (error code = 1) after > executing sysdb.q > 3803,3807c3803,3807 > < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@125b285b > < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@471246f3 > < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@57c013 > < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@59f1d7ac > < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@71a0 > --- > > COLUMN_STATS_ACCURATE > > \{"BASIC_STATS":"true","COLUMN_STATS":{"c_boolean":"true","c_float":"true","c_int":"true","key":"true","value":"true"}} > > COLUMN_STATS_ACCURATE > > \{"BASIC_STATS":"true","COLUMN_STATS":{"c_boolean":"true","c_float":"true","c_int":"true","key":"true","value":"true"}} > > COLUMN_STATS_ACCURATE > > \{"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}} > > COLUMN_STATS_ACCURATE > > \{"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}} > > COLUMN_STATS_ACCURATE > > \{"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-21741) Backport HIVE-20221 & related fix HIVE-20833 to branch-3: Increase column width for partition_params
[ https://issues.apache.org/jira/browse/HIVE-21741?focusedWorklogId=853105=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853105 ] ASF GitHub Bot logged work on HIVE-21741: - Author: ASF GitHub Bot Created on: 27/Mar/23 09:03 Start Date: 27/Mar/23 09:03 Worklog Time Spent: 10m Work Description: amanraj2520 commented on PR #4156: URL: https://github.com/apache/hive/pull/4156#issuecomment-1484772726 @vihangk1 As suggested by you, this revert fixed the BASIC_STATS printed in a json string issue. But there is another failure in the sysdb.q file which is why I have raised https://github.com/apache/hive/pull/4157. First we should merge this PR (https://github.com/apache/hive/pull/4157) and revert this. I have tested in my local. It is working fine. Can you please approve and merge the #4157 PR. Issue Time Tracking --- Worklog Id: (was: 853105) Time Spent: 50m (was: 40m) > Backport HIVE-20221 & related fix HIVE-20833 to branch-3: Increase column > width for partition_params > > > Key: HIVE-21741 > URL: https://issues.apache.org/jira/browse/HIVE-21741 > Project: Hive > Issue Type: Bug > Components: Metastore, Standalone Metastore >Affects Versions: 3.1.1 >Reporter: David Lavati >Assignee: David Lavati >Priority: Major > Labels: pull-request-available > Fix For: 3.2.0 > > Attachments: HIVE-21741.01.branch-3.patch, > HIVE-21741.01.branch-3.patch, HIVE-21741.02.branch-3.patch, > HIVE-21741.branch-3.patch > > Time Spent: 50m > Remaining Estimate: 0h > > This is an umbrella for backporting HIVE-20221 & the related fix of > HIVE-20833 to branch-3. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853104=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853104 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 27/Mar/23 09:03 Start Date: 27/Mar/23 09:03 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1149008394 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish partish) { return stats; } + + @Override + public boolean canSetColStatistics() { +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase(); +return statsSource.equals(PUFFIN); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +if (table.currentSnapshot() != null) { + String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); + String statsPath = table.location() + STATS + table.name() + table.currentSnapshot().snapshotId(); + if (statsSource.equals(PUFFIN)) { +try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) { + if (fs.exists(new Path(statsPath))) { Review Comment: can we create a path variable just once? Path statsPath = new Path(...); try (FileSystem fs = statsPath.getFileSystem(conf)) { return fs.exists(statsPath)); } Issue Time Tracking --- Worklog Id: (was: 853104) Time Spent: 3h 10m (was: 3h) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 3h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853102=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853102 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 27/Mar/23 09:02 Start Date: 27/Mar/23 09:02 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1149008394 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish partish) { return stats; } + + @Override + public boolean canSetColStatistics() { +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase(); +return statsSource.equals(PUFFIN); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +if (table.currentSnapshot() != null) { + String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); + String statsPath = table.location() + STATS + table.name() + table.currentSnapshot().snapshotId(); + if (statsSource.equals(PUFFIN)) { +try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) { + if (fs.exists(new Path(statsPath))) { Review Comment: can we create a path variable just once? Path statsPath = new Path(...); FileSystem fs = statsPath.getFileSystem(conf); return fs.exists(statsPath)); Issue Time Tracking --- Worklog Id: (was: 853102) Time Spent: 3h (was: 2h 50m) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 3h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27178) Backport of HIVE-23321 to branch-3
[ https://issues.apache.org/jira/browse/HIVE-27178?focusedWorklogId=853101=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853101 ] ASF GitHub Bot logged work on HIVE-27178: - Author: ASF GitHub Bot Created on: 27/Mar/23 09:01 Start Date: 27/Mar/23 09:01 Worklog Time Spent: 10m Work Description: amanraj2520 opened a new pull request, #4157: URL: https://github.com/apache/hive/pull/4157 JIRA link : https://issues.apache.org/jira/browse/HIVE-27178 Jenkins build link having the failure - http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4156/1/tests ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Issue Time Tracking --- Worklog Id: (was: 853101) Remaining Estimate: 0h Time Spent: 10m > Backport of HIVE-23321 to branch-3 > -- > > Key: HIVE-27178 > URL: https://issues.apache.org/jira/browse/HIVE-27178 > Project: Hive > Issue Type: Sub-task >Reporter: Aman Raj >Assignee: Aman Raj >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Current branch-3 fails with the diff in select count(*) from > skewed_string_list and select count(*) from skewed_string_list_values. > Jenkins run : [jenkins / hive-precommit / PR-4156 / #1 > (apache.org)|http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4156/1/tests/] > Diff : > Client Execution succeeded but contained differences (error code = 1) after > executing sysdb.q > 3740d3739 > < hdfs://### HDFS PATH ### default public ROLE > 4036c4035 > < 3 > --- > > 6 > 4045c4044 > < 3 > --- > > 6 > > This ticket tries to fix this diff. Please read the description of this > ticket for the exact reason. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27178) Backport of HIVE-23321 to branch-3
[ https://issues.apache.org/jira/browse/HIVE-27178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-27178: -- Labels: pull-request-available (was: ) > Backport of HIVE-23321 to branch-3 > -- > > Key: HIVE-27178 > URL: https://issues.apache.org/jira/browse/HIVE-27178 > Project: Hive > Issue Type: Sub-task >Reporter: Aman Raj >Assignee: Aman Raj >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Current branch-3 fails with the diff in select count(*) from > skewed_string_list and select count(*) from skewed_string_list_values. > Jenkins run : [jenkins / hive-precommit / PR-4156 / #1 > (apache.org)|http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4156/1/tests/] > Diff : > Client Execution succeeded but contained differences (error code = 1) after > executing sysdb.q > 3740d3739 > < hdfs://### HDFS PATH ### default public ROLE > 4036c4035 > < 3 > --- > > 6 > 4045c4044 > < 3 > --- > > 6 > > This ticket tries to fix this diff. Please read the description of this > ticket for the exact reason. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853100=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853100 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 27/Mar/23 08:58 Start Date: 27/Mar/23 08:58 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1149003439 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish partish) { return stats; } + + @Override + public boolean canSetColStatistics() { +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase(); +return statsSource.equals(PUFFIN); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +if (table.currentSnapshot() != null) { + String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); + String statsPath = table.location() + STATS + table.name() + table.currentSnapshot().snapshotId(); + if (statsSource.equals(PUFFIN)) { Review Comment: PUFFIN.equals(statsSource), maybe even better to create enum Issue Time Tracking --- Worklog Id: (was: 853100) Time Spent: 2h 50m (was: 2h 40m) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853099=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853099 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 27/Mar/23 08:57 Start Date: 27/Mar/23 08:57 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1148999605 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish partish) { return stats; } + + @Override + public boolean canSetColStatistics() { +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase(); +return statsSource.equals(PUFFIN); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +if (table.currentSnapshot() != null) { + String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); + String statsPath = table.location() + STATS + table.name() + table.currentSnapshot().snapshotId(); Review Comment: should we wrap table name with some delimiters like '-'? `STATS-customers-122121` is there some extension for the puffin file? btw, declare this vars under the PUFFIN code block Issue Time Tracking --- Worklog Id: (was: 853099) Time Spent: 2h 40m (was: 2.5h) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853098=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853098 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 27/Mar/23 08:56 Start Date: 27/Mar/23 08:56 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1148999605 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish partish) { return stats; } + + @Override + public boolean canSetColStatistics() { +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase(); +return statsSource.equals(PUFFIN); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +if (table.currentSnapshot() != null) { + String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); + String statsPath = table.location() + STATS + table.name() + table.currentSnapshot().snapshotId(); Review Comment: should we wrap table name with some delimiters like '-'? `STATS-customers-122121` is there some extension for the puffin file? Issue Time Tracking --- Worklog Id: (was: 853098) Time Spent: 2.5h (was: 2h 20m) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 2.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27178) Backport of HIVE-23321 to branch-3
[ https://issues.apache.org/jira/browse/HIVE-27178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Raj reassigned HIVE-27178: --- > Backport of HIVE-23321 to branch-3 > -- > > Key: HIVE-27178 > URL: https://issues.apache.org/jira/browse/HIVE-27178 > Project: Hive > Issue Type: Sub-task >Reporter: Aman Raj >Assignee: Aman Raj >Priority: Major > > Current branch-3 fails with the diff in select count(*) from > skewed_string_list and select count(*) from skewed_string_list_values. > Jenkins run : [jenkins / hive-precommit / PR-4156 / #1 > (apache.org)|http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4156/1/tests/] > Diff : > Client Execution succeeded but contained differences (error code = 1) after > executing sysdb.q > 3740d3739 > < hdfs://### HDFS PATH ### default public ROLE > 4036c4035 > < 3 > --- > > 6 > 4045c4044 > < 3 > --- > > 6 > > This ticket tries to fix this diff. Please read the description of this > ticket for the exact reason. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853097=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853097 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 27/Mar/23 08:55 Start Date: 27/Mar/23 08:55 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1148999605 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish partish) { return stats; } + + @Override + public boolean canSetColStatistics() { +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase(); +return statsSource.equals(PUFFIN); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +if (table.currentSnapshot() != null) { + String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); + String statsPath = table.location() + STATS + table.name() + table.currentSnapshot().snapshotId(); Review Comment: should we wrap table name with some delimiters like '-'? Issue Time Tracking --- Worklog Id: (was: 853097) Time Spent: 2h 20m (was: 2h 10m) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853096=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853096 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 27/Mar/23 08:53 Start Date: 27/Mar/23 08:53 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1148996651 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish partish) { return stats; } + + @Override + public boolean canSetColStatistics() { +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase(); +return statsSource.equals(PUFFIN); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; Review Comment: why do we need local var? Issue Time Tracking --- Worklog Id: (was: 853096) Time Spent: 2h 10m (was: 2h) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853095=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853095 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 27/Mar/23 08:51 Start Date: 27/Mar/23 08:51 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1148993843 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish partish) { return stats; } + + @Override + public boolean canSetColStatistics() { +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase(); +return statsSource.equals(PUFFIN); Review Comment: what if statsSource is undefined, could we get NPE? Issue Time Tracking --- Worklog Id: (was: 853095) Time Spent: 2h (was: 1h 50m) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853094=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853094 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 27/Mar/23 08:49 Start Date: 27/Mar/23 08:49 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1148989152 ## common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: ## @@ -2207,6 +2207,8 @@ public static enum ConfVars { "Whether to use codec pool in ORC. Disable if there are bugs with codec reuse."), HIVE_USE_STATS_FROM("hive.use.stats.from","iceberg","Use stats from iceberg table snapshot for query " + "planning. This has three values metastore, puffin and iceberg"), +HIVE_COL_STATS_SOURCE("hive.col.stats.source","metastore","Use stats from puffin file for query " + Review Comment: - do we support `iceberg` mode? - please update the description - "Use stats from selected source for query planning" ## common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: ## @@ -2207,6 +2207,8 @@ public static enum ConfVars { "Whether to use codec pool in ORC. Disable if there are bugs with codec reuse."), HIVE_USE_STATS_FROM("hive.use.stats.from","iceberg","Use stats from iceberg table snapshot for query " + "planning. This has three values metastore, puffin and iceberg"), +HIVE_COL_STATS_SOURCE("hive.col.stats.source","metastore","Use stats from puffin file for query " + Review Comment: - do we support `iceberg` mode? - please update the description - "Use stats from the selected source for query planning" Issue Time Tracking --- Worklog Id: (was: 853094) Time Spent: 1h 50m (was: 1h 40m) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26655) VectorUDAFBloomFilterMerge should take care of safe batch handling when working in parallel
[ https://issues.apache.org/jira/browse/HIVE-26655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-26655: Summary: VectorUDAFBloomFilterMerge should take care of safe batch handling when working in parallel (was: TPC-DS query 17 returns wrong results) > VectorUDAFBloomFilterMerge should take care of safe batch handling when > working in parallel > --- > > Key: HIVE-26655 > URL: https://issues.apache.org/jira/browse/HIVE-26655 > Project: Hive > Issue Type: Sub-task >Reporter: Sungwoo Park >Assignee: László Bodor >Priority: Major > > When tested with 100GB ORC tables, the number of rows returned by query 17 is > not stable. It returns fewer rows than the correct result (55 rows). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=853070=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853070 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 27/Mar/23 08:00 Start Date: 27/Mar/23 08:00 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #4131: URL: https://github.com/apache/hive/pull/4131#issuecomment-1484679758 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive=4131) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4131=false=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4131=false=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4131=false=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=CODE_SMELL) [6 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive=4131=coverage=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive=4131=duplicated_lines_density=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 853070) Time Spent: 1h 40m (was: 1.5h) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27168) Use basename of the datatype when fetching partition metadata using partition filters
[ https://issues.apache.org/jira/browse/HIVE-27168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705212#comment-17705212 ] Sourabh Badhya commented on HIVE-27168: --- Thanks [~kokila19] , [~rkirtir] , [~akshatm] , [~InvisibleProgrammer] , [~veghlaci05] , [~dkuzmenko] for the reviews. > Use basename of the datatype when fetching partition metadata using partition > filters > - > > Key: HIVE-27168 > URL: https://issues.apache.org/jira/browse/HIVE-27168 > Project: Hive > Issue Type: Bug >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 3h > Remaining Estimate: 0h > > While fetching partition metadata using partition filters, we use the column > type of the table directly. However, char/varchar types can contain extra > information such as length of the char/varchar column and hence it skips > fetching partition metadata due to this extra information. > Solution: Use the basename of the column type while deciding on whether > partition pruning can be done on the partitioned column. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-21741) Backport HIVE-20221 & related fix HIVE-20833 to branch-3: Increase column width for partition_params
[ https://issues.apache.org/jira/browse/HIVE-21741?focusedWorklogId=853062=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853062 ] ASF GitHub Bot logged work on HIVE-21741: - Author: ASF GitHub Bot Created on: 27/Mar/23 07:17 Start Date: 27/Mar/23 07:17 Worklog Time Spent: 10m Work Description: amanraj2520 opened a new pull request, #4156: URL: https://github.com/apache/hive/pull/4156 …anch-3: Increase column width for partition_params (David Lavati via Alan Gates)" This reverts commit e3d5abda ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Issue Time Tracking --- Worklog Id: (was: 853062) Time Spent: 40m (was: 0.5h) > Backport HIVE-20221 & related fix HIVE-20833 to branch-3: Increase column > width for partition_params > > > Key: HIVE-21741 > URL: https://issues.apache.org/jira/browse/HIVE-21741 > Project: Hive > Issue Type: Bug > Components: Metastore, Standalone Metastore >Affects Versions: 3.1.1 >Reporter: David Lavati >Assignee: David Lavati >Priority: Major > Labels: pull-request-available > Fix For: 3.2.0 > > Attachments: HIVE-21741.01.branch-3.patch, > HIVE-21741.01.branch-3.patch, HIVE-21741.02.branch-3.patch, > HIVE-21741.branch-3.patch > > Time Spent: 40m > Remaining Estimate: 0h > > This is an umbrella for backporting HIVE-20221 & the related fix of > HIVE-20833 to branch-3. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27174) Disable sysdb.q test
[ https://issues.apache.org/jira/browse/HIVE-27174?focusedWorklogId=853056=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853056 ] ASF GitHub Bot logged work on HIVE-27174: - Author: ASF GitHub Bot Created on: 27/Mar/23 06:45 Start Date: 27/Mar/23 06:45 Worklog Time Spent: 10m Work Description: amanraj2520 commented on PR #4152: URL: https://github.com/apache/hive/pull/4152#issuecomment-1484590603 @vihangk1 I see that in this #HIVE-21741, there were some changes related to INDEX_PARAMS from varchar to Clob. My hunch is that somewhere because of that this issue comes. Any luck from your side. Issue Time Tracking --- Worklog Id: (was: 853056) Time Spent: 20m (was: 10m) > Disable sysdb.q test > > > Key: HIVE-27174 > URL: https://issues.apache.org/jira/browse/HIVE-27174 > Project: Hive > Issue Type: Sub-task >Reporter: Aman Raj >Assignee: Aman Raj >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > h3. What changes were proposed in this pull request? > Disabled sysdb.q test. The test is failing because of diff in > BASIC_COLUMN_STATS json string. > Client Execution succeeded but contained differences (error code = 1) after > executing sysdb.q > 3803,3807c3803,3807 > < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@125b285b > < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@471246f3 > < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@57c013 > < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@59f1d7ac > < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@71a0 > — > {quote}COLUMN_STATS_ACCURATE > \{"BASIC_STATS":"true","COLUMN_STATS":{"c_boolean":"true","c_float":"true","c_int":"true","key":"true","value":"true"}} > COLUMN_STATS_ACCURATE > \{"BASIC_STATS":"true","COLUMN_STATS":{"c_boolean":"true","c_float":"true","c_int":"true","key":"true","value":"true"}} > COLUMN_STATS_ACCURATE > \{"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}} > COLUMN_STATS_ACCURATE > \{"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}} > COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS": > {quote} > h3. Why are the changes needed? > There is no issue in the test. The current code prints the COL_STATS as an > Object instead of a json string. Not sure why is this case. Tried a lot of > ways but seems like this is not fixable at the moment. So, disabling it for > now. Note that, in Hive 3.1.3 release this test was disabled so there should > not be any issue in disabling it here. > > > Created a followup ticket to fix this test that can be taken up later - > [HIVE-27057] Test fix for sysdb.q - ASF JIRA (apache.org) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27150) Drop single partition can also support direct sql
[ https://issues.apache.org/jira/browse/HIVE-27150?focusedWorklogId=853049=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853049 ] ASF GitHub Bot logged work on HIVE-27150: - Author: ASF GitHub Bot Created on: 27/Mar/23 06:08 Start Date: 27/Mar/23 06:08 Worklog Time Spent: 10m Work Description: saihemanth-cloudera commented on code in PR #4123: URL: https://github.com/apache/hive/pull/4123#discussion_r1148824344 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java: ## @@ -3101,6 +3100,22 @@ public boolean dropPartition(String catName, String dbName, String tableName, return success; } + @Override + public boolean dropPartition(String catName, String dbName, String tableName, String partName) + throws MetaException, NoSuchObjectException, InvalidObjectException, InvalidInputException { +boolean success = false; +try { + openTransaction(); + dropPartitionsInternal(catName, dbName, tableName, Arrays.asList(partName), true, true); Review Comment: I don't think this would improve the performance by any means. Consider dropping 10k partitions, each partition would have to access same number of tables in the underlying db to update the records, so it makes sense to batch them and implement with direct SQL. But for a single partition since it'll access same number of tables in the DB, I don't think it'll make sense to implement this feature. For example, [HIVE-26035](https://issues.apache.org/jira/browse/HIVE-26035) (see the details in the jira) proved that implementing direct SQL actually improved the performance by running against benchmark tests. Similarly can you provide any evidence that this patch also has an edge by running those tests? (Probably you might have to add some tests(e.g: dropping 10K+ single partitions). Issue Time Tracking --- Worklog Id: (was: 853049) Time Spent: 1h 20m (was: 1h 10m) > Drop single partition can also support direct sql > - > > Key: HIVE-27150 > URL: https://issues.apache.org/jira/browse/HIVE-27150 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Wechar >Assignee: Wechar >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > *Background:* > [HIVE-6980|https://issues.apache.org/jira/browse/HIVE-6980] supports direct > sql for drop_partitions, we can reuse this huge improvement in drop_partition. -- This message was sent by Atlassian Jira (v8.20.10#820010)