[jira] [Created] (HIVE-27228) Add missing upgrade SQL statements after CQ_NUMBER_OF_BUCKETS column being introduced in HIVE-26719
Sourabh Badhya created HIVE-27228: - Summary: Add missing upgrade SQL statements after CQ_NUMBER_OF_BUCKETS column being introduced in HIVE-26719 Key: HIVE-27228 URL: https://issues.apache.org/jira/browse/HIVE-27228 Project: Hive Issue Type: Bug Reporter: Sourabh Badhya Assignee: Sourabh Badhya HIVE-26719 introduced CQ_NUMBER_OF_BUCKETS column in COMPACTION_QUEUE table and COMPLETED_COMPACTIONS table. However, the corresponding upgrade SQL statements is missing for these columns. Also CQ_NUMBER_OF_BUCKETS is not updated in the COMPACTIONS view in information schema. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27227) Provide config to re-enable partitions discovery on external tables
Taraka Rama Rao Lethavadla created HIVE-27227: - Summary: Provide config to re-enable partitions discovery on external tables Key: HIVE-27227 URL: https://issues.apache.org/jira/browse/HIVE-27227 Project: Hive Issue Type: Improvement Components: Hive Reporter: Taraka Rama Rao Lethavadla As part of HIVE-25039 disabled discovery.partitions config for external tables by default. Now if someone wants to turn on the feature(knowing the risk) they have to add the config set to true for every newly created table. Another use case is if a user want to enable this feature(knowing the risk) to all the existing tables, he has to execute alter table command for every table. It would be very difficult if they have lot of tables -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27226) FullOuterJoin with filter expressions is not computed correctly
Seonggon Namgung created HIVE-27226: --- Summary: FullOuterJoin with filter expressions is not computed correctly Key: HIVE-27226 URL: https://issues.apache.org/jira/browse/HIVE-27226 Project: Hive Issue Type: Bug Reporter: Seonggon Namgung I tested many OuterJoin queries as an extension of HIVE-27138, and I found that Hive returns incorrect result for a query containing FullOuterJoin with filter expressions. In a nutshell, all JoinOperators that run on Tez engine return incorrect result for OuterJoin queries, and one of the reason for incorrect computation comes from CommonJoinOperator, which is the base of all JoinOperators. I attached the queries and configuration that I used at the bottom of the document. I am still inspecting this problems, and I will share an update once when I find out another reason. Also any comments and opinions would be appreciated. First of all, I observed that current Hive ignores filter expressions contained in MapJoinOperator. For example, the attached result of query1 shows that MapJoinOperator performs inner join, not full outer join. This problem stems from removal of filterMap. When converting JoinOperator to MapJoinOperator, ConvertJoinMapJoin#convertJoinDynamicPartitionedHashJoin() removes filterMap of MapJoinOperator. Because MapJoinOperator does not evaluate filter expressions if filterMap is null, this change makes MapJoinOperator ignore filter expressions and it always joins tables regardless whether they satisfy filter expressions or not. To solve this problem, I disable FullOuterMapJoinOptimization and apply path for HIVE-27138, which prevents NPE. (The patch is available at the following link: LINK.) The rest of this document uses this modified Hive, but most of problems happen to current Hive, too. The second problem I found is that Hive returns the same left-null or right-null rows multiple time when it uses MapJoinOperator or CommonMergeJoinOperator. This is caused by the logic of current CommonJoinOperator. Both of the two JoinOperators joins tables in 2 steps. First, they create RowContainers, each of which is a group of rows from one table and has the same key. Second, they call CommonJoinOperator#checkAndGenObject() with created RowContainers. This method checks filterTag of each row in RowContainers and forwards joined row if they meet all filter conditions. For OuterJoin, checkAndGenObject() forwards non-matching rows if there is no matching row in RowContainer. The problem happens when there are multiple RowContainer for the same key and table. For example, suppose that there are two left RowContainers and one right RowContainer. If none of the row in two left RowContainers satisfies filter condition, then checkAndGenObject() will forward Left-Null row for each right row. Because checkAndGenObject() is called with each left RowContainer, there will be two duplicated Left-Null rows for every right row. In the case of MapJoinOperator, it always creates singleton RowContainer for big table. Therefore, it always produces duplicated non-matching rows. CommonMergeJoinOperator also creates multiple RowContainer for big table, whose size is hive.join.emit.interval. In the below experiment, I also set hive.join.shortcut.unmatched.rows=false, and hive.exec.reducers.max=1 to disable specialized algorithm for OuterJoin of 2 tables and force calling checkAndGenObject() before all rows with the same keys are gathered. I didn't observe this problem when using VectorMapJoinOperator, and I will inspect VectorMapJoinOperator whether we can reproduce the problem with it. I think the second problem is not limited to FullOuterJoin, but I couldn't find such query as of now. This will also be added to this issue if I can write a query that reproduces the second problem without FullOuterJoin. I also found that Hive returns wrong result for query2 even when I used VectorMapJoinOperator. I am still inspecting this problem and I will add an update on it when I find out the reason. Experiment: {code:java} Configuration set hive.optimize.shared.work=false; -- Std MapJoin set hive.auto.convert.join=true; set hive.vectorized.execution.enabled=false; -- Vec MapJoin set hive.auto.convert.join=true; set hive.vectorized.execution.enabled=true; -- MergeJoin set hive.auto.convert.join=false; set hive.vectorized.execution.enabled=false; set hive.join.shortcut.unmatched.rows=false; set hive.join.emit.interval=1; set hive.exec.reducers.max=1; Queries -- Query 1 DROP TABLE IF EXISTS a; CREATE TABLE a (key string, value string); INSERT INTO a VALUES (1, 1), (1, 2), (2, 1); SELECT * FROM a FULL OUTER JOIN a b ON a.key = b.key AND a.key < 0; -- Query 2 DROP TABLE IF EXISTS b; CREATE TABLE b (key string, value string); INSERT INTO b VALUES (1, 0), (1, 1); SELECT * FROM b FULL OUTE
[jira] [Created] (HIVE-27225) Speedup build by skipping SBOM generation by default
Stamatis Zampetakis created HIVE-27225: -- Summary: Speedup build by skipping SBOM generation by default Key: HIVE-27225 URL: https://issues.apache.org/jira/browse/HIVE-27225 Project: Hive Issue Type: Improvement Components: Build Infrastructure Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis A full build of Hive locally in my environment takes ~15 minutes. {noformat} mvn clean install -DskipTests -Pitests [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 14:15 min {noformat} Profiling the build shows that we are spending roughly 30% of CPU in org.cyclonedx.maven plugin which is used to generate SBOM artifacts (HIVE-26912). The SBOM generation does not need run in every single build and probably needs to be active only during the release build. To speed-up every-day builds I propose to activate the cyclonedx plugin only in the dist (release) profile. After this change, the default build drops from 14 minutes to 8. {noformat} mvn clean install -DskipTests -Pitests [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 08:19 min {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27224) Enhance drop table/partition command
Taraka Rama Rao Lethavadla created HIVE-27224: - Summary: Enhance drop table/partition command Key: HIVE-27224 URL: https://issues.apache.org/jira/browse/HIVE-27224 Project: Hive Issue Type: Improvement Components: Hive, Standalone Metastore Reporter: Taraka Rama Rao Lethavadla {*}Problem Statement{*}: If the table has a large number of partitions, then drop table command will take a lot of time to finish. To improve the command we have the following proposals * Perform all the queries(HMS->DB) in drop table in batches(not just partitions table) so that query will not fail throwing exceptions like transaction id not found or any other timeout issues as this is directly proportional to backend database performance * Display what action is happening as part of drop table, so that user will know what step is taking more time or how many steps completed so far. we should have loggers(DEBUG's at least) in clients to know how many partitions/batches being processed & current iterations to estimate approx. timeout for such large HMS operation. * support retry option, if for some reason drop table command fails performing some of the operations, the next time it is run, it should proceed with next operations instead of failing due to missing/stale entries -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27223) Show Compactions failing with NPE
Ayush Saxena created HIVE-27223: --- Summary: Show Compactions failing with NPE Key: HIVE-27223 URL: https://issues.apache.org/jira/browse/HIVE-27223 Project: Hive Issue Type: Bug Reporter: Ayush Saxena Assignee: Ayush Saxena {noformat} java.lang.NullPointerException: null at java.io.DataOutputStream.writeBytes(DataOutputStream.java:274) ~[?:?] at org.apache.hadoop.hive.ql.ddl.process.show.compactions.ShowCompactionsOperation.writeRow(ShowCompactionsOperation.java:135) at org.apache.hadoop.hive.ql.ddl.process.show.compactions.ShowCompactionsOperation.execute(ShowCompactionsOperation.java:57) at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:360) {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27222) New functionality to show compactions information for a specific table/partition of a given database
Taraka Rama Rao Lethavadla created HIVE-27222: - Summary: New functionality to show compactions information for a specific table/partition of a given database Key: HIVE-27222 URL: https://issues.apache.org/jira/browse/HIVE-27222 Project: Hive Issue Type: Improvement Components: Hive Reporter: Taraka Rama Rao Lethavadla As per the current implementation, show compactions command will list compaction details of all the partitions and tables of all the databases in a single output. If user happens to have 100 or 1000 or even more databases/tables/partitions, parsing the show compactions output to check details of specific table/partition will be difficult So the proposal is to support something like {noformat} show compactions `db`.`table`[.`partition`]{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27221) Backport of HIVE-25726: Upgrade velocity to 2.3 due to CVE-2020-13936
Apoorva Aggarwal created HIVE-27221: --- Summary: Backport of HIVE-25726: Upgrade velocity to 2.3 due to CVE-2020-13936 Key: HIVE-27221 URL: https://issues.apache.org/jira/browse/HIVE-27221 Project: Hive Issue Type: Sub-task Reporter: Apoorva Aggarwal Assignee: Apoorva Aggarwal Fix For: 3.2.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27220) Backport Upgrade commons,httpclient,jackson,jetty,log4j binaries from branch-3.1
Apoorva Aggarwal created HIVE-27220: --- Summary: Backport Upgrade commons,httpclient,jackson,jetty,log4j binaries from branch-3.1 Key: HIVE-27220 URL: https://issues.apache.org/jira/browse/HIVE-27220 Project: Hive Issue Type: Sub-task Reporter: Apoorva Aggarwal Fix For: 3.2.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27219) Backport of HIVE-25616: Hive-24741 backport to 3.1
Apoorva Aggarwal created HIVE-27219: --- Summary: Backport of HIVE-25616: Hive-24741 backport to 3.1 Key: HIVE-27219 URL: https://issues.apache.org/jira/browse/HIVE-27219 Project: Hive Issue Type: Sub-task Reporter: Apoorva Aggarwal Assignee: Apoorva Aggarwal Fix For: 3.2.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27218) Hive-3 set hive.materializedview.rewriting default to false
Yi Zhang created HIVE-27218: --- Summary: Hive-3 set hive.materializedview.rewriting default to false Key: HIVE-27218 URL: https://issues.apache.org/jira/browse/HIVE-27218 Project: Hive Issue Type: Improvement Components: Hive Affects Versions: 3.1.3, 3.1.2, 3.1.0 Reporter: Yi Zhang https://issues.apache.org/jira/browse/HIVE-19973 switched hive.materializedview.rewriting default from false to true, however user observed high latency at query compilation as they have large amount of databases (5k), each call to a remote metastore DB adds up the latency to minutes, each query compilation time is high. as hive-4 have improvements in HIVE-21631 and HIVE-21344 backport is unlikely, suggest to turn this false by default in hive-3. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27217) addWriteNotificationLogInBatch can silently fail
John Sherman created HIVE-27217: --- Summary: addWriteNotificationLogInBatch can silently fail Key: HIVE-27217 URL: https://issues.apache.org/jira/browse/HIVE-27217 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: John Sherman Assignee: John Sherman Debugging an issue, I noticed that addWriteNotificationLogInBatch in Hive.java can fail silently if the TApplicationException thrown is not TApplicationException.UNKNOWN_METHOD or TApplicationException.WRONG_METHOD_NAME. https://github.com/apache/hive/blob/40a7d689e51d02fa9b324553fd1810d0ad043080/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L3359-L3381 Failures to write in the notification log can be very difficult to debug, we should rethrow the exception so that the failure is very visible. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27216) Upgrade postgresql to 42.5.1 from 9.x
Aman Raj created HIVE-27216: --- Summary: Upgrade postgresql to 42.5.1 from 9.x Key: HIVE-27216 URL: https://issues.apache.org/jira/browse/HIVE-27216 Project: Hive Issue Type: Sub-task Reporter: Aman Raj Assignee: Aman Raj This ticket involves partial cherry pick of #HIVE-23965 and complete cherry picks of HIVE-26253 and HIVE-26914 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27215) On DB with defaultTableType property, create external table with transactional property as true creates a managed table
Venugopal Reddy K created HIVE-27215: Summary: On DB with defaultTableType property, create external table with transactional property as true creates a managed table Key: HIVE-27215 URL: https://issues.apache.org/jira/browse/HIVE-27215 Project: Hive Issue Type: Bug Reporter: Venugopal Reddy K *Description:* On Database created with defaultTableType property, create external table with transactional property as true creates a managed table. *Steps to reproduce:* Create database with db property defaultTableType either as external or acid. And create external table with transactional property set to true or with transactional property set to true and transactional_properties=insert_only. Table is created as managed table. {code:java} 0: jdbc:hive2://localhost:1> create database mydbext with dbproperties('defaultTableType'='external'); 0: jdbc:hive2://localhost:1> use mydbext; 0: jdbc:hive2://localhost:1> create external table test_ext_txn(i string) stored as orc tblproperties('transactional'='true'); 0: jdbc:hive2://localhost:1> desc formatted test_ext_txn; +---+++ | col_name | data_type | comment | +---+++ | i | string | | | | NULL | NULL | | # Detailed Table Information | NULL | NULL | | Database: | mydbext | NULL | | OwnerType: | USER | NULL | | Owner: | hive | NULL | | CreateTime: | Mon Apr 03 23:24:07 IST 2023 | NULL | | LastAccessTime: | UNKNOWN | NULL | | Retention: | 0 | NULL | | Location: | file:/tmp/warehouse/managed/mydbext.db/test_ext_txn | NULL | | Table Type: | MANAGED_TABLE | NULL | | Table Parameters: | NULL | NULL | | | COLUMN_STATS_ACCURATE | {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"i\":\"true\"}} | | | bucketing_version | 2 | | | numFiles | 0 | | | numRows | 0 | | | rawDataSize | 0 | | | totalSize | 0 | | | transactional | true | | | transactional_properties | default | | | transient_lastDdlTime | 168057 | | | NULL | NULL | | # Storage Information | NULL | NULL
[jira] [Created] (HIVE-27214) Backport of HIVE-24414: Backport HIVE-19662 to branch-3.1
Diksha created HIVE-27214: - Summary: Backport of HIVE-24414: Backport HIVE-19662 to branch-3.1 Key: HIVE-27214 URL: https://issues.apache.org/jira/browse/HIVE-27214 Project: Hive Issue Type: Sub-task Reporter: Diksha -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27213) parquet logical decimal type to INT32 is not working while compute statastics
KIRTI RUGE created HIVE-27213: - Summary: parquet logical decimal type to INT32 is not working while compute statastics Key: HIVE-27213 URL: https://issues.apache.org/jira/browse/HIVE-27213 Project: Hive Issue Type: Improvement Reporter: KIRTI RUGE Attachments: test.parquet [^test.parquet] Steps to reproduce: dfs ${system:test.dfs.mkdir} hdfs:///tmp/dwxtest/ws_sold_date_sk=2451825; dfs -copyFromLocal ../../data/files/dwxtest.parquet hdfs:///tmp/dwxtest/ws_sold_date_sk=2451825; dfs -ls hdfs:///tmp/dwxtest/ws_sold_date_sk=2451825/; CREATE EXTERNAL TABLE `web_sales`( `ws_sold_time_sk` int, `ws_ship_date_sk` int, `ws_item_sk` int, `ws_bill_customer_sk` int, `ws_bill_cdemo_sk` int, `ws_bill_hdemo_sk` int, `ws_bill_addr_sk` int, `ws_ship_customer_sk` int, `ws_ship_cdemo_sk` int, `ws_ship_hdemo_sk` int, `ws_ship_addr_sk` int, `ws_web_page_sk` int, `ws_web_site_sk` int, `ws_ship_mode_sk` int, `ws_warehouse_sk` int, `ws_promo_sk` int, `ws_order_number` bigint, `ws_quantity` int, `ws_wholesale_cost` decimal(7,2), `ws_list_price` decimal(7,2), `ws_sales_price` decimal(7,2), `ws_ext_discount_amt` decimal(7,2), `ws_ext_sales_price` decimal(7,2), `ws_ext_wholesale_cost` decimal(7,2), `ws_ext_list_price` decimal(7,2), `ws_ext_tax` decimal(7,2), `ws_coupon_amt` decimal(7,2), `ws_ext_ship_cost` decimal(7,2), `ws_net_paid` decimal(7,2), `ws_net_paid_inc_tax` decimal(7,2), `ws_net_paid_inc_ship` decimal(7,2), `ws_net_paid_inc_ship_tax` decimal(7,2), `ws_net_profit` decimal(7,2)) PARTITIONED BY ( `ws_sold_date_sk` int) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS PARQUET LOCATION 'hdfs:///tmp/dwxtest/'; MSCK REPAIR TABLE web_sales; analyze table web_sales compute statistics for columns; Error Stack: {noformat} analyze table web_sales compute statistics for columns; ], TaskAttempt 3 failed, info=[Error: Error while running task ( failure ) : attempt_1678779198717__2_00_52_3:java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file s3a://nfqe-tpcds-test/spark-tpcds/sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/web_sales/ws_sold_date_sk=2451825/part-00796-788bef86-2748-4e21-a464-b34c7e646c94-cfcafd2c-2abd-4067-8aea-f58cb1021b35.c000.snappy.parquet at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:351) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:280) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:84) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:70) at java.base/java.security.AccessController.doPrivileged(Native Method) at java.base/javax.security.auth.Subject.doAs(Subject.java:423) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:70) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:40) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: java.lang.RuntimeException: java.io.IOException: org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file s3a://nfqe-tpcds-test/spark-tpcds/sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/web_sales/ws_sold_date_sk=2451825/part-00796-788bef86-2748-4e21-a464-b34c7e646c94-cfcafd2c-2abd-4067-8aea-f58cb1021b35.c000.snappy.parquet at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111) at org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:164) at org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MR
[jira] [Created] (HIVE-27212) Backport of HIVE-24316: Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1
Diksha created HIVE-27212: - Summary: Backport of HIVE-24316: Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1 Key: HIVE-27212 URL: https://issues.apache.org/jira/browse/HIVE-27212 Project: Hive Issue Type: Task Reporter: Diksha Backport of HIVE-24316: Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27211) Backport HIVE-22453: Describe table unnecessarily fetches partitions
Nikhil Gupta created HIVE-27211: --- Summary: Backport HIVE-22453: Describe table unnecessarily fetches partitions Key: HIVE-27211 URL: https://issues.apache.org/jira/browse/HIVE-27211 Project: Hive Issue Type: Sub-task Affects Versions: 3.1.2 Reporter: Nikhil Gupta Fix For: 3.2.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27210) Backport HIVE-23338: Bump jackson version to 2.10.0
Nikhil Gupta created HIVE-27210: --- Summary: Backport HIVE-23338: Bump jackson version to 2.10.0 Key: HIVE-27210 URL: https://issues.apache.org/jira/browse/HIVE-27210 Project: Hive Issue Type: Sub-task Reporter: Nikhil Gupta Fix For: 3.2.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27209) Backport HIVE-24569: LLAP daemon leaks file descriptors/log4j appenders
Nikhil Gupta created HIVE-27209: --- Summary: Backport HIVE-24569: LLAP daemon leaks file descriptors/log4j appenders Key: HIVE-27209 URL: https://issues.apache.org/jira/browse/HIVE-27209 Project: Hive Issue Type: Sub-task Components: llap Affects Versions: 2.2.0 Reporter: Nikhil Gupta -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27208) Iceberg: Add support for rename table
Ayush Saxena created HIVE-27208: --- Summary: Iceberg: Add support for rename table Key: HIVE-27208 URL: https://issues.apache.org/jira/browse/HIVE-27208 Project: Hive Issue Type: Improvement Reporter: Ayush Saxena Assignee: Ayush Saxena Add support for renaming iceberg tables. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27207) Backport of HIVE-26530: HS2 OOM-OperationManager.queryIdOperation does not properly clean up multiple queryIds
Aman Raj created HIVE-27207: --- Summary: Backport of HIVE-26530: HS2 OOM-OperationManager.queryIdOperation does not properly clean up multiple queryIds Key: HIVE-27207 URL: https://issues.apache.org/jira/browse/HIVE-27207 Project: Hive Issue Type: Sub-task Reporter: Aman Raj Assignee: Aman Raj HIVE-26530 was already part of Hive-3.1.3 release so it should be backported to branch-3 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27206) Backport of HIVE-20179
Aman Raj created HIVE-27206: --- Summary: Backport of HIVE-20179 Key: HIVE-27206 URL: https://issues.apache.org/jira/browse/HIVE-27206 Project: Hive Issue Type: Sub-task Reporter: Aman Raj Assignee: Aman Raj HIVE-20179 was already part of Hive 3.1.3 release so it makes sense to backport this to branch-3 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27205) Update jackson-databind for CVE fix for CVE-2022-42003
Diksha created HIVE-27205: - Summary: Update jackson-databind for CVE fix for CVE-2022-42003 Key: HIVE-27205 URL: https://issues.apache.org/jira/browse/HIVE-27205 Project: Hive Issue Type: Task Reporter: Diksha Assignee: Diksha Update jackson-databind for CVE fix for CVE-2022-42003 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27204) Upgrade jettison to 1.5.2 to fix CVE-2022-45685
Aman Raj created HIVE-27204: --- Summary: Upgrade jettison to 1.5.2 to fix CVE-2022-45685 Key: HIVE-27204 URL: https://issues.apache.org/jira/browse/HIVE-27204 Project: Hive Issue Type: Bug Reporter: Aman Raj Assignee: Aman Raj Upgrade jettison to 1.5.2 to fix CVE-2022-45685 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27203) Add compaction pending Qtest for Insert-only, Partitioned, Clustered ACID, and combination Tables
Akshat Mathur created HIVE-27203: Summary: Add compaction pending Qtest for Insert-only, Partitioned, Clustered ACID, and combination Tables Key: HIVE-27203 URL: https://issues.apache.org/jira/browse/HIVE-27203 Project: Hive Issue Type: Test Reporter: Akshat Mathur Assignee: Akshat Mathur Improve Qtest Coverage for Compaction use cases for ACID Tables: # Partitioned Tables( Major & Minor ) # Insert-Only Clustered( Major & Minor ) # Insert-Only Partitioned( Major & Minor ) # Insert-Only Clustered and Partitioned( Major & Minor ) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27202) Disable flaky test TestJdbcWithMiniLlapRow#testComplexQuery
Vihang Karajgaonkar created HIVE-27202: -- Summary: Disable flaky test TestJdbcWithMiniLlapRow#testComplexQuery Key: HIVE-27202 URL: https://issues.apache.org/jira/browse/HIVE-27202 Project: Hive Issue Type: Test Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar TestJdbcWithMiniLlapRow#testComplexQuery is flaky and should be disabled. http://ci.hive.apache.org/job/hive-flaky-check/634/ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27201) Inconsistency between session Hive and thread-local Hive may cause HS2 deadlock
Zhihua Deng created HIVE-27201: -- Summary: Inconsistency between session Hive and thread-local Hive may cause HS2 deadlock Key: HIVE-27201 URL: https://issues.apache.org/jira/browse/HIVE-27201 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Zhihua Deng Assignee: Zhihua Deng The HiveServer2’s server handler can switch to process the operation from other session, in such case, the Hive cached in ThreadLocal is not the same as the Hive in SessionState, and can be referenced by another session. If the two handlers swap their sessions to process the DatabaseMetaData request, and the HiveMetastoreClientFactory obtains the Hive via Hive.get(), then there is a chance that the deadlock can happen. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27200) Backport HIVE-24928 In case of non-native tables use basic statistics from HiveStorageHandler
Yi Zhang created HIVE-27200: --- Summary: Backport HIVE-24928 In case of non-native tables use basic statistics from HiveStorageHandler Key: HIVE-27200 URL: https://issues.apache.org/jira/browse/HIVE-27200 Project: Hive Issue Type: Improvement Components: StorageHandler Reporter: Yi Zhang This is to backport HIVE-24928 so that for HiveStorageHandler table 'ANALYZE TABLE ... COMPUTE STATISTICS' can use storagehandler to provide basic stats with BasicStatsNoJobTask -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27199) Read TIMESTAMP WITH LOCAL TIME ZONE columns from text files using custom formats
Stamatis Zampetakis created HIVE-27199: -- Summary: Read TIMESTAMP WITH LOCAL TIME ZONE columns from text files using custom formats Key: HIVE-27199 URL: https://issues.apache.org/jira/browse/HIVE-27199 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 4.0.0-alpha-2 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Timestamp values come in many flavors and formats and there is no single representation that can satisfy everyone especially when such values are stored in plain text/csv files. HIVE-9298, added a special SERDE property, {{{}timestamp.formats{}}}, that allows to provide custom timestamp patterns to parse correctly TIMESTAMP values coming from files. However, when the column type is TIMESTAMP WITH LOCAL TIME ZONE (LTZ) it is not possible to use a custom pattern thus when the built-in Hive parser does not match the expected format a NULL value is returned. Consider a text file, F1, with the following values: {noformat} 2016-05-03 12:26:34 2016-05-03T12:26:34 {noformat} and a table with a column declared as LTZ. {code:sql} CREATE TABLE ts_table (ts TIMESTAMP WITH LOCAL TIME ZONE); LOAD DATA LOCAL INPATH './F1' INTO TABLE ts_table; SELECT * FROM ts_table; 2016-05-03 12:26:34.0 US/Pacific NULL {code} In order to give more flexibility to the users relying on the TIMESTAMP WITH LOCAL TIME ZONE datatype and also align the behavior with the TIMESTAMP type this JIRA aims to reuse the {{timestamp.formats}} property for both TIMESTAMP types. The work here focuses exclusively on simple text files but the same could be done for other SERDE such as JSON etc. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27198) Delete directly aborted transactions instead of select and loading ids
Mahesh Raju Somalaraju created HIVE-27198: - Summary: Delete directly aborted transactions instead of select and loading ids Key: HIVE-27198 URL: https://issues.apache.org/jira/browse/HIVE-27198 Project: Hive Issue Type: Improvement Reporter: Mahesh Raju Somalaraju Assignee: Mahesh Raju Somalaraju in cleaning the aborted transaction , we can directly deletes the txns instead of selecting and process. method name: cleanEmptyAbortedAndCommittedTxns Code: String s = "SELECT \"TXN_ID\" FROM \"TXNS\" WHERE " + "\"TXN_ID\" NOT IN (SELECT \"TC_TXNID\" FROM \"TXN_COMPONENTS\") AND " + " (\"TXN_STATE\" = " + TxnStatus.ABORTED + " OR \"TXN_STATE\" = " + TxnStatus.COMMITTED + ") AND " + " \"TXN_ID\" < " + lowWaterMark; proposed code: String s = "DELETE \"TXN_ID\" FROM \"TXNS\" WHERE " + "\"TXN_ID\" NOT IN (SELECT \"TC_TXNID\" FROM \"TXN_COMPONENTS\") AND " + " (\"TXN_STATE\" = " + TxnStatus.ABORTED + " OR \"TXN_STATE\" = " + TxnStatus.COMMITTED + ") AND " + " \"TXN_ID\" < " + lowWaterMark; the select needs to be eliminated and the delete should work with the where clause instead of the built in clause we can see no reason for loading the ids into memory and then generate a huge sql Bathcing is also not necessary here, we can deletes the records directly -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27197) Iceberg: Support Iceberg version travel by reference name
zhangbutao created HIVE-27197: - Summary: Iceberg: Support Iceberg version travel by reference name Key: HIVE-27197 URL: https://issues.apache.org/jira/browse/HIVE-27197 Project: Hive Issue Type: Improvement Components: Iceberg integration Reporter: zhangbutao This ticket is inspired by https://github.com/apache/iceberg/pull/6575 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27196) Upgrade jettision version to 1.5.4 due to CVEs
Mahesh Raju Somalaraju created HIVE-27196: - Summary: Upgrade jettision version to 1.5.4 due to CVEs Key: HIVE-27196 URL: https://issues.apache.org/jira/browse/HIVE-27196 Project: Hive Issue Type: Improvement Reporter: Mahesh Raju Somalaraju Assignee: Mahesh Raju Somalaraju [CVE-2023-1436|https://www.cve.org/CVERecord?id=CVE-2023-1436] [CWE-400|https://cwe.mitre.org/data/definitions/400.html] Need to update jettison version to 1.5.4 version due to above CVE issues. version 1.5.4 has no CVE issues. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27195) Drop table if Exists . fails during authorization for temporary tables
Riju Trivedi created HIVE-27195: --- Summary: Drop table if Exists . fails during authorization for temporary tables Key: HIVE-27195 URL: https://issues.apache.org/jira/browse/HIVE-27195 Project: Hive Issue Type: Bug Reporter: Riju Trivedi Assignee: Riju Trivedi https://issues.apache.org/jira/browse/HIVE-20051 handles skipping authorization for temporary tables. But still, the drop table if Exists fails with HiveAccessControlException. Steps to Repro: {code:java} use test; CREATE TEMPORARY TABLE temp_table (id int); drop table if exists test.temp_table; Error: Error while compiling statement: FAILED: HiveAccessControlException Permission denied: user [rtrivedi] does not have [DROP] privilege on [test/temp_table] (state=42000,code=4) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27194) Support expression in limit and offset clauses
vamshi kolanu created HIVE-27194: Summary: Support expression in limit and offset clauses Key: HIVE-27194 URL: https://issues.apache.org/jira/browse/HIVE-27194 Project: Hive Issue Type: Task Components: Hive Reporter: vamshi kolanu Assignee: vamshi kolanu As part of this task, support expressions in both limit and offset clauses. Currently, these clauses are only supporting integers. For example: The following expressions will be supported after this change. 1. select key from (select * from src limit (1+2*3)) q1; 2. select key from (select * from src limit (1+2*3) offset (3*4*5)) q1; -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27193) Database names starting with '@' cause error during ALTER/DROP table.
Oliver Schiller created HIVE-27193: -- Summary: Database names starting with '@' cause error during ALTER/DROP table. Key: HIVE-27193 URL: https://issues.apache.org/jira/browse/HIVE-27193 Project: Hive Issue Type: Bug Components: Metastore, Standalone Metastore Affects Versions: 4.0.0-alpha-2 Reporter: Oliver Schiller The creation of database that start with '@' is supported: {code:java} create database `@test`;{code} The creation of a table in this database works: {code:java} create table `@test`.testtable (c1 integer);{code} However, dropping or altering the table result in an error: {code:java} drop table `@test`.testtable; FAILED: SemanticException Unable to fetch table testtable. @test is prepended with the catalog marker but does not appear to have a catalog name in it Error: Error while compiling statement: FAILED: SemanticException Unable to fetch table testtable. @test is prepended with the catalog marker but does not appear to have a catalog name in it (state=42000,code=4) alter table `@test`.testtable add columns (c2 integer); FAILED: SemanticException Unable to fetch table testtable. @test is prepended with the catalog marker but does not appear to have a catalog name in it Error: Error while compiling statement: FAILED: SemanticException Unable to fetch table testtable. @test is prepended with the catalog marker but does not appear to have a catalog name in it (state=42000,code=4) {code} Relevant snippet of stack trace: {{}} {code:java} org.apache.hadoop.hive.metastore.api.MetaException: @TEST is prepended with the catalog marker but does not appear to have a catalog name in it at org.apache.hadoop.hive.metastore.utils.MetaStoreUtils.parseDbName(MetaStoreUtils.java:1031 at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getTempTable(SessionHiveMetaStoreClient.java:651) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getTable(SessionHiveMetaStoreClient.java:279) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getTable(SessionHiveMetaStoreClient.java:273) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getTable(SessionHiveMetaStoreClient.java:258) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropTable(HiveMetaStoreClient.java:1982)org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropTable(HiveMetaStoreClient.java:1957) ...{code} {{}} My suspicion is that this caused by the implementation of getTempTable and how it is called. The method getTempTable calls parseDbName assuming that the given dbname might be prefixed with a catalog name. I'm wondering whether this is correct at this layer. From poking a bit around, it appears to me that the catalog name is typically prepended when making the actual thrift call. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27192) Use normal import instead of shaded import in TestSchemaToolCatalogOps.java
Zoltán Rátkai created HIVE-27192: Summary: Use normal import instead of shaded import in TestSchemaToolCatalogOps.java Key: HIVE-27192 URL: https://issues.apache.org/jira/browse/HIVE-27192 Project: Hive Issue Type: Improvement Reporter: Zoltán Rátkai Assignee: Zoltán Rátkai -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27191) Cleaner is blocked by orphaned entries in MHL table
Simhadri Govindappa created HIVE-27191: -- Summary: Cleaner is blocked by orphaned entries in MHL table Key: HIVE-27191 URL: https://issues.apache.org/jira/browse/HIVE-27191 Project: Hive Issue Type: Improvement Reporter: Simhadri Govindappa Assignee: Simhadri Govindappa The following mhl_txnids do not exist in TXNS table, as a result, the cleaner gets blocked and many entries are stuck in the ready-for-cleaning state. The cleaner should periodically check for such entries and remove them from MHL_TABLE to prevent the cleaner from being blocked. {noformat} postgres=# select mhl_txnid from min_history_level where not exists (select 1 from txns where txn_id = mhl_txnid); mhl_txnid --- 43708080 43708088 43679962 43680464 43680352 43680392 43680424 43680436 43680471 43680475 43680483 43622677 43708083 43708084 43678157 43680482 43680484 43622745 43622750 43706829 43707261 (21 rows){noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27190) Implement col stats cache for hive iceberg table
Simhadri Govindappa created HIVE-27190: -- Summary: Implement col stats cache for hive iceberg table Key: HIVE-27190 URL: https://issues.apache.org/jira/browse/HIVE-27190 Project: Hive Issue Type: Improvement Reporter: Simhadri Govindappa -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27189) Remove duplicate info log in Hive.isSubDIr
shuyouZZ created HIVE-27189: --- Summary: Remove duplicate info log in Hive.isSubDIr Key: HIVE-27189 URL: https://issues.apache.org/jira/browse/HIVE-27189 Project: Hive Issue Type: Improvement Reporter: shuyouZZ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27188) Explore usage of FilterApi.in(C column, Set values) in Parquet instead of nested OR
Rajesh Balamohan created HIVE-27188: --- Summary: Explore usage of FilterApi.in(C column, Set values) in Parquet instead of nested OR Key: HIVE-27188 URL: https://issues.apache.org/jira/browse/HIVE-27188 Project: Hive Issue Type: Improvement Reporter: Rajesh Balamohan Following query can throw stackoverflow exception with "Xss256K". Currently it generates nested OR filter [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/FilterPredicateLeafBuilder.java#L43-L52] Instead, need to explore the possibility of using {color:#de350b}FilterApi.in(C column, Set values) {color:#172b4d}in parquet{color}{color} {noformat} drop table if exists test; create external table test (i int) stored as parquet; insert into test values (1),(2),(3); select count(*) from test where i in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243); {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27187) Incremental rebuild of materialized view stored by iceberg
Krisztian Kasa created HIVE-27187: - Summary: Incremental rebuild of materialized view stored by iceberg Key: HIVE-27187 URL: https://issues.apache.org/jira/browse/HIVE-27187 Project: Hive Issue Type: Improvement Components: Iceberg integration, Materialized views Reporter: Krisztian Kasa Assignee: Krisztian Kasa Currently incremental rebuild of materialized view stored by iceberg which definition query contains aggregate operator is transformed to an insert overwrite statement which contains a union operator if the source tables contains insert operations only. One branch of the union scans the view the other produces the delta. This can be improved further: transform the statement to a multi insert statement representing a merge statement to insert new aggregations and update existing. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27186) A persistent property store
Henri Biestro created HIVE-27186: Summary: A persistent property store Key: HIVE-27186 URL: https://issues.apache.org/jira/browse/HIVE-27186 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 4.0.0-alpha-2 Reporter: Henri Biestro WHAT A persistent property store usable as a support facility for any metadata augmentation feature. WHY When adding new meta-data oriented features, we usually need to persist information linking the feature data and the HiveMetaStore objects it applies to. Any information related to a database, a table or the cluster - like statistics for example or any operational data state or data (think rolling backup) - fall in this use-case. Typically, accommodating such a feature requires modifying the Metastore database schema by adding or altering a table. It also usually implies modifying the thrift APIs to expose such meta-data to consumers. The proposed feature wants to solve the persistence and query/transport for these types of use-cases by exposing a 'key/(meta)value' store exposed as a property system. HOW A property-value model is the simple and generic exposed API. To provision for several usage scenarios, the model entry point is a 'namespace' that qualifies the feature-component property manager. For example, 'stats' could be the namespace for all properties related to the 'statistics' feature. The namespace identifies a manager that handles property-groups persisted as property-maps. For instance, all statistics pertaining to a given table would be collocated in the same property-group. As such, all properties (say number of 'unique_values' per columns) for a given HMS table 'relation0' would all be stored and persisted in the same property-map instance. Property-maps may be decorated by an (optional) schema that may declare the name and value-type of allowed properties (and their optional default value). Each property is addressed by a name, a path uniquely identifying the property in a given property map. The manager also handles transforming property-map names to the property-map keys used to persist them in the DB. The API provides inserting/updating properties in bulk transactionally. It also provides selection/projection to help reduce the volume of exchange between client/server; selection can use (JEXL expression) predicates to filter maps. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27185) Iceberg: Cache iceberg table while loading for stats
Ayush Saxena created HIVE-27185: --- Summary: Iceberg: Cache iceberg table while loading for stats Key: HIVE-27185 URL: https://issues.apache.org/jira/browse/HIVE-27185 Project: Hive Issue Type: Improvement Reporter: Ayush Saxena Assignee: Ayush Saxena Presently iceberg for stats loads the iceberg table multiple times for stats via different routes. Cache it to avoid reading/loading the iceberg table multiple times. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27184) Add class name profiling option in ProfileServlet
Rajesh Balamohan created HIVE-27184: --- Summary: Add class name profiling option in ProfileServlet Key: HIVE-27184 URL: https://issues.apache.org/jira/browse/HIVE-27184 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Rajesh Balamohan With async-profiler "-e classame.method", it is possible to profile specific events. Currently profileServlet supports events like cpu, alloc, lock etc. It will be good to enhance to support method name profiling as well. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27183) Iceberg: Table information is loaded multiple times
Rajesh Balamohan created HIVE-27183: --- Summary: Iceberg: Table information is loaded multiple times Key: HIVE-27183 URL: https://issues.apache.org/jira/browse/HIVE-27183 Project: Hive Issue Type: Improvement Reporter: Rajesh Balamohan HMS::getTable invokes "HiveIcebergMetaHook::postGetTable" which internally loads iceberg table again. If this isn't needed or needed only for show-create-table, do not load the table again. {noformat} at jdk.internal.misc.Unsafe.park(java.base@11.0.18/Native Method) - parking to wait for <0x00066f84eef0> (a java.util.concurrent.CompletableFuture$Signaller) at java.util.concurrent.locks.LockSupport.park(java.base@11.0.18/LockSupport.java:194) at java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.18/CompletableFuture.java:1796) at java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.18/ForkJoinPool.java:3128) at java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.18/CompletableFuture.java:1823) at java.util.concurrent.CompletableFuture.get(java.base@11.0.18/CompletableFuture.java:1998) at org.apache.hadoop.util.functional.FutureIO.awaitFuture(FutureIO.java:77) at org.apache.iceberg.hadoop.HadoopInputFile.newStream(HadoopInputFile.java:196) at org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:263) at org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:258) at org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$0(BaseMetastoreTableOperations.java:177) at org.apache.iceberg.BaseMetastoreTableOperations$$Lambda$609/0x000840e18040.apply(Unknown Source) at org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$1(BaseMetastoreTableOperations.java:191) at org.apache.iceberg.BaseMetastoreTableOperations$$Lambda$610/0x000840e18440.run(Unknown Source) at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:404) at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:214) at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:198) at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:190) at org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:191) at org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:176) at org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:171) at org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:153) at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:96) at org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:79) at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:44) at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:115) at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:105) at org.apache.iceberg.mr.hive.IcebergTableUtil.lambda$getTable$1(IcebergTableUtil.java:99) at org.apache.iceberg.mr.hive.IcebergTableUtil$$Lambda$552/0x000840d59840.apply(Unknown Source) at org.apache.iceberg.mr.hive.IcebergTableUtil.lambda$getTable$4(IcebergTableUtil.java:111) at org.apache.iceberg.mr.hive.IcebergTableUtil$$Lambda$557/0x000840d58c40.get(Unknown Source) at java.util.Optional.orElseGet(java.base@11.0.18/Optional.java:369) at org.apache.iceberg.mr.hive.IcebergTableUtil.getTable(IcebergTableUtil.java:108) at org.apache.iceberg.mr.hive.IcebergTableUtil.getTable(IcebergTableUtil.java:69) at org.apache.iceberg.mr.hive.IcebergTableUtil.getTable(IcebergTableUtil.java:73) at org.apache.iceberg.mr.hive.HiveIcebergMetaHook.postGetTable(HiveIcebergMetaHook.java:931) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.executePostGetTableHook(HiveMetaStoreClient.java:2638) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:2624) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getTable(SessionHiveMetaStoreClient.java:267) at jdk.internal.reflect.GeneratedMethodAccessor137.invoke(Unknown Source) at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.18/DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(java.base@11.0.18/Method.java:566) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:216) at com.sun.proxy.$Proxy56.getTable(Unknown Source) at jdk.internal.reflect.GeneratedMethodAccessor137.invoke(Unknown Source) at jdk.internal.reflect.DelegatingMethodAccessorImpl.inv
[jira] [Created] (HIVE-27182) tez_union_with_udf.q with TestMiniTezCliDriver is flaky
Ayush Saxena created HIVE-27182: --- Summary: tez_union_with_udf.q with TestMiniTezCliDriver is flaky Key: HIVE-27182 URL: https://issues.apache.org/jira/browse/HIVE-27182 Project: Hive Issue Type: Improvement Reporter: Ayush Saxena Looks like memory issue: {noformat} < Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.OutOfMemoryError: GC overhead limit exceeded < Serialization trace: < genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc) < colExprMap (org.apache.hadoop.hive.ql.plan.SelectDesc) < conf (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator) < childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorLimitOperator) < childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27181) Remove RegexSerDe from hive-contrib, Upgrade should update changed FQN for RegexSerDe in HMS DB
Riju Trivedi created HIVE-27181: --- Summary: Remove RegexSerDe from hive-contrib, Upgrade should update changed FQN for RegexSerDe in HMS DB Key: HIVE-27181 URL: https://issues.apache.org/jira/browse/HIVE-27181 Project: Hive Issue Type: Sub-task Components: Hive Reporter: Riju Trivedi -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27180) Remove JsonSerde from hcatalog, Upgrade should update changed FQN for JsonSerDe in HMS DB
Riju Trivedi created HIVE-27180: --- Summary: Remove JsonSerde from hcatalog, Upgrade should update changed FQN for JsonSerDe in HMS DB Key: HIVE-27180 URL: https://issues.apache.org/jira/browse/HIVE-27180 Project: Hive Issue Type: Sub-task Components: Hive Reporter: Riju Trivedi Assignee: Riju Trivedi -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27179) HS2 WebUI throws NPE when JspFactory loaded from jetty-runner
Zhihua Deng created HIVE-27179: -- Summary: HS2 WebUI throws NPE when JspFactory loaded from jetty-runner Key: HIVE-27179 URL: https://issues.apache.org/jira/browse/HIVE-27179 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Zhihua Deng In HIVE-17088{*},{*} we resolved a NPE thrown from HS2 WebUI by introducing javax.servlet.jsp-api. It works as expected when the javax.servlet.jsp-api jar prevails jetty-runner jar, but things can be different in some environments, it still throws NPE when opening the HS2 web: {noformat} java.lang.NullPointerException at org.apache.hive.generated.hiveserver2.hiveserver2_jsp._jspService(hiveserver2_jsp.java:286) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:71) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder$NotAsync.service(ServletHolder.java:1443) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:791) at org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1626) ...{noformat} The jetty-runner JspFactory.getDefaultFactory() just returns null. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27178) Backport of HIVE-23321 to branch-3
Aman Raj created HIVE-27178: --- Summary: Backport of HIVE-23321 to branch-3 Key: HIVE-27178 URL: https://issues.apache.org/jira/browse/HIVE-27178 Project: Hive Issue Type: Sub-task Reporter: Aman Raj Assignee: Aman Raj Current branch-3 fails with the diff in select count(*) from skewed_string_list and select count(*) from skewed_string_list_values. Jenkins run : [jenkins / hive-precommit / PR-4156 / #1 (apache.org)|http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4156/1/tests/] Diff : Client Execution succeeded but contained differences (error code = 1) after executing sysdb.q 3740d3739 < hdfs://### HDFS PATH ### default public ROLE 4036c4035 < 3 --- > 6 4045c4044 < 3 --- > 6 This ticket tries to fix this diff. Please read the description of this ticket for the exact reason. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27177) Add alter table...Convert to Iceberg command
Ayush Saxena created HIVE-27177: --- Summary: Add alter table...Convert to Iceberg command Key: HIVE-27177 URL: https://issues.apache.org/jira/browse/HIVE-27177 Project: Hive Issue Type: Improvement Reporter: Ayush Saxena Assignee: Ayush Saxena Add an alter table convert to Iceberg [TBLPROPERTIES('','')] to convert exiting external tables to iceberg tables -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27176) EXPLAIN SKEW
László Bodor created HIVE-27176: --- Summary: EXPLAIN SKEW Key: HIVE-27176 URL: https://issues.apache.org/jira/browse/HIVE-27176 Project: Hive Issue Type: Improvement Reporter: László Bodor Thinking about a new explain feature, which is actually not an explain, instead a set of analytical queries: considering a very complicated and large SQL statement (this below is a simple one, just for example's sake): {code} SELECT a FROM (SELECT b ... JOIN c on b.x = c.y) d JOIN e ON d.v = e.w {code} EXPLAIN skew should run a query like: {code} SELECT "b", "x", x, count distinct(b.x) as count order by count desc limit 50 UNION ALL SELECT "c", "y", y, count distinct(c.y) as count order by count desc limit 50 UNION ALL SELECT "d", "v", v count distinct(d.v) as count order by count desc limit 50 UNION ALL SELECT "e", "w", w, count distinct(e.w) as count order by count desc limit 50 {code} collecting some cardinality info about all the join columns found in the query, so result might be like: {code} table_name column_name column_value count b "x" x_skew_value1 100431234 b "x" x_skew_value2 234 c "y" y_skew_value1 35 c "y" x_skew_value2 45 c "y" x_skew_value3 42 ... {code} this doesn't solve the problem, instead shows data skew immediately for further analysis -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27175) Fix TestJdbcDriver2#testSelectExecAsync2
Vihang Karajgaonkar created HIVE-27175: -- Summary: Fix TestJdbcDriver2#testSelectExecAsync2 Key: HIVE-27175 URL: https://issues.apache.org/jira/browse/HIVE-27175 Project: Hive Issue Type: Sub-task Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar TestJdbcDriver2#testSelectExecAsync2 is failing on branch-3. We need to backport HIVE-20897 to fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27174) Disable sysdb.q test
Aman Raj created HIVE-27174: --- Summary: Disable sysdb.q test Key: HIVE-27174 URL: https://issues.apache.org/jira/browse/HIVE-27174 Project: Hive Issue Type: Sub-task Reporter: Aman Raj Assignee: Aman Raj h3. What changes were proposed in this pull request? Disabled sysdb.q test. The test is failing because of diff in BASIC_COLUMN_STATS json string. Client Execution succeeded but contained differences (error code = 1) after executing sysdb.q 3803,3807c3803,3807 < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@125b285b < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@471246f3 < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@57c013 < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@59f1d7ac < COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@71a0 — {quote}COLUMN_STATS_ACCURATE \{"BASIC_STATS":"true","COLUMN_STATS":{"c_boolean":"true","c_float":"true","c_int":"true","key":"true","value":"true"}} COLUMN_STATS_ACCURATE \{"BASIC_STATS":"true","COLUMN_STATS":{"c_boolean":"true","c_float":"true","c_int":"true","key":"true","value":"true"}} COLUMN_STATS_ACCURATE \{"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}} COLUMN_STATS_ACCURATE \{"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}} COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS": {quote} h3. Why are the changes needed? There is no issue in the test. The current code prints the COL_STATS as an Object instead of a json string. Not sure why is this case. Tried a lot of ways but seems like this is not fixable at the moment. So, disabling it for now. Note that, in Hive 3.1.3 release this test was disabled so there should not be any issue in disabling it here. Created a followup ticket to fix this test that can be taken up later - -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27173) Add method for Spark to be able to trigger DML events
Naveen Gangam created HIVE-27173: Summary: Add method for Spark to be able to trigger DML events Key: HIVE-27173 URL: https://issues.apache.org/jira/browse/HIVE-27173 Project: Hive Issue Type: Improvement Reporter: Naveen Gangam Spark currently uses Hive.java from Hive as a convenient way to hide from the having to deal with HMS Client and the thrift objects. Currently, Hive has support for DML events (being able to generate events on DML operations but does not expose a public method to do so). It has a private method that takes in Hive objects like Table etc. Would be nice if we can have something with more primitive datatypes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27172) Add the HMS client connection timeout config
Wechar created HIVE-27172: - Summary: Add the HMS client connection timeout config Key: HIVE-27172 URL: https://issues.apache.org/jira/browse/HIVE-27172 Project: Hive Issue Type: New Feature Components: Hive Reporter: Wechar Assignee: Wechar Currently {{HiveMetastoreClient}} use {{CLIENT_SOCKET_TIMEOUT}} as both socket timeout and connection timeout, it's not convenient for users to set a smaller connection timeout. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27171) Backport HIVE-20680 to branch-3
Vihang Karajgaonkar created HIVE-27171: -- Summary: Backport HIVE-20680 to branch-3 Key: HIVE-27171 URL: https://issues.apache.org/jira/browse/HIVE-27171 Project: Hive Issue Type: Sub-task Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar We need to backport HIVE-26836 to fix the TestReplicationScenariosAcrossInstances on branch-3 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27170) facing issues while using tez 0.9.2 as execution engine to hive 2.3.9
vikran created HIVE-27170: - Summary: facing issues while using tez 0.9.2 as execution engine to hive 2.3.9 Key: HIVE-27170 URL: https://issues.apache.org/jira/browse/HIVE-27170 Project: Hive Issue Type: Bug Components: Hive, Tez Affects Versions: 2.3.9 Reporter: vikran Fix For: 2.3.9 Attachments: hive-site.txt, hive_error_in_yarn.txt, tez-site.txt Hi Team, am using below versions: hive 2.3.9 tez 0.9.2 spark 3.3.2 hive-site.xml(attached) tez-site.xml(attached) i have added tez jars and files as well as hive jars into /apps/tez in hdfs directory, when am trying to start hive in cli, i am getting below error, hive> INSERT INTO emp1.employee values(7,'scott',23,'M'); Query ID = azureuser_20230324061903_97928963-410d-44a0-aa47-a83cdc24ce88 Total jobs = 1 Launching Job 1 out of 1 *FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask* and i have attached complete error log from appmaster. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27169) New Locked List to prevent configuration change at runtime without throwing error
Raghav Aggarwal created HIVE-27169: -- Summary: New Locked List to prevent configuration change at runtime without throwing error Key: HIVE-27169 URL: https://issues.apache.org/jira/browse/HIVE-27169 Project: Hive Issue Type: Improvement Affects Versions: 4.0.0-alpha-2 Reporter: Raghav Aggarwal Assignee: Raghav Aggarwal _*AIM*_ Create a new locked list called{{ hive.conf.locked.list}} which contains comma separated configuration that won't be changed during runtime. If someone try to change them at runtime then it will give WARN log on beeline itself. _*How is it different from Restricted List?*_ When running hql file or at runtime, if a configuration present in restricted list get updated then it will throw error and won't proceed with further execution of hql file. With locked list, the configuration that is getting updated will throw WARN log on beeline and will continue to execute the hql file. _*Why is it required?*_ In organisations, admin want to enforce some configs which user shouldn't be able to change at runtime and it shouldn't affect user's existing hql scripts. Therefore, this locked list will be useful as it will not allow user to change the value of particular configs and it will also not stop the execution of hql scripts. {_}*NOTE*{_}: Only at cluster level {{hive.conf.locked.list }}can be set and after that the hive service needs to be restarted. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27168) Use basename of the datatype when fetching partition metadata using partition filters
Sourabh Badhya created HIVE-27168: - Summary: Use basename of the datatype when fetching partition metadata using partition filters Key: HIVE-27168 URL: https://issues.apache.org/jira/browse/HIVE-27168 Project: Hive Issue Type: Bug Reporter: Sourabh Badhya Assignee: Sourabh Badhya While fetching partition metadata using partition filters, we use the column type of the table directly. However, char/varchar types can contain extra information such as length of the char/varchar column and hence it skips fetching partition metadata due to this extra information. Solution: Use the basename of the column type while deciding on whether partition pruning can be done on the partitioned column. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27167) Upgrade guava version in standalone-metastore and storage-api module
Raghav Aggarwal created HIVE-27167: -- Summary: Upgrade guava version in standalone-metastore and storage-api module Key: HIVE-27167 URL: https://issues.apache.org/jira/browse/HIVE-27167 Project: Hive Issue Type: Improvement Components: Standalone Metastore, storage-api Affects Versions: 4.0.0-alpha-2 Reporter: Raghav Aggarwal Assignee: Raghav Aggarwal The guava version in standalone-metastore and storage-api (i.e 19.0) is not in sync with the the parent pom.xml (i.e 22.0). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27166) Introduce Apache Commons DBUtils to handle boilerplate code
KIRTI RUGE created HIVE-27166: - Summary: Introduce Apache Commons DBUtils to handle boilerplate code Key: HIVE-27166 URL: https://issues.apache.org/jira/browse/HIVE-27166 Project: Hive Issue Type: Improvement Reporter: KIRTI RUGE Apache Commons DbUtils is a small library that makes working with JDBC a lot easier. Currently scope of this Jira is introducing Apache DBUtils latest version for applicable methods in TxnHandler and CompactionTxnHandler classes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27165) PART_COL_STATS metastore query not hitting the index
Hongdan Zhu created HIVE-27165: -- Summary: PART_COL_STATS metastore query not hitting the index Key: HIVE-27165 URL: https://issues.apache.org/jira/browse/HIVE-27165 Project: Hive Issue Type: Improvement Reporter: Hongdan Zhu The query located here: [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java#L1029-L1032] is not hitting an index. The index contains CAT_NAME whereas this query does not. This was a change made in Hive 3.0, I think. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27164) Create Temp Txn Table As Select is failing at tablePath validation
Naresh P R created HIVE-27164: - Summary: Create Temp Txn Table As Select is failing at tablePath validation Key: HIVE-27164 URL: https://issues.apache.org/jira/browse/HIVE-27164 Project: Hive Issue Type: Bug Components: HiveServer2, Metastore Reporter: Naresh P R Attachments: mm_cttas.q After HIVE-25303, every CTAS goes for HiveMetaStore$HMSHandler#translate_table_dryrun() call to fetch table location for CTAS queries which fails with following exception for temp tables if MetastoreDefaultTransformer is set. {code:java} 2023-03-17 16:41:23,390 INFO org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer: [pool-6-thread-196]: Starting translation for CreateTable for processor HMSClient-@localhost with [EXTWRITE, EXTREAD, HIVEBUCKET2, HIVEFULLACIDREAD, HIVEFULLACIDWRITE, HIVECACHEINVALIDATE, HIVEMANAGESTATS, HIVEMANAGEDINSERTWRITE, HIVEMANAGEDINSERTREAD, HIVESQL, HIVEMQT, HIVEONLYMQTWRITE] on table test_temp 2023-03-17 16:41:23,392 ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-6-thread-196]: MetaException(message:Illegal location for managed table, it has to be within database's managed location) at org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer.validateTablePaths(MetastoreDefaultTransformer.java:886) at org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer.transformCreateTable(MetastoreDefaultTransformer.java:666) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.translate_table_dryrun(HiveMetaStore.java:2164) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) {code} I am able to repro this issue at apache upstream using attached testcase. There are multiple ways to fix this issue * Have temp txn table path under db's managed location path. This will help with encryption zone tables as well. * skips location check for temp tables at MetastoreDefaultTransformer#validateTablePaths() -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27163) Column stats not getting published after an insert query into an external table with custom location
Taraka Rama Rao Lethavadla created HIVE-27163: - Summary: Column stats not getting published after an insert query into an external table with custom location Key: HIVE-27163 URL: https://issues.apache.org/jira/browse/HIVE-27163 Project: Hive Issue Type: Bug Components: Hive Reporter: Taraka Rama Rao Lethavadla Test case details are below *test.q* {noformat} set hive.stats.column.autogather=true; set hive.stats.autogather=true; dfs ${system:test.dfs.mkdir} ${system:test.tmp.dir}/test; create external table test_custom(age int, name string) stored as orc location '/tmp/test'; insert into test_custom select 1, 'test'; desc formatted test_custom age;{noformat} *test.q.out* {noformat} A masked pattern was here PREHOOK: type: CREATETABLE A masked pattern was here PREHOOK: Output: database:default PREHOOK: Output: default@test_custom A masked pattern was here POSTHOOK: type: CREATETABLE A masked pattern was here POSTHOOK: Output: database:default POSTHOOK: Output: default@test_custom PREHOOK: query: insert into test_custom select 1, 'test' PREHOOK: type: QUERY PREHOOK: Input: _dummy_database@_dummy_table PREHOOK: Output: default@test_custom POSTHOOK: query: insert into test_custom select 1, 'test' POSTHOOK: type: QUERY POSTHOOK: Input: _dummy_database@_dummy_table POSTHOOK: Output: default@test_custom POSTHOOK: Lineage: test_custom.age SIMPLE [] POSTHOOK: Lineage: test_custom.name SIMPLE [] PREHOOK: query: desc formatted test_custom age PREHOOK: type: DESCTABLE PREHOOK: Input: default@test_custom POSTHOOK: query: desc formatted test_custom age POSTHOOK: type: DESCTABLE POSTHOOK: Input: default@test_custom col_name age data_type int min max num_nulls distinct_count avg_col_len max_col_len num_trues num_falses bit_vector comment from deserializer{noformat} As we can see from desc formatted output, column stats were not populated -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27162) Unify HiveUnixTimestampSqlOperator and HiveToUnixTimestampSqlOperator
Stamatis Zampetakis created HIVE-27162: -- Summary: Unify HiveUnixTimestampSqlOperator and HiveToUnixTimestampSqlOperator Key: HIVE-27162 URL: https://issues.apache.org/jira/browse/HIVE-27162 Project: Hive Issue Type: Task Components: CBO Reporter: Stamatis Zampetakis The two classes below both represent the {{unix_timestamp}} operator and have identical implementations. * https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveUnixTimestampSqlOperator.java * https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveToUnixTimestampSqlOperator.java Probably there is a way to use one or the other and not both; having two ways of representing the same thing can bring various problems in query planning and it also leads to code duplication. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27161) MetaException when executing CTAS query in Druid storage handler
Stamatis Zampetakis created HIVE-27161: -- Summary: MetaException when executing CTAS query in Druid storage handler Key: HIVE-27161 URL: https://issues.apache.org/jira/browse/HIVE-27161 Project: Hive Issue Type: Bug Components: Druid integration Affects Versions: 4.0.0-alpha-2 Reporter: Stamatis Zampetakis Any kind of CTAS query targeting the Druid storage handler fails with the following exception: {noformat} org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:LOCATION may not be specified for Druid) at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1347) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1352) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableNonReplaceMode(CreateTableOperation.java:158) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:116) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:327) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:244) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:105) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:367) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:205) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:228) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257) ~[hive-cli-4.0.0-SNAPSHOT.jar:?] at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) ~[hive-cli-4.0.0-SNAPSHOT.jar:?] at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) ~[hive-cli-4.0.0-SNAPSHOT.jar:?] at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425) ~[hive-cli-4.0.0-SNAPSHOT.jar:?] at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356) ~[hive-cli-4.0.0-SNAPSHOT.jar:?] at org.apache.hadoop.hive.ql.dataset.QTestDatasetHandler.initDataset(QTestDatasetHandler.java:86) ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.dataset.QTestDatasetHandler.beforeTest(QTestDatasetHandler.java:190) ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.qoption.QTestOptionDispatcher.beforeTest(QTestOptionDispatcher.java:79) ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.QTestUtil.cliInit(QTestUtil.java:607) ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:112) ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver(TestMiniDruidCliDriver.java:60) ~[test-classes/:?] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_261] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_261] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_261] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_261] at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) ~[junit-4.13.2.jar:4.13.2] at
[jira] [Created] (HIVE-27160) Iceberg: Optimise delete (entire) data from table
Denys Kuzmenko created HIVE-27160: - Summary: Iceberg: Optimise delete (entire) data from table Key: HIVE-27160 URL: https://issues.apache.org/jira/browse/HIVE-27160 Project: Hive Issue Type: Task Reporter: Denys Kuzmenko Currently, in MOR mode, Hive creates "positional delete" files during deletes. With "Delete from ", the entire dataset in the table or partition is written as a "positional delete" file. During the read operation, all these files are read again causing huge delay. Proposal: apply "truncate" optimization in case of "delete *". -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27159) Filters are not pushed down for decimal format in Parquet
Rajesh Balamohan created HIVE-27159: --- Summary: Filters are not pushed down for decimal format in Parquet Key: HIVE-27159 URL: https://issues.apache.org/jira/browse/HIVE-27159 Project: Hive Issue Type: Improvement Reporter: Rajesh Balamohan Decimal filters are not created and pushed down in parquet readers. This causes latency delays and unwanted row processing in query execution. It throws exception in runtime and processes more rows. E.g Q13. {noformat} Parquet: (Map 1) INFO : Task Execution Summary INFO : -- INFO : VERTICES DURATION(ms) CPU_TIME(ms)GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS INFO : -- INFO : Map 1 31254.00 0 0 549,181,950 133 INFO : Map 3 0.00 0 0 73,049 365 INFO : Map 4 2027.00 0 0 6,000,0001,689,919 INFO : Map 5 0.00 0 0 7,2001,440 INFO : Map 6517.00 0 0 1,920,800 493,920 INFO : Map 7 0.00 0 0 1,0021,002 INFO : Reducer 2 18716.00 0 0 1330 INFO : -- ORC: INFO : Task Execution Summary INFO : -- INFO : VERTICES DURATION(ms) CPU_TIME(ms)GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS INFO : -- INFO : Map 1 6556.00 0 0 267,146,063 152 INFO : Map 3 0.00 0 0 10,000 365 INFO : Map 4 2014.00 0 0 6,000,0001,689,919 INFO : Map 5 0.00 0 0 7,2001,440 INFO : Map 6504.00 0 0 1,920,800 493,920 INFO : Reducer 2 3159.00 0 0 1520 INFO : -- {noformat} {noformat} Map 1 Map Operator Tree: TableScan alias: store_sales filterExpr: (ss_hdemo_sk is not null and ss_addr_sk is not null and ss_cdemo_sk is not null and ss_store_sk is not null and ((ss_sales_price >= 100) or (ss_sales_price <= 150) or (ss_sales_price >= 50) or (ss_sales_price <= 100) or (ss_sales_price >= 150) or (ss_sales_price <= 200)) and ((ss_net_profit >= 100) or (ss_net_profit <= 200) or (ss_net_profit >= 150) or (ss_net_profit <= 300) or (ss_net_profit >= 50) or (ss_net_profit <= 250))) (type: boolean) probeDecodeDetails: cacheKey:HASH_MAP_MAPJOIN_112_container, bigKeyColName:ss_hdemo_sk, smallTablePos:1, keyRatio:5.042575832290721E-6 Statistics: Num rows: 2750380056 Data size: 1321831086472 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (ss_hdemo_sk is not null and ss_addr_sk is not null and ss_cdemo_sk is not null and ss_store_sk is not null and ((ss_sales_price >= 100) or (ss_sales_price <= 150) or (ss_sales_price >= 50) or (ss_sales_price <= 100) or (ss_sales_price >= 150) or (ss_sales_price <= 200)) and ((ss_net_profit >= 100) or (ss_net_profit <= 200) or (ss_net_profit >= 150) or (ss_net_profit <= 300) or (ss_net_profit >= 50) or (ss_net_profit <= 250))) (type: boolean) Statistics: Num rows: 2500252205 Data size: 1201619783884 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: ss_cdemo_sk (type: bigint), ss_hdemo_sk (type: bigint), ss_addr_sk (type: bigint), ss_store_sk (type: bigint), ss_quantity (type: int), ss_ext_sales_price (type: decimal(7,2)), ss_ext_wholesale_cost (type: decimal(7,2)), ss_sold_date_sk (type: bigint), ss_net_profit BETWEEN 100 AND 200 (type: boolean), ss_net_profit BETWEEN 150 AND 300 (type: boolean), ss_net_profit BETWEEN 50 AND 250 (type: boolean), ss_sales_price BETWEEN 100 AND 150 (type: boolean), ss_sales_price BETWEEN 50 AN
[jira] [Created] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
Simhadri Govindappa created HIVE-27158: -- Summary: Store hive columns stats in puffin files for iceberg tables Key: HIVE-27158 URL: https://issues.apache.org/jira/browse/HIVE-27158 Project: Hive Issue Type: Improvement Reporter: Simhadri Govindappa Assignee: Simhadri Govindappa -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27157) AssertionError when inferring return type for unix_timestamp function
Stamatis Zampetakis created HIVE-27157: -- Summary: AssertionError when inferring return type for unix_timestamp function Key: HIVE-27157 URL: https://issues.apache.org/jira/browse/HIVE-27157 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 4.0.0-alpha-2 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Any attempt to derive the return data type for the {{unix_timestamp}} function results into the following assertion error. {noformat} java.lang.AssertionError: typeName.allowsPrecScale(true, false): BIGINT at org.apache.calcite.sql.type.BasicSqlType.checkPrecScale(BasicSqlType.java:65) at org.apache.calcite.sql.type.BasicSqlType.(BasicSqlType.java:81) at org.apache.calcite.sql.type.SqlTypeFactoryImpl.createSqlType(SqlTypeFactoryImpl.java:67) at org.apache.calcite.sql.fun.SqlAbstractTimeFunction.inferReturnType(SqlAbstractTimeFunction.java:78) at org.apache.calcite.rex.RexBuilder.deriveReturnType(RexBuilder.java:278) {noformat} due to a faulty implementation of type inference for the respective operators: * [https://github.com/apache/hive/blob/52360151dc43904217e812efde1069d6225e9570/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveUnixTimestampSqlOperator.java] * [https://github.com/apache/hive/blob/52360151dc43904217e812efde1069d6225e9570/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveToUnixTimestampSqlOperator.java] Although at this stage in master it is not possible to reproduce the problem with an actual SQL query the buggy implementation must be fixed since slight changes in the code/CBO rules may lead to methods relying on {{{}SqlOperator.inferReturnType{}}}. Note that in older versions of Hive it is possible to hit the AssertionError in various ways. For example in Hive 3.1.3 (and older), the error may come from [HiveRelDecorrelator|https://github.com/apache/hive/blob/4df4d75bf1e16fe0af75aad0b4179c34c07fc975/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelDecorrelator.java#L1933] in the presence of sub-queries. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27156) Wrong results when CAST timestamp literal with timezone to TIMESTAMP
Stamatis Zampetakis created HIVE-27156: -- Summary: Wrong results when CAST timestamp literal with timezone to TIMESTAMP Key: HIVE-27156 URL: https://issues.apache.org/jira/browse/HIVE-27156 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 4.0.0-alpha-2 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Casting a timestamp literal with an invalid timezone to the TIMESTAMP datatype results into a timestamp with the time part truncated to midnight (00:00:00). *Case I* {code:sql} select cast('2020-06-28 22:17:33.123456 Europe/Amsterd' as timestamp); {code} +Actual+ |2020-06-28 00:00:00| +Expected+ |NULL/ERROR/2020-06-28 22:17:33.123456| *Case II* {code:sql} select cast('2020-06-28 22:17:33.123456 Invalid/Zone' as timestamp); {code} +Actual+ |2020-06-28 00:00:00| +Expected+ |NULL/ERROR/2020-06-28 22:17:33.123456| The existing documentation does not cover what should be the output in the cases above: * https://cwiki.apache.org/confluence/display/hive/languagemanual+types#LanguageManualTypes-TimestampstimestampTimestamps * https://cwiki.apache.org/confluence/display/Hive/Different+TIMESTAMP+types *Case III* Another subtle but important case is the following where the timestamp literal has a valid timezone but we are attempting a cast to a datatype that does not store the timezone. {code:sql} select cast('2020-06-28 22:17:33.123456 Europe/Amsterdam' as timestamp); {code} +Actual+ |2020-06-28 22:17:33.123456| The correctness of the last result is debatable since someone would expect a NULL or ERROR. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27155) Iceberg: Vectorize virtual columns
Denys Kuzmenko created HIVE-27155: - Summary: Iceberg: Vectorize virtual columns Key: HIVE-27155 URL: https://issues.apache.org/jira/browse/HIVE-27155 Project: Hive Issue Type: Task Reporter: Denys Kuzmenko Vectorization gets disabled at runtime with the following reason: {code} Select expression for SELECT operator: Virtual column PARTITION__SPEC__ID is not supported {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27154) Fix testBootstrapReplLoadRetryAfterFailureForPartitions
Vihang Karajgaonkar created HIVE-27154: -- Summary: Fix testBootstrapReplLoadRetryAfterFailureForPartitions Key: HIVE-27154 URL: https://issues.apache.org/jira/browse/HIVE-27154 Project: Hive Issue Type: Sub-task Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar `testBootstrapReplLoadRetryAfterFailureForPartitions` has been failing on branch-3 http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4067/12/tests -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27153) Revert "HIVE-20182: Backport HIVE-20067 to branch-3"
Aman Raj created HIVE-27153: --- Summary: Revert "HIVE-20182: Backport HIVE-20067 to branch-3" Key: HIVE-27153 URL: https://issues.apache.org/jira/browse/HIVE-27153 Project: Hive Issue Type: Sub-task Reporter: Aman Raj Assignee: Aman Raj The mm_all.q test is failing because of this commit. This commit was not validated before committing. There is no stack trace for this exception. Link to the exception : [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4126/2/tests] {code:java} java.lang.AssertionError: Client execution failed with error code = 1 running "insert into table part_mm_n0 partition(key_mm=455) select key from intermediate_n0" fname=mm_all.q See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, or check ./ql/target/surefire-reports or ./itests/qtest/target/surefire-reports/ for specific test cases logs.at org.junit.Assert.fail(Assert.java:88)at org.apache.hadoop.hive.ql.QTestUtil.failed(QTestUtil.java:2232) at org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:180) at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104) at org.apache.hadoop.hive.cli.split1.TestMiniLlapCliDriver.testCliDriver(TestMiniLlapCliDriver.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27152) Revert "Constant UDF is not pushed to JDBCStorage Handler"
Aman Raj created HIVE-27152: --- Summary: Revert "Constant UDF is not pushed to JDBCStorage Handler" Key: HIVE-27152 URL: https://issues.apache.org/jira/browse/HIVE-27152 Project: Hive Issue Type: Sub-task Reporter: Aman Raj Assignee: Aman Raj current_date_timestamp.q - This change was committed in HIVE-21388 without validation. The failure is because again Hive is not able to parse explain cbo select current_timestamp() from alltypesorc Exception stack trace : {code:java} 2023-03-16 04:06:17 Completed running task attempt: attempt_1678964507586_0001_175_01_00_02023-03-16 04:06:17 Completed Dag: dag_1678964507586_0001_175TRACE StatusLogger Log4jLoggerFactory.getContext() found anchor class org.apache.hadoop.hive.ql.exec.OperatorTRACE StatusLogger Log4jLoggerFactory.getContext() found anchor class org.apache.hadoop.hive.ql.stats.fs.FSStatsPublisherTRACE StatusLogger Log4jLoggerFactory.getContext() found anchor class org.apache.hadoop.hive.ql.stats.fs.FSStatsAggregatorNoViableAltException(24@[]) at org.apache.hadoop.hive.ql.parse.HiveParser.explainStatement(HiveParser.java:1512) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1407) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:230) at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:79) at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:72) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:617)at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1854) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1801) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1796) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:214) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335) at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1474) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1448) at org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:177) at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104) at org.apache.hadoop.hive.cli.split12.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:92) at org.junit.rules.RunRules.evaluate(RunRules.java:20) Attachments {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27151) Revert "HIVE-21685 Wrong simplification in query with multiple IN clauses"
Aman Raj created HIVE-27151: --- Summary: Revert "HIVE-21685 Wrong simplification in query with multiple IN clauses" Key: HIVE-27151 URL: https://issues.apache.org/jira/browse/HIVE-27151 Project: Hive Issue Type: Sub-task Reporter: Aman Raj Assignee: Aman Raj The multi_in_clause.q fails because Hive is not able to parse explain cbo select * from very_simple_table_for_in_test where name IN('g','r') AND name IN('a','b') If we want this to work, I am able to do it in my local. We have 2 options : a. Either revert HIVE-21685 since this scenario was not validated back then before adding this test. b. This fix was present in https://issues.apache.org/jira/browse/HIVE-20718 but to cherry pick this we need to cherry pick https://issues.apache.org/jira/browse/HIVE-17040 since HIVE-20718 has a lot of merge conflicts with HIVE-17040. But after cherry picking these we have other failures to fix. I am reverting this ticket for now. Exception stacktrace : {code:java} 2023-03-16 12:33:11 Completed running task attempt: attempt_1678994907903_0001_185_01_00_02023-03-16 12:33:11 Completed Dag: dag_1678994907903_0001_185TRACE StatusLogger Log4jLoggerFactory.getContext() found anchor class org.apache.hadoop.hive.ql.exec.OperatorTRACE StatusLogger Log4jLoggerFactory.getContext() found anchor class org.apache.hadoop.hive.ql.stats.fs.FSStatsPublisherTRACE StatusLogger Log4jLoggerFactory.getContext() found anchor class org.apache.hadoop.hive.ql.stats.fs.FSStatsAggregatorNoViableAltException(24@[]) at org.apache.hadoop.hive.ql.parse.HiveParser.explainStatement(HiveParser.java:1512) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1407) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:230) at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:79) at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:72) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:617)at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1854) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1801) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1796) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:214) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335) at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1474) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1448) at org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:177) at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104) at org.apache.hadoop.hive.cli.split12.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27150) Drop single partition can also support direct sql
Wechar created HIVE-27150: - Summary: Drop single partition can also support direct sql Key: HIVE-27150 URL: https://issues.apache.org/jira/browse/HIVE-27150 Project: Hive Issue Type: Improvement Components: Hive Reporter: Wechar Assignee: Wechar *Background:* [HIVE-6980|https://issues.apache.org/jira/browse/HIVE-6980] supports direct sql for drop_partitions, we can reuse this huge improvement in drop_partition. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27149) StorageHandler PPD query planning statistics not adjusted for residualPredicate
Yi Zhang created HIVE-27149: --- Summary: StorageHandler PPD query planning statistics not adjusted for residualPredicate Key: HIVE-27149 URL: https://issues.apache.org/jira/browse/HIVE-27149 Project: Hive Issue Type: Bug Components: StorageHandler Affects Versions: 4.0.0-alpha-2 Reporter: Yi Zhang In StorageHandler PPD, filter predicates can be pushed down to storage and trimmed to a subset residualPredicate, however, at query planning statistics based on filters only consider the 'final' residual predicates, when in fact there are pushedPredicates that should also be considered, this affect reducer parallelism (more reducers than needed) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27148) Disable TestJdbcGenericUDTFGetSplits
Vihang Karajgaonkar created HIVE-27148: -- Summary: Disable TestJdbcGenericUDTFGetSplits Key: HIVE-27148 URL: https://issues.apache.org/jira/browse/HIVE-27148 Project: Hive Issue Type: Sub-task Components: Tests Reporter: Vihang Karajgaonkar TestJdbcGenericUDTFGetSplits is flaky and intermittently fails. http://ci.hive.apache.org/job/hive-flaky-check/614/ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27147) HS2 is not accessible to clients via zookeeper when hostname used is not FQDN
Venugopal Reddy K created HIVE-27147: Summary: HS2 is not accessible to clients via zookeeper when hostname used is not FQDN Key: HIVE-27147 URL: https://issues.apache.org/jira/browse/HIVE-27147 Project: Hive Issue Type: Bug Reporter: Venugopal Reddy K HS2 is not accessible to clients via zookeeper when hostname used during registration is InetAddress.getHostName() with JDK 11. This issue is happening due to change in behavior on JDK 11. [https://stackoverflow.com/questions/61898627/inetaddress-getlocalhost-gethostname-different-behavior-between-jdk-11-and-j|http://example.com] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27146) Re-enable orc_merge*.q tests for TestMiniSparkOnYarnCliDriver
Vihang Karajgaonkar created HIVE-27146: -- Summary: Re-enable orc_merge*.q tests for TestMiniSparkOnYarnCliDriver Key: HIVE-27146 URL: https://issues.apache.org/jira/browse/HIVE-27146 Project: Hive Issue Type: Test Reporter: Vihang Karajgaonkar It was found that the q.out file for these tests fail with a diff in the replication factor of the files. The tests only fail on the CI job so it is possible that it is due to some test environment issues. The tests also fail on 3.1.3 release. E.g orc_merge4.q fails with the error. Similarly the other tests fail with the same difference in replication factor. {code:java} 40c40 < -rw-r--r-- 1 ### USER ### ### GROUP ### 2530 ### HDFS DATE ### hdfs://### HDFS PATH ### --- > -rw-r--r-- 3 ### USER ### ### GROUP ### 2530 ### HDFS DATE ### > hdfs://### HDFS PATH ### 66c66 < -rw-r--r-- 1 ### USER ### ### GROUP ### 2530 ### HDFS DATE ### hdfs://### HDFS PATH ### --- > -rw-r--r-- 3 ### USER ### ### GROUP ### 2530 ### HDFS DATE ### > hdfs://### HDFS PATH ### 68c68 < -rw-r--r-- 1 ### USER ### ### GROUP ### 2530 ### HDFS DATE ### hdfs://### HDFS PATH ### --- > -rw-r--r-- 3 ### USER ### ### GROUP ### 2530 ### HDFS DATE ### > hdfs://### HDFS PATH ### {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27145) Use StrictMath for remaining Math functions as followup of HIVE-23133
Himanshu Mishra created HIVE-27145: -- Summary: Use StrictMath for remaining Math functions as followup of HIVE-23133 Key: HIVE-27145 URL: https://issues.apache.org/jira/browse/HIVE-27145 Project: Hive Issue Type: Task Components: UDF Reporter: Himanshu Mishra Assignee: Himanshu Mishra [HIVE-23133|https://issues.apache.org/jira/browse/HIVE-23133] started using {{StrictMath}} for {{cos, exp, log}} UDFs to fix QTests failing as results vary based on hardware when using Math library. Follow it up by using {{StrictMath}} for other Math functions that can have same impact of underlying hardware namely, {{sin, tan, asin, acos, atan, sqrt, pow, cbrt}}. [JDK-4477961](https://bugs.openjdk.org/browse/JDK-4477961) (in Java 9) changed radians and degrees calculation leading to Q Test failures when tests are run on Java 9+, fix such tests. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27144) Alter table partitions need not DBNotificationListener for external tables
Rajesh Balamohan created HIVE-27144: --- Summary: Alter table partitions need not DBNotificationListener for external tables Key: HIVE-27144 URL: https://issues.apache.org/jira/browse/HIVE-27144 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Rajesh Balamohan DBNotificationListener for external tables may not be needed. Even for "analyze table blah compute statistics for columns" for external partitioned tables, it invokes DBNotificationListener for all partitions. {noformat} at org.datanucleus.store.query.Query.execute(Query.java:1726) at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:374) at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:216) at org.apache.hadoop.hive.metastore.ObjectStore.addNotificationEvent(ObjectStore.java:11774) at jdk.internal.reflect.GeneratedMethodAccessor135.invoke(Unknown Source) at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.18/DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(java.base@11.0.18/Method.java:566) at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) at com.sun.proxy.$Proxy33.addNotificationEvent(Unknown Source) at org.apache.hive.hcatalog.listener.DbNotificationListener.process(DbNotificationListener.java:1308) at org.apache.hive.hcatalog.listener.DbNotificationListener.onAlterPartition(DbNotificationListener.java:458) at org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier$14.notify(MetaStoreListenerNotifier.java:161) at org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:328) at org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:390) at org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartitions(HiveAlterHandler.java:863) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions_with_environment_context(HiveMetaStore.java:6253) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions_req(HiveMetaStore.java:6201) at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@11.0.18/Native Method) at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@11.0.18/NativeMethodAccessorImpl.java:62) at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.18/DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(java.base@11.0.18/Method.java:566) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:160) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:121) at com.sun.proxy.$Proxy34.alter_partitions_req(Unknown Source) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions_req.getResult(ThriftHiveMetastore.java:21532) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions_req.getResult(ThriftHiveMetastore.java:21511) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:652) at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:647) at java.security.AccessController.doPrivileged(java.base@11.0.18/Native Method) {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27143) Improve HCatStorer move task
Yi Zhang created HIVE-27143: --- Summary: Improve HCatStorer move task Key: HIVE-27143 URL: https://issues.apache.org/jira/browse/HIVE-27143 Project: Hive Issue Type: Improvement Components: HCatalog Affects Versions: 3.1.3 Reporter: Yi Zhang moveTask in hcatalog is inefficient, it does 2 iterations dryRun and execution, and is sequential. This can be improved. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27142) Map Join not working as expected when joining non-native tables with native tables
Syed Shameerur Rahman created HIVE-27142: Summary: Map Join not working as expected when joining non-native tables with native tables Key: HIVE-27142 URL: https://issues.apache.org/jira/browse/HIVE-27142 Project: Hive Issue Type: Bug Affects Versions: All Versions Reporter: Syed Shameerur Rahman Assignee: Syed Shameerur Rahman Fix For: 4.0.0 *1. Issue :* When *_hive.auto.convert.join=true_* and if the underlying query is trying to join a large non-native hive table with a small native hive table, The map join is happening in the wrong side i.e on the map task which process the small native hive table and it can lead to OOM when the non-native table is really large and only few map tasks are spawned to scan the small native hive tables. *2. Why is this happening ?* This happens due to improper stats collection/computation of non native hive tables. Since the non-native hive tables are actually stored in a different location which Hive does not know of and only a temporary path which is visible to Hive while creating a non native table does not store the actual data, The stats collection logic tend to under estimate the data/rows and hence causes the map join to happen in the wrong side. *3. Potential Solutions* 3.1 Turn off *_hive.auto.convert.join=false._* This can have a negative impact of the query if the same query is trying to do multiple joins i.e one join with non-native tables and other join where both the tables are native. 3.2 Compute stats for non-native table by firing the ANALYZE TABLE <> command before joining native and non-native commands. The user may or may not choose to do it. 3.3 Don't collect/estimate stats for non-native hive tables by default (Preferred solution) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27141) Iceberg: Add more iceberg table metadata
zhangbutao created HIVE-27141: - Summary: Iceberg: Add more iceberg table metadata Key: HIVE-27141 URL: https://issues.apache.org/jira/browse/HIVE-27141 Project: Hive Issue Type: Improvement Components: Iceberg integration Affects Versions: 4.0.0-alpha-2 Reporter: zhangbutao -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27140) Set HADOOP_PROXY_USER cause hiveMetaStoreClient close everytime
chenruotao created HIVE-27140: - Summary: Set HADOOP_PROXY_USER cause hiveMetaStoreClient close everytime Key: HIVE-27140 URL: https://issues.apache.org/jira/browse/HIVE-27140 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 3.1.2, 2.3.8 Reporter: chenruotao Assignee: chenruotao -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27139) Log details when hiveserver2.sh doing sanity check with the process id
Zhihua Deng created HIVE-27139: -- Summary: Log details when hiveserver2.sh doing sanity check with the process id Key: HIVE-27139 URL: https://issues.apache.org/jira/browse/HIVE-27139 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Zhihua Deng HiveServer2 always persists the process id into a file after HIVE-22193. When some other process reuses the same pid, restarting the HiveServer2 would be failed, print the details of the process if in case, and delete the old pid file when the HiveServer2 is decommissioning. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27138) MapJoinOperator throws NPE when computing OuterJoin with filter expressions on small table
Seonggon Namgung created HIVE-27138: --- Summary: MapJoinOperator throws NPE when computing OuterJoin with filter expressions on small table Key: HIVE-27138 URL: https://issues.apache.org/jira/browse/HIVE-27138 Project: Hive Issue Type: Bug Reporter: Seonggon Namgung Assignee: Seonggon Namgung Hive throws NPE when running mapjoin_filter_on_outerjoin.q using Tez engine. (I used TestMiniLlapCliDriver.) The NPE is thrown by CommonJoinOperator.getFilterTag(), which just retreives the last object from the given list. To the best of my knowledge, if Hive selects MapJoin to perform Join operation, filterTag should be computed and appended to a row before the row is passed to MapJoinOperator. In the case of MapReduce engine, this is done by HashTableSinkOperator. However, I cannot find any logic pareparing filterTag for small tables when Hive uses Tez engine. I think there are 2 available options: 1. Don't use MapJoinOperator if a small table has filter expression. 2. Add a new logic that computes and passes filterTag to MapJoinOperator. I am working on the second option and ready to discuss about it. It would be grateful if you could give any opinion about this issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27137) Remove HIVE_IN_TEST_ICEBERG flag
Zsolt Miskolczi created HIVE-27137: -- Summary: Remove HIVE_IN_TEST_ICEBERG flag Key: HIVE-27137 URL: https://issues.apache.org/jira/browse/HIVE-27137 Project: Hive Issue Type: Improvement Components: Iceberg integration Reporter: Zsolt Miskolczi Remove the HIVE_IN_TEST_ICEBERG flag from the production code. Remove code snippet from TxnHandler and update unit tests which are expecting the exception. {{ if (lc.isSetOperationType() && lc.getOperationType() == DataOperationType.UNSET && ((MetastoreConf.getBoolVar(conf, ConfVars.HIVE_IN_TEST) || MetastoreConf.getBoolVar(conf, ConfVars.HIVE_IN_TEZ_TEST)) && !MetastoreConf.getBoolVar(conf, ConfVars.HIVE_IN_TEST_ICEBERG))) { }} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27136) Backport HIVE-27129 to branch-3
Junlin Zeng created HIVE-27136: -- Summary: Backport HIVE-27129 to branch-3 Key: HIVE-27136 URL: https://issues.apache.org/jira/browse/HIVE-27136 Project: Hive Issue Type: Improvement Reporter: Junlin Zeng Assignee: Junlin Zeng -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27135) Cleaner fails with FileNotFoundException
Dayakar M created HIVE-27135: Summary: Cleaner fails with FileNotFoundException Key: HIVE-27135 URL: https://issues.apache.org/jira/browse/HIVE-27135 Project: Hive Issue Type: Bug Reporter: Dayakar M Assignee: Dayakar M The compaction fails when the Cleaner tried to remove a missing directory from HDFS. {code:java} 2023-03-06 07:45:48,331 ERROR org.apache.hadoop.hive.ql.txn.compactor.Cleaner: [Cleaner-executor-thread-12]: Caught exception when cleaning, unable to complete cleaning of id:39762523,dbname:ramas04_hk_ch,tableName:wsinvoicepart,partName:null,state:,type:MINOR,enqueueTime:0,start:0,properties:null,runAs:hive,tooManyAborts:false,hasOldAbort:false,highestWriteId:989,errorMessage:null,workerId: null,initiatorId: null java.io.FileNotFoundException: File hdfs://OnPrem-P-Se-DL-01/warehouse/tablespace/managed/hive/ramas04_hk_ch.db/wsinvoicepart/.hive-staging_hive_2023-03-06_07-45-23_120_4659605113266849995-73550 does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1275) at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1249) at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1194) at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1190) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1208) at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2144) at org.apache.hadoop.fs.FileSystem$5.handleFileStat(FileSystem.java:2332) at org.apache.hadoop.fs.FileSystem$5.hasNext(FileSystem.java:2309) at org.apache.hadoop.util.functional.RemoteIterators$WrappingRemoteIterator.sourceHasNext(RemoteIterators.java:432) at org.apache.hadoop.util.functional.RemoteIterators$FilteringRemoteIterator.fetch(RemoteIterators.java:581) at org.apache.hadoop.util.functional.RemoteIterators$FilteringRemoteIterator.hasNext(RemoteIterators.java:602) at org.apache.hadoop.hive.ql.io.AcidUtils.getHdfsDirSnapshots(AcidUtils.java:1435) at org.apache.hadoop.hive.ql.txn.compactor.Cleaner.removeFiles(Cleaner.java:287) at org.apache.hadoop.hive.ql.txn.compactor.Cleaner.clean(Cleaner.java:214) at org.apache.hadoop.hive.ql.txn.compactor.Cleaner.lambda$run$0(Cleaner.java:114) at org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil$ThrowingRunnable.lambda$unchecked$0(CompactorUtil.java:54) at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750){code} h4. This issue got fixed as a part of HIVE-26481 but here its not fixed completely. [Here|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L1541] FileUtils.listFiles() API which returns a RemoteIterator. So while iterating over, it checks if it is a directory and recursive listing then it will try to list files from that directory but if that directory is removed by other thread/task then it throws FileNotFoundException. Here the directory which got removed is the .staging directory which needs to be excluded through by using passed filter. So here we can use _*org.apache.hadoop.hive.common.FileUtils#listStatusRecursively()*_ [API|https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/common/FileUtils.java#L372] which will apply the filter before listing the files from that directory which will avoid FileNotFoundException. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27134) SharedWorkOptimizer merges TableScan operators that have different DPP parents
Sungwoo Park created HIVE-27134: --- Summary: SharedWorkOptimizer merges TableScan operators that have different DPP parents Key: HIVE-27134 URL: https://issues.apache.org/jira/browse/HIVE-27134 Project: Hive Issue Type: Sub-task Reporter: Sungwoo Park -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27133) Round off limit value greater than int_max to int_max;
vamshi kolanu created HIVE-27133: Summary: Round off limit value greater than int_max to int_max; Key: HIVE-27133 URL: https://issues.apache.org/jira/browse/HIVE-27133 Project: Hive Issue Type: Task Reporter: vamshi kolanu Assignee: vamshi kolanu Currently when the limit has a bigint value, it fails with the following error. As part of this task, we are going to round off any value greater than int_max to int_max. select string_col from alltypes order by 1 limit 9223372036854775807 java.lang.NumberFormatException: For input string: "9223372036854775807" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:583) at java.lang.Integer.(Integer.java:867) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1803) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1911) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1911) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:12616) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12718) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:450) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:299) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:650) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1503) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1450) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1445) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126) at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:200) at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:265) at org.apache.hive.service.cli.operation.Operation.run(Operation.java:274) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:565) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:551) at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315) at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:567) at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557) at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27132) backport HIVE-12371 to branch-3 hive-jdbc use global Driver loginTimeout
shalk created HIVE-27132: Summary: backport HIVE-12371 to branch-3 hive-jdbc use global Driver loginTimeout Key: HIVE-27132 URL: https://issues.apache.org/jira/browse/HIVE-27132 Project: Hive Issue Type: Improvement Reporter: shalk -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27131) Remove empty module shims/scheduler
Stamatis Zampetakis created HIVE-27131: -- Summary: Remove empty module shims/scheduler Key: HIVE-27131 URL: https://issues.apache.org/jira/browse/HIVE-27131 Project: Hive Issue Type: Task Components: Shims Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis The module has nothing more than a plain pom.xml file and the latter does not seem to do anything special apart from bundling up together some optional dependencies. There is no source code, no tests, and no reason for the module to exist. At some point it used to contain a few classes but these were removed progressively (e.g., HIVE-22398) leaving back an empty module. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27130) Add metrics to report the size of data replicated/copied to target
Amit Saonerkar created HIVE-27130: - Summary: Add metrics to report the size of data replicated/copied to target Key: HIVE-27130 URL: https://issues.apache.org/jira/browse/HIVE-27130 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 3.2.0 Reporter: Amit Saonerkar The corresponding CDPD Jira is [CDPD-45872|https://jira.cloudera.com/browse/CDPD-45872] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27129) Enhanced support to Hive Client http support
Junlin Zeng created HIVE-27129: -- Summary: Enhanced support to Hive Client http support Key: HIVE-27129 URL: https://issues.apache.org/jira/browse/HIVE-27129 Project: Hive Issue Type: Improvement Reporter: Junlin Zeng Assignee: Junlin Zeng Currently we support using http in the hive metastore connection. However, we do not support custom headers and also default trust store. This ticket tracks the work to improve the http journey. -- This message was sent by Atlassian Jira (v8.20.10#820010)