[jira] [Updated] (HIVE-28090) correct desc of `hive.metastore.disallow.incompatible.col.type.changes`
[ https://issues.apache.org/jira/browse/HIVE-28090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28090: -- Labels: pull-request-available (was: ) > correct desc of `hive.metastore.disallow.incompatible.col.type.changes` > --- > > Key: HIVE-28090 > URL: https://issues.apache.org/jira/browse/HIVE-28090 > Project: Hive > Issue Type: Improvement >Reporter: ming95 >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-28090) correct desc of `hive.metastore.disallow.incompatible.col.type.changes`
ming95 created HIVE-28090: - Summary: correct desc of `hive.metastore.disallow.incompatible.col.type.changes` Key: HIVE-28090 URL: https://issues.apache.org/jira/browse/HIVE-28090 Project: Hive Issue Type: Improvement Reporter: ming95 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-28089) Adding feature about MSCK/Analyze commands showing a warning in console for Small files.
Hongdan Zhu created HIVE-28089: -- Summary: Adding feature about MSCK/Analyze commands showing a warning in console for Small files. Key: HIVE-28089 URL: https://issues.apache.org/jira/browse/HIVE-28089 Project: Hive Issue Type: Improvement Reporter: Hongdan Zhu Add warning when MSCK/Analyze commands can show a warning in console for table/partition if (totalSize / numFiles) is less than avgFileSize -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-28089) Adding feature about MSCK/Analyze commands showing a warning in console for Small files.
[ https://issues.apache.org/jira/browse/HIVE-28089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongdan Zhu reassigned HIVE-28089: -- Assignee: Hongdan Zhu > Adding feature about MSCK/Analyze commands showing a warning in console for > Small files. > > > Key: HIVE-28089 > URL: https://issues.apache.org/jira/browse/HIVE-28089 > Project: Hive > Issue Type: Improvement >Reporter: Hongdan Zhu >Assignee: Hongdan Zhu >Priority: Major > > Add warning when > MSCK/Analyze commands can show a warning in console for table/partition if > (totalSize / numFiles) is less than avgFileSize -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28021) Iceberg: Attempting to create a table with a percent symbol fails
[ https://issues.apache.org/jira/browse/HIVE-28021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820123#comment-17820123 ] Tim Thorpe commented on HIVE-28021: --- Thanks [~zhangbutao] for merging it! > Iceberg: Attempting to create a table with a percent symbol fails > - > > Key: HIVE-28021 > URL: https://issues.apache.org/jira/browse/HIVE-28021 > Project: Hive > Issue Type: Bug > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: Tim Thorpe >Assignee: Tim Thorpe >Priority: Minor > Labels: pull-request-available > Fix For: 4.1.0 > > > This occurred while attempting to test creating a table > "[|]#&%_@"."[|]#&%_@" > The stack trace is as follows: > > {code:java} > java.util.UnknownFormatConversionException: Conversion = '_' > org.apache.hadoop.hive.ql.metadata.HiveException: > java.util.UnknownFormatConversionException: Conversion = '_' > at > org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1383) > ~[hive-exec-4.0.0-beta-1.jar:4.0.0-beta-1] > at > org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1388) > ~[hive-exec-4.0.0-beta-1.jar:4.0.0-beta-1] > at > org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1278) > ~[hive-exec-4.0.0-beta-1.jar:4.0.0-beta-1] > … > Caused by: java.util.UnknownFormatConversionException: Conversion = '_' > at java.util.Formatter.checkText(Formatter.java:2590) ~[?:1.8.0] > at java.util.Formatter.parse(Formatter.java:2566) ~[?:1.8.0] > at java.util.Formatter.format(Formatter.java:2512) ~[?:1.8.0] > at java.util.Formatter.format(Formatter.java:2466) ~[?:1.8.0] > at java.lang.String.format(String.java:4268) ~[?:2.9 (05-29-2023)] > at > org.apache.iceberg.relocated.com.google.common.util.concurrent.ThreadFactoryBuilder.format(ThreadFactoryBuilder.java:186) > ~[hive-iceberg-handler-4.0.0-beta-1.jar:4.0.0-beta-1] > at > org.apache.iceberg.relocated.com.google.common.util.concurrent.ThreadFactoryBuilder.setNameFormat(ThreadFactoryBuilder.java:73) > ~[hive-iceberg-handler-4.0.0-beta-1.jar:4.0.0-beta-1] > at > org.apache.iceberg.hive.MetastoreLock.(MetastoreLock.java:129) > ~[hive-iceberg-handler-4.0.0-beta-1.jar:4.0.0-beta-1] {code} > > > This was fixed by making a change to > [https://github.com/apache/hive/blob/branch-4.0.0-beta-1/iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/MetastoreLock.java#L129] > > {code:java} > -.setNameFormat("iceberg-hive-lock-heartbeat-" + > fullName + "-%d") > +.setNameFormat("iceberg-hive-lock-heartbeat-" + > fullName.replace("%", "%%") + "-%d"){code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27829) New command to display current connections on HS2 and HMS instances
[ https://issues.apache.org/jira/browse/HIVE-27829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Riju Trivedi reassigned HIVE-27829: --- Assignee: Riju Trivedi > New command to display current connections on HS2 and HMS instances > --- > > Key: HIVE-27829 > URL: https://issues.apache.org/jira/browse/HIVE-27829 > Project: Hive > Issue Type: New Feature > Components: Hive, HiveServer2, Standalone Metastore >Reporter: Taraka Rama Rao Lethavadla >Assignee: Riju Trivedi >Priority: Major > > We would need a command to list current connections to HS2/HMS instances > It could like {*}show processlist{*}(Mysql) or {*}select * from > pg_stat_activity{*}(Postgresql) or {*}show compactions{*}(Hive) to see > current connections to the Hive Server2/HMS instances > This command can help in troubleshooting issues with Hive service. One can > know the load on a given HS2/HMS instance with this command and identify > inappropriate connections to terminate them. > > We can even extend this command to show connections between an HMS instance > and backend database to troubleshoot issues between HMS and backend database -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27405) Throw out the detail error Invalid partition name to the clients
[ https://issues.apache.org/jira/browse/HIVE-27405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Butao Zhang resolved HIVE-27405. Fix Version/s: 4.1.0 Resolution: Fixed Fix has been merged into master branch. Thanks [~tarak271] for the contribution! Thanks [~okumin] for the review! > Throw out the detail error Invalid partition name to the clients > > > Key: HIVE-27405 > URL: https://issues.apache.org/jira/browse/HIVE-27405 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Taraka Rama Rao Lethavadla >Assignee: Taraka Rama Rao Lethavadla >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > > When try to MSCK TABLE, if there is a directory that doesn't match the > partition format, the query fails. However, it doesn't thrown out the detail > error information to the client. > > {noformat} > > MSCK table db_1.table_1; > Error: Error while compiling statement: FAILED: Execution Error, return code > 1 from org.apache.hadoop.hive.ql.ddl.DDLTask (state=08S01,code=1){noformat} > From the logs, we are able to see the detailed error which helps the customer > to fix the incorrect partition name withour rasing a case. > {noformat} > 2023-04-11 12:42:18,012 WARN org.apache.hadoop.hive.metastore.Msck: > [6fd73a49-32eb-48e3-907e-00a1a71979e8 HiveServer2-Handler-Pool: > Thread-30852428]: Failed to run metacheck: > org.apache.hadoop.hive.metastore.utils.MetastoreException: Invalid partition > name hdfs://warehouse/db_1.db/table_1/part{noformat} > This error has to be printed out to the client side, so that users can fix it > by themselves before checking logs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-28021) Iceberg: Attempting to create a table with a percent symbol fails
[ https://issues.apache.org/jira/browse/HIVE-28021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Butao Zhang resolved HIVE-28021. Fix Version/s: 4.1.0 Target Version/s: (was: 4.0.0) Resolution: Fixed > Iceberg: Attempting to create a table with a percent symbol fails > - > > Key: HIVE-28021 > URL: https://issues.apache.org/jira/browse/HIVE-28021 > Project: Hive > Issue Type: Bug > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: Tim Thorpe >Assignee: Tim Thorpe >Priority: Minor > Labels: pull-request-available > Fix For: 4.1.0 > > > This occurred while attempting to test creating a table > "[|]#&%_@"."[|]#&%_@" > The stack trace is as follows: > > {code:java} > java.util.UnknownFormatConversionException: Conversion = '_' > org.apache.hadoop.hive.ql.metadata.HiveException: > java.util.UnknownFormatConversionException: Conversion = '_' > at > org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1383) > ~[hive-exec-4.0.0-beta-1.jar:4.0.0-beta-1] > at > org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1388) > ~[hive-exec-4.0.0-beta-1.jar:4.0.0-beta-1] > at > org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1278) > ~[hive-exec-4.0.0-beta-1.jar:4.0.0-beta-1] > … > Caused by: java.util.UnknownFormatConversionException: Conversion = '_' > at java.util.Formatter.checkText(Formatter.java:2590) ~[?:1.8.0] > at java.util.Formatter.parse(Formatter.java:2566) ~[?:1.8.0] > at java.util.Formatter.format(Formatter.java:2512) ~[?:1.8.0] > at java.util.Formatter.format(Formatter.java:2466) ~[?:1.8.0] > at java.lang.String.format(String.java:4268) ~[?:2.9 (05-29-2023)] > at > org.apache.iceberg.relocated.com.google.common.util.concurrent.ThreadFactoryBuilder.format(ThreadFactoryBuilder.java:186) > ~[hive-iceberg-handler-4.0.0-beta-1.jar:4.0.0-beta-1] > at > org.apache.iceberg.relocated.com.google.common.util.concurrent.ThreadFactoryBuilder.setNameFormat(ThreadFactoryBuilder.java:73) > ~[hive-iceberg-handler-4.0.0-beta-1.jar:4.0.0-beta-1] > at > org.apache.iceberg.hive.MetastoreLock.(MetastoreLock.java:129) > ~[hive-iceberg-handler-4.0.0-beta-1.jar:4.0.0-beta-1] {code} > > > This was fixed by making a change to > [https://github.com/apache/hive/blob/branch-4.0.0-beta-1/iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/MetastoreLock.java#L129] > > {code:java} > -.setNameFormat("iceberg-hive-lock-heartbeat-" + > fullName + "-%d") > +.setNameFormat("iceberg-hive-lock-heartbeat-" + > fullName.replace("%", "%%") + "-%d"){code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28021) Iceberg: Attempting to create a table with a percent symbol fails
[ https://issues.apache.org/jira/browse/HIVE-28021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820092#comment-17820092 ] Butao Zhang commented on HIVE-28021: Fix has been merged into master branch. Thanks [~tthorpeca] for the contribution!!! > Iceberg: Attempting to create a table with a percent symbol fails > - > > Key: HIVE-28021 > URL: https://issues.apache.org/jira/browse/HIVE-28021 > Project: Hive > Issue Type: Bug > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: Tim Thorpe >Assignee: Tim Thorpe >Priority: Minor > Labels: pull-request-available > > This occurred while attempting to test creating a table > "[|]#&%_@"."[|]#&%_@" > The stack trace is as follows: > > {code:java} > java.util.UnknownFormatConversionException: Conversion = '_' > org.apache.hadoop.hive.ql.metadata.HiveException: > java.util.UnknownFormatConversionException: Conversion = '_' > at > org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1383) > ~[hive-exec-4.0.0-beta-1.jar:4.0.0-beta-1] > at > org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1388) > ~[hive-exec-4.0.0-beta-1.jar:4.0.0-beta-1] > at > org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1278) > ~[hive-exec-4.0.0-beta-1.jar:4.0.0-beta-1] > … > Caused by: java.util.UnknownFormatConversionException: Conversion = '_' > at java.util.Formatter.checkText(Formatter.java:2590) ~[?:1.8.0] > at java.util.Formatter.parse(Formatter.java:2566) ~[?:1.8.0] > at java.util.Formatter.format(Formatter.java:2512) ~[?:1.8.0] > at java.util.Formatter.format(Formatter.java:2466) ~[?:1.8.0] > at java.lang.String.format(String.java:4268) ~[?:2.9 (05-29-2023)] > at > org.apache.iceberg.relocated.com.google.common.util.concurrent.ThreadFactoryBuilder.format(ThreadFactoryBuilder.java:186) > ~[hive-iceberg-handler-4.0.0-beta-1.jar:4.0.0-beta-1] > at > org.apache.iceberg.relocated.com.google.common.util.concurrent.ThreadFactoryBuilder.setNameFormat(ThreadFactoryBuilder.java:73) > ~[hive-iceberg-handler-4.0.0-beta-1.jar:4.0.0-beta-1] > at > org.apache.iceberg.hive.MetastoreLock.(MetastoreLock.java:129) > ~[hive-iceberg-handler-4.0.0-beta-1.jar:4.0.0-beta-1] {code} > > > This was fixed by making a change to > [https://github.com/apache/hive/blob/branch-4.0.0-beta-1/iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/MetastoreLock.java#L129] > > {code:java} > -.setNameFormat("iceberg-hive-lock-heartbeat-" + > fullName + "-%d") > +.setNameFormat("iceberg-hive-lock-heartbeat-" + > fullName.replace("%", "%%") + "-%d"){code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28021) Iceberg: Attempting to create a table with a percent symbol fails
[ https://issues.apache.org/jira/browse/HIVE-28021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Butao Zhang updated HIVE-28021: --- Summary: Iceberg: Attempting to create a table with a percent symbol fails (was: Attempting to create a table with a percent symbol fails) > Iceberg: Attempting to create a table with a percent symbol fails > - > > Key: HIVE-28021 > URL: https://issues.apache.org/jira/browse/HIVE-28021 > Project: Hive > Issue Type: Bug > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: Tim Thorpe >Assignee: Tim Thorpe >Priority: Minor > Labels: pull-request-available > > This occurred while attempting to test creating a table > "[|]#&%_@"."[|]#&%_@" > The stack trace is as follows: > > {code:java} > java.util.UnknownFormatConversionException: Conversion = '_' > org.apache.hadoop.hive.ql.metadata.HiveException: > java.util.UnknownFormatConversionException: Conversion = '_' > at > org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1383) > ~[hive-exec-4.0.0-beta-1.jar:4.0.0-beta-1] > at > org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1388) > ~[hive-exec-4.0.0-beta-1.jar:4.0.0-beta-1] > at > org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1278) > ~[hive-exec-4.0.0-beta-1.jar:4.0.0-beta-1] > … > Caused by: java.util.UnknownFormatConversionException: Conversion = '_' > at java.util.Formatter.checkText(Formatter.java:2590) ~[?:1.8.0] > at java.util.Formatter.parse(Formatter.java:2566) ~[?:1.8.0] > at java.util.Formatter.format(Formatter.java:2512) ~[?:1.8.0] > at java.util.Formatter.format(Formatter.java:2466) ~[?:1.8.0] > at java.lang.String.format(String.java:4268) ~[?:2.9 (05-29-2023)] > at > org.apache.iceberg.relocated.com.google.common.util.concurrent.ThreadFactoryBuilder.format(ThreadFactoryBuilder.java:186) > ~[hive-iceberg-handler-4.0.0-beta-1.jar:4.0.0-beta-1] > at > org.apache.iceberg.relocated.com.google.common.util.concurrent.ThreadFactoryBuilder.setNameFormat(ThreadFactoryBuilder.java:73) > ~[hive-iceberg-handler-4.0.0-beta-1.jar:4.0.0-beta-1] > at > org.apache.iceberg.hive.MetastoreLock.(MetastoreLock.java:129) > ~[hive-iceberg-handler-4.0.0-beta-1.jar:4.0.0-beta-1] {code} > > > This was fixed by making a change to > [https://github.com/apache/hive/blob/branch-4.0.0-beta-1/iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/MetastoreLock.java#L129] > > {code:java} > -.setNameFormat("iceberg-hive-lock-heartbeat-" + > fullName + "-%d") > +.setNameFormat("iceberg-hive-lock-heartbeat-" + > fullName.replace("%", "%%") + "-%d"){code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27898) For ICEBERG partitioned table, after turning on vectorisation-read, all non-partitioned columns may will be empty(location based table).
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yongzhi.shao updated HIVE-27898: Summary: For ICEBERG partitioned table, after turning on vectorisation-read, all non-partitioned columns may will be empty(location based table). (was: For ICEBERG partitioned table, after turning on vectorisation, all non-partitioned columns may will be empty(location based table).) > For ICEBERG partitioned table, after turning on vectorisation-read, all > non-partitioned columns may will be empty(location based table). > > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Bug > Components: Iceberg integration >Reporter: yongzhi.shao >Priority: Critical > Attachments: hive-site.xml > > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 --10 rows > select * > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --10 rows > select uni_shop_id > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > select uni_shop_id > from ( > select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > --hive-orc > select uni_shop_id > from ( > select * from iceberg_dwd.trade_test > where uni_shop_id = 'TEST|1' limit 10 > ) t1;--10 ROWS{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HIVE-28088) CalcitePlanner fails to resolve column names on materialized CTEs
[ https://issues.apache.org/jira/browse/HIVE-28088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820058#comment-17820058 ] okumin edited comment on HIVE-28088 at 2/23/24 1:18 PM: || ||Hive w/ CBO||Hive w/o CBO||Spark SQL||Trino||PostgreSQL||MySQL|| |(1) Sub-query|(key, col_1)|(key, col1)|(key, key)|(key, key)|(key, key)|Error| |(2) Sub-query + top-level join|(key, col_1, key, col_1)|Error|Error|Error|Error|Error| |(3) CTE|(key, col_1)|(key, col_1)|(key, key)|(key, key)|(key, key)|Error| |(4) CTE + top-level join|(key, col_1, key, col_1)|Error|Error|Error|Error|Error| |(5) Materialized CTE|(key)|(key)|N/A|N/A|N/A|N/A| |(6) Materialized CTE + top-level join|Error|(key, key)|N/A|N/A|N/A|N/A| If we assume Spark SQL, Trino, or PostgreSQL are correct, (1), (3), and (5) should return two keys. (2), (4), and (6) should fail. was (Author: okumin): || ||Hive w/ CBO||Hive w/o CBO||Spark SQL||Trino||PostgreSQL||MySQL|| |(1) Sub-query|(key, col_1)|(key, col1)|(key, key)|(key, key)|(key, key)|Error| |(2) Sub-query + top-level join|(key, col_1, key, col_1)|Error|Error|Error|Error|Error| |(3) CTE|(key, col_1)|(key, col_1)|(key, key)|(key, key)|(key, key)|Error| |(4) CTE + top-level join|(key, col_1, key, col_1)|Error|Error|Error|Error|Error| |(5) Materialized CTE|(key)|(key)|N/A|N/A|N/A|N/A| |(6) Materialized CTE + top-level join|Error|(key, key)|N/A|N/A|N/A|N/A| If we assume Spark SQL, Trino, or PostgreSQL, (1), (3), and (5) should return two keys. (2), (4), and (6) should fail. > CalcitePlanner fails to resolve column names on materialized CTEs > - > > Key: HIVE-28088 > URL: https://issues.apache.org/jira/browse/HIVE-28088 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 4.0.0-beta-1 >Reporter: okumin >Assignee: okumin >Priority: Major > > We found a case where CBO fails when it refers to a materialized CTE. > These are queries to reproduce the issue. > {code:java} > DROP TABLE IF EXISTS test; > CREATE TABLE test (key STRING); > set hive.optimize.cte.materialize.threshold=1; > set hive.optimize.cte.materialize.full.aggregate.only=false; > EXPLAIN CBO WITH materialized_cte AS ( > SELECT a.key, b.key > FROM test a > JOIN test b ON (a.key = b.key) > ) > SELECT * > FROM materialized_cte c > JOIN materialized_cte d ON (c.key = d.key); {code} > CBO fails. > {code:java} > +--+ > | Explain | > +--+ > +--+ {code} > Error log on HiveServer2. > {code:java} > 2024-02-23T04:43:19,065 ERROR [d145a95b-8840-4d2b-9869-a186b4bd82ef > HiveServer2-Handler-Pool: Thread-555] parse.CalcitePlanner: CBO failed, > skipping CBO. > org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: Could > not resolve column name > at > org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.getPosition(RexNodeExprFactory.java:184) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > h at > org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createColumnRefExpr(RexNodeExprFactory.java:161) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createColumnRefExpr(RexNodeExprFactory.java:97) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.type.JoinCondTypeCheckProcFactory$JoinCondDefaultExprProcessor.processQualifiedColRef(JoinCondTypeCheckProcFactory.java:188) > ~[hive-exec-4.0.0-beta-2- > SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.type.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1414) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.lib.CostLessRuleDispatcher.dispatch(CostLessRuleDispatcher.java:66) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.lib.ExpressionWalker.walk(ExpressionWalker.java:101) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.type.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:231) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at >
[jira] [Commented] (HIVE-28088) CalcitePlanner fails to resolve column names on materialized CTEs
[ https://issues.apache.org/jira/browse/HIVE-28088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820058#comment-17820058 ] okumin commented on HIVE-28088: --- || ||Hive w/ CBO||Hive w/o CBO||Spark SQL||Trino||PostgreSQL||MySQL|| |(1) Sub-query|(key, col_1)|(key, col1)|(key, key)|(key, key)|(key, key)|Error| |(2) Sub-query + top-level join|(key, col_1, key, col_1)|Error|Error|Error|Error|Error| |(3) CTE|(key, col_1)|(key, col_1)|(key, key)|(key, key)|(key, key)|Error| |(4) CTE + top-level join|(key, col_1, key, col_1)|Error|Error|Error|Error|Error| |(5) Materialized CTE|(key)|(key)|N/A|N/A|N/A|N/A| |(6) Materialized CTE + top-level join|Error|(key, key)|N/A|N/A|N/A|N/A| If we assume Spark SQL, Trino, or PostgreSQL, (1), (3), and (5) should return two keys. (2), (4), and (6) should fail. > CalcitePlanner fails to resolve column names on materialized CTEs > - > > Key: HIVE-28088 > URL: https://issues.apache.org/jira/browse/HIVE-28088 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 4.0.0-beta-1 >Reporter: okumin >Assignee: okumin >Priority: Major > > We found a case where CBO fails when it refers to a materialized CTE. > These are queries to reproduce the issue. > {code:java} > DROP TABLE IF EXISTS test; > CREATE TABLE test (key STRING); > set hive.optimize.cte.materialize.threshold=1; > set hive.optimize.cte.materialize.full.aggregate.only=false; > EXPLAIN CBO WITH materialized_cte AS ( > SELECT a.key, b.key > FROM test a > JOIN test b ON (a.key = b.key) > ) > SELECT * > FROM materialized_cte c > JOIN materialized_cte d ON (c.key = d.key); {code} > CBO fails. > {code:java} > +--+ > | Explain | > +--+ > +--+ {code} > Error log on HiveServer2. > {code:java} > 2024-02-23T04:43:19,065 ERROR [d145a95b-8840-4d2b-9869-a186b4bd82ef > HiveServer2-Handler-Pool: Thread-555] parse.CalcitePlanner: CBO failed, > skipping CBO. > org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: Could > not resolve column name > at > org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.getPosition(RexNodeExprFactory.java:184) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > h at > org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createColumnRefExpr(RexNodeExprFactory.java:161) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createColumnRefExpr(RexNodeExprFactory.java:97) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.type.JoinCondTypeCheckProcFactory$JoinCondDefaultExprProcessor.processQualifiedColRef(JoinCondTypeCheckProcFactory.java:188) > ~[hive-exec-4.0.0-beta-2- > SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.type.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1414) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.lib.CostLessRuleDispatcher.dispatch(CostLessRuleDispatcher.java:66) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.lib.ExpressionWalker.walk(ExpressionWalker.java:101) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.type.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:231) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.type.RexNodeTypeCheck.genExprNodeJoinCond(RexNodeTypeCheck.java:60) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genJoinRelNode(CalcitePlanner.java:2656) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genJoinLogicalPlan(CalcitePlanner.java:2888) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5048) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at >
[jira] [Comment Edited] (HIVE-28088) CalcitePlanner fails to resolve column names on materialized CTEs
[ https://issues.apache.org/jira/browse/HIVE-28088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819917#comment-17819917 ] okumin edited comment on HIVE-28088 at 2/23/24 1:11 PM: I found (2), (4), and (6) generated different results with CBO or without CBO. Also, (3) vs (5) and (4) vs (6) are not consistent, which means CTE materialization could change a result when there are duplicated columns. || ||CBO||Non-CBO|| |(1) Sub-query|(key, col_1)|(key, col1)| |(2) Sub-query + top-level join|(key, col_1, key, col_1)|Error| |(3) CTE|(key, col_1)|(key, col_1)| |(4) CTE + top-level join|(key, col_1, key, col_1)|Error| |(5) Materialized CTE|(key)|(key)| |(6) Materialized CTE + top-level join|Error|(key, key)| Test queries. {code:java} DROP TABLE IF EXISTS test1; DROP TABLE IF EXISTS test2; CREATE TABLE test1 (key STRING); CREATE TABLE test2 (key STRING); -- (1) SELECT * FROM (SELECT a.key, b.key FROM test1 a JOIN test2 b ON (a.key = b.key)) c; -- (2) SELECT * FROM (SELECT a.key, b.key FROM test1 a JOIN test2 b ON (a.key = b.key)) c JOIN (SELECT a.key, b.key FROM test2 a JOIN test2 b ON (a.key = b.key)) d ON c.key = d.key; -- (3) WITH cte AS ( SELECT a.key, b.key FROM test1 a JOIN test2 b ON (a.key = b.key) ) SELECT * FROM cte; -- (4) WITH cte AS ( SELECT a.key, b.key FROM test1 a JOIN test2 b ON (a.key = b.key) ) SELECT * FROM cte c JOIN cte d ON (c.key = d.key); set hive.optimize.cte.materialize.threshold=1; set hive.optimize.cte.materialize.full.aggregate.only=false; -- (5) WITH materialized_cte AS ( SELECT a.key, b.key FROM test1 a JOIN test2 b ON (a.key = b.key) ) SELECT * FROM materialized_cte; -- (6) WITH materialized_cte AS ( SELECT a.key, b.key FROM test1 a JOIN test2 b ON (a.key = b.key) ) SELECT * FROM materialized_cte c JOIN materialized_cte d ON (c.key = d.key);{code} With CBO + `hive.cbo.fallback.strategy=NEVER;`. {code:java} (1) ++--+ | c.key | c._col1 | ++--+ ++--+ (2) ++--++--+ | c.key | c._col1 | d.key | d._col1 | ++--++--+ ++--++--+ (3) +--++ | cte.key | cte._col1 | +--++ +--++ (4) ++--++--+ | c.key | c._col1 | d.key | d._col1 | ++--++--+ ++--++--+ (5) +---+ | materialized_cte.key | +---+ +---+ (6) Error: Error while compiling statement: FAILED: CalciteSemanticException Could not resolve column name (state=42000,code=4){code} Without CBO. {code:java} (1) ++--+ | c.key | c._col1 | ++--+ (2) Error: Error while compiling statement: FAILED: SemanticException [Error 10007]: Ambiguous column reference key in c (state=42000,code=10007) (3) +--++ | cte.key | cte._col1 | +--++ +--++ (4) Error: Error while compiling statement: FAILED: SemanticException [Error 10007]: Ambiguous column reference key in c (state=42000,code=10007) (5) +---+ | materialized_cte.key | +---+ +---+ (6) +++ | c.key | d.key | +++ +++{code} was (Author: okumin): I found (2), (4), and (6) generated different results with CBO or without CBO. Also, (3) vs (5) and (4) vs (6) are not consistent, which means CTE materialization could change a result when there are duplicated columns. || ||CBO||Non-CBO|| |(1) Sub-query|(key, col_1)|(key, col1)| |(2) Sub-query + top-level join|(key, col_1, key, col_1)|Error| |(3) CTE|(key, col_1)|(key, col_1)| |(4) CTE + top-level join|(key, col_1, key, col_1)|Error| |(5) Materialized CTE|(key)|(key)| |(6) Materialized CTE + top-level join|Error|(key, key)| Test queries. {code:java} DROP TABLE IF EXISTS test1; DROP TABLE IF EXISTS test2; CREATE TABLE test1 (key STRING); CREATE TABLE test2 (key STRING); -- (1) SELECT * FROM (SELECT a.key, b.key FROM test1 a JOIN test2 b ON (a.key = b.key)) c; -- (2) SELECT * FROM (SELECT a.key, b.key FROM test1 a JOIN test2 b ON (a.key = b.key)) c JOIN (SELECT a.key, b.key FROM test2 a JOIN test2 b ON (a.key = b.key)) d ON c.key = d.key; -- (3) WITH cte AS ( SELECT a.key, b.key FROM test1 a JOIN test2 b ON (a.key = b.key) ) SELECT * FROM cte; -- (4) WITH cte AS ( SELECT a.key, b.key FROM test1 a JOIN test2 b ON (a.key = b.key) ) SELECT * FROM cte c JOIN cte d ON (c.key = d.key); set hive.optimize.cte.materialize.threshold=1; set hive.optimize.cte.materialize.full.aggregate.only=false; -- (5) WITH materialized_cte AS ( SELECT a.key, b.key FROM test1 a JOIN test2 b ON (a.key = b.key) ) SELECT * FROM materialized_cte; -- (6) WITH materialized_cte AS ( SELECT
[jira] [Comment Edited] (HIVE-28088) CalcitePlanner fails to resolve column names on materialized CTEs
[ https://issues.apache.org/jira/browse/HIVE-28088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820054#comment-17820054 ] okumin edited comment on HIVE-28088 at 2/23/24 1:11 PM: I tested them with various databases. Spark SQL 3.4.1. {code:java} (1) key key (2) [AMBIGUOUS_REFERENCE] Reference `c`.`key` is ambiguous, could be: [`c`.`key`, `c`.`key`].; line 4 pos 3 (3) key key (4) [AMBIGUOUS_REFERENCE] Reference `c`.`key` is ambiguous, could be: [`c`.`key`, `c`.`key`].; line 6 pos 15{code} Trino 413. {code:java} (1) key | key -+- (0 rows) (2) Query 20240223_125426_00032_p9au2 failed: line 5:4: Column 'c.key' is ambiguous (3) key | key -+- (0 rows) (4) Query 20240223_125426_00034_p9au2 failed: line 7:16: Column 'c.key' is ambiguous {code} PostgreSQL 16.2. {code:java} (1) key | key -+- (0 rows) (2) ERROR: column reference "key" is ambiguous LINE 4: ON c.key = d.key; (3) key | key -+- (0 rows) (4) ERROR: column reference "key" is ambiguous LINE 6: JOIN cte d ON (c.key = d.key);{code} MySQL 8.3. {code:java} (1) ERROR 1060 (42S21): Duplicate column name 'key' (2) ERROR 1060 (42S21): Duplicate column name 'key' (3) ERROR 1060 (42S21): Duplicate column name 'key' (4) ERROR 1060 (42S21): Duplicate column name 'key'{code} was (Author: okumin): I tested them with various databases. Spark SQL 3.4.1. {code:java} (1) key key (2) [AMBIGUOUS_REFERENCE] Reference `c`.`key` is ambiguous, could be: [`c`.`key`, `c`.`key`].; line 4 pos 3 (3) key key (4) [AMBIGUOUS_REFERENCE] Reference `c`.`key` is ambiguous, could be: [`c`.`key`, `c`.`key`].; line 6 pos 15{code} Trino 413. {code:java} (1) key | key -+- (0 rows) (2) Query 20240223_125426_00032_p9au2 failed: line 5:4: Column 'c.key' is ambiguous (3) key | key -+- (0 rows) (4) Query 20240223_125426_00034_p9au2 failed: line 7:16: Column 'c.key' is ambiguous {code} PostgreSQL 16.2. {code:java} (1) key | key -+- (0 rows) (2) ERROR: column reference "key" is ambiguous LINE 4: ON c.key = d.key; (3) key | key -+- (0 rows) (4) ERROR: column reference "key" is ambiguous LINE 6: JOIN cte d ON (c.key = d.key);{code} MySQL 8.3. {code:java} (1) ERROR 1060 (42S21): Duplicate column name 'key' (2) ERROR 1060 (42S21): Duplicate column name 'key' (3) ERROR 1060 (42S21): Duplicate column name 'key' (4) ERROR 1060 (42S21): Duplicate column name 'key'{code} > CalcitePlanner fails to resolve column names on materialized CTEs > - > > Key: HIVE-28088 > URL: https://issues.apache.org/jira/browse/HIVE-28088 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 4.0.0-beta-1 >Reporter: okumin >Assignee: okumin >Priority: Major > > We found a case where CBO fails when it refers to a materialized CTE. > These are queries to reproduce the issue. > {code:java} > DROP TABLE IF EXISTS test; > CREATE TABLE test (key STRING); > set hive.optimize.cte.materialize.threshold=1; > set hive.optimize.cte.materialize.full.aggregate.only=false; > EXPLAIN CBO WITH materialized_cte AS ( > SELECT a.key, b.key > FROM test a > JOIN test b ON (a.key = b.key) > ) > SELECT * > FROM materialized_cte c > JOIN materialized_cte d ON (c.key = d.key); {code} > CBO fails. > {code:java} > +--+ > | Explain | > +--+ > +--+ {code} > Error log on HiveServer2. > {code:java} > 2024-02-23T04:43:19,065 ERROR [d145a95b-8840-4d2b-9869-a186b4bd82ef > HiveServer2-Handler-Pool: Thread-555] parse.CalcitePlanner: CBO failed, > skipping CBO. > org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: Could > not resolve column name > at > org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.getPosition(RexNodeExprFactory.java:184) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > h at > org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createColumnRefExpr(RexNodeExprFactory.java:161) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createColumnRefExpr(RexNodeExprFactory.java:97) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.type.JoinCondTypeCheckProcFactory$JoinCondDefaultExprProcessor.processQualifiedColRef(JoinCondTypeCheckProcFactory.java:188) > ~[hive-exec-4.0.0-beta-2- > SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.type.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1414) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.lib.CostLessRuleDispatcher.dispatch(CostLessRuleDispatcher.java:66) >
[jira] [Commented] (HIVE-28088) CalcitePlanner fails to resolve column names on materialized CTEs
[ https://issues.apache.org/jira/browse/HIVE-28088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820054#comment-17820054 ] okumin commented on HIVE-28088: --- I tested them with various databases. Spark SQL 3.4.1. {code:java} (1) key key (2) [AMBIGUOUS_REFERENCE] Reference `c`.`key` is ambiguous, could be: [`c`.`key`, `c`.`key`].; line 4 pos 3 (3) key key (4) [AMBIGUOUS_REFERENCE] Reference `c`.`key` is ambiguous, could be: [`c`.`key`, `c`.`key`].; line 6 pos 15{code} Trino 413. {code:java} (1) key | key -+- (0 rows) (2) Query 20240223_125426_00032_p9au2 failed: line 5:4: Column 'c.key' is ambiguous (3) key | key -+- (0 rows) (4) Query 20240223_125426_00034_p9au2 failed: line 7:16: Column 'c.key' is ambiguous {code} PostgreSQL 16.2. {code:java} (1) key | key -+- (0 rows) (2) ERROR: column reference "key" is ambiguous LINE 4: ON c.key = d.key; (3) key | key -+- (0 rows) (4) ERROR: column reference "key" is ambiguous LINE 6: JOIN cte d ON (c.key = d.key);{code} MySQL 8.3. {code:java} (1) ERROR 1060 (42S21): Duplicate column name 'key' (2) ERROR 1060 (42S21): Duplicate column name 'key' (3) ERROR 1060 (42S21): Duplicate column name 'key' (4) ERROR 1060 (42S21): Duplicate column name 'key'{code} > CalcitePlanner fails to resolve column names on materialized CTEs > - > > Key: HIVE-28088 > URL: https://issues.apache.org/jira/browse/HIVE-28088 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 4.0.0-beta-1 >Reporter: okumin >Assignee: okumin >Priority: Major > > We found a case where CBO fails when it refers to a materialized CTE. > These are queries to reproduce the issue. > {code:java} > DROP TABLE IF EXISTS test; > CREATE TABLE test (key STRING); > set hive.optimize.cte.materialize.threshold=1; > set hive.optimize.cte.materialize.full.aggregate.only=false; > EXPLAIN CBO WITH materialized_cte AS ( > SELECT a.key, b.key > FROM test a > JOIN test b ON (a.key = b.key) > ) > SELECT * > FROM materialized_cte c > JOIN materialized_cte d ON (c.key = d.key); {code} > CBO fails. > {code:java} > +--+ > | Explain | > +--+ > +--+ {code} > Error log on HiveServer2. > {code:java} > 2024-02-23T04:43:19,065 ERROR [d145a95b-8840-4d2b-9869-a186b4bd82ef > HiveServer2-Handler-Pool: Thread-555] parse.CalcitePlanner: CBO failed, > skipping CBO. > org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: Could > not resolve column name > at > org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.getPosition(RexNodeExprFactory.java:184) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > h at > org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createColumnRefExpr(RexNodeExprFactory.java:161) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createColumnRefExpr(RexNodeExprFactory.java:97) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.type.JoinCondTypeCheckProcFactory$JoinCondDefaultExprProcessor.processQualifiedColRef(JoinCondTypeCheckProcFactory.java:188) > ~[hive-exec-4.0.0-beta-2- > SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.type.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1414) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.lib.CostLessRuleDispatcher.dispatch(CostLessRuleDispatcher.java:66) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.lib.ExpressionWalker.walk(ExpressionWalker.java:101) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.type.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:231) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.type.RexNodeTypeCheck.genExprNodeJoinCond(RexNodeTypeCheck.java:60) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at >
[jira] [Updated] (HIVE-27898) For ICEBERG partitioned table, after turning on vectorisation, all non-partitioned columns may will be empty(location based table).
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yongzhi.shao updated HIVE-27898: Summary: For ICEBERG partitioned table, after turning on vectorisation, all non-partitioned columns may will be empty(location based table). (was: For ICEBERG partitioned table, after turning on vectorisation, all non-partitioned columns may will be empty.) > For ICEBERG partitioned table, after turning on vectorisation, all > non-partitioned columns may will be empty(location based table). > --- > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Bug > Components: Iceberg integration >Reporter: yongzhi.shao >Priority: Critical > Attachments: hive-site.xml > > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 --10 rows > select * > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --10 rows > select uni_shop_id > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > select uni_shop_id > from ( > select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > --hive-orc > select uni_shop_id > from ( > select * from iceberg_dwd.trade_test > where uni_shop_id = 'TEST|1' limit 10 > ) t1;--10 ROWS{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27898) For ICEBERG partitioned table, after turning on vectorisation, all non-partitioned columns may will be empty.
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yongzhi.shao updated HIVE-27898: Summary: For ICEBERG partitioned table, after turning on vectorisation, all non-partitioned columns may will be empty. (was: For ICEBERG partitioned table, after turning on vectorisation, all non-partitioned columns will be empty.) > For ICEBERG partitioned table, after turning on vectorisation, all > non-partitioned columns may will be empty. > - > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Bug > Components: Iceberg integration >Reporter: yongzhi.shao >Priority: Critical > Attachments: hive-site.xml > > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 --10 rows > select * > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --10 rows > select uni_shop_id > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > select uni_shop_id > from ( > select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > --hive-orc > select uni_shop_id > from ( > select * from iceberg_dwd.trade_test > where uni_shop_id = 'TEST|1' limit 10 > ) t1;--10 ROWS{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27898) For ICEBERG partitioned table, after turning on vectorisation, all non-partitioned columns will be empty.
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yongzhi.shao updated HIVE-27898: Summary: For ICEBERG partitioned table, after turning on vectorisation, all non-partitioned columns will be empty. (was: When CBO is disabled, HIVE cannot query ICEBERG tables in nested subqueries properly) > For ICEBERG partitioned table, after turning on vectorisation, all > non-partitioned columns will be empty. > - > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Bug > Components: Iceberg integration >Reporter: yongzhi.shao >Priority: Critical > Attachments: hive-site.xml > > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 --10 rows > select * > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --10 rows > select uni_shop_id > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > select uni_shop_id > from ( > select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > --hive-orc > select uni_shop_id > from ( > select * from iceberg_dwd.trade_test > where uni_shop_id = 'TEST|1' limit 10 > ) t1;--10 ROWS{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27556) Add Unit Test for KafkaStorageHandlerInfo
[ https://issues.apache.org/jira/browse/HIVE-27556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-27556: Fix Version/s: 4.0.0 > Add Unit Test for KafkaStorageHandlerInfo > - > > Key: HIVE-27556 > URL: https://issues.apache.org/jira/browse/HIVE-27556 > Project: Hive > Issue Type: Test > Components: kafka integration, StorageHandler >Reporter: Kokila N >Assignee: Kokila N >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Adding unit tests for KafkaStorageHandlerInfo. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27556) Add Unit Test for KafkaStorageHandlerInfo
[ https://issues.apache.org/jira/browse/HIVE-27556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820008#comment-17820008 ] László Bodor commented on HIVE-27556: - merged to master, thanks [~kokila19] for the patch and [~akshatm] for the review! > Add Unit Test for KafkaStorageHandlerInfo > - > > Key: HIVE-27556 > URL: https://issues.apache.org/jira/browse/HIVE-27556 > Project: Hive > Issue Type: Test > Components: kafka integration, StorageHandler >Reporter: Kokila N >Assignee: Kokila N >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Adding unit tests for KafkaStorageHandlerInfo. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27556) Add Unit Test for KafkaStorageHandlerInfo
[ https://issues.apache.org/jira/browse/HIVE-27556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor resolved HIVE-27556. - Resolution: Fixed > Add Unit Test for KafkaStorageHandlerInfo > - > > Key: HIVE-27556 > URL: https://issues.apache.org/jira/browse/HIVE-27556 > Project: Hive > Issue Type: Test > Components: kafka integration, StorageHandler >Reporter: Kokila N >Assignee: Kokila N >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Adding unit tests for KafkaStorageHandlerInfo. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27692) Explore removing the always task from embedded HMS
[ https://issues.apache.org/jira/browse/HIVE-27692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihua Deng resolved HIVE-27692. Fix Version/s: 4.0.0 Resolution: Fixed Fix has been merged. Thank you Naveen for the review! > Explore removing the always task from embedded HMS > -- > > Key: HIVE-27692 > URL: https://issues.apache.org/jira/browse/HIVE-27692 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > The always tasks are running in the leader HMS now, the properties for > configuring the leader should only belong to HMS, other engines such as > Spark/Impala doesn't need to know these properties. For most cases, the > engine only cares about the properties for connecting HMS, e.g, > hive.metastore.uris. > Every time when a new apps uses an embedded Metastore, it will start the HMS > always tasks by default. Imaging we have hundreds of apps, then hundreds of > pieces of the same tasks are running, this will put extra burden to the > underlying databases, such as the flooding queries, connection limit. > I think we can remove always tasks from the embeded Metastore, the always > task will be taken care of by the standalone Metastore, as a standalone > Metastore should be here in production environment. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28086) Clean up older version staging-dir when versioning is enabled at storage like s3
[ https://issues.apache.org/jira/browse/HIVE-28086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819984#comment-17819984 ] Butao Zhang commented on HIVE-28086: I have tested the *hive.exec.stagingdir,* and i think it can be set dynamically. *Loading data to table testdb.teststage from hdfs://127.0.0.1:8028/tmp/hive/.hive-staging_hive_2024-02-23_17-05-18_098_5916756629075483734-4/-ext-1* {code:java} 0: jdbc:hive2://127.0.0.1:1/default> create table teststage(id int); 0: jdbc:hive2://127.0.0.1:1/default> set hive.exec.stagingdir=/tmp/hive/.hive-staging; 0: jdbc:hive2://127.0.0.1:10004/default> insert into teststage values(123); ... INFO : Starting task [Stage-2:DEPENDENCY_COLLECTION] in serial mode INFO : Starting task [Stage-0:MOVE] in serial mode INFO : Loading data to table testdb.teststage from hdfs://127.0.0.1:8028/tmp/hive/.hive-staging_hive_2024-02-23_17-05-18_098_5916756629075483734-4/-ext-1 INFO : Completed executing command(queryId=hive_20240223170518_68ee2a14-3268-4b34-bd01-ed0fe48a02ea); Time taken: 5.977 seconds 1 row affected (6.535 seconds) {code} > Clean up older version staging-dir when versioning is enabled at storage like > s3 > > > Key: HIVE-28086 > URL: https://issues.apache.org/jira/browse/HIVE-28086 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Taraka Rama Rao Lethavadla >Priority: Major > > When running Hive using AWS S3 Storage *with versioning* for managed / > external table directories, a staging directory is created as per > {{hive.exec.stagingdir=.hive-staging}} under the table location. > The directory is deleted after the job is completed > AWS S3 offers an option to enable versioning. When enabled, the Hive staging > directories will be "deleted" but a copy will be kept. This requires manual > cleanup and over time the effort to remove these directories will be too much > of a work > For Spark jobs, this is easily worked out by setting: > {noformat} > spark.hadoop.hive.exec.stagingdir=s3a:///tmp/hive/.hive-staging > {noformat} > However that is not an option in Hive because {{hive.exec.stagingdir}} is > meant to be a relative path to the table directory > Options to solve: > It would be helpful to allow the staging directory to be configurable like > these properties: > {noformat} > hive.exec.scratchdir=/tmp/hive > hive.exec.local.scratchdir=/tmp/hive > tez.staging-dir=/tmp/${user.name}/staging > {noformat} > That will allow customers to configure a location without versioning and > avoid this usability issue. > or > use Storage API(eg: s3) to delete the old version of staging along with > actual directory > h4. -- This message was sent by Atlassian Jira (v8.20.10#820010)