[jira] [Updated] (HIVE-28090) correct desc of `hive.metastore.disallow.incompatible.col.type.changes`

2024-02-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28090:
--
Labels: pull-request-available  (was: )

> correct desc of `hive.metastore.disallow.incompatible.col.type.changes`
> ---
>
> Key: HIVE-28090
> URL: https://issues.apache.org/jira/browse/HIVE-28090
> Project: Hive
>  Issue Type: Improvement
>Reporter: ming95
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28090) correct desc of `hive.metastore.disallow.incompatible.col.type.changes`

2024-02-23 Thread zzzzming95 (Jira)
ming95 created HIVE-28090:
-

 Summary: correct desc of 
`hive.metastore.disallow.incompatible.col.type.changes`
 Key: HIVE-28090
 URL: https://issues.apache.org/jira/browse/HIVE-28090
 Project: Hive
  Issue Type: Improvement
Reporter: ming95






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28089) Adding feature about MSCK/Analyze commands showing a warning in console for Small files.

2024-02-23 Thread Hongdan Zhu (Jira)
Hongdan Zhu created HIVE-28089:
--

 Summary: Adding feature about MSCK/Analyze commands showing a 
warning in console for Small files.
 Key: HIVE-28089
 URL: https://issues.apache.org/jira/browse/HIVE-28089
 Project: Hive
  Issue Type: Improvement
Reporter: Hongdan Zhu


Add warning when 
MSCK/Analyze commands can show a warning in console for table/partition if 
(totalSize / numFiles) is less than avgFileSize



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-28089) Adding feature about MSCK/Analyze commands showing a warning in console for Small files.

2024-02-23 Thread Hongdan Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongdan Zhu reassigned HIVE-28089:
--

Assignee: Hongdan Zhu

> Adding feature about MSCK/Analyze commands showing a warning in console for 
> Small files.
> 
>
> Key: HIVE-28089
> URL: https://issues.apache.org/jira/browse/HIVE-28089
> Project: Hive
>  Issue Type: Improvement
>Reporter: Hongdan Zhu
>Assignee: Hongdan Zhu
>Priority: Major
>
> Add warning when 
> MSCK/Analyze commands can show a warning in console for table/partition if 
> (totalSize / numFiles) is less than avgFileSize



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28021) Iceberg: Attempting to create a table with a percent symbol fails

2024-02-23 Thread Tim Thorpe (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820123#comment-17820123
 ] 

Tim Thorpe commented on HIVE-28021:
---

Thanks [~zhangbutao] for merging it!

> Iceberg: Attempting to create a table with a percent symbol fails
> -
>
> Key: HIVE-28021
> URL: https://issues.apache.org/jira/browse/HIVE-28021
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: Tim Thorpe
>Assignee: Tim Thorpe
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> This occurred while attempting to test creating a table 
> "[|]#&%_@"."[|]#&%_@"
> The stack trace is as follows:
>  
> {code:java}
> java.util.UnknownFormatConversionException: Conversion = '_'
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.util.UnknownFormatConversionException: Conversion = '_'
>         at 
> org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1383) 
> ~[hive-exec-4.0.0-beta-1.jar:4.0.0-beta-1]
>         at 
> org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1388) 
> ~[hive-exec-4.0.0-beta-1.jar:4.0.0-beta-1]
>         at 
> org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1278) 
> ~[hive-exec-4.0.0-beta-1.jar:4.0.0-beta-1]
>         …
> Caused by: java.util.UnknownFormatConversionException: Conversion = '_'
>         at java.util.Formatter.checkText(Formatter.java:2590) ~[?:1.8.0]
>         at java.util.Formatter.parse(Formatter.java:2566) ~[?:1.8.0]
>         at java.util.Formatter.format(Formatter.java:2512) ~[?:1.8.0]
>         at java.util.Formatter.format(Formatter.java:2466) ~[?:1.8.0]
>         at java.lang.String.format(String.java:4268) ~[?:2.9 (05-29-2023)]
>         at 
> org.apache.iceberg.relocated.com.google.common.util.concurrent.ThreadFactoryBuilder.format(ThreadFactoryBuilder.java:186)
>  ~[hive-iceberg-handler-4.0.0-beta-1.jar:4.0.0-beta-1]
>         at 
> org.apache.iceberg.relocated.com.google.common.util.concurrent.ThreadFactoryBuilder.setNameFormat(ThreadFactoryBuilder.java:73)
>  ~[hive-iceberg-handler-4.0.0-beta-1.jar:4.0.0-beta-1]
>         at 
> org.apache.iceberg.hive.MetastoreLock.(MetastoreLock.java:129) 
> ~[hive-iceberg-handler-4.0.0-beta-1.jar:4.0.0-beta-1] {code}
>  
>  
> This was fixed by making a change to 
> [https://github.com/apache/hive/blob/branch-4.0.0-beta-1/iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/MetastoreLock.java#L129]
>  
> {code:java}
> -.setNameFormat("iceberg-hive-lock-heartbeat-" + 
> fullName + "-%d")
> +.setNameFormat("iceberg-hive-lock-heartbeat-" + 
> fullName.replace("%", "%%") + "-%d"){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27829) New command to display current connections on HS2 and HMS instances

2024-02-23 Thread Riju Trivedi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riju Trivedi reassigned HIVE-27829:
---

Assignee: Riju Trivedi

> New command to display current connections on HS2 and HMS instances
> ---
>
> Key: HIVE-27829
> URL: https://issues.apache.org/jira/browse/HIVE-27829
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive, HiveServer2, Standalone Metastore
>Reporter: Taraka Rama Rao Lethavadla
>Assignee: Riju Trivedi
>Priority: Major
>
> We would need a command to list current connections to HS2/HMS instances
> It could like {*}show processlist{*}(Mysql) or {*}select * from 
> pg_stat_activity{*}(Postgresql) or {*}show compactions{*}(Hive) to see 
> current connections to the Hive Server2/HMS instances
> This command can help in troubleshooting issues with Hive service. One can 
> know the load on a given HS2/HMS instance with this command and identify 
> inappropriate connections to terminate them.
>  
> We can even extend this command to show connections between an HMS instance 
> and backend database to troubleshoot issues between HMS and backend database



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27405) Throw out the detail error Invalid partition name to the clients

2024-02-23 Thread Butao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Butao Zhang resolved HIVE-27405.

Fix Version/s: 4.1.0
   Resolution: Fixed

Fix has been merged into master branch.

Thanks [~tarak271] for the contribution!

Thanks [~okumin] for the review!

> Throw out the detail error Invalid partition name to the clients
> 
>
> Key: HIVE-27405
> URL: https://issues.apache.org/jira/browse/HIVE-27405
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Taraka Rama Rao Lethavadla
>Assignee: Taraka Rama Rao Lethavadla
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> When try to MSCK TABLE, if there is a directory that doesn't match the 
> partition format, the query fails. However, it doesn't thrown out the detail 
> error information to the client.
>  
> {noformat}
> > MSCK table db_1.table_1;
> Error: Error while compiling statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.ddl.DDLTask (state=08S01,code=1){noformat}
> From the logs, we are able to see the detailed error which helps the customer 
> to fix the incorrect partition name withour rasing a case.
> {noformat}
> 2023-04-11 12:42:18,012 WARN org.apache.hadoop.hive.metastore.Msck: 
> [6fd73a49-32eb-48e3-907e-00a1a71979e8 HiveServer2-Handler-Pool: 
> Thread-30852428]: Failed to run metacheck:
> org.apache.hadoop.hive.metastore.utils.MetastoreException: Invalid partition 
> name hdfs://warehouse/db_1.db/table_1/part{noformat}
> This error has to be printed out to the client side, so that users can fix it 
> by themselves before checking logs. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28021) Iceberg: Attempting to create a table with a percent symbol fails

2024-02-23 Thread Butao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Butao Zhang resolved HIVE-28021.

   Fix Version/s: 4.1.0
Target Version/s:   (was: 4.0.0)
  Resolution: Fixed

> Iceberg: Attempting to create a table with a percent symbol fails
> -
>
> Key: HIVE-28021
> URL: https://issues.apache.org/jira/browse/HIVE-28021
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: Tim Thorpe
>Assignee: Tim Thorpe
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> This occurred while attempting to test creating a table 
> "[|]#&%_@"."[|]#&%_@"
> The stack trace is as follows:
>  
> {code:java}
> java.util.UnknownFormatConversionException: Conversion = '_'
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.util.UnknownFormatConversionException: Conversion = '_'
>         at 
> org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1383) 
> ~[hive-exec-4.0.0-beta-1.jar:4.0.0-beta-1]
>         at 
> org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1388) 
> ~[hive-exec-4.0.0-beta-1.jar:4.0.0-beta-1]
>         at 
> org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1278) 
> ~[hive-exec-4.0.0-beta-1.jar:4.0.0-beta-1]
>         …
> Caused by: java.util.UnknownFormatConversionException: Conversion = '_'
>         at java.util.Formatter.checkText(Formatter.java:2590) ~[?:1.8.0]
>         at java.util.Formatter.parse(Formatter.java:2566) ~[?:1.8.0]
>         at java.util.Formatter.format(Formatter.java:2512) ~[?:1.8.0]
>         at java.util.Formatter.format(Formatter.java:2466) ~[?:1.8.0]
>         at java.lang.String.format(String.java:4268) ~[?:2.9 (05-29-2023)]
>         at 
> org.apache.iceberg.relocated.com.google.common.util.concurrent.ThreadFactoryBuilder.format(ThreadFactoryBuilder.java:186)
>  ~[hive-iceberg-handler-4.0.0-beta-1.jar:4.0.0-beta-1]
>         at 
> org.apache.iceberg.relocated.com.google.common.util.concurrent.ThreadFactoryBuilder.setNameFormat(ThreadFactoryBuilder.java:73)
>  ~[hive-iceberg-handler-4.0.0-beta-1.jar:4.0.0-beta-1]
>         at 
> org.apache.iceberg.hive.MetastoreLock.(MetastoreLock.java:129) 
> ~[hive-iceberg-handler-4.0.0-beta-1.jar:4.0.0-beta-1] {code}
>  
>  
> This was fixed by making a change to 
> [https://github.com/apache/hive/blob/branch-4.0.0-beta-1/iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/MetastoreLock.java#L129]
>  
> {code:java}
> -.setNameFormat("iceberg-hive-lock-heartbeat-" + 
> fullName + "-%d")
> +.setNameFormat("iceberg-hive-lock-heartbeat-" + 
> fullName.replace("%", "%%") + "-%d"){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28021) Iceberg: Attempting to create a table with a percent symbol fails

2024-02-23 Thread Butao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820092#comment-17820092
 ] 

Butao Zhang commented on HIVE-28021:


Fix has been merged into master branch.

Thanks [~tthorpeca] for the contribution!!!

> Iceberg: Attempting to create a table with a percent symbol fails
> -
>
> Key: HIVE-28021
> URL: https://issues.apache.org/jira/browse/HIVE-28021
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: Tim Thorpe
>Assignee: Tim Thorpe
>Priority: Minor
>  Labels: pull-request-available
>
> This occurred while attempting to test creating a table 
> "[|]#&%_@"."[|]#&%_@"
> The stack trace is as follows:
>  
> {code:java}
> java.util.UnknownFormatConversionException: Conversion = '_'
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.util.UnknownFormatConversionException: Conversion = '_'
>         at 
> org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1383) 
> ~[hive-exec-4.0.0-beta-1.jar:4.0.0-beta-1]
>         at 
> org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1388) 
> ~[hive-exec-4.0.0-beta-1.jar:4.0.0-beta-1]
>         at 
> org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1278) 
> ~[hive-exec-4.0.0-beta-1.jar:4.0.0-beta-1]
>         …
> Caused by: java.util.UnknownFormatConversionException: Conversion = '_'
>         at java.util.Formatter.checkText(Formatter.java:2590) ~[?:1.8.0]
>         at java.util.Formatter.parse(Formatter.java:2566) ~[?:1.8.0]
>         at java.util.Formatter.format(Formatter.java:2512) ~[?:1.8.0]
>         at java.util.Formatter.format(Formatter.java:2466) ~[?:1.8.0]
>         at java.lang.String.format(String.java:4268) ~[?:2.9 (05-29-2023)]
>         at 
> org.apache.iceberg.relocated.com.google.common.util.concurrent.ThreadFactoryBuilder.format(ThreadFactoryBuilder.java:186)
>  ~[hive-iceberg-handler-4.0.0-beta-1.jar:4.0.0-beta-1]
>         at 
> org.apache.iceberg.relocated.com.google.common.util.concurrent.ThreadFactoryBuilder.setNameFormat(ThreadFactoryBuilder.java:73)
>  ~[hive-iceberg-handler-4.0.0-beta-1.jar:4.0.0-beta-1]
>         at 
> org.apache.iceberg.hive.MetastoreLock.(MetastoreLock.java:129) 
> ~[hive-iceberg-handler-4.0.0-beta-1.jar:4.0.0-beta-1] {code}
>  
>  
> This was fixed by making a change to 
> [https://github.com/apache/hive/blob/branch-4.0.0-beta-1/iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/MetastoreLock.java#L129]
>  
> {code:java}
> -.setNameFormat("iceberg-hive-lock-heartbeat-" + 
> fullName + "-%d")
> +.setNameFormat("iceberg-hive-lock-heartbeat-" + 
> fullName.replace("%", "%%") + "-%d"){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28021) Iceberg: Attempting to create a table with a percent symbol fails

2024-02-23 Thread Butao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Butao Zhang updated HIVE-28021:
---
Summary: Iceberg: Attempting to create a table with a percent symbol fails  
(was: Attempting to create a table with a percent symbol fails)

> Iceberg: Attempting to create a table with a percent symbol fails
> -
>
> Key: HIVE-28021
> URL: https://issues.apache.org/jira/browse/HIVE-28021
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: Tim Thorpe
>Assignee: Tim Thorpe
>Priority: Minor
>  Labels: pull-request-available
>
> This occurred while attempting to test creating a table 
> "[|]#&%_@"."[|]#&%_@"
> The stack trace is as follows:
>  
> {code:java}
> java.util.UnknownFormatConversionException: Conversion = '_'
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.util.UnknownFormatConversionException: Conversion = '_'
>         at 
> org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1383) 
> ~[hive-exec-4.0.0-beta-1.jar:4.0.0-beta-1]
>         at 
> org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1388) 
> ~[hive-exec-4.0.0-beta-1.jar:4.0.0-beta-1]
>         at 
> org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1278) 
> ~[hive-exec-4.0.0-beta-1.jar:4.0.0-beta-1]
>         …
> Caused by: java.util.UnknownFormatConversionException: Conversion = '_'
>         at java.util.Formatter.checkText(Formatter.java:2590) ~[?:1.8.0]
>         at java.util.Formatter.parse(Formatter.java:2566) ~[?:1.8.0]
>         at java.util.Formatter.format(Formatter.java:2512) ~[?:1.8.0]
>         at java.util.Formatter.format(Formatter.java:2466) ~[?:1.8.0]
>         at java.lang.String.format(String.java:4268) ~[?:2.9 (05-29-2023)]
>         at 
> org.apache.iceberg.relocated.com.google.common.util.concurrent.ThreadFactoryBuilder.format(ThreadFactoryBuilder.java:186)
>  ~[hive-iceberg-handler-4.0.0-beta-1.jar:4.0.0-beta-1]
>         at 
> org.apache.iceberg.relocated.com.google.common.util.concurrent.ThreadFactoryBuilder.setNameFormat(ThreadFactoryBuilder.java:73)
>  ~[hive-iceberg-handler-4.0.0-beta-1.jar:4.0.0-beta-1]
>         at 
> org.apache.iceberg.hive.MetastoreLock.(MetastoreLock.java:129) 
> ~[hive-iceberg-handler-4.0.0-beta-1.jar:4.0.0-beta-1] {code}
>  
>  
> This was fixed by making a change to 
> [https://github.com/apache/hive/blob/branch-4.0.0-beta-1/iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/MetastoreLock.java#L129]
>  
> {code:java}
> -.setNameFormat("iceberg-hive-lock-heartbeat-" + 
> fullName + "-%d")
> +.setNameFormat("iceberg-hive-lock-heartbeat-" + 
> fullName.replace("%", "%%") + "-%d"){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27898) For ICEBERG partitioned table, after turning on vectorisation-read, all non-partitioned columns may will be empty(location based table).

2024-02-23 Thread yongzhi.shao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yongzhi.shao updated HIVE-27898:

Summary: For ICEBERG partitioned table, after turning on 
vectorisation-read, all non-partitioned columns may will be empty(location 
based table).  (was: For ICEBERG partitioned table, after turning on 
vectorisation, all non-partitioned columns may will be empty(location based 
table).)

> For ICEBERG partitioned table, after turning on vectorisation-read, all 
> non-partitioned columns may will be empty(location based table).
> 
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Reporter: yongzhi.shao
>Priority: Critical
> Attachments: hive-site.xml
>
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> --hive-orc
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.trade_test 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;--10 ROWS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-28088) CalcitePlanner fails to resolve column names on materialized CTEs

2024-02-23 Thread okumin (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820058#comment-17820058
 ] 

okumin edited comment on HIVE-28088 at 2/23/24 1:18 PM:


|| ||Hive w/ CBO||Hive w/o CBO||Spark SQL||Trino||PostgreSQL||MySQL||
|(1) Sub-query|(key, col_1)|(key, col1)|(key, key)|(key, key)|(key, key)|Error|
|(2) Sub-query + top-level join|(key, col_1, key, 
col_1)|Error|Error|Error|Error|Error|
|(3) CTE|(key, col_1)|(key, col_1)|(key, key)|(key, key)|(key, key)|Error|
|(4) CTE + top-level join|(key, col_1, key, 
col_1)|Error|Error|Error|Error|Error|
|(5) Materialized CTE|(key)|(key)|N/A|N/A|N/A|N/A|
|(6) Materialized CTE + top-level join|Error|(key, key)|N/A|N/A|N/A|N/A|

If we assume Spark SQL, Trino, or PostgreSQL are correct, (1), (3), and (5) 
should return two keys. (2), (4), and (6) should fail.


was (Author: okumin):
|| ||Hive w/ CBO||Hive w/o CBO||Spark SQL||Trino||PostgreSQL||MySQL||
|(1) Sub-query|(key, col_1)|(key, col1)|(key, key)|(key, key)|(key, key)|Error|
|(2) Sub-query + top-level join|(key, col_1, key, 
col_1)|Error|Error|Error|Error|Error|
|(3) CTE|(key, col_1)|(key, col_1)|(key, key)|(key, key)|(key, key)|Error|
|(4) CTE + top-level join|(key, col_1, key, 
col_1)|Error|Error|Error|Error|Error|
|(5) Materialized CTE|(key)|(key)|N/A|N/A|N/A|N/A|
|(6) Materialized CTE + top-level join|Error|(key, key)|N/A|N/A|N/A|N/A|

If we assume Spark SQL, Trino, or PostgreSQL, (1), (3), and (5) should return 
two keys. (2), (4), and (6) should fail.

> CalcitePlanner fails to resolve column names on materialized CTEs
> -
>
> Key: HIVE-28088
> URL: https://issues.apache.org/jira/browse/HIVE-28088
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-beta-1
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>
> We found a case where CBO fails when it refers to a materialized CTE.
> These are queries to reproduce the issue.
> {code:java}
> DROP TABLE IF EXISTS test;
> CREATE TABLE test (key STRING);
> set hive.optimize.cte.materialize.threshold=1;
> set hive.optimize.cte.materialize.full.aggregate.only=false;
> EXPLAIN CBO WITH materialized_cte AS (
>   SELECT a.key, b.key
>   FROM test a
>   JOIN test b ON (a.key = b.key)
> )
> SELECT *
> FROM materialized_cte c
> JOIN materialized_cte d ON (c.key = d.key); {code}
> CBO fails.
> {code:java}
> +--+
> | Explain  |
> +--+
> +--+ {code}
> Error log on HiveServer2.
> {code:java}
> 2024-02-23T04:43:19,065 ERROR [d145a95b-8840-4d2b-9869-a186b4bd82ef 
> HiveServer2-Handler-Pool: Thread-555] parse.CalcitePlanner: CBO failed, 
> skipping CBO. 
> org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: Could 
> not resolve column name
>      at 
> org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.getPosition(RexNodeExprFactory.java:184)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> h     at 
> org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createColumnRefExpr(RexNodeExprFactory.java:161)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createColumnRefExpr(RexNodeExprFactory.java:97)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.parse.type.JoinCondTypeCheckProcFactory$JoinCondDefaultExprProcessor.processQualifiedColRef(JoinCondTypeCheckProcFactory.java:188)
>  ~[hive-exec-4.0.0-beta-2-
> SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.parse.type.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1414)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.lib.CostLessRuleDispatcher.dispatch(CostLessRuleDispatcher.java:66)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.lib.ExpressionWalker.walk(ExpressionWalker.java:101)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.parse.type.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:231)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> 

[jira] [Commented] (HIVE-28088) CalcitePlanner fails to resolve column names on materialized CTEs

2024-02-23 Thread okumin (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820058#comment-17820058
 ] 

okumin commented on HIVE-28088:
---

|| ||Hive w/ CBO||Hive w/o CBO||Spark SQL||Trino||PostgreSQL||MySQL||
|(1) Sub-query|(key, col_1)|(key, col1)|(key, key)|(key, key)|(key, key)|Error|
|(2) Sub-query + top-level join|(key, col_1, key, 
col_1)|Error|Error|Error|Error|Error|
|(3) CTE|(key, col_1)|(key, col_1)|(key, key)|(key, key)|(key, key)|Error|
|(4) CTE + top-level join|(key, col_1, key, 
col_1)|Error|Error|Error|Error|Error|
|(5) Materialized CTE|(key)|(key)|N/A|N/A|N/A|N/A|
|(6) Materialized CTE + top-level join|Error|(key, key)|N/A|N/A|N/A|N/A|

If we assume Spark SQL, Trino, or PostgreSQL, (1), (3), and (5) should return 
two keys. (2), (4), and (6) should fail.

> CalcitePlanner fails to resolve column names on materialized CTEs
> -
>
> Key: HIVE-28088
> URL: https://issues.apache.org/jira/browse/HIVE-28088
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-beta-1
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>
> We found a case where CBO fails when it refers to a materialized CTE.
> These are queries to reproduce the issue.
> {code:java}
> DROP TABLE IF EXISTS test;
> CREATE TABLE test (key STRING);
> set hive.optimize.cte.materialize.threshold=1;
> set hive.optimize.cte.materialize.full.aggregate.only=false;
> EXPLAIN CBO WITH materialized_cte AS (
>   SELECT a.key, b.key
>   FROM test a
>   JOIN test b ON (a.key = b.key)
> )
> SELECT *
> FROM materialized_cte c
> JOIN materialized_cte d ON (c.key = d.key); {code}
> CBO fails.
> {code:java}
> +--+
> | Explain  |
> +--+
> +--+ {code}
> Error log on HiveServer2.
> {code:java}
> 2024-02-23T04:43:19,065 ERROR [d145a95b-8840-4d2b-9869-a186b4bd82ef 
> HiveServer2-Handler-Pool: Thread-555] parse.CalcitePlanner: CBO failed, 
> skipping CBO. 
> org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: Could 
> not resolve column name
>      at 
> org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.getPosition(RexNodeExprFactory.java:184)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> h     at 
> org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createColumnRefExpr(RexNodeExprFactory.java:161)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createColumnRefExpr(RexNodeExprFactory.java:97)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.parse.type.JoinCondTypeCheckProcFactory$JoinCondDefaultExprProcessor.processQualifiedColRef(JoinCondTypeCheckProcFactory.java:188)
>  ~[hive-exec-4.0.0-beta-2-
> SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.parse.type.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1414)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.lib.CostLessRuleDispatcher.dispatch(CostLessRuleDispatcher.java:66)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.lib.ExpressionWalker.walk(ExpressionWalker.java:101)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.parse.type.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:231)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.parse.type.RexNodeTypeCheck.genExprNodeJoinCond(RexNodeTypeCheck.java:60)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genJoinRelNode(CalcitePlanner.java:2656)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genJoinLogicalPlan(CalcitePlanner.java:2888)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5048)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> 

[jira] [Comment Edited] (HIVE-28088) CalcitePlanner fails to resolve column names on materialized CTEs

2024-02-23 Thread okumin (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819917#comment-17819917
 ] 

okumin edited comment on HIVE-28088 at 2/23/24 1:11 PM:


I found (2), (4), and (6) generated different results with CBO or without CBO.

Also, (3) vs (5) and (4) vs (6) are not consistent, which means CTE 
materialization could change a result when there are duplicated columns.
|| ||CBO||Non-CBO||
|(1) Sub-query|(key, col_1)|(key, col1)|
|(2) Sub-query + top-level join|(key, col_1, key, col_1)|Error|
|(3) CTE|(key, col_1)|(key, col_1)|
|(4) CTE + top-level join|(key, col_1, key, col_1)|Error|
|(5) Materialized CTE|(key)|(key)|
|(6) Materialized CTE + top-level join|Error|(key, key)|

Test queries.
{code:java}
DROP TABLE IF EXISTS test1;
DROP TABLE IF EXISTS test2;
CREATE TABLE test1 (key STRING);
CREATE TABLE test2 (key STRING);

-- (1)
SELECT *
FROM (SELECT a.key, b.key FROM test1 a JOIN test2 b ON (a.key = b.key)) c;

-- (2)
SELECT *
FROM (SELECT a.key, b.key FROM test1 a JOIN test2 b ON (a.key = b.key)) c
JOIN (SELECT a.key, b.key FROM test2 a JOIN test2 b ON (a.key = b.key)) d
ON c.key = d.key;

-- (3)
WITH cte AS (
  SELECT a.key, b.key FROM test1 a JOIN test2 b ON (a.key = b.key)
)
SELECT * FROM cte;

-- (4)
WITH cte AS (
  SELECT a.key, b.key FROM test1 a JOIN test2 b ON (a.key = b.key)
)
SELECT *
FROM cte c
JOIN cte d ON (c.key = d.key);

set hive.optimize.cte.materialize.threshold=1;
set hive.optimize.cte.materialize.full.aggregate.only=false;

-- (5)
WITH materialized_cte AS (
  SELECT a.key, b.key FROM test1 a JOIN test2 b ON (a.key = b.key)
)
SELECT * FROM materialized_cte;

-- (6)
WITH materialized_cte AS (
  SELECT a.key, b.key FROM test1 a JOIN test2 b ON (a.key = b.key)
)
SELECT *
FROM materialized_cte c
JOIN materialized_cte d ON (c.key = d.key);{code}
With CBO + `hive.cbo.fallback.strategy=NEVER;`.
{code:java}
(1)
++--+
| c.key  | c._col1  |
++--+
++--+

(2)
++--++--+
| c.key  | c._col1  | d.key  | d._col1  |
++--++--+
++--++--+

(3)
+--++
| cte.key  | cte._col1  |
+--++
+--++

(4)
++--++--+
| c.key  | c._col1  | d.key  | d._col1  |
++--++--+
++--++--+

(5)
+---+
| materialized_cte.key  |
+---+
+---+

(6)
Error: Error while compiling statement: FAILED: CalciteSemanticException Could 
not resolve column name (state=42000,code=4){code}
Without CBO.
{code:java}
(1)
++--+
| c.key  | c._col1  |
++--+

(2)
Error: Error while compiling statement: FAILED: SemanticException [Error 
10007]: Ambiguous column reference key in c (state=42000,code=10007)

(3)
+--++
| cte.key  | cte._col1  |
+--++
+--++

(4)
Error: Error while compiling statement: FAILED: SemanticException [Error 
10007]: Ambiguous column reference key in c (state=42000,code=10007)

(5)
+---+
| materialized_cte.key  |
+---+
+---+

(6)
+++
| c.key  | d.key  |
+++
+++{code}


was (Author: okumin):
I found (2), (4), and (6) generated different results with CBO or without CBO.

Also, (3) vs (5) and (4) vs (6) are not consistent, which means CTE 
materialization could change a result when there are duplicated columns.
|| ||CBO||Non-CBO||
|(1) Sub-query|(key, col_1)|(key, col1)|
|(2) Sub-query + top-level join|(key, col_1, key, col_1)|Error|
|(3) CTE|(key, col_1)|(key, col_1)|
|(4) CTE + top-level join|(key, col_1, key, col_1)|Error|
|(5) Materialized CTE|(key)|(key)|
|(6) Materialized CTE + top-level join|Error|(key, key)|

Test queries.

 
{code:java}
DROP TABLE IF EXISTS test1;
DROP TABLE IF EXISTS test2;
CREATE TABLE test1 (key STRING);
CREATE TABLE test2 (key STRING);

-- (1)
SELECT *
FROM (SELECT a.key, b.key FROM test1 a JOIN test2 b ON (a.key = b.key)) c;

-- (2)
SELECT *
FROM (SELECT a.key, b.key FROM test1 a JOIN test2 b ON (a.key = b.key)) c
JOIN (SELECT a.key, b.key FROM test2 a JOIN test2 b ON (a.key = b.key)) d
ON c.key = d.key;

-- (3)
WITH cte AS (
  SELECT a.key, b.key FROM test1 a JOIN test2 b ON (a.key = b.key)
)
SELECT * FROM cte;

-- (4)
WITH cte AS (
  SELECT a.key, b.key FROM test1 a JOIN test2 b ON (a.key = b.key)
)
SELECT *
FROM cte c
JOIN cte d ON (c.key = d.key);

set hive.optimize.cte.materialize.threshold=1;
set hive.optimize.cte.materialize.full.aggregate.only=false;

-- (5)
WITH materialized_cte AS (
  SELECT a.key, b.key FROM test1 a JOIN test2 b ON (a.key = b.key)
)
SELECT * FROM materialized_cte;

-- (6)
WITH materialized_cte AS (
  SELECT 

[jira] [Comment Edited] (HIVE-28088) CalcitePlanner fails to resolve column names on materialized CTEs

2024-02-23 Thread okumin (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820054#comment-17820054
 ] 

okumin edited comment on HIVE-28088 at 2/23/24 1:11 PM:


I tested them with various databases.

Spark SQL 3.4.1.
{code:java}
(1)
key    key

(2)
[AMBIGUOUS_REFERENCE] Reference `c`.`key` is ambiguous, could be: [`c`.`key`, 
`c`.`key`].; line 4 pos 3

(3)
key key

(4)
[AMBIGUOUS_REFERENCE] Reference `c`.`key` is ambiguous, could be: [`c`.`key`, 
`c`.`key`].; line 6 pos 15{code}
Trino 413.
{code:java}
(1)
 key | key 
-+-
(0 rows)

(2)
Query 20240223_125426_00032_p9au2 failed: line 5:4: Column 'c.key' is ambiguous

(3)
 key | key 
-+-
(0 rows)

(4)
Query 20240223_125426_00034_p9au2 failed: line 7:16: Column 'c.key' is 
ambiguous {code}
PostgreSQL 16.2.
{code:java}
(1)
 key | key 
-+-
(0 rows)

(2)
ERROR:  column reference "key" is ambiguous
LINE 4: ON c.key = d.key;

(3)
 key | key 
-+-
(0 rows)

(4)
ERROR:  column reference "key" is ambiguous
LINE 6: JOIN cte d ON (c.key = d.key);{code}
MySQL 8.3.
{code:java}
(1)
ERROR 1060 (42S21): Duplicate column name 'key'

(2)
ERROR 1060 (42S21): Duplicate column name 'key'

(3)
ERROR 1060 (42S21): Duplicate column name 'key'

(4)
ERROR 1060 (42S21): Duplicate column name 'key'{code}


was (Author: okumin):
I tested them with various databases.

Spark SQL 3.4.1.

 
{code:java}
(1)
key    key

(2)
[AMBIGUOUS_REFERENCE] Reference `c`.`key` is ambiguous, could be: [`c`.`key`, 
`c`.`key`].; line 4 pos 3

(3)
key key

(4)
[AMBIGUOUS_REFERENCE] Reference `c`.`key` is ambiguous, could be: [`c`.`key`, 
`c`.`key`].; line 6 pos 15{code}
Trino 413.

 

 
{code:java}
(1)
 key | key 
-+-
(0 rows)

(2)
Query 20240223_125426_00032_p9au2 failed: line 5:4: Column 'c.key' is ambiguous

(3)
 key | key 
-+-
(0 rows)

(4)
Query 20240223_125426_00034_p9au2 failed: line 7:16: Column 'c.key' is 
ambiguous {code}
PostgreSQL 16.2.
{code:java}
(1)
 key | key 
-+-
(0 rows)

(2)
ERROR:  column reference "key" is ambiguous
LINE 4: ON c.key = d.key;

(3)
 key | key 
-+-
(0 rows)

(4)
ERROR:  column reference "key" is ambiguous
LINE 6: JOIN cte d ON (c.key = d.key);{code}
MySQL 8.3.
{code:java}
(1)
ERROR 1060 (42S21): Duplicate column name 'key'

(2)
ERROR 1060 (42S21): Duplicate column name 'key'

(3)
ERROR 1060 (42S21): Duplicate column name 'key'

(4)
ERROR 1060 (42S21): Duplicate column name 'key'{code}
 

> CalcitePlanner fails to resolve column names on materialized CTEs
> -
>
> Key: HIVE-28088
> URL: https://issues.apache.org/jira/browse/HIVE-28088
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-beta-1
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>
> We found a case where CBO fails when it refers to a materialized CTE.
> These are queries to reproduce the issue.
> {code:java}
> DROP TABLE IF EXISTS test;
> CREATE TABLE test (key STRING);
> set hive.optimize.cte.materialize.threshold=1;
> set hive.optimize.cte.materialize.full.aggregate.only=false;
> EXPLAIN CBO WITH materialized_cte AS (
>   SELECT a.key, b.key
>   FROM test a
>   JOIN test b ON (a.key = b.key)
> )
> SELECT *
> FROM materialized_cte c
> JOIN materialized_cte d ON (c.key = d.key); {code}
> CBO fails.
> {code:java}
> +--+
> | Explain  |
> +--+
> +--+ {code}
> Error log on HiveServer2.
> {code:java}
> 2024-02-23T04:43:19,065 ERROR [d145a95b-8840-4d2b-9869-a186b4bd82ef 
> HiveServer2-Handler-Pool: Thread-555] parse.CalcitePlanner: CBO failed, 
> skipping CBO. 
> org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: Could 
> not resolve column name
>      at 
> org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.getPosition(RexNodeExprFactory.java:184)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> h     at 
> org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createColumnRefExpr(RexNodeExprFactory.java:161)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createColumnRefExpr(RexNodeExprFactory.java:97)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.parse.type.JoinCondTypeCheckProcFactory$JoinCondDefaultExprProcessor.processQualifiedColRef(JoinCondTypeCheckProcFactory.java:188)
>  ~[hive-exec-4.0.0-beta-2-
> SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.parse.type.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1414)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.lib.CostLessRuleDispatcher.dispatch(CostLessRuleDispatcher.java:66)
>  

[jira] [Commented] (HIVE-28088) CalcitePlanner fails to resolve column names on materialized CTEs

2024-02-23 Thread okumin (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820054#comment-17820054
 ] 

okumin commented on HIVE-28088:
---

I tested them with various databases.

Spark SQL 3.4.1.

 
{code:java}
(1)
key    key

(2)
[AMBIGUOUS_REFERENCE] Reference `c`.`key` is ambiguous, could be: [`c`.`key`, 
`c`.`key`].; line 4 pos 3

(3)
key key

(4)
[AMBIGUOUS_REFERENCE] Reference `c`.`key` is ambiguous, could be: [`c`.`key`, 
`c`.`key`].; line 6 pos 15{code}
Trino 413.

 

 
{code:java}
(1)
 key | key 
-+-
(0 rows)

(2)
Query 20240223_125426_00032_p9au2 failed: line 5:4: Column 'c.key' is ambiguous

(3)
 key | key 
-+-
(0 rows)

(4)
Query 20240223_125426_00034_p9au2 failed: line 7:16: Column 'c.key' is 
ambiguous {code}
PostgreSQL 16.2.
{code:java}
(1)
 key | key 
-+-
(0 rows)

(2)
ERROR:  column reference "key" is ambiguous
LINE 4: ON c.key = d.key;

(3)
 key | key 
-+-
(0 rows)

(4)
ERROR:  column reference "key" is ambiguous
LINE 6: JOIN cte d ON (c.key = d.key);{code}
MySQL 8.3.
{code:java}
(1)
ERROR 1060 (42S21): Duplicate column name 'key'

(2)
ERROR 1060 (42S21): Duplicate column name 'key'

(3)
ERROR 1060 (42S21): Duplicate column name 'key'

(4)
ERROR 1060 (42S21): Duplicate column name 'key'{code}
 

> CalcitePlanner fails to resolve column names on materialized CTEs
> -
>
> Key: HIVE-28088
> URL: https://issues.apache.org/jira/browse/HIVE-28088
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-beta-1
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>
> We found a case where CBO fails when it refers to a materialized CTE.
> These are queries to reproduce the issue.
> {code:java}
> DROP TABLE IF EXISTS test;
> CREATE TABLE test (key STRING);
> set hive.optimize.cte.materialize.threshold=1;
> set hive.optimize.cte.materialize.full.aggregate.only=false;
> EXPLAIN CBO WITH materialized_cte AS (
>   SELECT a.key, b.key
>   FROM test a
>   JOIN test b ON (a.key = b.key)
> )
> SELECT *
> FROM materialized_cte c
> JOIN materialized_cte d ON (c.key = d.key); {code}
> CBO fails.
> {code:java}
> +--+
> | Explain  |
> +--+
> +--+ {code}
> Error log on HiveServer2.
> {code:java}
> 2024-02-23T04:43:19,065 ERROR [d145a95b-8840-4d2b-9869-a186b4bd82ef 
> HiveServer2-Handler-Pool: Thread-555] parse.CalcitePlanner: CBO failed, 
> skipping CBO. 
> org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: Could 
> not resolve column name
>      at 
> org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.getPosition(RexNodeExprFactory.java:184)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
> h     at 
> org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createColumnRefExpr(RexNodeExprFactory.java:161)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createColumnRefExpr(RexNodeExprFactory.java:97)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.parse.type.JoinCondTypeCheckProcFactory$JoinCondDefaultExprProcessor.processQualifiedColRef(JoinCondTypeCheckProcFactory.java:188)
>  ~[hive-exec-4.0.0-beta-2-
> SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.parse.type.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1414)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.lib.CostLessRuleDispatcher.dispatch(CostLessRuleDispatcher.java:66)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.lib.ExpressionWalker.walk(ExpressionWalker.java:101)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.parse.type.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:231)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> org.apache.hadoop.hive.ql.parse.type.RexNodeTypeCheck.genExprNodeJoinCond(RexNodeTypeCheck.java:60)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>      at 
> 

[jira] [Updated] (HIVE-27898) For ICEBERG partitioned table, after turning on vectorisation, all non-partitioned columns may will be empty(location based table).

2024-02-23 Thread yongzhi.shao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yongzhi.shao updated HIVE-27898:

Summary: For ICEBERG partitioned table, after turning on vectorisation, all 
non-partitioned columns may will be empty(location based table).  (was: For 
ICEBERG partitioned table, after turning on vectorisation, all non-partitioned 
columns may will be empty.)

> For ICEBERG partitioned table, after turning on vectorisation, all 
> non-partitioned columns may will be empty(location based table).
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Reporter: yongzhi.shao
>Priority: Critical
> Attachments: hive-site.xml
>
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> --hive-orc
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.trade_test 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;--10 ROWS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27898) For ICEBERG partitioned table, after turning on vectorisation, all non-partitioned columns may will be empty.

2024-02-23 Thread yongzhi.shao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yongzhi.shao updated HIVE-27898:

Summary: For ICEBERG partitioned table, after turning on vectorisation, all 
non-partitioned columns may will be empty.  (was: For ICEBERG partitioned 
table, after turning on vectorisation, all non-partitioned columns will be 
empty.)

> For ICEBERG partitioned table, after turning on vectorisation, all 
> non-partitioned columns may will be empty.
> -
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Reporter: yongzhi.shao
>Priority: Critical
> Attachments: hive-site.xml
>
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> --hive-orc
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.trade_test 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;--10 ROWS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27898) For ICEBERG partitioned table, after turning on vectorisation, all non-partitioned columns will be empty.

2024-02-23 Thread yongzhi.shao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yongzhi.shao updated HIVE-27898:

Summary: For ICEBERG partitioned table, after turning on vectorisation, all 
non-partitioned columns will be empty.  (was: When CBO is disabled, HIVE cannot 
query ICEBERG tables in nested subqueries properly)

> For ICEBERG partitioned table, after turning on vectorisation, all 
> non-partitioned columns will be empty.
> -
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Reporter: yongzhi.shao
>Priority: Critical
> Attachments: hive-site.xml
>
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> --hive-orc
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.trade_test 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;--10 ROWS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27556) Add Unit Test for KafkaStorageHandlerInfo

2024-02-23 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-27556:

Fix Version/s: 4.0.0

> Add Unit Test for KafkaStorageHandlerInfo
> -
>
> Key: HIVE-27556
> URL: https://issues.apache.org/jira/browse/HIVE-27556
> Project: Hive
>  Issue Type: Test
>  Components: kafka integration, StorageHandler
>Reporter: Kokila N
>Assignee: Kokila N
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Adding unit tests for KafkaStorageHandlerInfo.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27556) Add Unit Test for KafkaStorageHandlerInfo

2024-02-23 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-27556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820008#comment-17820008
 ] 

László Bodor commented on HIVE-27556:
-

merged to master, thanks [~kokila19] for the patch and [~akshatm] for the 
review!

> Add Unit Test for KafkaStorageHandlerInfo
> -
>
> Key: HIVE-27556
> URL: https://issues.apache.org/jira/browse/HIVE-27556
> Project: Hive
>  Issue Type: Test
>  Components: kafka integration, StorageHandler
>Reporter: Kokila N
>Assignee: Kokila N
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Adding unit tests for KafkaStorageHandlerInfo.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27556) Add Unit Test for KafkaStorageHandlerInfo

2024-02-23 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor resolved HIVE-27556.
-
Resolution: Fixed

> Add Unit Test for KafkaStorageHandlerInfo
> -
>
> Key: HIVE-27556
> URL: https://issues.apache.org/jira/browse/HIVE-27556
> Project: Hive
>  Issue Type: Test
>  Components: kafka integration, StorageHandler
>Reporter: Kokila N
>Assignee: Kokila N
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Adding unit tests for KafkaStorageHandlerInfo.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27692) Explore removing the always task from embedded HMS

2024-02-23 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng resolved HIVE-27692.

Fix Version/s: 4.0.0
   Resolution: Fixed

Fix has been merged. Thank you Naveen for the review!

> Explore removing the always task from embedded HMS
> --
>
> Key: HIVE-27692
> URL: https://issues.apache.org/jira/browse/HIVE-27692
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> The always tasks are running in the leader HMS now, the properties for 
> configuring the leader should only belong to HMS, other engines such as 
> Spark/Impala doesn't need to know these properties. For most cases, the 
> engine only cares about the properties for connecting HMS, e.g, 
> hive.metastore.uris.
> Every time when a new apps uses an embedded Metastore, it will start the HMS 
> always tasks by default. Imaging we have hundreds of apps, then hundreds of 
> pieces of the same tasks are running, this will put extra burden to the 
> underlying databases, such as the flooding queries, connection limit.
> I think we can remove always tasks from the embeded Metastore, the always 
> task will be taken care of by the standalone Metastore, as a standalone 
> Metastore should be here in production environment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28086) Clean up older version staging-dir when versioning is enabled at storage like s3

2024-02-23 Thread Butao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819984#comment-17819984
 ] 

Butao Zhang commented on HIVE-28086:


I have tested the *hive.exec.stagingdir,* and i think it can be set dynamically.

*Loading data to table testdb.teststage from 
hdfs://127.0.0.1:8028/tmp/hive/.hive-staging_hive_2024-02-23_17-05-18_098_5916756629075483734-4/-ext-1*
{code:java}
0: jdbc:hive2://127.0.0.1:1/default> create table teststage(id int);
0: jdbc:hive2://127.0.0.1:1/default> set 
hive.exec.stagingdir=/tmp/hive/.hive-staging; 
0: jdbc:hive2://127.0.0.1:10004/default> insert into teststage values(123);
...
INFO  : Starting task [Stage-2:DEPENDENCY_COLLECTION] in serial mode
INFO  : Starting task [Stage-0:MOVE] in serial mode
INFO  : Loading data to table testdb.teststage from 
hdfs://127.0.0.1:8028/tmp/hive/.hive-staging_hive_2024-02-23_17-05-18_098_5916756629075483734-4/-ext-1
INFO  : Completed executing 
command(queryId=hive_20240223170518_68ee2a14-3268-4b34-bd01-ed0fe48a02ea); Time 
taken: 5.977 seconds
1 row affected (6.535 seconds)
{code}
 

> Clean up older version staging-dir when versioning is enabled at storage like 
> s3
> 
>
> Key: HIVE-28086
> URL: https://issues.apache.org/jira/browse/HIVE-28086
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Taraka Rama Rao Lethavadla
>Priority: Major
>
> When running Hive using AWS S3 Storage *with versioning* for managed / 
> external table directories, a staging directory is created as per 
> {{hive.exec.stagingdir=.hive-staging}} under the table location.
> The directory is deleted after the job is completed
> AWS S3 offers an option to enable versioning.  When enabled, the Hive staging 
> directories will be "deleted" but a copy will be kept.  This requires manual 
> cleanup and over time the effort to remove these directories will be too much 
> of a work
> For Spark jobs, this is easily worked out by setting:
> {noformat}
> spark.hadoop.hive.exec.stagingdir=s3a:///tmp/hive/.hive-staging 
> {noformat}
> However that is not an option in Hive because {{hive.exec.stagingdir}} is 
> meant to be a relative path to the table directory
> Options to solve:
> It would be helpful to allow the staging directory to be configurable like 
> these properties:
> {noformat}
> hive.exec.scratchdir=/tmp/hive
> hive.exec.local.scratchdir=/tmp/hive
> tez.staging-dir=/tmp/${user.name}/staging 
> {noformat}
> That will allow customers to configure a location without versioning and 
> avoid this usability issue.  
> or
> use Storage API(eg: s3) to delete the old version of staging along with 
> actual directory
> h4.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)