[jira] [Commented] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790440#comment-17790440 ] zhangbutao commented on HIVE-27898: --- Just a remind, once HIVE-27912 is fixed, you can get the snapshot package which including iceberg from http://ci.hive.apache.org/job/hive-nightly/ > HIVE4 can't use ICEBERG table in subqueries > --- > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Bug > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Critical > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 --10 rows > select * > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --10 rows > select uni_shop_id > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > select uni_shop_id > from ( > select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > --hive-orc > select uni_shop_id > from ( > select * from iceberg_dwd.trade_test > where uni_shop_id = 'TEST|1' limit 10 > ) t1;--10 ROWS{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27912) Include Iceberg module in nightly builds
[ https://issues.apache.org/jira/browse/HIVE-27912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-27912: -- Description: [http://ci.hive.apache.org/job/hive-nightly/] HIVE-25715 added nightly builds, and it give users a chance to test the snapshot binary package. But the builds didn't containe Iceberg module, It would be good to include Iceberg module. And then user like HIVE-27898 can use the snapshot binary package to test some queries. > Include Iceberg module in nightly builds > > > Key: HIVE-27912 > URL: https://issues.apache.org/jira/browse/HIVE-27912 > Project: Hive > Issue Type: Improvement >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > > [http://ci.hive.apache.org/job/hive-nightly/] > HIVE-25715 added nightly builds, and it give users a chance to test the > snapshot binary package. > But the builds didn't containe Iceberg module, It would be good to include > Iceberg module. And then user like HIVE-27898 can use the snapshot binary > package to test some queries. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27912) Include Iceberg module in nightly builds
zhangbutao created HIVE-27912: - Summary: Include Iceberg module in nightly builds Key: HIVE-27912 URL: https://issues.apache.org/jira/browse/HIVE-27912 Project: Hive Issue Type: Improvement Reporter: zhangbutao -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27912) Include Iceberg module in nightly builds
[ https://issues.apache.org/jira/browse/HIVE-27912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao reassigned HIVE-27912: - Assignee: zhangbutao > Include Iceberg module in nightly builds > > > Key: HIVE-27912 > URL: https://issues.apache.org/jira/browse/HIVE-27912 > Project: Hive > Issue Type: Improvement >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790424#comment-17790424 ] zhangbutao commented on HIVE-27898: --- I think you can try the master branch to see if it is ok. We have fixed some issues on the master branch. BTW, I think we will release a new Hive4 version soon, and then you can use the new released version. > HIVE4 can't use ICEBERG table in subqueries > --- > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Bug > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Critical > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 --10 rows > select * > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --10 rows > select uni_shop_id > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > select uni_shop_id > from ( > select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > --hive-orc > select uni_shop_id > from ( > select * from iceberg_dwd.trade_test > where uni_shop_id = 'TEST|1' limit 10 > ) t1;--10 ROWS{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790407#comment-17790407 ] zhangbutao commented on HIVE-27898: --- I still can not reproduce your issue: Spark3.5.0 Hadoop3.3.1 Tez 0.10.2 Hive4 master code Iceberg 1.4.2 {code:java} //Spark side /data/spark-3.5.0-bin-hadoop3/bin/spark-sql \ --master local \ --deploy-mode client \ --conf spark.sql.catalog.hadoop_prod=org.apache.iceberg.spark.SparkCatalog \ --conf spark.sql.catalog.hadoop_prod.type=hadoop \ --conf spark.sql.catalog.hadoop_prod.warehouse=hdfs://localhost:8028/tmp/testiceberg; CREATE TABLE IF NOT EXISTS hadoop_prod.default.test_data_04 ( id string,name string ) using iceberg PARTITIONED BY (name) TBLPROPERTIES ('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true'); insert into hadoop_prod.default.test_data_04(id,name) values('1','a'),('2','b');{code} {code:java} // HS2 side CREATE EXTERNAL TABLE test_data_04 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 'hdfs://localhost:8028/tmp/testiceberg/default/test_data_04' TBLPROPERTIES ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); select id from (select * from test_data_04 limit 10) s1; +-+ | id | +-+ | 1 | | 2 | +-+ select id from (select * from test_data_04) s1; +-+ | id | +-+ | 1 | | 2 | +-+ {code} > HIVE4 can't use ICEBERG table in subqueries > --- > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Bug > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Critical > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 --10 rows > select * > from ( > select * from iceber
[jira] [Commented] (HIVE-27910) Hive on Spark -- should work?
[ https://issues.apache.org/jira/browse/HIVE-27910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789770#comment-17789770 ] zhangbutao commented on HIVE-27910: --- [https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark] yep, I think we need add notes to remind users that Hive on Spark was not supported since Hive4. > Hive on Spark -- should work? > - > > Key: HIVE-27910 > URL: https://issues.apache.org/jira/browse/HIVE-27910 > Project: Hive > Issue Type: Task >Affects Versions: 3.1.1 >Reporter: Alexander Petrossian (PAF) >Priority: Major > > I wanted to test this > [https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark] > Where our admins installed > {code:java} > hive --version > Hive 3.1.0.3.1.0.0-78 > Git > git://ctr-e138-1518143905142-586755-01-15.hwx.site/grid/0/jenkins/workspace/HDP-parallel-centos7/SOURCES/hive > -r 56673b027117d8cb3400675b1680a4d992360808 {code} > Trying > {code:java} > set hive.execution.engine=spark; > SELECT ...; {code} > Getting > {code:java} > ERROR : FAILED: Execution Error, return code 30041 from > org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create Spark client > for Spark session 139c0453-459f-4511-b7fc-eab28e78fe0c {code} > Could it be that Spark support in Hive was somehow dropped? > Or could it be some [simple?] configuration issue? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27910) Hive on Spark -- should work?
[ https://issues.apache.org/jira/browse/HIVE-27910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789765#comment-17789765 ] zhangbutao commented on HIVE-27910: --- Hive on Spark is removed in Hive4 version. IMO, for Hive3.1.1, no one maintains this feature anymore. I would suggest you use Hive on tez which is actively maintained by Hive community. Thanks. > Hive on Spark -- should work? > - > > Key: HIVE-27910 > URL: https://issues.apache.org/jira/browse/HIVE-27910 > Project: Hive > Issue Type: Task >Affects Versions: 3.1.1 >Reporter: Alexander Petrossian (PAF) >Priority: Major > > I wanted to test this > [https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark] > Where our admins installed > {code:java} > hive --version > Hive 3.1.0.3.1.0.0-78 > Git > git://ctr-e138-1518143905142-586755-01-15.hwx.site/grid/0/jenkins/workspace/HDP-parallel-centos7/SOURCES/hive > -r 56673b027117d8cb3400675b1680a4d992360808 {code} > Trying > {code:java} > set hive.execution.engine=spark; > SELECT ...; {code} > Getting > {code:java} > ERROR : FAILED: Execution Error, return code 30041 from > org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create Spark client > for Spark session 139c0453-459f-4511-b7fc-eab28e78fe0c {code} > Could it be that Spark support in Hive was somehow dropped? > Or could it be some [simple?] configuration issue? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HIVE-27900) hive can not read iceberg-parquet table
[ https://issues.apache.org/jira/browse/HIVE-27900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789289#comment-17789289 ] zhangbutao edited comment on HIVE-27900 at 11/24/23 2:12 AM: - Do you use the tez shuffle handler? [https://tez.apache.org/shuffle-handler.html] If you remove the ORC's vectorised reads property from the parquet table, then the query will succeed? was (Author: zhangbutao): If you remove the ORC's vectorised reads property from the parquet table, then the query will succeed? > hive can not read iceberg-parquet table > --- > > Key: HIVE-27900 > URL: https://issues.apache.org/jira/browse/HIVE-27900 > Project: Hive > Issue Type: Bug > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Major > > We found that using HIVE4-BETA version, we could not query the > Iceberg-Parquet table with vectorised execution turned on. > {code:java} > --spark-sql(3.4.1+iceberg 1.4.2) > CREATE TABLE local.test.b_qqd_shop_rfm_parquet_snappy ( > a string,b string,c string) > USING iceberg > LOCATION '/iceberg-catalog/warehouse/test/b_qqd_shop_rfm_parquet_snappy' > TBLPROPERTIES ( > 'current-snapshot-id' = '5138351937447353683', > 'format' = 'iceberg/parquet', > 'format-version' = '2', > 'read.orc.vectorization.enabled' = 'true', > 'write.format.default' = 'parquet', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.parquet.compression-codec' = 'snappy'); > --hive-sql > CREATE EXTERNAL TABLE iceberg_dwd.b_qqd_shop_rfm_parquet_snappy > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION > 'hdfs://xxx/iceberg-catalog/warehouse/test/b_qqd_shop_rfm_parquet_snappy/' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > set hive.default.fileformat=orc; > set hive.default.fileformat.managed=orc; > create table test_parquet_as_orc as select * from > b_qqd_shop_rfm_parquet_snappy limit 100; > , TaskAttempt 2 failed, info=[Error: Node: /xxx..xx.xx: Error while > running task ( failure ) : > attempt_1696729618575_69586_1_00_00_2:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) > at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293) > ... 16 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:993) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101) > ... 19 more > Caused by: org.apache.hadoop.hive.ql.metadat
[jira] [Commented] (HIVE-27900) hive can not read iceberg-parquet table
[ https://issues.apache.org/jira/browse/HIVE-27900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789289#comment-17789289 ] zhangbutao commented on HIVE-27900: --- If you remove the ORC's vectorised reads property from the parquet table, then the query will succeed? > hive can not read iceberg-parquet table > --- > > Key: HIVE-27900 > URL: https://issues.apache.org/jira/browse/HIVE-27900 > Project: Hive > Issue Type: Bug > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Major > > We found that using HIVE4-BETA version, we could not query the > Iceberg-Parquet table with vectorised execution turned on. > {code:java} > --spark-sql(3.4.1+iceberg 1.4.2) > CREATE TABLE local.test.b_qqd_shop_rfm_parquet_snappy ( > a string,b string,c string) > USING iceberg > LOCATION '/iceberg-catalog/warehouse/test/b_qqd_shop_rfm_parquet_snappy' > TBLPROPERTIES ( > 'current-snapshot-id' = '5138351937447353683', > 'format' = 'iceberg/parquet', > 'format-version' = '2', > 'read.orc.vectorization.enabled' = 'true', > 'write.format.default' = 'parquet', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.parquet.compression-codec' = 'snappy'); > --hive-sql > CREATE EXTERNAL TABLE iceberg_dwd.b_qqd_shop_rfm_parquet_snappy > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION > 'hdfs://xxx/iceberg-catalog/warehouse/test/b_qqd_shop_rfm_parquet_snappy/' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > set hive.default.fileformat=orc; > set hive.default.fileformat.managed=orc; > create table test_parquet_as_orc as select * from > b_qqd_shop_rfm_parquet_snappy limit 100; > , TaskAttempt 2 failed, info=[Error: Node: /xxx..xx.xx: Error while > running task ( failure ) : > attempt_1696729618575_69586_1_00_00_2:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) > at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293) > ... 16 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:993) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101) > ... 19 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.vector.reducesink.VectorReduceSinkEmptyKeyOperator.process(VectorReduceSinkEmptyKeyOperator.java:137) > at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919) >
[jira] [Commented] (HIVE-27901) Hive's performance for querying the Iceberg table is very poor.
[ https://issues.apache.org/jira/browse/HIVE-27901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789013#comment-17789013 ] zhangbutao commented on HIVE-27901: --- I think this ticket looks something like https://issues.apache.org/jira/browse/HIVE-27883 . Currently, some optimization properties like merge/split data can not be used on Iceberg table as iceberg has its own optimization properties. For this ticket, it seems that orc table has more tasks than iceberg table, so the orc table can run faster. I think that maybe you can try to optimize the property _set read.split.target-size=67108864;_ [https://iceberg.apache.org/docs/latest/configuration/#read-properties] read.split.target-size is default 134217728. But i am not sure if this is a good way to optimize your query, as i can not reproduce and delve into your problem. > Hive's performance for querying the Iceberg table is very poor. > --- > > Key: HIVE-27901 > URL: https://issues.apache.org/jira/browse/HIVE-27901 > Project: Hive > Issue Type: Bug > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Major > Attachments: image-2023-11-22-18-32-28-344.png, > image-2023-11-22-18-33-01-885.png, image-2023-11-22-18-33-32-915.png > > > I am using HIVE4.0.0-BETA for testing. > BTW,I found that the performance of HIVE reading ICEBERG table is still very > slow. > How should I deal with this problem? > I count a 7 billion table and compare the performance difference between HIVE > reading ICEBERG-ORC and ORC table respectively. > We use ICEBERG 1.4.2, ICEBERG-ORC with ZSTD compression enabled. > ORC with SNAPPY compression. > HADOOP version 3.1.1 (native zstd not supported). > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > --inner orc table( set hive default format = orc ) > set hive.default.fileformat=orc; > set hive.default.fileformat.managed=orc; > create table if not exists iceberg_dwd.orc_inner_table as select * from > iceberg_dwd.b_std_trade;{code} > > !image-2023-11-22-18-32-28-344.png! > !image-2023-11-22-18-33-01-885.png! > Also, I have another question. The Submit Plan statistic is clearly > incorrect. Is this something that needs to be fixed? > !image-2023-11-22-18-33-32-915.png! > -- This m
[jira] [Commented] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789008#comment-17789008 ] zhangbutao commented on HIVE-27898: --- Please provide more simple test to help others to reproduce this issue. 1) Can we create a more simple table with several columns? table *_datacenter.dwd.b_std_trade_* has too many columns. 2) Can we insert several rows to help to reproduce this issue? > HIVE4 can't use ICEBERG table in subqueries > --- > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Bug > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Critical > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 --10 rows > select * > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --10 rows > select uni_shop_id > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > select uni_shop_id > from ( > select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > --hive-orc > select uni_shop_id > from ( > select * from iceberg_dwd.trade_test > where uni_shop_id = 'TEST|1' limit 10 > ) t1;--10 ROWS{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27900) hive can not read iceberg-parquet table
[ https://issues.apache.org/jira/browse/HIVE-27900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789007#comment-17789007 ] zhangbutao commented on HIVE-27900: --- I can not reproduce this issue on master code. My env is : 1) Hive master branch: You can compile hive code using the cmd: {code:java} mvn clean install -DskipTests -Piceberg -Pdist{code} 2)Tez 0.10.2 I recommend you use 0.10.2 to test as 0.10.3 is not released. We can not make sure 0.10.3 work well with Hive. 3) Hadoop 3.3.1 BTW, if the table _*local.test.b_qqd_shop_rfm_parquet_snappy* is_ empty without data, will the issue occur again in your env? > hive can not read iceberg-parquet table > --- > > Key: HIVE-27900 > URL: https://issues.apache.org/jira/browse/HIVE-27900 > Project: Hive > Issue Type: Bug > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Major > > We found that using HIVE4-BETA version, we could not query the > Iceberg-Parquet table with vectorised execution turned on. > {code:java} > --spark-sql(3.4.1+iceberg 1.4.2) > CREATE TABLE local.test.b_qqd_shop_rfm_parquet_snappy ( > a string,b string,c string) > USING iceberg > LOCATION '/iceberg-catalog/warehouse/test/b_qqd_shop_rfm_parquet_snappy' > TBLPROPERTIES ( > 'current-snapshot-id' = '5138351937447353683', > 'format' = 'iceberg/parquet', > 'format-version' = '2', > 'read.orc.vectorization.enabled' = 'true', > 'write.format.default' = 'parquet', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.parquet.compression-codec' = 'snappy'); > --hive-sql > CREATE EXTERNAL TABLE iceberg_dwd.b_qqd_shop_rfm_parquet_snappy > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION > 'hdfs://xxx/iceberg-catalog/warehouse/test/b_qqd_shop_rfm_parquet_snappy/' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > set hive.default.fileformat=orc; > set hive.default.fileformat.managed=orc; > create table test_parquet_as_orc as select * from > b_qqd_shop_rfm_parquet_snappy limit 100; > , TaskAttempt 2 failed, info=[Error: Node: /xxx..xx.xx: Error while > running task ( failure ) : > attempt_1696729618575_69586_1_00_00_2:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) > at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293) > ... 16 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:993) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSourc
[jira] [Updated] (HIVE-27880) Iceberg: Support creating a branch on an empty table
[ https://issues.apache.org/jira/browse/HIVE-27880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-27880: -- Summary: Iceberg: Support creating a branch on an empty table (was: Iceberg: Supports creating a branch on an empty table) > Iceberg: Support creating a branch on an empty table > > > Key: HIVE-27880 > URL: https://issues.apache.org/jira/browse/HIVE-27880 > Project: Hive > Issue Type: Sub-task > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > Labels: pull-request-available > > After this Iceberg change [https://github.com/apache/iceberg/pull/8072] which > has been inclued iceberg1.4, we can create a branch on an empty. User can > create an empty branch, and then write data into the branch. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27880) Iceberg: Supports creating a branch on an empty table
[ https://issues.apache.org/jira/browse/HIVE-27880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-27880: -- Description: After this Iceberg change [https://github.com/apache/iceberg/pull/8072] which has been inclued iceberg1.4, we can create a branch on an empty. User can create an empty branch, and then write data into the branch. (was: After this Iceberg change [https://github.com/apache/iceberg/pull/8072] which has been inclued iceberg1.4, we can create a branch on an empty. Use can create an empty branch, and then write data into the branch.) > Iceberg: Supports creating a branch on an empty table > - > > Key: HIVE-27880 > URL: https://issues.apache.org/jira/browse/HIVE-27880 > Project: Hive > Issue Type: Sub-task > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > > After this Iceberg change [https://github.com/apache/iceberg/pull/8072] which > has been inclued iceberg1.4, we can create a branch on an empty. User can > create an empty branch, and then write data into the branch. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27880) Iceberg: Supports creating a branch on an empty table
[ https://issues.apache.org/jira/browse/HIVE-27880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-27880: -- Description: After this Iceberg change [https://github.com/apache/iceberg/pull/8072] which has been inclued iceberg1.4, we can create a branch on an empty. Use can create an empty branch, and then write data into the branch. > Iceberg: Supports creating a branch on an empty table > - > > Key: HIVE-27880 > URL: https://issues.apache.org/jira/browse/HIVE-27880 > Project: Hive > Issue Type: Sub-task > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > > After this Iceberg change [https://github.com/apache/iceberg/pull/8072] which > has been inclued iceberg1.4, we can create a branch on an empty. Use can > create an empty branch, and then write data into the branch. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27880) Iceberg: Supports creating a branch on an empty table
[ https://issues.apache.org/jira/browse/HIVE-27880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao reassigned HIVE-27880: - Assignee: zhangbutao > Iceberg: Supports creating a branch on an empty table > - > > Key: HIVE-27880 > URL: https://issues.apache.org/jira/browse/HIVE-27880 > Project: Hive > Issue Type: Sub-task > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27880) Iceberg: Supports creating a branch on an empty table
zhangbutao created HIVE-27880: - Summary: Iceberg: Supports creating a branch on an empty table Key: HIVE-27880 URL: https://issues.apache.org/jira/browse/HIVE-27880 Project: Hive Issue Type: Sub-task Components: Iceberg integration Reporter: zhangbutao -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27869) Iceberg: Select HadoopTables will fail at HiveIcebergStorageHandler::canProvideColStats
[ https://issues.apache.org/jira/browse/HIVE-27869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-27869: -- Description: Step to reproduce:(latest master code) 1) Create path-based HadoopTable by Spark: {code:java} ./spark-3.3.1-bin-hadoop3/bin/spark-sql \--master local \--deploy-mode client \--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \--conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \--conf spark.sql.catalog.spark_catalog.type=hadoop \--conf spark.sql.catalog.spark_catalog.warehouse=hdfs://localhost:8028/tmp/testiceberg; create table ice_test_001(id int) using iceberg; insert into ice_test_001(id) values(1),(2),(3);{code} 2) Create iceberg table based on the HadoopTable by Hive: {code:java} CREATE EXTERNAL TABLE ice_test_001 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 'hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001' TBLPROPERTIES ('iceberg.catalog'='location_based_table'); {code} 3)Select the HadoopTable by Hive // launch tez task to scan data *set hive.fetch.task.conversion=none;* {code:java} jdbc:hive2://localhost:1/default> select * from ice_test_001; Error: Error while compiling statement: FAILED: IllegalArgumentException Pathname /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 from hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 is not a valid DFS filename. (state=42000,code=4) {code} Full stacktrace: {code:java} Caused by: java.lang.IllegalArgumentException: Pathname /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 from hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:256) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1752) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1749) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1764) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760) ~[hadoop-common-3.3.1.jar:?] at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStats(HiveIcebergStorageHandler.java:540) ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStatistics(HiveIcebergStorageHandler.java:533) ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:1073) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:302) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:193) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:181) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:173) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:148) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:125) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithSt
[jira] [Updated] (HIVE-27869) Iceberg: Select HadoopTables will fail at HiveIcebergStorageHandler::canProvideColStats
[ https://issues.apache.org/jira/browse/HIVE-27869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-27869: -- Description: Step to reproduce: 1) Create path-based HadoopTable by Spark: {code:java} ./spark-3.3.1-bin-hadoop3/bin/spark-sql \--master local \--deploy-mode client \--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \--conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \--conf spark.sql.catalog.spark_catalog.type=hadoop \--conf spark.sql.catalog.spark_catalog.warehouse=hdfs://localhost:8028/tmp/testiceberg; create table ice_test_001(id int) using iceberg; insert into ice_test_001(id) values(1),(2),(3);{code} 2) Create iceberg table based on the HadoopTable by Hive: {code:java} CREATE EXTERNAL TABLE ice_test_001 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 'hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001' TBLPROPERTIES ('iceberg.catalog'='location_based_table'); {code} 3)Select the HadoopTable by Hive // launch tez task to scan data *set hive.fetch.task.conversion=none;* {code:java} jdbc:hive2://localhost:1/default> select * from ice_test_001; Error: Error while compiling statement: FAILED: IllegalArgumentException Pathname /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 from hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 is not a valid DFS filename. (state=42000,code=4) {code} Full stacktrace: {code:java} Caused by: java.lang.IllegalArgumentException: Pathname /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 from hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:256) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1752) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1749) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1764) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760) ~[hadoop-common-3.3.1.jar:?] at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStats(HiveIcebergStorageHandler.java:540) ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStatistics(HiveIcebergStorageHandler.java:533) ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:1073) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:302) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:193) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:181) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:173) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:148) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:125) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(A
[jira] [Updated] (HIVE-27869) Iceberg: Select HadoopTables will fail at HiveIcebergStorageHandler::canProvideColStats
[ https://issues.apache.org/jira/browse/HIVE-27869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-27869: -- Description: Step to reproduce: 1) Create path-based HadoopTable by Spark: {code:java} ./spark-3.3.1-bin-hadoop3/bin/spark-sql \--master local \--deploy-mode client \--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \--conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \--conf spark.sql.catalog.spark_catalog.type=hadoop \--conf spark.sql.catalog.spark_catalog.warehouse=hdfs://localhost:8028/tmp/testiceberg; create table ice_test_001(id int) using iceberg; insert into ice_test_001(id) values(1),(2),(3);{code} 2) Create iceberg table based on the HadoopTable by Hive: {code:java} CREATE EXTERNAL TABLE ice_test_001 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 'hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001' TBLPROPERTIES ('iceberg.catalog'='location_based_table'); {code} 3)Select the HadoopTable by Hive // launch tez task to scan data *set hive.fetch.task.conversion=none;* {code:java} jdbc:hive2://localhost:10004/default> select * from ice_test_001; Error: Error while compiling statement: FAILED: IllegalArgumentException Pathname /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 from hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 is not a valid DFS filename. (state=42000,code=4) {code} Full stacktrace: {code:java} Caused by: java.lang.IllegalArgumentException: Pathname /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 from hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:256) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1752) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1749) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1764) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760) ~[hadoop-common-3.3.1.jar:?] at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStats(HiveIcebergStorageHandler.java:540) ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStatistics(HiveIcebergStorageHandler.java:533) ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:1073) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:302) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:193) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:181) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:173) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:148) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:125) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(A
[jira] [Updated] (HIVE-27869) Iceberg: Select HadoopTables will fail at HiveIcebergStorageHandler::canProvideColStats
[ https://issues.apache.org/jira/browse/HIVE-27869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-27869: -- Description: Step to reproduce: 1) Create path-based HadoopTable by Spark: {code:java} ./spark-3.3.1-bin-hadoop3/bin/spark-sql \--master local \--deploy-mode client \--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \--conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \--conf spark.sql.catalog.spark_catalog.type=hadoop \--conf spark.sql.catalog.spark_catalog.warehouse=hdfs://localhost:8028/tmp/testiceberg; create table ice_test_001(id int) using iceberg; insert into ice_test_001(id) values(1),2),(3);{code} 2) Create iceberg table based on the HadoopTable by Hive: {code:java} CREATE EXTERNAL TABLE ice_test_001 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 'hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001'TBLPROPERTIES ('iceberg.catalog'='location_based_table'); {code} 3)Select the HadoopTable by Hive *set hive.fetch.task.conversion=none;* {code:java} jdbc:hive2://localhost:10004/default> select * from ice_test_001; Error: Error while compiling statement: FAILED: IllegalArgumentException Pathname /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 from hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 is not a valid DFS filename. (state=42000,code=4) {code} Full stacktrace: {code:java} Caused by: java.lang.IllegalArgumentException: Pathname /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 from hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:256) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1752) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1749) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1764) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760) ~[hadoop-common-3.3.1.jar:?] at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStats(HiveIcebergStorageHandler.java:540) ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStatistics(HiveIcebergStorageHandler.java:533) ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:1073) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:302) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:193) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:181) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:173) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:148) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:125) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(AnnotateWithStatistics.java:84) ~[h
[jira] [Assigned] (HIVE-27869) Iceberg: Select HadoopTables will fail at HiveIcebergStorageHandler::canProvideColStats
[ https://issues.apache.org/jira/browse/HIVE-27869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao reassigned HIVE-27869: - Assignee: zhangbutao > Iceberg: Select HadoopTables will fail at > HiveIcebergStorageHandler::canProvideColStats > > > Key: HIVE-27869 > URL: https://issues.apache.org/jira/browse/HIVE-27869 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > > Step to reproduce: > 1) Create path-based HadoopTable by Spark: > > {code:java} > ./spark-3.3.1-bin-hadoop3/bin/spark-sql \--master local \--deploy-mode client > \--conf > spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions > \--conf > spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog > \--conf spark.sql.catalog.spark_catalog.type=hadoop \--conf > spark.sql.catalog.spark_catalog.warehouse=hdfs://localhost:8028/tmp/testiceberg; > create table ice_test_001(id int) using iceberg; > insert into ice_test_001(id) values(1),2),(3);{code} > > 2) Create iceberg table based on the HadoopTable by Hive: > {code:java} > CREATE EXTERNAL TABLE ice_test_001STORED BY > 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION > 'hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001'TBLPROPERTIES > ('iceberg.catalog'='location_based_table'); {code} > 3)Select the HadoopTable by Hive > *set hive.fetch.task.conversion=none;* > {code:java} > jdbc:hive2://localhost:10004/default> select * from testicedb118.ice_test_001; > Error: Error while compiling statement: FAILED: IllegalArgumentException > Pathname > /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 > from > hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 > is not a valid DFS filename. (state=42000,code=4) {code} > Full stacktrace: > {code:java} > Caused by: java.lang.IllegalArgumentException: Pathname > /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 > from > hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 > is not a valid DFS filename. > at > org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:256) > ~[hadoop-hdfs-client-3.3.1.jar:?] > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1752) > ~[hadoop-hdfs-client-3.3.1.jar:?] > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1749) > ~[hadoop-hdfs-client-3.3.1.jar:?] > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > ~[hadoop-common-3.3.1.jar:?] > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1764) > ~[hadoop-hdfs-client-3.3.1.jar:?] > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760) > ~[hadoop-common-3.3.1.jar:?] > at > org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStats(HiveIcebergStorageHandler.java:540) > ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStatistics(HiveIcebergStorageHandler.java:533) > ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:1073) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:302) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:193) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:181) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:173) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalke
[jira] [Updated] (HIVE-27869) Iceberg: Select HadoopTables will fail at HiveIcebergStorageHandler::canProvideColStats
[ https://issues.apache.org/jira/browse/HIVE-27869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-27869: -- Description: Step to reproduce: 1) Create path-based HadoopTable by Spark: {code:java} ./spark-3.3.1-bin-hadoop3/bin/spark-sql \--master local \--deploy-mode client \--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \--conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \--conf spark.sql.catalog.spark_catalog.type=hadoop \--conf spark.sql.catalog.spark_catalog.warehouse=hdfs://localhost:8028/tmp/testiceberg; create table ice_test_001(id int) using iceberg; insert into ice_test_001(id) values(1),2),(3);{code} 2) Create iceberg table based on the HadoopTable by Hive: {code:java} CREATE EXTERNAL TABLE ice_test_001STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 'hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001'TBLPROPERTIES ('iceberg.catalog'='location_based_table'); {code} 3)Select the HadoopTable by Hive *set hive.fetch.task.conversion=none;* {code:java} jdbc:hive2://localhost:10004/default> select * from testicedb118.ice_test_001; Error: Error while compiling statement: FAILED: IllegalArgumentException Pathname /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 from hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 is not a valid DFS filename. (state=42000,code=4) {code} Full stacktrace: {code:java} Caused by: java.lang.IllegalArgumentException: Pathname /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 from hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:256) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1752) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1749) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1764) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760) ~[hadoop-common-3.3.1.jar:?] at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStats(HiveIcebergStorageHandler.java:540) ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStatistics(HiveIcebergStorageHandler.java:533) ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:1073) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:302) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:193) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:181) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:173) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:148) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:125) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(AnnotateWithStatistics.j
[jira] [Created] (HIVE-27869) Iceberg: Select HadoopTables will fail at HiveIcebergStorageHandler::canProvideColStats
zhangbutao created HIVE-27869: - Summary: Iceberg: Select HadoopTables will fail at HiveIcebergStorageHandler::canProvideColStats Key: HIVE-27869 URL: https://issues.apache.org/jira/browse/HIVE-27869 Project: Hive Issue Type: Improvement Components: Iceberg integration Reporter: zhangbutao Step to reproduce: 1) Create path-based HadoopTable by Spark: {code:java} ./spark-3.3.1-bin-hadoop3/bin/spark-sql \--master local \--deploy-mode client \--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \--conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \--conf spark.sql.catalog.spark_catalog.type=hadoop \--conf spark.sql.catalog.spark_catalog.warehouse=hdfs://localhost:8028/tmp/testiceberg; create table ice_test_001(id int) using iceberg;insert into ice_test_001(id) values(1),2),(3);{code} 2) Create iceberg table based on the HadoopTable by Hive: {code:java} CREATE EXTERNAL TABLE ice_test_001STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 'hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001'TBLPROPERTIES ('iceberg.catalog'='location_based_table'); {code} 3)Select the HadoopTable by Hive *set hive.fetch.task.conversion=none;* {code:java} jdbc:hive2://localhost:10004/default> select * from testicedb118.ice_test_001; Error: Error while compiling statement: FAILED: IllegalArgumentException Pathname /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 from hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 is not a valid DFS filename. (state=42000,code=4) {code} Full stacktrace: {code:java} Caused by: java.lang.IllegalArgumentException: Pathname /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 from hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:256) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1752) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1749) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1764) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760) ~[hadoop-common-3.3.1.jar:?] at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStats(HiveIcebergStorageHandler.java:540) ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStatistics(HiveIcebergStorageHandler.java:533) ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:1073) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:302) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:193) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:181) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:173) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:148) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(
[jira] [Updated] (HIVE-27819) Iceberg: Upgrade iceberg version to 1.4.2
[ https://issues.apache.org/jira/browse/HIVE-27819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-27819: -- Summary: Iceberg: Upgrade iceberg version to 1.4.2 (was: Iceberg: Upgrade iceberg version to 1.4.1) > Iceberg: Upgrade iceberg version to 1.4.2 > - > > Key: HIVE-27819 > URL: https://issues.apache.org/jira/browse/HIVE-27819 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > Labels: pull-request-available > > Iceberg latest version 1.4.2 has been released out. we need upgrade iceberg > depdency from 1.3.9 to 1.4.2. Meantime, we should port some Hive catalog > changes from Iceberg repo to Hive repo. > [https://iceberg.apache.org/releases/#142-release] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27819) Iceberg: Upgrade iceberg version to 1.4.2
[ https://issues.apache.org/jira/browse/HIVE-27819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-27819: -- Description: Iceberg latest version 1.4.2 has been released out. we need upgrade iceberg depdency from 1.3.0 to 1.4.2. Meantime, we should port some Hive catalog changes from Iceberg repo to Hive repo. [https://iceberg.apache.org/releases/#142-release] was: Iceberg latest version 1.4.2 has been released out. we need upgrade iceberg depdency from 1.3.9 to 1.4.2. Meantime, we should port some Hive catalog changes from Iceberg repo to Hive repo. [https://iceberg.apache.org/releases/#142-release] > Iceberg: Upgrade iceberg version to 1.4.2 > - > > Key: HIVE-27819 > URL: https://issues.apache.org/jira/browse/HIVE-27819 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > Labels: pull-request-available > > Iceberg latest version 1.4.2 has been released out. we need upgrade iceberg > depdency from 1.3.0 to 1.4.2. Meantime, we should port some Hive catalog > changes from Iceberg repo to Hive repo. > [https://iceberg.apache.org/releases/#142-release] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27819) Iceberg: Upgrade iceberg version to 1.4.1
[ https://issues.apache.org/jira/browse/HIVE-27819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-27819: -- Description: Iceberg latest version 1.4.2 has been released out. we need upgrade iceberg depdency from 1.3.9 to 1.4.2. Meantime, we should port some Hive catalog changes from Iceberg repo to Hive repo. [https://iceberg.apache.org/releases/#142-release] was: Iceberg latest version 1.4.2 has been released out. we need upgrade iceberg depdency from 1.3.9 to 1.4.2. Meantime, we should port some Hive catalog changes from Iceberg repo to Hive repo. [https://iceberg.apache.org/releases/#142-release|https://iceberg.apache.org/releases/#141-release] > Iceberg: Upgrade iceberg version to 1.4.1 > - > > Key: HIVE-27819 > URL: https://issues.apache.org/jira/browse/HIVE-27819 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > Labels: pull-request-available > > Iceberg latest version 1.4.2 has been released out. we need upgrade iceberg > depdency from 1.3.9 to 1.4.2. Meantime, we should port some Hive catalog > changes from Iceberg repo to Hive repo. > [https://iceberg.apache.org/releases/#142-release] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27819) Iceberg: Upgrade iceberg version to 1.4.1
[ https://issues.apache.org/jira/browse/HIVE-27819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-27819: -- Description: Iceberg latest version 1.4.2 has been released out. we need upgrade iceberg depdency from 1.3.9 to 1.4.2. Meantime, we should port some Hive catalog changes from Iceberg repo to Hive repo. [https://iceberg.apache.org/releases/#142-release|https://iceberg.apache.org/releases/#141-release] was: Iceberg latest version 1.4.1 has been released out. we need upgrade iceberg depdency from 1.3.1 to 1.4.1. Meantime, we should port some Hive catalog changes from Iceberg repo to Hive repo. [https://iceberg.apache.org/releases/#141-release] > Iceberg: Upgrade iceberg version to 1.4.1 > - > > Key: HIVE-27819 > URL: https://issues.apache.org/jira/browse/HIVE-27819 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > Labels: pull-request-available > > Iceberg latest version 1.4.2 has been released out. we need upgrade iceberg > depdency from 1.3.9 to 1.4.2. Meantime, we should port some Hive catalog > changes from Iceberg repo to Hive repo. > [https://iceberg.apache.org/releases/#142-release|https://iceberg.apache.org/releases/#141-release] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26192) JDBC data connector queries occur exception at cbo stage
[ https://issues.apache.org/jira/browse/HIVE-26192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781230#comment-17781230 ] zhangbutao commented on HIVE-26192: --- [~ngangam] Thanks for letting me this issue. If i understand correctlly, we should change code as follows when the jdbc connector has different meaning between schema and database. e.g, postgres and oracle. getCatalogName() can keep be null as for PG the database name must be specified in jdbc url, e.g. {*}jdbc:postgresql://localhost:5432/testpgdb{*}, so the value in getCatalogName() is no need any more and also it has no effect for the PG connection. And users can select a certain schema if they use the schemaname from property "connector.remoteDbName". I have tested this change locally, it works as expected. {code:java} diff --git a/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/PostgreSQLConnectorProvider.java b/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/PostgreSQLConnectorProvider.java index b79bee452d..79a505e6a9 100644 --- a/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/PostgreSQLConnectorProvider.java +++ b/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/PostgreSQLConnectorProvider.java @@ -36,11 +36,11 @@ public PostgreSQLConnectorProvider(String dbName, DataConnector dataConn) { } @Override protected String getCatalogName() { - return scoped_db; + return null; } @Override protected String getDatabaseName() { - return null; + return scoped_db; } {code} Do I understand your question correctly? If we come to an agreement about this issue, i can submit a PR to fix it. Thanks. > JDBC data connector queries occur exception at cbo stage > - > > Key: HIVE-26192 > URL: https://issues.apache.org/jira/browse/HIVE-26192 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0-alpha-2 >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 1.5h > Remaining Estimate: 0h > > If you do a select query qtest with jdbc data connector, you will see > exception at cbo stage: > {code:java} > [ERROR] Failures: > [ERROR] TestMiniLlapCliDriver.testCliDriver:62 Client execution failed with > error code = 4 > running > select * from country > fname=dataconnector_mysql.qSee ./ql/target/tmp/log/hive.log or > ./itests/qtest/target/tmp/log/hive.log, or check ./ql/target/surefire-reports > or ./itests/qtest/target/surefire-reports/ for specific test cases logs. > org.apache.hadoop.hive.ql.parse.SemanticException: Table qtestDB.country was > not found in the database > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genTableLogicalPlan(CalcitePlanner.java:3078) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5048) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1665) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1605) > at > org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131) > at > org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914) > at > org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180) > at > org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1357) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:567) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12587) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:460) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:317) > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:106) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:500) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:452) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:416) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:410) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespo
[jira] [Commented] (HIVE-9260) Implement the bloom filter for the ParquetSerde
[ https://issues.apache.org/jira/browse/HIVE-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779740#comment-17779740 ] zhangbutao commented on HIVE-9260: -- [~Ferd] Will you continue to finish the ticket? I think bloom filter is more usefull to accelerate the parquet table query. > Implement the bloom filter for the ParquetSerde > --- > > Key: HIVE-9260 > URL: https://issues.apache.org/jira/browse/HIVE-9260 > Project: Hive > Issue Type: Sub-task > Components: Serializers/Deserializers >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu >Priority: Major > Attachments: HIVE-9260.patch > > > Implement the boom filter for parquet -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27826) Upgrade to Parquet 1.13.1
[ https://issues.apache.org/jira/browse/HIVE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-27826: -- Attachment: mvn_dependency_tree.text > Upgrade to Parquet 1.13.1 > - > > Key: HIVE-27826 > URL: https://issues.apache.org/jira/browse/HIVE-27826 > Project: Hive > Issue Type: Improvement > Components: Parquet >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > Labels: pull-request-available > Attachments: mvn_dependency_tree.text > > > Upgrade parquet to 1.13.1. Apache Iceberg also use this latest parquet > version. > [https://github.com/apache/iceberg/pull/7301] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27826) Upgrade to Parquet 1.13.1
[ https://issues.apache.org/jira/browse/HIVE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao reassigned HIVE-27826: - Assignee: zhangbutao > Upgrade to Parquet 1.13.1 > - > > Key: HIVE-27826 > URL: https://issues.apache.org/jira/browse/HIVE-27826 > Project: Hive > Issue Type: Improvement > Components: Parquet >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > > Upgrade parquet to 1.13.1. Apache Iceberg also use this latest parquet > version. > [https://github.com/apache/iceberg/pull/7301] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27826) Upgrade to Parquet 1.13.1
[ https://issues.apache.org/jira/browse/HIVE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-27826: -- Description: Upgrade parquet to 1.13.1. Apache Iceberg also use this latest parquet version. [https://github.com/apache/iceberg/pull/7301] was:Upgrade parquet to 1.13.1. Apache Iceberg also use this parquet version. > Upgrade to Parquet 1.13.1 > - > > Key: HIVE-27826 > URL: https://issues.apache.org/jira/browse/HIVE-27826 > Project: Hive > Issue Type: Improvement > Components: Parquet >Reporter: zhangbutao >Priority: Major > > Upgrade parquet to 1.13.1. Apache Iceberg also use this latest parquet > version. > [https://github.com/apache/iceberg/pull/7301] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27826) Upgrade to Parquet 1.13.1
zhangbutao created HIVE-27826: - Summary: Upgrade to Parquet 1.13.1 Key: HIVE-27826 URL: https://issues.apache.org/jira/browse/HIVE-27826 Project: Hive Issue Type: Improvement Components: Parquet Reporter: zhangbutao Upgrade parquet to 1.13.1. Apache Iceberg also use this parquet version. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27776) Iceberg: Upgrade iceberg version to 1.4.0
[ https://issues.apache.org/jira/browse/HIVE-27776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao resolved HIVE-27776. --- Resolution: Duplicate > Iceberg: Upgrade iceberg version to 1.4.0 > - > > Key: HIVE-27776 > URL: https://issues.apache.org/jira/browse/HIVE-27776 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: zhangbutao >Priority: Major > > [https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-core/1.4.0] > Iceberg1.4.0 has been released out, and we need upgrade iceberg depdency from > 1.3.1 to 1.4.0. Meantime, we should port some Hive catalog changes from > Iceberg repo to Hive repo. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27776) Iceberg: Upgrade iceberg version to 1.4.0
[ https://issues.apache.org/jira/browse/HIVE-27776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779010#comment-17779010 ] zhangbutao commented on HIVE-27776: --- Supersede by https://issues.apache.org/jira/browse/HIVE-27819 > Iceberg: Upgrade iceberg version to 1.4.0 > - > > Key: HIVE-27776 > URL: https://issues.apache.org/jira/browse/HIVE-27776 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: zhangbutao >Priority: Major > > [https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-core/1.4.0] > Iceberg1.4.0 has been released out, and we need upgrade iceberg depdency from > 1.3.1 to 1.4.0. Meantime, we should port some Hive catalog changes from > Iceberg repo to Hive repo. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27819) Iceberg: Upgrade iceberg version to 1.4.1
[ https://issues.apache.org/jira/browse/HIVE-27819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao reassigned HIVE-27819: - Assignee: zhangbutao > Iceberg: Upgrade iceberg version to 1.4.1 > - > > Key: HIVE-27819 > URL: https://issues.apache.org/jira/browse/HIVE-27819 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > > Iceberg latest version 1.4.1 has been released out. we need upgrade iceberg > depdency from 1.3.1 to 1.4.1. Meantime, we should port some Hive catalog > changes from Iceberg repo to Hive repo. > [https://iceberg.apache.org/releases/#141-release] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27819) Iceberg: Upgrade iceberg version to 1.4.1
zhangbutao created HIVE-27819: - Summary: Iceberg: Upgrade iceberg version to 1.4.1 Key: HIVE-27819 URL: https://issues.apache.org/jira/browse/HIVE-27819 Project: Hive Issue Type: Improvement Components: Iceberg integration Reporter: zhangbutao Iceberg latest version 1.4.1 has been released out. we need upgrade iceberg depdency from 1.3.1 to 1.4.1. Meantime, we should port some Hive catalog changes from Iceberg repo to Hive repo. [https://iceberg.apache.org/releases/#141-release] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HIVE-27814) Support VIEWs in the metadata federation
[ https://issues.apache.org/jira/browse/HIVE-27814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1561#comment-1561 ] zhangbutao edited comment on HIVE-27814 at 10/20/23 4:43 AM: - [~ngangam] All the data connectors we have implemented are JDBC type, and I think it is no problem to add the jdbc VIEW as hms remote table. I can't think of connctor which should be excluded. Let's implement it first. was (Author: zhangbutao): [~ngangam] All the data connectors we have implemented are JDBC type, and I think it is no problem to add the jdbc VIEW as hms remote table. I can't think of connctor which should exclude. Let's implement it first. > Support VIEWs in the metadata federation > > > Key: HIVE-27814 > URL: https://issues.apache.org/jira/browse/HIVE-27814 > Project: Hive > Issue Type: Sub-task > Components: Standalone Metastore >Affects Versions: 4.0.0-beta-1 >Reporter: Naveen Gangam >Assignee: Naveen Gangam >Priority: Major > > Currently, we only federate the TABLE type objects from the remote > datasource. We should be able to pull in VIEW type objects as well. > It appears we can currently create a JDBC-storage handler based table in Hive > that points to a view in the remote DB server. I do not see a reason to not > include this in the list of federated objects we pull in. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27814) Support VIEWs in the metadata federation
[ https://issues.apache.org/jira/browse/HIVE-27814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1561#comment-1561 ] zhangbutao commented on HIVE-27814: --- [~ngangam] All the data connectors we have implemented are JDBC type, and I think it is no problem to add the jdbc VIEW as hms remote table. I can't think of connctor which should exclude. Let's implement it first. > Support VIEWs in the metadata federation > > > Key: HIVE-27814 > URL: https://issues.apache.org/jira/browse/HIVE-27814 > Project: Hive > Issue Type: Sub-task > Components: Standalone Metastore >Affects Versions: 4.0.0-beta-1 >Reporter: Naveen Gangam >Assignee: Naveen Gangam >Priority: Major > > Currently, we only federate the TABLE type objects from the remote > datasource. We should be able to pull in VIEW type objects as well. > It appears we can currently create a JDBC-storage handler based table in Hive > that points to a view in the remote DB server. I do not see a reason to not > include this in the list of federated objects we pull in. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27793) Iceberg: Support setting current snapshot with SnapshotRef
[ https://issues.apache.org/jira/browse/HIVE-27793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao reassigned HIVE-27793: - Assignee: zhangbutao > Iceberg: Support setting current snapshot with SnapshotRef > -- > > Key: HIVE-27793 > URL: https://issues.apache.org/jira/browse/HIVE-27793 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > > [https://iceberg.apache.org/docs/latest/spark-procedures/#set_current_snapshot] > Spark supports setting current snapshot using snapshotId or snapshotRef. We > can refer to this to implement setting current snapshot with > SnapshotRef(branch or tag). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27793) Iceberg: Support setting current snapshot with SnapshotRef
zhangbutao created HIVE-27793: - Summary: Iceberg: Support setting current snapshot with SnapshotRef Key: HIVE-27793 URL: https://issues.apache.org/jira/browse/HIVE-27793 Project: Hive Issue Type: Improvement Components: Iceberg integration Reporter: zhangbutao [https://iceberg.apache.org/docs/latest/spark-procedures/#set_current_snapshot] Spark supports setting current snapshot using snapshotId or snapshotRef. We can refer to this to implement setting current snapshot with SnapshotRef(branch or tag). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27597) Implement JDBC Connector for HiveServer
[ https://issues.apache.org/jira/browse/HIVE-27597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773426#comment-17773426 ] zhangbutao commented on HIVE-27597: --- [~ngangam] I probably still don't have edit permission, and i can't find the edit button. Could you please check it again? Thanks. [https://cwiki.apache.org/confluence/display/Hive/Data+Connectors+in+Hive] > Implement JDBC Connector for HiveServer > > > Key: HIVE-27597 > URL: https://issues.apache.org/jira/browse/HIVE-27597 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Naveen Gangam >Assignee: Naveen Gangam >Priority: Major > Labels: pull-request-available > > The initial idea of having a thrift based connector, that would enable Hive > Metastore to use thrift APIs to interact with another metastore from another > cluster, has some limitations. Features like column masking support become a > challenge as we may bypass the authz controls on the remote cluster. > Instead if we could federate a query from one instance of HS2 to another > instance of HS2 over JDBC, we would address the above concerns. This will > atleast give us the ability to access tables across cluster boundaries. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27780) Implement direct SQL for get_all_functions
[ https://issues.apache.org/jira/browse/HIVE-27780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao reassigned HIVE-27780: - Assignee: zhangbutao > Implement direct SQL for get_all_functions > -- > > Key: HIVE-27780 > URL: https://issues.apache.org/jira/browse/HIVE-27780 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27780) Implement direct SQL for get_all_functions
zhangbutao created HIVE-27780: - Summary: Implement direct SQL for get_all_functions Key: HIVE-27780 URL: https://issues.apache.org/jira/browse/HIVE-27780 Project: Hive Issue Type: Improvement Components: Standalone Metastore Reporter: zhangbutao -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27776) Iceberg: Upgrade iceberg version to 1.4.0
zhangbutao created HIVE-27776: - Summary: Iceberg: Upgrade iceberg version to 1.4.0 Key: HIVE-27776 URL: https://issues.apache.org/jira/browse/HIVE-27776 Project: Hive Issue Type: Improvement Components: Iceberg integration Reporter: zhangbutao [https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-core/1.4.0] Iceberg1.4.0 has been released out, and we need upgrade iceberg depdency from 1.3.1 to 1.4.0. Meantime, we should port some Hive catalog changes from Iceberg repo to Hive repo. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27729) Iceberg: Check Iceberg type in AlterTableExecuteAnalyzer
[ https://issues.apache.org/jira/browse/HIVE-27729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-27729: -- Description: If we execute ROLLBACK and other cmd(expire snashot & fast_forward. etc) on a *non-iceberg table,* it will throw NPE. We need to check iceberg type in _AlterTableExecuteAnalyzer_ to throw a better exception. {code:java} //create a non-iceberg table create table non-iceberg (id int);{code} {code:java} // execute rollback ALTER TABLE non_ice EXECUTE ROLLBACK('2022-09-26 00:00:00');{code} {code:java} ERROR : Failed java.lang.NullPointerException: null at org.apache.hadoop.hive.ql.metadata.Hive.alterTableExecuteOperation(Hive.java:6772) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.ddl.table.execute.AlterTableExecuteOperation.execute(AlterTableExecuteOperation.java:37) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:327) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:244) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:105) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:367) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:205) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:236) ~[hive-service-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:90) ~[hive-service-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:336) ~[hive-service-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_291] at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_291] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) ~[hadoop-common-3.3.1.jar:?] at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:356) ~[hive-service-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_291] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_291] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_291] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_291] at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_291] ERROR : DDLTask failed, DDL Operation: class org.apache.hadoop.hive.ql.ddl.table.execute.AlterTableExecuteOperation {code} was: If we execute ROLLBACK and other cmd(expire snashot & fast_forward. etc) on a *non-iceberg table,* it will throw NPE. We need to check iceberg type in _AlterTableExecuteAnalyzer_ to throw a better exception. {code:java} ERROR : Failed java.lang.NullPointerException: null at org.apache.hadoop.hive.ql.metadata.Hive.alterTableExecuteOperation(Hive.java:6772) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.ddl.table.execute.AlterTableExecuteOperation.execute(AlterTableExecuteOperation.java:37) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.q
[jira] [Assigned] (HIVE-27729) Iceberg: Check Iceberg type in AlterTableExecuteAnalyzer
[ https://issues.apache.org/jira/browse/HIVE-27729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao reassigned HIVE-27729: - Assignee: zhangbutao > Iceberg: Check Iceberg type in AlterTableExecuteAnalyzer > > > Key: HIVE-27729 > URL: https://issues.apache.org/jira/browse/HIVE-27729 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > > If we execute ROLLBACK and other cmd(expire snashot & fast_forward. etc) on a > *non-iceberg table,* it will throw NPE. We need to check iceberg type in > _AlterTableExecuteAnalyzer_ to throw a better exception. > > {code:java} > ERROR : Failed > java.lang.NullPointerException: null > at > org.apache.hadoop.hive.ql.metadata.Hive.alterTableExecuteOperation(Hive.java:6772) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.ddl.table.execute.AlterTableExecuteOperation.execute(AlterTableExecuteOperation.java:37) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:327) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:244) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:105) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:367) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:205) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:236) > ~[hive-service-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:90) > ~[hive-service-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:336) > ~[hive-service-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at java.security.AccessController.doPrivileged(Native Method) > ~[?:1.8.0_291] > at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_291] > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) > ~[hadoop-common-3.3.1.jar:?] > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:356) > ~[hive-service-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[?:1.8.0_291] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[?:1.8.0_291] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > ~[?:1.8.0_291] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > ~[?:1.8.0_291] > at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_291] > ERROR : DDLTask failed, DDL Operation: class > org.apache.hadoop.hive.ql.ddl.table.execute.AlterTableExecuteOperation > {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27729) Iceberg: Check Iceberg type in AlterTableExecuteAnalyzer
zhangbutao created HIVE-27729: - Summary: Iceberg: Check Iceberg type in AlterTableExecuteAnalyzer Key: HIVE-27729 URL: https://issues.apache.org/jira/browse/HIVE-27729 Project: Hive Issue Type: Improvement Components: Iceberg integration Reporter: zhangbutao If we execute ROLLBACK and other cmd(expire snashot & fast_forward. etc) on a *non-iceberg table,* it will throw NPE. We need to check iceberg type in _AlterTableExecuteAnalyzer_ to throw a better exception. {code:java} ERROR : Failed java.lang.NullPointerException: null at org.apache.hadoop.hive.ql.metadata.Hive.alterTableExecuteOperation(Hive.java:6772) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.ddl.table.execute.AlterTableExecuteOperation.execute(AlterTableExecuteOperation.java:37) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:327) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:244) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:105) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:367) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:205) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:236) ~[hive-service-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:90) ~[hive-service-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:336) ~[hive-service-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_291] at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_291] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) ~[hadoop-common-3.3.1.jar:?] at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:356) ~[hive-service-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_291] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_291] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_291] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_291] at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_291] ERROR : DDLTask failed, DDL Operation: class org.apache.hadoop.hive.ql.ddl.table.execute.AlterTableExecuteOperation {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27711) Allow creating a branch from tag name
[ https://issues.apache.org/jira/browse/HIVE-27711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao reassigned HIVE-27711: - Assignee: zhangbutao > Allow creating a branch from tag name > - > > Key: HIVE-27711 > URL: https://issues.apache.org/jira/browse/HIVE-27711 > Project: Hive > Issue Type: Sub-task >Reporter: Ayush Saxena >Assignee: zhangbutao >Priority: Major > > Allow creating a branch from tag name. > If a tag is already there we should be able to create a branch with the same > snapshot id that corresponds to that tag -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27689) Iceberg: Remove unsed iceberg property
[ https://issues.apache.org/jira/browse/HIVE-27689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764995#comment-17764995 ] zhangbutao commented on HIVE-27689: --- PR https://github.com/apache/hive/pull/4681 > Iceberg: Remove unsed iceberg property > -- > > Key: HIVE-27689 > URL: https://issues.apache.org/jira/browse/HIVE-27689 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27689) Iceberg: Remove unsed iceberg property
zhangbutao created HIVE-27689: - Summary: Iceberg: Remove unsed iceberg property Key: HIVE-27689 URL: https://issues.apache.org/jira/browse/HIVE-27689 Project: Hive Issue Type: Improvement Components: Iceberg integration Reporter: zhangbutao -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27689) Iceberg: Remove unsed iceberg property
[ https://issues.apache.org/jira/browse/HIVE-27689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao reassigned HIVE-27689: - Assignee: zhangbutao > Iceberg: Remove unsed iceberg property > -- > > Key: HIVE-27689 > URL: https://issues.apache.org/jira/browse/HIVE-27689 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27651) Upgrade hbase version
[ https://issues.apache.org/jira/browse/HIVE-27651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao reassigned HIVE-27651: - Assignee: zhangbutao > Upgrade hbase version > - > > Key: HIVE-27651 > URL: https://issues.apache.org/jira/browse/HIVE-27651 > Project: Hive > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: zhangbutao >Priority: Major > > Upgrade hbase version in hive, currently we are using some legacy alpha-4 > version, move it to the latest -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27593) Iceberg: Keep iceberg properties in sync with hms properties
[ https://issues.apache.org/jira/browse/HIVE-27593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-27593: -- Description: In HIVE-26596, as we have not implement all COW mode, we enforced *mor* mode for iceberg v2 table in the tow scenarios: # create a v2 iceberg table, the delete mode will be set *mor* if not specified # upgrage v1 table to v2, and the delete mode will be set mor In HS2, we check the mode(cow/mor) from hms table properties instead of *iceberg* {*}properties(metadata json file){*}, and HIVE-26596 only change hms table properties. Therefore, it is ok for HS2 to operate iceberg table by checking cow/mor mode from hms properties, but for others like Spark, they operate the iceberg table by checking cow/mor from {*}iceberg properties(metadata json file){*}. Before we implement all COW mode, we need keep iceberg properties in sync with hms properties to make the users have the same experience on multiple engines(HS2 & Spark). was: In HIVE-26596, as we have not implement all COW mode, we enforced *mor* mode for iceberg v2 table in the tow scenarios: # create a v2 iceberg table, the delete mode will be set *mor* if not specified # upgrage v1 table to v2, and the delete mode will be set mor In HS2, we check the mode(cow/mor) from hms table properties instead of *iceberg* {*}properties(metadata json file){*}, and HIVE-26596 only change hms table properties. Therefore, it is ok for HS2 to operate iceberg table by checking cow/mor mode from hms properties, but for others like Spark, they operate the iceberg table by checking cow/mor from {*}iceberg properties(metadata json file){*}. Before we implement all COW mode, we need keep iceberg properties in sync with hms properties to ** make the users have the same experience on multiple engines(HS2 & Spark). > Iceberg: Keep iceberg properties in sync with hms properties > > > Key: HIVE-27593 > URL: https://issues.apache.org/jira/browse/HIVE-27593 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > > In HIVE-26596, as we have not implement all COW mode, we enforced *mor* mode > for iceberg v2 table in the tow scenarios: > # create a v2 iceberg table, the delete mode will be set *mor* if not > specified > # upgrage v1 table to v2, and the delete mode will be set mor > > In HS2, we check the mode(cow/mor) from hms table properties instead of > *iceberg* {*}properties(metadata json file){*}, and HIVE-26596 only change > hms table properties. Therefore, it is ok for HS2 to operate iceberg table > by checking cow/mor mode from hms properties, but for others like Spark, they > operate the iceberg table by checking cow/mor from {*}iceberg > properties(metadata json file){*}. > Before we implement all COW mode, we need keep iceberg properties in sync > with hms properties to make the users have the same experience on multiple > engines(HS2 & Spark). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27593) Iceberg: Keep iceberg properties in sync with hms properties
[ https://issues.apache.org/jira/browse/HIVE-27593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-27593: -- Description: In HIVE-26596, as we have not implement all COW mode, we enforced *mor* mode for iceberg v2 table in the tow scenarios: # create a v2 iceberg table, the delete mode will be set *mor* if not specified # upgrage v1 table to v2, and the delete mode will be set mor In HS2, we check the mode(cow/mor) from hms table properties instead of *iceberg* {*}properties(metadata json file){*}, and HIVE-26596 only change hms table properties. Therefore, it is ok for HS2 to operate iceberg table by checking cow/mor mode from hms properties, but for others like Spark, they operate the iceberg table by checking cow/mor from {*}iceberg properties(metadata json file){*}. Before we implement all COW mode, we need keep iceberg properties in sync with hms properties to ** make the users have the same experience on multiple engines(HS2 & Spark). > Iceberg: Keep iceberg properties in sync with hms properties > > > Key: HIVE-27593 > URL: https://issues.apache.org/jira/browse/HIVE-27593 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > > In HIVE-26596, as we have not implement all COW mode, we enforced *mor* mode > for iceberg v2 table in the tow scenarios: > # create a v2 iceberg table, the delete mode will be set *mor* if not > specified > # upgrage v1 table to v2, and the delete mode will be set mor > > In HS2, we check the mode(cow/mor) from hms table properties instead of > *iceberg* {*}properties(metadata json file){*}, and HIVE-26596 only change > hms table properties. Therefore, it is ok for HS2 to operate iceberg table > by checking cow/mor mode from hms properties, but for others like Spark, they > operate the iceberg table by checking cow/mor from {*}iceberg > properties(metadata json file){*}. > Before we implement all COW mode, we need keep iceberg properties in sync > with hms properties to ** make the users have the same experience on multiple > engines(HS2 & Spark). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27593) Iceberg: Keep iceberg properties in sync with hms properties
zhangbutao created HIVE-27593: - Summary: Iceberg: Keep iceberg properties in sync with hms properties Key: HIVE-27593 URL: https://issues.apache.org/jira/browse/HIVE-27593 Project: Hive Issue Type: Improvement Components: Iceberg integration Reporter: zhangbutao -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27593) Iceberg: Keep iceberg properties in sync with hms properties
[ https://issues.apache.org/jira/browse/HIVE-27593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao reassigned HIVE-27593: - Assignee: zhangbutao > Iceberg: Keep iceberg properties in sync with hms properties > > > Key: HIVE-27593 > URL: https://issues.apache.org/jira/browse/HIVE-27593 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27565) Fix NPE when dropping table in HiveQueryLifeTimeHook::checkAndRollbackCTAS
zhangbutao created HIVE-27565: - Summary: Fix NPE when dropping table in HiveQueryLifeTimeHook::checkAndRollbackCTAS Key: HIVE-27565 URL: https://issues.apache.org/jira/browse/HIVE-27565 Project: Hive Issue Type: Bug Reporter: zhangbutao If dropping a iceberg table which is used by a materialized view, HiveQueryLifeTimeHook::checkAndRollbackCTAS will throw NPE. Step to repro: * create a iceberg table: create table test_ice1 (id int) stored by iceberg; * create a materialized view: create materialized view ice_mat1 as select * from test_ice1; * drop the iceberg table: drop table test_ice01; {code:java} at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_291] at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:4462) ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] at com.sun.proxy.$Proxy47.dropTable(Unknown Source) ~[?:?] at org.apache.hadoop.hive.ql.metadata.Hive.dropTable(Hive.java:1500) ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] ... 26 more ERROR : FAILED: Execution Error, return code 4 from org.apache.hadoop.hive.ql.ddl.DDLTask. MetaException(message:Cannot drop table as it is used in the following materialized views [testdbpr.ice_mat1] ) WARN : Failed when invoking query after execution hook java.lang.RuntimeException: Not able to check whether the CTAS table directory exists due to: at org.apache.hadoop.hive.ql.HiveQueryLifeTimeHook.checkAndRollbackCTAS(HiveQueryLifeTimeHook.java:84) ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] at org.apache.hadoop.hive.ql.HiveQueryLifeTimeHook.afterExecution(HiveQueryLifeTimeHook.java:65) ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] at org.apache.hadoop.hive.ql.HookRunner.runAfterExecutionHook(HookRunner.java:185) ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] at org.apache.hadoop.hive.ql.Executor.cleanUp(Executor.java:525) ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:118) ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:367) ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:205) ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185) ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:236) ~[hive-service-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] at org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:90) ~[hive-service-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:336) ~[hive-service-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_291] at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_291] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) ~[hadoop-common-3.3.1.jar:?] at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:356) ~[hive-service-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_291] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_291] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_291] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_291] at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_291] Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.HiveQueryLifeTimeHook.checkAndRollbackCTAS(HiveQueryLifeTimeHook.java:79) ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] ... 21 more INFO : Completed executing command(queryId=hive_20230804145734_08837e22-5ff0-4b56-a0cf-69b0414171dd); Time taken: 0.073 seconds Error: Error while compiling statement: FAILED: Execution Error, return code 4 from org.apache.hadoop.hive.ql.ddl.DDLTask. MetaException(message:Cannot drop table as it is used in the follo
[jira] [Assigned] (HIVE-27565) Fix NPE when dropping table in HiveQueryLifeTimeHook::checkAndRollbackCTAS
[ https://issues.apache.org/jira/browse/HIVE-27565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao reassigned HIVE-27565: - Assignee: zhangbutao > Fix NPE when dropping table in HiveQueryLifeTimeHook::checkAndRollbackCTAS > -- > > Key: HIVE-27565 > URL: https://issues.apache.org/jira/browse/HIVE-27565 > Project: Hive > Issue Type: Bug >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > > If dropping a iceberg table which is used by a materialized view, > HiveQueryLifeTimeHook::checkAndRollbackCTAS will throw NPE. > > Step to repro: > * create a iceberg table: > create table test_ice1 (id int) stored by iceberg; > * create a materialized view: > create materialized view ice_mat1 as select * from test_ice1; > * drop the iceberg table: > drop table test_ice1; > > {code:java} > at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_291] > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:4462) > ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] > at com.sun.proxy.$Proxy47.dropTable(Unknown Source) ~[?:?] > at org.apache.hadoop.hive.ql.metadata.Hive.dropTable(Hive.java:1500) > ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] > ... 26 more > ERROR : FAILED: Execution Error, return code 4 from > org.apache.hadoop.hive.ql.ddl.DDLTask. MetaException(message:Cannot drop > table as it is used in the following materialized views [testdbpr.ice_mat1] > ) > WARN : Failed when invoking query after execution hook > java.lang.RuntimeException: Not able to check whether the CTAS table > directory exists due to: > at > org.apache.hadoop.hive.ql.HiveQueryLifeTimeHook.checkAndRollbackCTAS(HiveQueryLifeTimeHook.java:84) > ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] > at > org.apache.hadoop.hive.ql.HiveQueryLifeTimeHook.afterExecution(HiveQueryLifeTimeHook.java:65) > ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] > at > org.apache.hadoop.hive.ql.HookRunner.runAfterExecutionHook(HookRunner.java:185) > ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] > at org.apache.hadoop.hive.ql.Executor.cleanUp(Executor.java:525) > ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] > at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:118) > ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:367) > ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:205) > ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) > ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) > ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185) > ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:236) > ~[hive-service-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] > at > org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:90) > ~[hive-service-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:336) > ~[hive-service-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] > at java.security.AccessController.doPrivileged(Native Method) > ~[?:1.8.0_291] > at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_291] > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) > ~[hadoop-common-3.3.1.jar:?] > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:356) > ~[hive-service-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[?:1.8.0_291] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[?:1.8.0_291] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > ~[?:1.8.0_291] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > ~[?:1.8.0_291] > at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_291] > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.HiveQueryLifeTimeHook.checkAndRollbac
[jira] [Updated] (HIVE-27565) Fix NPE when dropping table in HiveQueryLifeTimeHook::checkAndRollbackCTAS
[ https://issues.apache.org/jira/browse/HIVE-27565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-27565: -- Description: If dropping a iceberg table which is used by a materialized view, HiveQueryLifeTimeHook::checkAndRollbackCTAS will throw NPE. Step to repro: * create a iceberg table: create table test_ice1 (id int) stored by iceberg; * create a materialized view: create materialized view ice_mat1 as select * from test_ice1; * drop the iceberg table: drop table test_ice1; {code:java} at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_291] at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:4462) ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] at com.sun.proxy.$Proxy47.dropTable(Unknown Source) ~[?:?] at org.apache.hadoop.hive.ql.metadata.Hive.dropTable(Hive.java:1500) ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] ... 26 more ERROR : FAILED: Execution Error, return code 4 from org.apache.hadoop.hive.ql.ddl.DDLTask. MetaException(message:Cannot drop table as it is used in the following materialized views [testdbpr.ice_mat1] ) WARN : Failed when invoking query after execution hook java.lang.RuntimeException: Not able to check whether the CTAS table directory exists due to: at org.apache.hadoop.hive.ql.HiveQueryLifeTimeHook.checkAndRollbackCTAS(HiveQueryLifeTimeHook.java:84) ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] at org.apache.hadoop.hive.ql.HiveQueryLifeTimeHook.afterExecution(HiveQueryLifeTimeHook.java:65) ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] at org.apache.hadoop.hive.ql.HookRunner.runAfterExecutionHook(HookRunner.java:185) ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] at org.apache.hadoop.hive.ql.Executor.cleanUp(Executor.java:525) ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:118) ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:367) ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:205) ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185) ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:236) ~[hive-service-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] at org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:90) ~[hive-service-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:336) ~[hive-service-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_291] at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_291] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) ~[hadoop-common-3.3.1.jar:?] at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:356) ~[hive-service-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_291] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_291] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_291] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_291] at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_291] Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.HiveQueryLifeTimeHook.checkAndRollbackCTAS(HiveQueryLifeTimeHook.java:79) ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT] ... 21 more INFO : Completed executing command(queryId=hive_20230804145734_08837e22-5ff0-4b56-a0cf-69b0414171dd); Time taken: 0.073 seconds Error: Error while compiling statement: FAILED: Execution Error, return code 4 from org.apache.hadoop.hive.ql.ddl.DDLTask. MetaException(message:Cannot drop table as it is used in the following materialized views [testdbpr.ice_mat1] ) (state=08S01,code=4) {code} was: If dropping a iceberg table which is used by a materiali
[jira] [Commented] (HIVE-27553) After upgrading from Hive1 to Hive3, Decimal computation experiences a loss of precision
[ https://issues.apache.org/jira/browse/HIVE-27553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17749598#comment-17749598 ] zhangbutao commented on HIVE-27553: --- This issue was caused by HIVE-15331 which emulated the SQL Server decimal behavior. > After upgrading from Hive1 to Hive3, Decimal computation experiences a loss > of precision > > > Key: HIVE-27553 > URL: https://issues.apache.org/jira/browse/HIVE-27553 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 3.1.3 >Reporter: ZhengBowen >Priority: Major > Attachments: image-2023-07-31-20-40-00-679.png, > image-2023-07-31-20-40-35-050.png, image-2023-07-31-20-43-05-379.png, > image-2023-07-31-20-43-49-775.png > > > I can reproduce this bug. > {quote}{{create}} {{table}} {{decimal_test(}} > {{}}{{id }}{{{}int{}}}{{{},{}}} > {{}}{{quantity }}{{{}decimal{}}}{{{}(38,8),{}}} > {{}}{{cost }}{{{}decimal{}}}{{{}(38,8){}}} > {{) stored }}{{as}} {{textfile;}} > > {{insert}} {{into}} {{decimal_test }}{{{}values{}}}{{{}(1,0.8000, > 0.00015000);{}}} > > {{select}} {{quantity * cost }}{{from}} {{decimal_test;}} > {quote} > *1、The following are the execution results and execution plan on Hive-1.0.1:* > !image-2023-07-31-20-40-00-679.png|width=550,height=230! > !image-2023-07-31-20-43-05-379.png|width=540,height=144! > *2、The following are the execution results and execution plan on Hive-3.1.3:* > !image-2023-07-31-20-40-35-050.png|width=538,height=257! > !image-2023-07-31-20-43-49-775.png|width=533,height=142! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27440) Improve data connector cache
[ https://issues.apache.org/jira/browse/HIVE-27440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao resolved HIVE-27440. --- Fix Version/s: 4.0.0 Resolution: Fixed > Improve data connector cache > > > Key: HIVE-27440 > URL: https://issues.apache.org/jira/browse/HIVE-27440 > Project: Hive > Issue Type: Sub-task >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > _*DataConnectorProviderFactory*_ uses HashMap to cache data connector > instances, and there is no way to invalidate the cache unless you restart the > MetaStore. > What is more serious is that if you drop or alter the dataconnector, the > cache will not change, and you maybe use a invalid dataconnector next time. > > I think we can improve the dataconnector cache from the two aspects: > * Use Caffeine with a *maxmumsize* e.g. 100 to cache data connector instead > of HashMap, and set a *expire time* after the last accessing. And we also > should close the underlying datasource connection using {*}Caffeine > RemovalListener{*}. > * After executing Drop or Alter DDL on a dataConnector, we should *update > cache* to clean the dataConnector to avoid using the invalid dataConnector > next time. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27440) Improve data connector cache
[ https://issues.apache.org/jira/browse/HIVE-27440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17743354#comment-17743354 ] zhangbutao commented on HIVE-27440: --- Fix has been merged! Thanks [~hemanth619] [~akshatm] [~ngangam] > Improve data connector cache > > > Key: HIVE-27440 > URL: https://issues.apache.org/jira/browse/HIVE-27440 > Project: Hive > Issue Type: Sub-task >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > Labels: pull-request-available > > _*DataConnectorProviderFactory*_ uses HashMap to cache data connector > instances, and there is no way to invalidate the cache unless you restart the > MetaStore. > What is more serious is that if you drop or alter the dataconnector, the > cache will not change, and you maybe use a invalid dataconnector next time. > > I think we can improve the dataconnector cache from the two aspects: > * Use Caffeine with a *maxmumsize* e.g. 100 to cache data connector instead > of HashMap, and set a *expire time* after the last accessing. And we also > should close the underlying datasource connection using {*}Caffeine > RemovalListener{*}. > * After executing Drop or Alter DDL on a dataConnector, we should *update > cache* to clean the dataConnector to avoid using the invalid dataConnector > next time. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27503) Support query iceberg tag
[ https://issues.apache.org/jira/browse/HIVE-27503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-27503: -- Description: Support query iceberg tag like this: {code:java} select * from db.tbl.tag_tagName;{code} In addition, Iceberg tag can not be written data and we should throw exception in compile stage if users want to write data to iceberg tag. was: Support query iceberg tag like this: {code:java} select * from db.tbl.tag_tagName;{code} In addition, Iceberg tag can not be written data and we should throw exception when compile stage if users want to write data to iceberg tag. > Support query iceberg tag > - > > Key: HIVE-27503 > URL: https://issues.apache.org/jira/browse/HIVE-27503 > Project: Hive > Issue Type: Sub-task > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > > Support query iceberg tag like this: > {code:java} > select * from db.tbl.tag_tagName;{code} > > In addition, Iceberg tag can not be written data and we should throw > exception in compile stage if users want to write data to iceberg tag. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27503) Support query iceberg tag
zhangbutao created HIVE-27503: - Summary: Support query iceberg tag Key: HIVE-27503 URL: https://issues.apache.org/jira/browse/HIVE-27503 Project: Hive Issue Type: Sub-task Components: Iceberg integration Reporter: zhangbutao Support query iceberg tag like this: {code:java} select * from db.tbl.tag_tagName;{code} In addition, Iceberg tag can not be written data and we should throw exception when compile stage if users want to write data to iceberg tag. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27503) Support query iceberg tag
[ https://issues.apache.org/jira/browse/HIVE-27503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao reassigned HIVE-27503: - Assignee: zhangbutao > Support query iceberg tag > - > > Key: HIVE-27503 > URL: https://issues.apache.org/jira/browse/HIVE-27503 > Project: Hive > Issue Type: Sub-task > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > > Support query iceberg tag like this: > {code:java} > select * from db.tbl.tag_tagName;{code} > > In addition, Iceberg tag can not be written data and we should throw > exception when compile stage if users want to write data to iceberg tag. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27440) Improve data connector cache
zhangbutao created HIVE-27440: - Summary: Improve data connector cache Key: HIVE-27440 URL: https://issues.apache.org/jira/browse/HIVE-27440 Project: Hive Issue Type: Sub-task Reporter: zhangbutao _*DataConnectorProviderFactory*_ uses HashMap to cache data connector instances, and there is no way to invalidate the cache unless you restart the MetaStore. What is more serious is that if you drop or alter the dataconnector, the cache will not change, and you maybe use a invalid dataconnector next time. I think we can improve the dataconnector cache from the two aspects: * Use Caffeine with a *maxmumsize* e.g. 100 to cache data connector instead of HashMap, and set a *expire time* after the last accessing. And we also should close the underlying datasource connection using {*}Caffeine RemovalListener{*}. * After executing Drop or Alter DDL on a dataConnector, we should *update cache* to clean the dataConnector to avoid using the invalid dataConnector next time. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27440) Improve data connector cache
[ https://issues.apache.org/jira/browse/HIVE-27440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao reassigned HIVE-27440: - Assignee: zhangbutao > Improve data connector cache > > > Key: HIVE-27440 > URL: https://issues.apache.org/jira/browse/HIVE-27440 > Project: Hive > Issue Type: Sub-task >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > > _*DataConnectorProviderFactory*_ uses HashMap to cache data connector > instances, and there is no way to invalidate the cache unless you restart the > MetaStore. > What is more serious is that if you drop or alter the dataconnector, the > cache will not change, and you maybe use a invalid dataconnector next time. > > I think we can improve the dataconnector cache from the two aspects: > * Use Caffeine with a *maxmumsize* e.g. 100 to cache data connector instead > of HashMap, and set a *expire time* after the last accessing. And we also > should close the underlying datasource connection using {*}Caffeine > RemovalListener{*}. > * After executing Drop or Alter DDL on a dataConnector, we should *update > cache* to clean the dataConnector to avoid using the invalid dataConnector > next time. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27435) Iceberg: Add cache for Hive::createStorageHandler to avoid creating storage handler frequently
zhangbutao created HIVE-27435: - Summary: Iceberg: Add cache for Hive::createStorageHandler to avoid creating storage handler frequently Key: HIVE-27435 URL: https://issues.apache.org/jira/browse/HIVE-27435 Project: Hive Issue Type: Improvement Components: Iceberg integration Reporter: zhangbutao [https://github.com/apache/hive/pull/4372/files#r1222816743] Create or Alter iceberg table will invoke method _*Hive::createStorageHandler*_ multiple times. We may should consider how to add cache to avoid this. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27431) Clean invalid properties in test moduel
[ https://issues.apache.org/jira/browse/HIVE-27431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-27431: -- Summary: Clean invalid properties in test moduel (was: Clean invalid property in test moduel) > Clean invalid properties in test moduel > --- > > Key: HIVE-27431 > URL: https://issues.apache.org/jira/browse/HIVE-27431 > Project: Hive > Issue Type: Test > Components: Test >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Minor > > In *data/conf* module, *hive-site.xml* is used to qtest&test. It keeps many > invalid properties, and if you run test in IDE, you will see lots lof WARN: > {code:java} > 2023-06-12T01:28:18,074 WARN [main] conf.HiveConf: HiveConf of name > hive.mapjoin.max.gc.time.percentage does not exist > 2023-06-12T01:28:18,074 WARN [main] conf.HiveConf: HiveConf of name > hive.llap.io.cache.orc.size does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.dummyparam.test.server.specific.config.override does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.metastore.metadb.dir does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.llap.io.cache.orc.alloc.min does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.dummyparam.test.server.specific.config.hivesite does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.llap.io.cache.orc.alloc.max does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.metastore.client.cache.maxSize does not exist > 2023-06-12T01:28:18,076 WARN [main] conf.HiveConf: HiveConf of name > hive.dummyparam.test.server.specific.config.metastoresite does not exist > 2023-06-12T01:28:18,076 WARN [main] conf.HiveConf: HiveConf of name > hive.metastore.client.cache.recordStats does not exist > 2023-06-12T01:28:18,076 WARN [main] conf.HiveConf: HiveConf of name > hive.llap.io.cache.orc.arena.size does not exist > 2023-06-12T01:28:18,076 WARN [main] conf.HiveConf: HiveConf of name > hive.stats.key.prefix.reserve.length does not exist {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27431) Clean invalid property in test moduel
[ https://issues.apache.org/jira/browse/HIVE-27431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-27431: -- Description: In *data/conf* module, *hive-site.xml* is used to qtest&test. It keeps many invalid properties, and if you run test in IDE, you will see lots lof WARN: {code:java} 2023-06-12T01:28:18,074 WARN [main] conf.HiveConf: HiveConf of name hive.mapjoin.max.gc.time.percentage does not exist 2023-06-12T01:28:18,074 WARN [main] conf.HiveConf: HiveConf of name hive.llap.io.cache.orc.size does not exist 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name hive.dummyparam.test.server.specific.config.override does not exist 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name hive.metastore.metadb.dir does not exist 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name hive.llap.io.cache.orc.alloc.min does not exist 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name hive.dummyparam.test.server.specific.config.hivesite does not exist 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name hive.llap.io.cache.orc.alloc.max does not exist 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name hive.metastore.client.cache.maxSize does not exist 2023-06-12T01:28:18,076 WARN [main] conf.HiveConf: HiveConf of name hive.dummyparam.test.server.specific.config.metastoresite does not exist 2023-06-12T01:28:18,076 WARN [main] conf.HiveConf: HiveConf of name hive.metastore.client.cache.recordStats does not exist 2023-06-12T01:28:18,076 WARN [main] conf.HiveConf: HiveConf of name hive.llap.io.cache.orc.arena.size does not exist 2023-06-12T01:28:18,076 WARN [main] conf.HiveConf: HiveConf of name hive.stats.key.prefix.reserve.length does not exist {code} was: {code:java} 2023-06-12T01:28:18,074 WARN [main] conf.HiveConf: HiveConf of name hive.mapjoin.max.gc.time.percentage does not exist 2023-06-12T01:28:18,074 WARN [main] conf.HiveConf: HiveConf of name hive.llap.io.cache.orc.size does not exist 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name hive.dummyparam.test.server.specific.config.override does not exist 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name hive.metastore.metadb.dir does not exist 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name hive.llap.io.cache.orc.alloc.min does not exist 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name hive.dummyparam.test.server.specific.config.hivesite does not exist 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name hive.llap.io.cache.orc.alloc.max does not exist 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name hive.metastore.client.cache.maxSize does not exist 2023-06-12T01:28:18,076 WARN [main] conf.HiveConf: HiveConf of name hive.dummyparam.test.server.specific.config.metastoresite does not exist 2023-06-12T01:28:18,076 WARN [main] conf.HiveConf: HiveConf of name hive.metastore.client.cache.recordStats does not exist 2023-06-12T01:28:18,076 WARN [main] conf.HiveConf: HiveConf of name hive.llap.io.cache.orc.arena.size does not exist 2023-06-12T01:28:18,076 WARN [main] conf.HiveConf: HiveConf of name hive.stats.key.prefix.reserve.length does not exist {code} > Clean invalid property in test moduel > - > > Key: HIVE-27431 > URL: https://issues.apache.org/jira/browse/HIVE-27431 > Project: Hive > Issue Type: Test > Components: Test >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Minor > > In *data/conf* module, *hive-site.xml* is used to qtest&test. It keeps many > invalid properties, and if you run test in IDE, you will see lots lof WARN: > {code:java} > 2023-06-12T01:28:18,074 WARN [main] conf.HiveConf: HiveConf of name > hive.mapjoin.max.gc.time.percentage does not exist > 2023-06-12T01:28:18,074 WARN [main] conf.HiveConf: HiveConf of name > hive.llap.io.cache.orc.size does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.dummyparam.test.server.specific.config.override does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.metastore.metadb.dir does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.llap.io.cache.orc.alloc.min does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.dummyparam.test.server.specific.config.hivesite does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.llap.io.cache.orc.alloc.max does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.metastore.client.cache.maxSize does not exist > 2023-06-12T01:28:18,076 WARN [main] conf.HiveConf: HiveC
[jira] [Created] (HIVE-27431) Clean invalid property in test moduel
zhangbutao created HIVE-27431: - Summary: Clean invalid property in test moduel Key: HIVE-27431 URL: https://issues.apache.org/jira/browse/HIVE-27431 Project: Hive Issue Type: Test Components: Test Reporter: zhangbutao {code:java} 2023-06-12T01:28:18,074 WARN [main] conf.HiveConf: HiveConf of name hive.mapjoin.max.gc.time.percentage does not exist 2023-06-12T01:28:18,074 WARN [main] conf.HiveConf: HiveConf of name hive.llap.io.cache.orc.size does not exist 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name hive.dummyparam.test.server.specific.config.override does not exist 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name hive.metastore.metadb.dir does not exist 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name hive.llap.io.cache.orc.alloc.min does not exist 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name hive.dummyparam.test.server.specific.config.hivesite does not exist 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name hive.llap.io.cache.orc.alloc.max does not exist 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name hive.metastore.client.cache.maxSize does not exist 2023-06-12T01:28:18,076 WARN [main] conf.HiveConf: HiveConf of name hive.dummyparam.test.server.specific.config.metastoresite does not exist 2023-06-12T01:28:18,076 WARN [main] conf.HiveConf: HiveConf of name hive.metastore.client.cache.recordStats does not exist 2023-06-12T01:28:18,076 WARN [main] conf.HiveConf: HiveConf of name hive.llap.io.cache.orc.arena.size does not exist 2023-06-12T01:28:18,076 WARN [main] conf.HiveConf: HiveConf of name hive.stats.key.prefix.reserve.length does not exist {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27431) Clean invalid property in test moduel
[ https://issues.apache.org/jira/browse/HIVE-27431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao reassigned HIVE-27431: - Assignee: zhangbutao > Clean invalid property in test moduel > - > > Key: HIVE-27431 > URL: https://issues.apache.org/jira/browse/HIVE-27431 > Project: Hive > Issue Type: Test > Components: Test >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Minor > > {code:java} > 2023-06-12T01:28:18,074 WARN [main] conf.HiveConf: HiveConf of name > hive.mapjoin.max.gc.time.percentage does not exist > 2023-06-12T01:28:18,074 WARN [main] conf.HiveConf: HiveConf of name > hive.llap.io.cache.orc.size does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.dummyparam.test.server.specific.config.override does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.metastore.metadb.dir does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.llap.io.cache.orc.alloc.min does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.dummyparam.test.server.specific.config.hivesite does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.llap.io.cache.orc.alloc.max does not exist > 2023-06-12T01:28:18,075 WARN [main] conf.HiveConf: HiveConf of name > hive.metastore.client.cache.maxSize does not exist > 2023-06-12T01:28:18,076 WARN [main] conf.HiveConf: HiveConf of name > hive.dummyparam.test.server.specific.config.metastoresite does not exist > 2023-06-12T01:28:18,076 WARN [main] conf.HiveConf: HiveConf of name > hive.metastore.client.cache.recordStats does not exist > 2023-06-12T01:28:18,076 WARN [main] conf.HiveConf: HiveConf of name > hive.llap.io.cache.orc.arena.size does not exist > 2023-06-12T01:28:18,076 WARN [main] conf.HiveConf: HiveConf of name > hive.stats.key.prefix.reserve.length does not exist {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27429) Disable flaky test TestCompactionMetrics#testCleanerFailuresCountedCorrectly
[ https://issues.apache.org/jira/browse/HIVE-27429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao reassigned HIVE-27429: - Assignee: zhangbutao > Disable flaky test TestCompactionMetrics#testCleanerFailuresCountedCorrectly > > > Key: HIVE-27429 > URL: https://issues.apache.org/jira/browse/HIVE-27429 > Project: Hive > Issue Type: Test > Components: Test >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > > {code:java} > mvn test -Dtest=TestCompactionMetrics#testCleanerFailuresCountedCorrectly > -pl ql/{code} > [http://ci.hive.apache.org/job/hive-flaky-check/697/testReport/] > > I also have found several PR integration tests failed with the test > _*TestCompactionMetrics#testCleanerFailuresCountedCorrectly*_ > [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4404/1/tests/] > [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4402/4/tests/] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27429) Disable flaky test TestCompactionMetrics#testCleanerFailuresCountedCorrectly
zhangbutao created HIVE-27429: - Summary: Disable flaky test TestCompactionMetrics#testCleanerFailuresCountedCorrectly Key: HIVE-27429 URL: https://issues.apache.org/jira/browse/HIVE-27429 Project: Hive Issue Type: Test Components: Test Reporter: zhangbutao {code:java} mvn test -Dtest=TestCompactionMetrics#testCleanerFailuresCountedCorrectly -pl ql/{code} [http://ci.hive.apache.org/job/hive-flaky-check/697/testReport/] I also have found several PR integration tests failed with the test _*TestCompactionMetrics#testCleanerFailuresCountedCorrectly*_ [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4404/1/tests/] [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4402/4/tests/] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27360) Iceberg: Don't create the redundant MANAGED location when creating table without EXTERNAL keyword
[ https://issues.apache.org/jira/browse/HIVE-27360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731305#comment-17731305 ] zhangbutao commented on HIVE-27360: --- Finally, i think we can get a agreement about creating iceberg table: If we create a iceberg table neither has EXTERNAL keyword nor is specified a location explicity. We should make a check as follows: # Check the value of {*}_MetastoreConf.ConfVars.METASTORE_METADATA_TRANSFORMER_CLASS_{*}, and if it is set a valid value, we let it go its own way to determine the tables' type and location. # if *_MetastoreConf.ConfVars.METASTORE_METADATA_TRANSFORMER_CLASS_* is not be set a valid value, we shoud make sure the table is EXTERNAL type and the location is on EXTERNAL warehouse, but the *purge flag* should be set to true. > Iceberg: Don't create the redundant MANAGED location when creating table > without EXTERNAL keyword > - > > Key: HIVE-27360 > URL: https://issues.apache.org/jira/browse/HIVE-27360 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > Labels: pull-request-available > > If you create a managed iceberg table without specifying the location and the > database has both location and managed_location, the final iceberg table > location will be on database location instead of managed_location. But you > can see a the database managed_location also has a iceberg table subdirectory > which is always here even if the table was dropped. > We should ensure the managed iceberg table always on database > managed_location in case of database managed_location existing. The direct > and simple way is we can use the created hms table location before > committing iceberg table to avoid creating a new iceberg location. > > Step to repro: > 1. set location and managed location properties: > > {code:java} > set hive.metastore.warehouse.dir=/user/hive/warehouse/hiveicetest; > set hive.metastore.warehouse.external.dir= > /user/hive/warehouse/external/hiveicetest; > set metastore.metadata.transformer.class=' '; //disable metastore > transformer, this conf only can be set in metasetore server side{code} > 2. create a database with default location and managed_location: > > {code:java} > create database testdb;{code} > > {code:java} > desc database testdb;{code} > > {code:java} > +--+--+++-+-+-++ > | db_name | comment | location | > managedlocation | owner_name | owner_type > | connector_name | remote_dbname | > +--+--+++-+-+-++ > | testdb | | > hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db | > hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db | hive | USER > | | > +--+--+++-+-+-++ > {code} > > > 3. create a managed iceberg table without specifing the table location: > > {code:java} > // the table location will on: > hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01 > create table ice01 (id int) Stored by Iceberg stored as ORC;{code} > but here you will find the two created location: > > {code:java} > hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01 //the > actual location which is used by the managed iceberg table > hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db/ice01// a > empty managed location which is unused > {code} > > 4. drop the icebeg table > you will find this unused managed location is still there: > {code:java} > hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db{code} > > > We should use the created managed location to avoid creating a new iceberg > location. > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27360) Iceberg: Don't create the redundant MANAGED location when creating table without EXTERNAL keyword
[ https://issues.apache.org/jira/browse/HIVE-27360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-27360: -- Summary: Iceberg: Don't create the redundant MANAGED location when creating table without EXTERNAL keyword (was: Iceberg: Don't create a new iceberg location if hms table already has a default location ) > Iceberg: Don't create the redundant MANAGED location when creating table > without EXTERNAL keyword > - > > Key: HIVE-27360 > URL: https://issues.apache.org/jira/browse/HIVE-27360 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > Labels: pull-request-available > > If you create a managed iceberg table without specifying the location and the > database has both location and managed_location, the final iceberg table > location will be on database location instead of managed_location. But you > can see a the database managed_location also has a iceberg table subdirectory > which is always here even if the table was dropped. > We should ensure the managed iceberg table always on database > managed_location in case of database managed_location existing. The direct > and simple way is we can use the created hms table location before > committing iceberg table to avoid creating a new iceberg location. > > Step to repro: > 1. set location and managed location properties: > > {code:java} > set hive.metastore.warehouse.dir=/user/hive/warehouse/hiveicetest; > set hive.metastore.warehouse.external.dir= > /user/hive/warehouse/external/hiveicetest; > set metastore.metadata.transformer.class=' '; //disable metastore > transformer, this conf only can be set in metasetore server side{code} > 2. create a database with default location and managed_location: > > {code:java} > create database testdb;{code} > > {code:java} > desc database testdb;{code} > > {code:java} > +--+--+++-+-+-++ > | db_name | comment | location | > managedlocation | owner_name | owner_type > | connector_name | remote_dbname | > +--+--+++-+-+-++ > | testdb | | > hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db | > hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db | hive | USER > | | > +--+--+++-+-+-++ > {code} > > > 3. create a managed iceberg table without specifing the table location: > > {code:java} > // the table location will on: > hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01 > create table ice01 (id int) Stored by Iceberg stored as ORC;{code} > but here you will find the two created location: > > {code:java} > hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01 //the > actual location which is used by the managed iceberg table > hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db/ice01// a > empty managed location which is unused > {code} > > 4. drop the icebeg table > you will find this unused managed location is still there: > {code:java} > hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db{code} > > > We should use the created managed location to avoid creating a new iceberg > location. > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27418) UNION ALL + ORDER BY ordinal works incorrectly for all const queries
[ https://issues.apache.org/jira/browse/HIVE-27418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730898#comment-17730898 ] zhangbutao commented on HIVE-27418: --- Hi [~csringhofer] , could you provide more info about your hive cluster env? e.g. Hive & Hadoop &Tez version. And what execution engine did you use for the test? Tez? or MR? > UNION ALL + ORDER BY ordinal works incorrectly for all const queries > > > Key: HIVE-27418 > URL: https://issues.apache.org/jira/browse/HIVE-27418 > Project: Hive > Issue Type: Bug >Reporter: Csaba Ringhofer >Priority: Major > > For the following query I get results in wrong order: > SELECT '1', 'b' UNION ALL SELECT '2', 'a' ORDER BY 2; > +--+--+ > | _c0 | _c1 | > +--+--+ > | 1| b| > | 2| a| > +--+--+ > I get correct results if: > - the column has an alias > - the same rows come from tables > - the UNION ALL part of the query is in a sub-query and ORDER BY is run on > the sub*query > Checked with postgres and Apache Impala and they apply ORDER BY correctly. > (also noted the the ordinal after ORDER BY is not checked, so it could be 20 > and Hive doesn't complain) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27409) Iceberg: table with EXTERNAL type can not use statistics to optimize the query
[ https://issues.apache.org/jira/browse/HIVE-27409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-27409: -- Description: We have supported iceberg statistics recently. e.g. _HIVE-24928_ and {_}HIVE-27158{_}. And we can use iceberg stats to optimize some queries like {_}HIVE-27347{_}. However, in current hive codebase, we prohibit using EXTERNAL table stats and this change was introduced by HIVE-11266. And HIVE-19329 also disabled some optimizations for EXTERNAL table whether it is iceberg or not. Therefore, The EXTERNAL type iceberg table can not use stats to optimize query. In {_}HIVE-24928{_}, we have added method *_HiveStorageHandler::canProvideBasicStatistics()_* to indicate iceberg can have the ability to provide stats. That is to say, Although Iceberg table is regard as EXTERNAL table in Hive, it can provide details statistics. Therefore, here i suggest we should check both table type and boolean result of *_HiveStorageHandler::canProvideBasicStatistics()_* to determine if the table can use stats. was: We have supported iceberg statistics recently. e.g. _HIVE-24928_ and {_}HIVE-27158{_}. And we can use iceberg stats to optimize some queries like {_}HIVE-27347{_}. However, in current hive codebase, we prohibit using EXTERNAL table stats and this change was introduced by HIVE-11266. Therefore, The EXTERNAL type iceberg table can not use stats to optimize query. In {_}HIVE-24928{_}, we have added method *_HiveStorageHandler::canProvideBasicStatistics()_* to indicate iceberg can have the ability to provide stats. That is to say, Although Iceberg table is regard as EXTERNAL table in Hive, it can provide details statistics. Therefore, here i suggest we should check both table type and boolean result of *_HiveStorageHandler::canProvideBasicStatistics()_* to determine if the table can use stats. > Iceberg: table with EXTERNAL type can not use statistics to optimize the query > -- > > Key: HIVE-27409 > URL: https://issues.apache.org/jira/browse/HIVE-27409 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Minor > Labels: pull-request-available > > We have supported iceberg statistics recently. e.g. _HIVE-24928_ and > {_}HIVE-27158{_}. And we can use iceberg stats to optimize some queries like > {_}HIVE-27347{_}. > However, in current hive codebase, we prohibit using EXTERNAL table stats and > this change was introduced by HIVE-11266. And HIVE-19329 also disabled some > optimizations for EXTERNAL table whether it is iceberg or not. Therefore, > The EXTERNAL type iceberg table can not use stats to optimize query. > > In {_}HIVE-24928{_}, we have added method > *_HiveStorageHandler::canProvideBasicStatistics()_* to indicate iceberg can > have the ability to provide stats. That is to say, Although Iceberg table is > regard as EXTERNAL table in Hive, it can provide details statistics. > > Therefore, here i suggest we should check both table type and boolean result > of *_HiveStorageHandler::canProvideBasicStatistics()_* to determine if the > table can use stats. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27409) Iceberg: table with EXTERNAL type can not use statistics to optimize the query
zhangbutao created HIVE-27409: - Summary: Iceberg: table with EXTERNAL type can not use statistics to optimize the query Key: HIVE-27409 URL: https://issues.apache.org/jira/browse/HIVE-27409 Project: Hive Issue Type: Improvement Components: Iceberg integration Reporter: zhangbutao We have supported iceberg statistics recently. e.g. _HIVE-24928_ and {_}HIVE-27158{_}. And we can use iceberg stats to optimize some queries like {_}HIVE-27347{_}. However, in current hive codebase, we prohibit using EXTERNAL table stats and this change was introduced by HIVE-11266. Therefore, The EXTERNAL type iceberg table can not use stats to optimize query. In {_}HIVE-24928{_}, we have added method *_HiveStorageHandler::canProvideBasicStatistics()_* to indicate iceberg can have the ability to provide stats. That is to say, Although Iceberg table is regard as EXTERNAL table in Hive, it can provide details statistics. Therefore, here i suggest we should check both table type and boolean result of *_HiveStorageHandler::canProvideBasicStatistics()_* to determine if the table can use stats. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27409) Iceberg: table with EXTERNAL type can not use statistics to optimize the query
[ https://issues.apache.org/jira/browse/HIVE-27409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao reassigned HIVE-27409: - Assignee: zhangbutao > Iceberg: table with EXTERNAL type can not use statistics to optimize the query > -- > > Key: HIVE-27409 > URL: https://issues.apache.org/jira/browse/HIVE-27409 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Minor > > We have supported iceberg statistics recently. e.g. _HIVE-24928_ and > {_}HIVE-27158{_}. And we can use iceberg stats to optimize some queries like > {_}HIVE-27347{_}. > However, in current hive codebase, we prohibit using EXTERNAL table stats and > this change was introduced by HIVE-11266. Therefore, The EXTERNAL type > iceberg table can not use stats to optimize query. > > In {_}HIVE-24928{_}, we have added method > *_HiveStorageHandler::canProvideBasicStatistics()_* to indicate iceberg can > have the ability to provide stats. That is to say, Although Iceberg table is > regard as EXTERNAL table in Hive, it can provide details statistics. > > Therefore, here i suggest we should check both table type and boolean result > of *_HiveStorageHandler::canProvideBasicStatistics()_* to determine if the > table can use stats. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27360) Iceberg: Don't create a new iceberg location if hms table already has a default location
[ https://issues.apache.org/jira/browse/HIVE-27360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-27360: -- Description: If you create a managed iceberg table without specifying the location and the database has both location and managed_location, the final iceberg table location will be on database location instead of managed_location. But you can see a the database managed_location also has a iceberg table subdirectory which is always here even if the table was dropped. We should ensure the managed iceberg table always on database managed_location in case of database managed_location existing. The direct and simple way is we can use the created hms table location before committing iceberg table to avoid creating a new iceberg location. Step to repro: 1. set location and managed location properties: {code:java} set hive.metastore.warehouse.dir=/user/hive/warehouse/hiveicetest; set hive.metastore.warehouse.external.dir= /user/hive/warehouse/external/hiveicetest; set metastore.metadata.transformer.class=' '; //disable metastore transformer, this conf only can be set in metasetore server side{code} 2. create a database with default location and managed_location: {code:java} create database testdb;{code} {code:java} desc database testdb;{code} {code:java} +--+--+++-+-+-++ | db_name | comment | location | managedlocation | owner_name | owner_type | connector_name | remote_dbname | +--+--+++-+-+-++ | testdb | | hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db | hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db | hive | USER | | +--+--+++-+-+-++ {code} 3. create a managed iceberg table without specifing the table location: {code:java} // the table location will on: hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01 create table ice01 (id int) Stored by Iceberg stored as ORC;{code} but here you will find the two created location: {code:java} hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01 //the actual location which is used by the managed iceberg table hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db/ice01// a empty managed location which is unused {code} 4. drop the icebeg table you will find this unused managed location is still there: {code:java} hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db{code} We should use the created managed location to avoid creating a new iceberg location. was: If you create a managed iceberg table without specifying the location and the database has both location and managed_location, the final iceberg table location will be on database location instead of managed_location. But you can see a the database managed_location also has a iceberg table subdirectory which is always here even if the table was dropped. We should ensure the managed iceberg table always on database managed_location in case of database managed_location existing. The direct and simple way is we can use the created hms table location before committing iceberg table to avoid creating a new iceberg location. Step to repro: 1. set location and managed location properties: {code:java} set hive.metastore.warehouse.dir=/user/hive/warehouse/hiveicetest; set hive.metastore.warehouse.external.dir= /user/hive/warehouse/external/hiveicetest; {code} 2. create a database with default location and managed_location: {code:java} create database testdb;{code} {code:java} desc database testdb;{code} {code:java} +--+--+++-+-+-++ | db_name | comment | location | managedlocation | owner_name | owner_type | connector_name | remote_dbname | +--+--+++-+-+-++ | testdb | | hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db | hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db | hive | USER
[jira] [Assigned] (HIVE-27364) StorageHandler: Skip to create staging directory for non-native table
[ https://issues.apache.org/jira/browse/HIVE-27364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao reassigned HIVE-27364: - Assignee: zhangbutao > StorageHandler: Skip to create staging directory for non-native table > - > > Key: HIVE-27364 > URL: https://issues.apache.org/jira/browse/HIVE-27364 > Project: Hive > Issue Type: Improvement > Components: StorageHandler >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27364) StorageHandler: Skip to create staging directory for non-native table
zhangbutao created HIVE-27364: - Summary: StorageHandler: Skip to create staging directory for non-native table Key: HIVE-27364 URL: https://issues.apache.org/jira/browse/HIVE-27364 Project: Hive Issue Type: Improvement Components: StorageHandler Reporter: zhangbutao -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27360) Iceberg: Don't create a new iceberg location if hms table already has a default location
[ https://issues.apache.org/jira/browse/HIVE-27360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17724109#comment-17724109 ] zhangbutao commented on HIVE-27360: --- PR available: https://github.com/apache/hive/pull/4341 > Iceberg: Don't create a new iceberg location if hms table already has a > default location > - > > Key: HIVE-27360 > URL: https://issues.apache.org/jira/browse/HIVE-27360 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > > If you create a managed iceberg table without specifying the location and the > database has both location and managed_location, the final iceberg table > location will be on database location instead of managed_location. But you > can see a the database managed_location also has a iceberg table subdirectory > which is always here even if the table was dropped. > We should ensure the managed iceberg table always on database > managed_location in case of database managed_location existing. The direct > and simple way is we can use the created hms table location before > committing iceberg table to avoid creating a new iceberg location. > > Step to repro: > 1. set location and managed location properties: > > {code:java} > set hive.metastore.warehouse.dir=/user/hive/warehouse/hiveicetest; > set hive.metastore.warehouse.external.dir= > /user/hive/warehouse/external/hiveicetest; > {code} > 2. create a database with default location and managed_location: > > {code:java} > create database testdb;{code} > > {code:java} > desc database testdb;{code} > > {code:java} > +--+--+++-+-+-++ > | db_name | comment | location | > managedlocation | owner_name | owner_type > | connector_name | remote_dbname | > +--+--+++-+-+-++ > | testdb | | > hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db | > hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db | hive | USER > | | > +--+--+++-+-+-++ > {code} > > > 3. create a managed iceberg table without specifing the table location: > > {code:java} > // the table location will on: > hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01 > create table ice01 (id int) Stored by Iceberg stored as ORC;{code} > but here you will find the two created location: > > {code:java} > hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01 //the > actual location which is used by the managed iceberg table > hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db/ice01// a > empty managed location which is unused > {code} > > 4. drop the icebeg table > you will find this unused managed location is still there: > {code:java} > hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db{code} > > > We should use the created managed location to avoid creating a new iceberg > location. > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27360) Iceberg: Don't create a new iceberg location if hms table already has a default location
[ https://issues.apache.org/jira/browse/HIVE-27360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17724108#comment-17724108 ] zhangbutao commented on HIVE-27360: --- [~ayushtkn] Thanks for quick commenting! In this ticket's description, the hmsTbl_managed_location is actually created automatically by *_HMSHandler::create_database_core_* based on database location&managed_location if table location not specified, and then *_HiveIcebergMetaHook::commitCreateTable_* will alter the hms location from the created managed location to external location based on {*}_HiveCatalog::defaultWarehouseLocation_{*}. And so if we drop the table, the initially hmsTbl_managed_location will not be deleted and will become a dangling directory. Note, before Hive4, this is not a problem as database's location only has one. In the PR, i reused the created hmsTbl_managed_location to avoid creating a new iceberg location as well as eliminating dangling directory. Do you think we should always keep iceberg table as external table? Imo, user usually create external table with keyword *external* like \{*}'{*}{_}create *external* table ice01 (id int) Stored by Iceberg stored as ORC{_}', and table shoud be on managed_location if location is not specified and without keyword external. > Iceberg: Don't create a new iceberg location if hms table already has a > default location > - > > Key: HIVE-27360 > URL: https://issues.apache.org/jira/browse/HIVE-27360 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > > If you create a managed iceberg table without specifying the location and the > database has both location and managed_location, the final iceberg table > location will be on database location instead of managed_location. But you > can see a the database managed_location also has a iceberg table subdirectory > which is always here even if the table was dropped. > We should ensure the managed iceberg table always on database > managed_location in case of database managed_location existing. The direct > and simple way is we can use the created hms table location before > committing iceberg table to avoid creating a new iceberg location. > > Step to repro: > 1. set location and managed location properties: > > {code:java} > set hive.metastore.warehouse.dir=/user/hive/warehouse/hiveicetest; > set hive.metastore.warehouse.external.dir= > /user/hive/warehouse/external/hiveicetest; > {code} > 2. create a database with default location and managed_location: > > {code:java} > create database testdb;{code} > > {code:java} > desc database testdb;{code} > > {code:java} > +--+--+++-+-+-++ > | db_name | comment | location | > managedlocation | owner_name | owner_type > | connector_name | remote_dbname | > +--+--+++-+-+-++ > | testdb | | > hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db | > hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db | hive | USER > | | > +--+--+++-+-+-++ > {code} > > > 3. create a managed iceberg table without specifing the table location: > > {code:java} > // the table location will on: > hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01 > create table ice01 (id int) Stored by Iceberg stored as ORC;{code} > but here you will find the two created location: > > {code:java} > hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01 //the > actual location which is used by the managed iceberg table > hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db/ice01// a > empty managed location which is unused > {code} > > 4. drop the icebeg table > you will find this unused managed location is still there: > {code:java} > hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db{code} > > > We should use the created managed location to avoid creating a new iceberg > location. > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27360) Iceberg: Don't create a new iceberg location if hms table already has a default location
[ https://issues.apache.org/jira/browse/HIVE-27360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-27360: -- Description: If you create a managed iceberg table without specifying the location and the database has both location and managed_location, the final iceberg table location will be on database location instead of managed_location. But you can see a the database managed_location also has a iceberg table subdirectory which is always here even if the table was dropped. We should ensure the managed iceberg table always on database managed_location in case of database managed_location existing. The direct and simple way is we can use the created hms table location before committing iceberg table to avoid creating a new iceberg location. Step to repro: 1. set location and managed location properties: {code:java} set hive.metastore.warehouse.dir=/user/hive/warehouse/hiveicetest; set hive.metastore.warehouse.external.dir= /user/hive/warehouse/external/hiveicetest; {code} 2. create a database with default location and managed_location: {code:java} create database testdb;{code} {code:java} desc database testdb;{code} {code:java} +--+--+++-+-+-++ | db_name | comment | location | managedlocation | owner_name | owner_type | connector_name | remote_dbname | +--+--+++-+-+-++ | testdb | | hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db | hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db | hive | USER | | +--+--+++-+-+-++ {code} 3. create a managed iceberg table without specifing the table location: {code:java} // the table location will on: hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01 create table ice01 (id int) Stored by Iceberg stored as ORC;{code} but here you will find the two created location: {code:java} hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01 //the actual location which is used by the managed iceberg table hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db/ice01// a empty managed location which is unused {code} 4. drop the icebeg table you will find this unused managed location is still there: {code:java} hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db{code} We should use the created managed location to avoid creating a new iceberg location. was: If you create a managed iceberg table without specifying the location and the database has both location and managed_location, the final iceberg table location will be on database location instead of managed_location. But you can see a the database managed_location also has a iceberg table subdirectory which is always here even if the table was dropped. We should ensure the managed iceberg table always on database managed_location in case of database managed_location existing. The direct and simple way is we can use the created hms table location before committing iceberg table to avoid creating a new iceberg location. Step to repro: 1. set location and managed location properties: {code:java} set hive.metastore.warehouse.dir=/user/hive/warehouse/hiveicetest; set hive.metastore.warehouse.external.dir= /user/hive/warehouse/external/hiveicetest; {code} 2. create a database with default location and managed_location: {code:java} create database testdb;{code} {code:java} desc database testdb;{code} {code:java} +--+--+++-+-+-++ | db_name | comment | location | managedlocation | owner_name | owner_type | connector_name | remote_dbname | +--+--+++-+-+-++ | testdb | | hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db | hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db | hive | USER | | +--+--++---
[jira] [Updated] (HIVE-27360) Iceberg: Don't create a new iceberg location if hms table already has a default location
[ https://issues.apache.org/jira/browse/HIVE-27360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-27360: -- Description: If you create a managed iceberg table without specifying the location and the database has both location and managed_location, the final iceberg table location will be on database location instead of managed_location. But you can see a the database managed_location also has a iceberg table subdirectory which is always here even if the table was dropped. We should ensure the managed iceberg table always on database managed_location in case of database managed_location existing. The direct and simple way is we can use the created hms table location before committing iceberg table to avoid creating a new iceberg location. Step to repro: 1. set location and managed location properties: {code:java} set hive.metastore.warehouse.dir=/user/hive/warehouse/hiveicetest; set hive.metastore.warehouse.external.dir= /user/hive/warehouse/external/hiveicetest; {code} 2. create a database with default location and managed_location: {code:java} create database testdb;{code} {code:java} desc database testdb;{code} {code:java} +--+--+++-+-+-++ | db_name | comment | location | managedlocation | owner_name | owner_type | connector_name | remote_dbname | +--+--+++-+-+-++ | testdb | | hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db | hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db | hive | USER | | +--+--+++-+-+-++ {code} 3. create a managed iceberg table without specifing the table location: {code:java} // the table location will on: hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01 create table ice01 (id int) Stored by Iceberg stored as ORC;{code} but here you will find the two created location: {code:java} hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01 //the actual location which is used by the managed iceberg table hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db // a empty managed location which is unused {code} 4. drop the icebeg table you will find this unused managed location is still there: {code:java} hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db{code} We should use the created managed location to avoid creating a new iceberg location. was: If you create a managed iceberg table without specifying the location and the database has both location and managed_location, the final iceberg table location will be on database location instead of managed_location. But you can see a the database managed_location also has a iceberg table subdirectory which is always here even if the table was dropped. We should ensure the managed iceberg table always on database managed_location in case of database managed_location existing. The direct and simple way is we can use the created hms table location before committing iceberg table to avoid creating a new iceberg location. Step to repro: 1. set location and managed location properties: {code:java} set hive.metastore.warehouse.dir=/user/hive/warehouse/hiveicetest; set hive.metastore.warehouse.external.dir= /user/hive/warehouse/external/hiveicetest; {code} 2. create a database with default location and managed_location: {code:java} create database testdb;{code} {code:java} desc database testdb;{code} {code:java} +--+--+++-+-+-++ | db_name | comment | location | managedlocation | owner_name | owner_type | connector_name | remote_dbname | +--+--+++-+-+-++ | testdb | | hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db | hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db | hive | USER | | +--+--++---
[jira] [Updated] (HIVE-27360) Iceberg: Don't create a new iceberg location if hms table already has a default location
[ https://issues.apache.org/jira/browse/HIVE-27360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-27360: -- Description: If you create a managed iceberg table without specifying the location and the database has both location and managed_location, the final iceberg table location will be on database location instead of managed_location. But you can see a the database managed_location also has a iceberg table subdirectory which is always here even if the table was dropped. We should ensure the managed iceberg table always on database managed_location in case of database managed_location existing. The direct and simple way is we can use the created hms table location before committing iceberg table to avoid creating a new iceberg location. Step to repro: 1. set location and managed location properties: {code:java} set hive.metastore.warehouse.dir=/user/hive/warehouse/hiveicetest; set hive.metastore.warehouse.external.dir= /user/hive/warehouse/external/hiveicetest; {code} 2. create a database with default location and managed_location: {code:java} create database testdb;{code} {code:java} desc database testdb;{code} {code:java} +--+--+++-+-+-++ | db_name | comment | location | managedlocation | owner_name | owner_type | connector_name | remote_dbname | +--+--+++-+-+-++ | testdb | | hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db | hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db | hive | USER | | +--+--+++-+-+-++ {code} 3. create a managed iceberg table without specifing the table location: {code:java} // the table location will on: hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01 create table ice01 (id int) Stored by Iceberg stored as ORC;{code} but here you will find the two created location: {code:java} hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01 //the actual location which is used by the managed iceberg table hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db // a empty managed location which is unused {code} 4. drop the icebeg table you will find this unused managed location is still there: {code:java} hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db{code} was: If you create a managed iceberg table without specifying the location and the database has both location and managed_location, the final iceberg table location will be on database location instead of managed_location. But you can see a the database managed_location also has a iceberg table subdirectory which is always here even if the table was dropped. We should ensure the managed iceberg table always on database managed_location in case of database managed_location existing. The direct and simple way is we can use the created hms table location before committing iceberg table to avoid creating a new iceberg location. > Iceberg: Don't create a new iceberg location if hms table already has a > default location > - > > Key: HIVE-27360 > URL: https://issues.apache.org/jira/browse/HIVE-27360 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > > If you create a managed iceberg table without specifying the location and the > database has both location and managed_location, the final iceberg table > location will be on database location instead of managed_location. But you > can see a the database managed_location also has a iceberg table subdirectory > which is always here even if the table was dropped. > We should ensure the managed iceberg table always on database > managed_location in case of database managed_location existing. The direct > and simple way is we can use the created hms table location before > committing iceberg table to avoid creating a new iceberg location. > > Step to repro: > 1. set location and managed location properties: > > {code:java} > set hive.metastore.warehouse.dir=/user/hive/warehouse/hiveicetest; > set hive.metastore.warehouse.external.dir= > /user/hive/w
[jira] [Updated] (HIVE-27360) Iceberg: Don't create a new iceberg location if hms table already has a default location
[ https://issues.apache.org/jira/browse/HIVE-27360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-27360: -- Description: If you create a managed iceberg table without specifying the location and the database has both location and managed_location, the final iceberg table location will be on database location instead of managed_location. But you can see a the database managed_location also has a iceberg table subdirectory which is always here even if the table was dropped. We should ensure the managed iceberg table always on database managed_location in case of database managed_location existing. The direct and simple way is we can use the created hms table location before committing iceberg table to avoid creating a new iceberg location. > Iceberg: Don't create a new iceberg location if hms table already has a > default location > - > > Key: HIVE-27360 > URL: https://issues.apache.org/jira/browse/HIVE-27360 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > > If you create a managed iceberg table without specifying the location and the > database has both location and managed_location, the final iceberg table > location will be on database location instead of managed_location. But you > can see a the database managed_location also has a iceberg table subdirectory > which is always here even if the table was dropped. > We should ensure the managed iceberg table always on database > managed_location in case of database managed_location existing. The direct > and simple way is we can use the created hms table location before > committing iceberg table to avoid creating a new iceberg location. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27360) Iceberg: Don't create a new iceberg location if hms table already has a default location
[ https://issues.apache.org/jira/browse/HIVE-27360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao reassigned HIVE-27360: - Assignee: zhangbutao > Iceberg: Don't create a new iceberg location if hms table already has a > default location > - > > Key: HIVE-27360 > URL: https://issues.apache.org/jira/browse/HIVE-27360 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27360) Iceberg: Don't create a new iceberg location if hms table already has a default location
zhangbutao created HIVE-27360: - Summary: Iceberg: Don't create a new iceberg location if hms table already has a default location Key: HIVE-27360 URL: https://issues.apache.org/jira/browse/HIVE-27360 Project: Hive Issue Type: Improvement Components: Iceberg integration Reporter: zhangbutao -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27317) Temporary (local) session files cleanup improvements
[ https://issues.apache.org/jira/browse/HIVE-27317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17719385#comment-17719385 ] zhangbutao commented on HIVE-27317: --- Hi [~sercan.tekin] Please create a Github pull request in [https://github.com/apache/hive/pulls] ,as patch review has been not used for a long time. > Temporary (local) session files cleanup improvements > > > Key: HIVE-27317 > URL: https://issues.apache.org/jira/browse/HIVE-27317 > Project: Hive > Issue Type: Improvement >Reporter: Sercan Tekin >Assignee: Sercan Tekin >Priority: Major > Attachments: HIVE-27317.patch > > > When Hive session is killed, no chance for shutdown hook to clean-up tmp > files. > There is a Hive service to clean residual files > https://issues.apache.org/jira/browse/HIVE-13429, and later on its execution > is scheduled inside HS2 https://issues.apache.org/jira/browse/HIVE-15068 to > make sure not to leave any temp file behind. But this service cleans up only > HDFS temp files, there are still residual files/dirs in > *HiveConf.ConfVars.LOCALSCRATCHDIR* location as follows; > {code:java} > > ll /tmp/user/97c4ef50-5e80-480e-a6f0-4f779050852b* > drwx-- 2 user user 4096 Oct 29 10:09 97c4ef50-5e80-480e-a6f0-4f779050852b > -rw--- 1 user user 0 Oct 29 10:09 > 97c4ef50-5e80-480e-a6f0-4f779050852b10571819313894728966.pipeout > -rw--- 1 user user 0 Oct 29 10:09 > 97c4ef50-5e80-480e-a6f0-4f779050852b16013956055489853961.pipeout > -rw--- 1 user user 0 Oct 29 10:09 > 97c4ef50-5e80-480e-a6f0-4f779050852b4383913570068173450.pipeout > -rw--- 1 user user 0 Oct 29 10:09 > 97c4ef50-5e80-480e-a6f0-4f779050852b889740171428672108.pipeout {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27302) Iceberg: Suport write to iceberg branch
[ https://issues.apache.org/jira/browse/HIVE-27302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao reassigned HIVE-27302: - Assignee: zhangbutao > Iceberg: Suport write to iceberg branch > --- > > Key: HIVE-27302 > URL: https://issues.apache.org/jira/browse/HIVE-27302 > Project: Hive > Issue Type: Sub-task > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > > This feature depends on Iceberg1.2.0 interface: > [https://github.com/apache/iceberg/pull/5234] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27302) Iceberg: Suport write to iceberg branch
zhangbutao created HIVE-27302: - Summary: Iceberg: Suport write to iceberg branch Key: HIVE-27302 URL: https://issues.apache.org/jira/browse/HIVE-27302 Project: Hive Issue Type: Sub-task Components: Iceberg integration Reporter: zhangbutao This feature depends on Iceberg1.2.0 interface: [https://github.com/apache/iceberg/pull/5234] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27273) Iceberg: Upgrade iceberg to 1.2.1
[ https://issues.apache.org/jira/browse/HIVE-27273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715386#comment-17715386 ] zhangbutao commented on HIVE-27273: --- PR https://github.com/apache/hive/pull/4252 > Iceberg: Upgrade iceberg to 1.2.1 > -- > > Key: HIVE-27273 > URL: https://issues.apache.org/jira/browse/HIVE-27273 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > > [https://iceberg.apache.org/releases/#121-release] Iceberg1.2.1(include > 1.2.0) has lots of improvement, e.g. _branch commit_ and > _{{position_deletes}} metadata table._ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27273) Iceberg: Upgrade iceberg to 1.2.1
[ https://issues.apache.org/jira/browse/HIVE-27273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao reassigned HIVE-27273: - Assignee: zhangbutao > Iceberg: Upgrade iceberg to 1.2.1 > -- > > Key: HIVE-27273 > URL: https://issues.apache.org/jira/browse/HIVE-27273 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > > [https://iceberg.apache.org/releases/#121-release] Iceberg1.2.1(include > 1.2.0) has lots of improvement, e.g. _branch commit_ and > _{{position_deletes}} metadata table._ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27273) Iceberg: Upgrade iceberg to 1.2.1
zhangbutao created HIVE-27273: - Summary: Iceberg: Upgrade iceberg to 1.2.1 Key: HIVE-27273 URL: https://issues.apache.org/jira/browse/HIVE-27273 Project: Hive Issue Type: Improvement Components: Iceberg integration Reporter: zhangbutao [https://iceberg.apache.org/releases/#121-release] Iceberg1.2.1(include 1.2.0) has lots of improvement, e.g. _branch commit_ and _{{position_deletes}} metadata table._ -- This message was sent by Atlassian Jira (v8.20.10#820010)