from:"zhangbutao \(Jira\)"

[jira] [Commented] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2023-11-28 Thread zhangbutao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790440#comment-17790440
 ] 

zhangbutao commented on HIVE-27898:
---

Just a remind, once HIVE-27912 is fixed, you can get the snapshot package which 
including iceberg  from http://ci.hive.apache.org/job/hive-nightly/

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> --hive-orc
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.trade_test 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;--10 ROWS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27912) Include Iceberg module in nightly builds

2023-11-28 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-27912:
--
Description: 
[http://ci.hive.apache.org/job/hive-nightly/]

HIVE-25715 added nightly builds, and it give users a chance to test the 
snapshot binary package.

But the builds didn't containe Iceberg module, It would be good to include 
Iceberg module. And then user like HIVE-27898 can use the snapshot binary 
package to test some queries.

> Include Iceberg module in nightly builds
> 
>
> Key: HIVE-27912
> URL: https://issues.apache.org/jira/browse/HIVE-27912
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>
> [http://ci.hive.apache.org/job/hive-nightly/]
> HIVE-25715 added nightly builds, and it give users a chance to test the 
> snapshot binary package.
> But the builds didn't containe Iceberg module, It would be good to include 
> Iceberg module. And then user like HIVE-27898 can use the snapshot binary 
> package to test some queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27912) Include Iceberg module in nightly builds

2023-11-28 Thread zhangbutao (Jira)

zhangbutao created HIVE-27912:
-

 Summary: Include Iceberg module in nightly builds
 Key: HIVE-27912
 URL: https://issues.apache.org/jira/browse/HIVE-27912
 Project: Hive
  Issue Type: Improvement
Reporter: zhangbutao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-27912) Include Iceberg module in nightly builds

2023-11-28 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao reassigned HIVE-27912:
-

Assignee: zhangbutao

> Include Iceberg module in nightly builds
> 
>
> Key: HIVE-27912
> URL: https://issues.apache.org/jira/browse/HIVE-27912
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2023-11-28 Thread zhangbutao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790424#comment-17790424
 ] 

zhangbutao commented on HIVE-27898:
---

I think you can try the master branch to see if it is ok. We have fixed some 
issues on the master branch.

BTW, I think we will release a new Hive4 version soon, and then you can use the 
new released version.

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> --hive-orc
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.trade_test 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;--10 ROWS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2023-11-27 Thread zhangbutao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790407#comment-17790407
 ] 

zhangbutao commented on HIVE-27898:
---

I still can not reproduce your issue:

Spark3.5.0

Hadoop3.3.1

Tez 0.10.2

Hive4 master code

Iceberg 1.4.2

 
{code:java}
//Spark side 

/data/spark-3.5.0-bin-hadoop3/bin/spark-sql \
--master local \
--deploy-mode client \
--conf spark.sql.catalog.hadoop_prod=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.hadoop_prod.type=hadoop \
--conf 
spark.sql.catalog.hadoop_prod.warehouse=hdfs://localhost:8028/tmp/testiceberg;


CREATE TABLE IF NOT EXISTS hadoop_prod.default.test_data_04 (
id string,name string
)
using iceberg
PARTITIONED BY (name)
TBLPROPERTIES 
('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true');


insert into hadoop_prod.default.test_data_04(id,name) 
values('1','a'),('2','b');{code}
 

 
{code:java}
// HS2 side

CREATE EXTERNAL TABLE test_data_04
STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
LOCATION 'hdfs://localhost:8028/tmp/testiceberg/default/test_data_04'
TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');

 
select id from (select * from test_data_04 limit 10) s1;
+-+
| id  |
+-+
| 1   |
| 2   |
+-+

select id from (select * from test_data_04) s1;
+-+
| id  |
+-+
| 1   |
| 2   |
+-+
{code}
 

 

 

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceber

[jira] [Commented] (HIVE-27910) Hive on Spark -- should work?

2023-11-25 Thread zhangbutao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789770#comment-17789770
 ] 

zhangbutao commented on HIVE-27910:
---

[https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark] yep, I think 
we need add notes to remind users that Hive on Spark was not supported since 
Hive4.

> Hive on Spark -- should work?
> -
>
> Key: HIVE-27910
> URL: https://issues.apache.org/jira/browse/HIVE-27910
> Project: Hive
>  Issue Type: Task
>Affects Versions: 3.1.1
>Reporter: Alexander Petrossian (PAF)
>Priority: Major
>
> I wanted to test this
> [https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark]
> Where our admins installed 
> {code:java}
> hive --version
> Hive 3.1.0.3.1.0.0-78
> Git 
> git://ctr-e138-1518143905142-586755-01-15.hwx.site/grid/0/jenkins/workspace/HDP-parallel-centos7/SOURCES/hive
>  -r 56673b027117d8cb3400675b1680a4d992360808 {code}
> Trying
> {code:java}
> set hive.execution.engine=spark;
> SELECT ...; {code}
> Getting
> {code:java}
> ERROR : FAILED: Execution Error, return code 30041 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create Spark client 
> for Spark session 139c0453-459f-4511-b7fc-eab28e78fe0c {code}
> Could it be that Spark support in Hive was somehow dropped?
> Or could it be some [simple?] configuration issue?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-27910) Hive on Spark -- should work?

2023-11-25 Thread zhangbutao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789765#comment-17789765
 ] 

zhangbutao commented on HIVE-27910:
---

Hive on Spark is removed in Hive4 version. IMO, for Hive3.1.1, no one maintains 
this feature anymore.

I would suggest you use Hive on tez which is actively maintained by Hive 
community.

Thanks.

> Hive on Spark -- should work?
> -
>
> Key: HIVE-27910
> URL: https://issues.apache.org/jira/browse/HIVE-27910
> Project: Hive
>  Issue Type: Task
>Affects Versions: 3.1.1
>Reporter: Alexander Petrossian (PAF)
>Priority: Major
>
> I wanted to test this
> [https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark]
> Where our admins installed 
> {code:java}
> hive --version
> Hive 3.1.0.3.1.0.0-78
> Git 
> git://ctr-e138-1518143905142-586755-01-15.hwx.site/grid/0/jenkins/workspace/HDP-parallel-centos7/SOURCES/hive
>  -r 56673b027117d8cb3400675b1680a4d992360808 {code}
> Trying
> {code:java}
> set hive.execution.engine=spark;
> SELECT ...; {code}
> Getting
> {code:java}
> ERROR : FAILED: Execution Error, return code 30041 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create Spark client 
> for Spark session 139c0453-459f-4511-b7fc-eab28e78fe0c {code}
> Could it be that Spark support in Hive was somehow dropped?
> Or could it be some [simple?] configuration issue?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (HIVE-27900) hive can not read iceberg-parquet table

2023-11-23 Thread zhangbutao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789289#comment-17789289
 ] 

zhangbutao edited comment on HIVE-27900 at 11/24/23 2:12 AM:
-

Do you use the tez shuffle handler? 
[https://tez.apache.org/shuffle-handler.html]

 

If you remove  the ORC's vectorised reads property from the parquet table, then 
the query will succeed? 


was (Author: zhangbutao):
If you remove  the ORC's vectorised reads property from the parquet table, then 
the query will succeed? 

> hive can not read iceberg-parquet table
> ---
>
> Key: HIVE-27900
> URL: https://issues.apache.org/jira/browse/HIVE-27900
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Major
>
> We found that using HIVE4-BETA version, we could not query the 
> Iceberg-Parquet table with vectorised execution turned on.
> {code:java}
> --spark-sql(3.4.1+iceberg 1.4.2)
> CREATE TABLE local.test.b_qqd_shop_rfm_parquet_snappy (
> a string,b string,c string)
> USING iceberg
> LOCATION '/iceberg-catalog/warehouse/test/b_qqd_shop_rfm_parquet_snappy'
> TBLPROPERTIES (
>   'current-snapshot-id' = '5138351937447353683',
>   'format' = 'iceberg/parquet',
>   'format-version' = '2',
>   'read.orc.vectorization.enabled' = 'true',
>   'write.format.default' = 'parquet',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.parquet.compression-codec' = 'snappy');
> --hive-sql
> CREATE EXTERNAL TABLE iceberg_dwd.b_qqd_shop_rfm_parquet_snappy
> STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
> LOCATION 
> 'hdfs://xxx/iceberg-catalog/warehouse/test/b_qqd_shop_rfm_parquet_snappy/'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> set hive.default.fileformat=orc;
> set hive.default.fileformat.managed=orc;
> create table test_parquet_as_orc as select * from 
> b_qqd_shop_rfm_parquet_snappy limit 100;
> , TaskAttempt 2 failed, info=[Error: Node: /xxx..xx.xx: Error while 
> running task ( failure ) : 
> attempt_1696729618575_69586_1_00_00_2:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
> at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
> ... 16 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:993)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
> ... 19 more
> Caused by: org.apache.hadoop.hive.ql.metadat

[jira] [Commented] (HIVE-27900) hive can not read iceberg-parquet table

2023-11-23 Thread zhangbutao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789289#comment-17789289
 ] 

zhangbutao commented on HIVE-27900:
---

If you remove  the ORC's vectorised reads property from the parquet table, then 
the query will succeed? 

> hive can not read iceberg-parquet table
> ---
>
> Key: HIVE-27900
> URL: https://issues.apache.org/jira/browse/HIVE-27900
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Major
>
> We found that using HIVE4-BETA version, we could not query the 
> Iceberg-Parquet table with vectorised execution turned on.
> {code:java}
> --spark-sql(3.4.1+iceberg 1.4.2)
> CREATE TABLE local.test.b_qqd_shop_rfm_parquet_snappy (
> a string,b string,c string)
> USING iceberg
> LOCATION '/iceberg-catalog/warehouse/test/b_qqd_shop_rfm_parquet_snappy'
> TBLPROPERTIES (
>   'current-snapshot-id' = '5138351937447353683',
>   'format' = 'iceberg/parquet',
>   'format-version' = '2',
>   'read.orc.vectorization.enabled' = 'true',
>   'write.format.default' = 'parquet',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.parquet.compression-codec' = 'snappy');
> --hive-sql
> CREATE EXTERNAL TABLE iceberg_dwd.b_qqd_shop_rfm_parquet_snappy
> STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
> LOCATION 
> 'hdfs://xxx/iceberg-catalog/warehouse/test/b_qqd_shop_rfm_parquet_snappy/'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> set hive.default.fileformat=orc;
> set hive.default.fileformat.managed=orc;
> create table test_parquet_as_orc as select * from 
> b_qqd_shop_rfm_parquet_snappy limit 100;
> , TaskAttempt 2 failed, info=[Error: Node: /xxx..xx.xx: Error while 
> running task ( failure ) : 
> attempt_1696729618575_69586_1_00_00_2:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
> at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
> ... 16 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:993)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
> ... 19 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.vector.reducesink.VectorReduceSinkEmptyKeyOperator.process(VectorReduceSinkEmptyKeyOperator.java:137)
> at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
>

[jira] [Commented] (HIVE-27901) Hive's performance for querying the Iceberg table is very poor.

2023-11-22 Thread zhangbutao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789013#comment-17789013
 ] 

zhangbutao commented on HIVE-27901:
---

I think this ticket looks something like 
https://issues.apache.org/jira/browse/HIVE-27883 . Currently, some optimization 
properties  like merge/split data can not be used on Iceberg table as iceberg 
has its own optimization properties. 

 

For this ticket, it seems that orc table has more tasks than iceberg table, so 
the orc table can run faster. I think that maybe you can try to optimize the 
property _set read.split.target-size=67108864;_

[https://iceberg.apache.org/docs/latest/configuration/#read-properties]  
read.split.target-size is default 134217728.

But i am not sure if this is a good way to optimize your query, as i can not 
reproduce and delve into your problem.

> Hive's performance for querying the Iceberg table is very poor.
> ---
>
> Key: HIVE-27901
> URL: https://issues.apache.org/jira/browse/HIVE-27901
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Major
> Attachments: image-2023-11-22-18-32-28-344.png, 
> image-2023-11-22-18-33-01-885.png, image-2023-11-22-18-33-32-915.png
>
>
> I am using HIVE4.0.0-BETA for testing.
> BTW,I found that the performance of HIVE reading ICEBERG table is still very 
> slow.
> How should I deal with this problem?
> I count a 7 billion table and compare the performance difference between HIVE 
> reading ICEBERG-ORC and ORC table respectively.
> We use ICEBERG 1.4.2, ICEBERG-ORC with ZSTD compression enabled.
> ORC with SNAPPY compression.
> HADOOP version 3.1.1 (native zstd not supported).
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
> CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
> STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); 
> --inner orc table( set hive default format = orc )
> set hive.default.fileformat=orc;
> set hive.default.fileformat.managed=orc;
> create table if not exists iceberg_dwd.orc_inner_table as select * from 
> iceberg_dwd.b_std_trade;{code}
>  
> !image-2023-11-22-18-32-28-344.png!
> !image-2023-11-22-18-33-01-885.png!
> Also, I have another question. The Submit Plan statistic is clearly 
> incorrect. Is this something that needs to be fixed?
> !image-2023-11-22-18-33-32-915.png!
>  



--
This m

[jira] [Commented] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2023-11-22 Thread zhangbutao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789008#comment-17789008
 ] 

zhangbutao commented on HIVE-27898:
---

Please provide more simple test to help others to reproduce this issue. 

1)  Can we create a more simple table with several columns? table 
*_datacenter.dwd.b_std_trade_* has too many columns.

2) Can we insert several rows to help to reproduce this issue? 

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> --hive-orc
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.trade_test 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;--10 ROWS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-27900) hive can not read iceberg-parquet table

2023-11-22 Thread zhangbutao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789007#comment-17789007
 ] 

zhangbutao commented on HIVE-27900:
---

I can not reproduce this issue on master code. My env is :

1) Hive master branch:

    You can compile hive code using the cmd:

 
{code:java}
mvn clean install -DskipTests -Piceberg -Pdist{code}
2)Tez 0.10.2

   I recommend you use 0.10.2 to test as 0.10.3 is not released. We can not 
make sure 0.10.3 work well with Hive.

3) Hadoop 3.3.1

 

 

BTW, if the table _*local.test.b_qqd_shop_rfm_parquet_snappy* is_ empty without 
data, will the issue occur again in your env?

> hive can not read iceberg-parquet table
> ---
>
> Key: HIVE-27900
> URL: https://issues.apache.org/jira/browse/HIVE-27900
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Major
>
> We found that using HIVE4-BETA version, we could not query the 
> Iceberg-Parquet table with vectorised execution turned on.
> {code:java}
> --spark-sql(3.4.1+iceberg 1.4.2)
> CREATE TABLE local.test.b_qqd_shop_rfm_parquet_snappy (
> a string,b string,c string)
> USING iceberg
> LOCATION '/iceberg-catalog/warehouse/test/b_qqd_shop_rfm_parquet_snappy'
> TBLPROPERTIES (
>   'current-snapshot-id' = '5138351937447353683',
>   'format' = 'iceberg/parquet',
>   'format-version' = '2',
>   'read.orc.vectorization.enabled' = 'true',
>   'write.format.default' = 'parquet',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.parquet.compression-codec' = 'snappy');
> --hive-sql
> CREATE EXTERNAL TABLE iceberg_dwd.b_qqd_shop_rfm_parquet_snappy
> STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
> LOCATION 
> 'hdfs://xxx/iceberg-catalog/warehouse/test/b_qqd_shop_rfm_parquet_snappy/'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> set hive.default.fileformat=orc;
> set hive.default.fileformat.managed=orc;
> create table test_parquet_as_orc as select * from 
> b_qqd_shop_rfm_parquet_snappy limit 100;
> , TaskAttempt 2 failed, info=[Error: Node: /xxx..xx.xx: Error while 
> running task ( failure ) : 
> attempt_1696729618575_69586_1_00_00_2:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
> at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
> ... 16 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:993)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSourc

[jira] [Updated] (HIVE-27880) Iceberg: Support creating a branch on an empty table

2023-11-16 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-27880:
--
Summary: Iceberg: Support creating a branch on an empty table  (was: 
Iceberg: Supports creating a branch on an empty table)

> Iceberg: Support creating a branch on an empty table
> 
>
> Key: HIVE-27880
> URL: https://issues.apache.org/jira/browse/HIVE-27880
> Project: Hive
>  Issue Type: Sub-task
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
>
> After this Iceberg change [https://github.com/apache/iceberg/pull/8072] which 
> has been inclued iceberg1.4, we can create a branch on an empty. User can 
> create an empty branch, and then write data into the branch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27880) Iceberg: Supports creating a branch on an empty table

2023-11-16 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-27880:
--
Description: After this Iceberg change 
[https://github.com/apache/iceberg/pull/8072] which has been inclued 
iceberg1.4, we can create a branch on an empty. User can create an empty 
branch, and then write data into the branch.  (was: After this Iceberg change 
[https://github.com/apache/iceberg/pull/8072] which has been inclued 
iceberg1.4, we can create a branch on an empty. Use can create an empty branch, 
and then write data into the branch.)

> Iceberg: Supports creating a branch on an empty table
> -
>
> Key: HIVE-27880
> URL: https://issues.apache.org/jira/browse/HIVE-27880
> Project: Hive
>  Issue Type: Sub-task
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>
> After this Iceberg change [https://github.com/apache/iceberg/pull/8072] which 
> has been inclued iceberg1.4, we can create a branch on an empty. User can 
> create an empty branch, and then write data into the branch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27880) Iceberg: Supports creating a branch on an empty table

2023-11-16 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-27880:
--
Description: After this Iceberg change 
[https://github.com/apache/iceberg/pull/8072] which has been inclued 
iceberg1.4, we can create a branch on an empty. Use can create an empty branch, 
and then write data into the branch.

> Iceberg: Supports creating a branch on an empty table
> -
>
> Key: HIVE-27880
> URL: https://issues.apache.org/jira/browse/HIVE-27880
> Project: Hive
>  Issue Type: Sub-task
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>
> After this Iceberg change [https://github.com/apache/iceberg/pull/8072] which 
> has been inclued iceberg1.4, we can create a branch on an empty. Use can 
> create an empty branch, and then write data into the branch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-27880) Iceberg: Supports creating a branch on an empty table

2023-11-16 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao reassigned HIVE-27880:
-

Assignee: zhangbutao

> Iceberg: Supports creating a branch on an empty table
> -
>
> Key: HIVE-27880
> URL: https://issues.apache.org/jira/browse/HIVE-27880
> Project: Hive
>  Issue Type: Sub-task
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27880) Iceberg: Supports creating a branch on an empty table

2023-11-16 Thread zhangbutao (Jira)

zhangbutao created HIVE-27880:
-

 Summary: Iceberg: Supports creating a branch on an empty table
 Key: HIVE-27880
 URL: https://issues.apache.org/jira/browse/HIVE-27880
 Project: Hive
  Issue Type: Sub-task
  Components: Iceberg integration
Reporter: zhangbutao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27869) Iceberg: Select HadoopTables will fail at HiveIcebergStorageHandler::canProvideColStats

2023-11-10 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-27869:
--
Description: 
Step to reproduce:(latest master code)

1) Create path-based HadoopTable by Spark:
 
{code:java}
./spark-3.3.1-bin-hadoop3/bin/spark-sql \--master local \--deploy-mode client 
\--conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \--conf 
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog 
\--conf spark.sql.catalog.spark_catalog.type=hadoop \--conf 
spark.sql.catalog.spark_catalog.warehouse=hdfs://localhost:8028/tmp/testiceberg;


create table ice_test_001(id int) using iceberg;
insert into ice_test_001(id) values(1),(2),(3);{code}
 
2) Create iceberg table based on the HadoopTable by Hive:
{code:java}
CREATE EXTERNAL TABLE ice_test_001 STORED BY 
'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 
'hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001' TBLPROPERTIES 
('iceberg.catalog'='location_based_table'); {code}
3)Select the HadoopTable by Hive

// launch tez task to scan data
*set hive.fetch.task.conversion=none;*
{code:java}
jdbc:hive2://localhost:1/default> select * from ice_test_001;
Error: Error while compiling statement: FAILED: IllegalArgumentException 
Pathname 
/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 from 
hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 is not a valid DFS filename. (state=42000,code=4) {code}
Full stacktrace:
{code:java}
Caused by: java.lang.IllegalArgumentException: Pathname 
/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 from 
hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 is not a valid DFS filename.
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:256)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1752)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1749)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 ~[hadoop-common-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1764)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760) 
~[hadoop-common-3.3.1.jar:?]
        at 
org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStats(HiveIcebergStorageHandler.java:540)
 ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStatistics(HiveIcebergStorageHandler.java:533)
 ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:1073)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:302)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:193)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:181)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:173)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:148) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:125)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithSt

[jira] [Updated] (HIVE-27869) Iceberg: Select HadoopTables will fail at HiveIcebergStorageHandler::canProvideColStats

2023-11-10 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-27869:
--
Description: 
Step to reproduce:

1) Create path-based HadoopTable by Spark:
 
{code:java}
./spark-3.3.1-bin-hadoop3/bin/spark-sql \--master local \--deploy-mode client 
\--conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \--conf 
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog 
\--conf spark.sql.catalog.spark_catalog.type=hadoop \--conf 
spark.sql.catalog.spark_catalog.warehouse=hdfs://localhost:8028/tmp/testiceberg;


create table ice_test_001(id int) using iceberg;
insert into ice_test_001(id) values(1),(2),(3);{code}
 
2) Create iceberg table based on the HadoopTable by Hive:
{code:java}
CREATE EXTERNAL TABLE ice_test_001 STORED BY 
'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 
'hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001' TBLPROPERTIES 
('iceberg.catalog'='location_based_table'); {code}
3)Select the HadoopTable by Hive

// launch tez task to scan data
*set hive.fetch.task.conversion=none;*
{code:java}
jdbc:hive2://localhost:1/default> select * from ice_test_001;
Error: Error while compiling statement: FAILED: IllegalArgumentException 
Pathname 
/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 from 
hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 is not a valid DFS filename. (state=42000,code=4) {code}
Full stacktrace:
{code:java}
Caused by: java.lang.IllegalArgumentException: Pathname 
/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 from 
hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 is not a valid DFS filename.
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:256)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1752)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1749)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 ~[hadoop-common-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1764)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760) 
~[hadoop-common-3.3.1.jar:?]
        at 
org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStats(HiveIcebergStorageHandler.java:540)
 ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStatistics(HiveIcebergStorageHandler.java:533)
 ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:1073)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:302)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:193)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:181)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:173)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:148) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:125)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(A

[jira] [Updated] (HIVE-27869) Iceberg: Select HadoopTables will fail at HiveIcebergStorageHandler::canProvideColStats

2023-11-10 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-27869:
--
Description: 
Step to reproduce:

1) Create path-based HadoopTable by Spark:
 
{code:java}
./spark-3.3.1-bin-hadoop3/bin/spark-sql \--master local \--deploy-mode client 
\--conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \--conf 
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog 
\--conf spark.sql.catalog.spark_catalog.type=hadoop \--conf 
spark.sql.catalog.spark_catalog.warehouse=hdfs://localhost:8028/tmp/testiceberg;


create table ice_test_001(id int) using iceberg;
insert into ice_test_001(id) values(1),(2),(3);{code}
 
2) Create iceberg table based on the HadoopTable by Hive:
{code:java}
CREATE EXTERNAL TABLE ice_test_001 STORED BY 
'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 
'hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001' TBLPROPERTIES 
('iceberg.catalog'='location_based_table'); {code}
3)Select the HadoopTable by Hive

// launch tez task to scan data
*set hive.fetch.task.conversion=none;*
{code:java}
jdbc:hive2://localhost:10004/default> select * from ice_test_001;
Error: Error while compiling statement: FAILED: IllegalArgumentException 
Pathname 
/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 from 
hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 is not a valid DFS filename. (state=42000,code=4) {code}
Full stacktrace:
{code:java}
Caused by: java.lang.IllegalArgumentException: Pathname 
/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 from 
hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 is not a valid DFS filename.
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:256)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1752)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1749)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 ~[hadoop-common-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1764)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760) 
~[hadoop-common-3.3.1.jar:?]
        at 
org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStats(HiveIcebergStorageHandler.java:540)
 ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStatistics(HiveIcebergStorageHandler.java:533)
 ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:1073)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:302)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:193)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:181)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:173)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:148) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:125)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(A

[jira] [Updated] (HIVE-27869) Iceberg: Select HadoopTables will fail at HiveIcebergStorageHandler::canProvideColStats

2023-11-10 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-27869:
--
Description: 
Step to reproduce:

1) Create path-based HadoopTable by Spark:
 
{code:java}
./spark-3.3.1-bin-hadoop3/bin/spark-sql \--master local \--deploy-mode client 
\--conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \--conf 
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog 
\--conf spark.sql.catalog.spark_catalog.type=hadoop \--conf 
spark.sql.catalog.spark_catalog.warehouse=hdfs://localhost:8028/tmp/testiceberg;


create table ice_test_001(id int) using iceberg;
insert into ice_test_001(id) values(1),2),(3);{code}
 
2) Create iceberg table based on the HadoopTable by Hive:
{code:java}
CREATE EXTERNAL TABLE ice_test_001 STORED BY 
'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 
'hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001'TBLPROPERTIES 
('iceberg.catalog'='location_based_table'); {code}
3)Select the HadoopTable by Hive
*set hive.fetch.task.conversion=none;*
{code:java}
jdbc:hive2://localhost:10004/default> select * from ice_test_001;
Error: Error while compiling statement: FAILED: IllegalArgumentException 
Pathname 
/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 from 
hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 is not a valid DFS filename. (state=42000,code=4) {code}
Full stacktrace:
{code:java}
Caused by: java.lang.IllegalArgumentException: Pathname 
/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 from 
hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 is not a valid DFS filename.
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:256)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1752)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1749)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 ~[hadoop-common-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1764)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760) 
~[hadoop-common-3.3.1.jar:?]
        at 
org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStats(HiveIcebergStorageHandler.java:540)
 ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStatistics(HiveIcebergStorageHandler.java:533)
 ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:1073)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:302)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:193)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:181)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:173)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:148) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:125)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(AnnotateWithStatistics.java:84)
 ~[h

[jira] [Assigned] (HIVE-27869) Iceberg: Select HadoopTables will fail at HiveIcebergStorageHandler::canProvideColStats

2023-11-10 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao reassigned HIVE-27869:
-

Assignee: zhangbutao

> Iceberg: Select  HadoopTables will fail at 
> HiveIcebergStorageHandler::canProvideColStats
> 
>
> Key: HIVE-27869
> URL: https://issues.apache.org/jira/browse/HIVE-27869
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>
> Step to reproduce:
> 1) Create path-based HadoopTable by Spark:
>  
> {code:java}
> ./spark-3.3.1-bin-hadoop3/bin/spark-sql \--master local \--deploy-mode client 
> \--conf 
> spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
>  \--conf 
> spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog 
> \--conf spark.sql.catalog.spark_catalog.type=hadoop \--conf 
> spark.sql.catalog.spark_catalog.warehouse=hdfs://localhost:8028/tmp/testiceberg;
> create table ice_test_001(id int) using iceberg;
> insert into ice_test_001(id) values(1),2),(3);{code}
>  
> 2) Create iceberg table based on the HadoopTable by Hive:
> {code:java}
> CREATE EXTERNAL TABLE ice_test_001STORED BY 
> 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 
> 'hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001'TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table'); {code}
> 3)Select the HadoopTable by Hive
> *set hive.fetch.task.conversion=none;*
> {code:java}
> jdbc:hive2://localhost:10004/default> select * from testicedb118.ice_test_001;
> Error: Error while compiling statement: FAILED: IllegalArgumentException 
> Pathname 
> /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
>  from 
> hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
>  is not a valid DFS filename. (state=42000,code=4) {code}
> Full stacktrace:
> {code:java}
> Caused by: java.lang.IllegalArgumentException: Pathname 
> /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
>  from 
> hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
>  is not a valid DFS filename.
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:256)
>  ~[hadoop-hdfs-client-3.3.1.jar:?]
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1752)
>  ~[hadoop-hdfs-client-3.3.1.jar:?]
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1749)
>  ~[hadoop-hdfs-client-3.3.1.jar:?]
>         at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>  ~[hadoop-common-3.3.1.jar:?]
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1764)
>  ~[hadoop-hdfs-client-3.3.1.jar:?]
>         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760) 
> ~[hadoop-common-3.3.1.jar:?]
>         at 
> org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStats(HiveIcebergStorageHandler.java:540)
>  ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStatistics(HiveIcebergStorageHandler.java:533)
>  ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:1073)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:302)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:193)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:181)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:173)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalke

[jira] [Updated] (HIVE-27869) Iceberg: Select HadoopTables will fail at HiveIcebergStorageHandler::canProvideColStats

2023-11-10 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-27869:
--
Description: 
Step to reproduce:

1) Create path-based HadoopTable by Spark:
 
{code:java}
./spark-3.3.1-bin-hadoop3/bin/spark-sql \--master local \--deploy-mode client 
\--conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \--conf 
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog 
\--conf spark.sql.catalog.spark_catalog.type=hadoop \--conf 
spark.sql.catalog.spark_catalog.warehouse=hdfs://localhost:8028/tmp/testiceberg;


create table ice_test_001(id int) using iceberg;
insert into ice_test_001(id) values(1),2),(3);{code}
 
2) Create iceberg table based on the HadoopTable by Hive:
{code:java}
CREATE EXTERNAL TABLE ice_test_001STORED BY 
'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 
'hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001'TBLPROPERTIES 
('iceberg.catalog'='location_based_table'); {code}
3)Select the HadoopTable by Hive
*set hive.fetch.task.conversion=none;*
{code:java}
jdbc:hive2://localhost:10004/default> select * from testicedb118.ice_test_001;
Error: Error while compiling statement: FAILED: IllegalArgumentException 
Pathname 
/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 from 
hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 is not a valid DFS filename. (state=42000,code=4) {code}
Full stacktrace:
{code:java}
Caused by: java.lang.IllegalArgumentException: Pathname 
/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 from 
hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 is not a valid DFS filename.
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:256)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1752)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1749)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 ~[hadoop-common-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1764)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760) 
~[hadoop-common-3.3.1.jar:?]
        at 
org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStats(HiveIcebergStorageHandler.java:540)
 ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStatistics(HiveIcebergStorageHandler.java:533)
 ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:1073)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:302)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:193)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:181)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:173)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:148) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:125)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(AnnotateWithStatistics.j

[jira] [Created] (HIVE-27869) Iceberg: Select HadoopTables will fail at HiveIcebergStorageHandler::canProvideColStats

2023-11-10 Thread zhangbutao (Jira)

zhangbutao created HIVE-27869:
-

 Summary: Iceberg: Select  HadoopTables will fail at 
HiveIcebergStorageHandler::canProvideColStats
 Key: HIVE-27869
 URL: https://issues.apache.org/jira/browse/HIVE-27869
 Project: Hive
  Issue Type: Improvement
  Components: Iceberg integration
Reporter: zhangbutao


Step to reproduce:

1) Create path-based HadoopTable by Spark:
 
{code:java}
./spark-3.3.1-bin-hadoop3/bin/spark-sql \--master local \--deploy-mode client 
\--conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \--conf 
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog 
\--conf spark.sql.catalog.spark_catalog.type=hadoop \--conf 
spark.sql.catalog.spark_catalog.warehouse=hdfs://localhost:8028/tmp/testiceberg;


create table ice_test_001(id int) using iceberg;insert into ice_test_001(id) 
values(1),2),(3);{code}
 
2) Create iceberg table based on the HadoopTable by Hive:
{code:java}
CREATE EXTERNAL TABLE ice_test_001STORED BY 
'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 
'hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001'TBLPROPERTIES 
('iceberg.catalog'='location_based_table'); {code}
3)Select the HadoopTable by Hive
*set hive.fetch.task.conversion=none;*
{code:java}
jdbc:hive2://localhost:10004/default> select * from testicedb118.ice_test_001;
Error: Error while compiling statement: FAILED: IllegalArgumentException 
Pathname 
/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 from 
hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 is not a valid DFS filename. (state=42000,code=4) {code}
Full stacktrace:
{code:java}
Caused by: java.lang.IllegalArgumentException: Pathname 
/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 from 
hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 is not a valid DFS filename.
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:256)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1752)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1749)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 ~[hadoop-common-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1764)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760) 
~[hadoop-common-3.3.1.jar:?]
        at 
org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStats(HiveIcebergStorageHandler.java:540)
 ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStatistics(HiveIcebergStorageHandler.java:533)
 ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:1073)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:302)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:193)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:181)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:173)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:148) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(

[jira] [Updated] (HIVE-27819) Iceberg: Upgrade iceberg version to 1.4.2

2023-11-02 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-27819:
--
Summary: Iceberg: Upgrade iceberg version to 1.4.2  (was: Iceberg: Upgrade 
iceberg version to 1.4.1)

> Iceberg: Upgrade iceberg version to 1.4.2
> -
>
> Key: HIVE-27819
> URL: https://issues.apache.org/jira/browse/HIVE-27819
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
>
> Iceberg latest version 1.4.2 has been released out. we need upgrade iceberg 
> depdency from 1.3.9 to 1.4.2. Meantime, we should port some Hive catalog 
> changes from Iceberg repo to Hive repo.
> [https://iceberg.apache.org/releases/#142-release]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27819) Iceberg: Upgrade iceberg version to 1.4.2

2023-11-02 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-27819:
--
Description: 
Iceberg latest version 1.4.2 has been released out. we need upgrade iceberg 
depdency from 1.3.0 to 1.4.2. Meantime, we should port some Hive catalog 
changes from Iceberg repo to Hive repo.

[https://iceberg.apache.org/releases/#142-release]

 

  was:
Iceberg latest version 1.4.2 has been released out. we need upgrade iceberg 
depdency from 1.3.9 to 1.4.2. Meantime, we should port some Hive catalog 
changes from Iceberg repo to Hive repo.

[https://iceberg.apache.org/releases/#142-release]

 


> Iceberg: Upgrade iceberg version to 1.4.2
> -
>
> Key: HIVE-27819
> URL: https://issues.apache.org/jira/browse/HIVE-27819
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
>
> Iceberg latest version 1.4.2 has been released out. we need upgrade iceberg 
> depdency from 1.3.0 to 1.4.2. Meantime, we should port some Hive catalog 
> changes from Iceberg repo to Hive repo.
> [https://iceberg.apache.org/releases/#142-release]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27819) Iceberg: Upgrade iceberg version to 1.4.1

2023-11-02 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-27819:
--
Description: 
Iceberg latest version 1.4.2 has been released out. we need upgrade iceberg 
depdency from 1.3.9 to 1.4.2. Meantime, we should port some Hive catalog 
changes from Iceberg repo to Hive repo.

[https://iceberg.apache.org/releases/#142-release]

 

  was:
Iceberg latest version 1.4.2 has been released out. we need upgrade iceberg 
depdency from 1.3.9 to 1.4.2. Meantime, we should port some Hive catalog 
changes from Iceberg repo to Hive repo.

[https://iceberg.apache.org/releases/#142-release|https://iceberg.apache.org/releases/#141-release]

 


> Iceberg: Upgrade iceberg version to 1.4.1
> -
>
> Key: HIVE-27819
> URL: https://issues.apache.org/jira/browse/HIVE-27819
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
>
> Iceberg latest version 1.4.2 has been released out. we need upgrade iceberg 
> depdency from 1.3.9 to 1.4.2. Meantime, we should port some Hive catalog 
> changes from Iceberg repo to Hive repo.
> [https://iceberg.apache.org/releases/#142-release]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27819) Iceberg: Upgrade iceberg version to 1.4.1

2023-11-02 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-27819:
--
Description: 
Iceberg latest version 1.4.2 has been released out. we need upgrade iceberg 
depdency from 1.3.9 to 1.4.2. Meantime, we should port some Hive catalog 
changes from Iceberg repo to Hive repo.

[https://iceberg.apache.org/releases/#142-release|https://iceberg.apache.org/releases/#141-release]

 

  was:
Iceberg latest version 1.4.1 has been released out. we need upgrade iceberg 
depdency from 1.3.1 to 1.4.1. Meantime, we should port some Hive catalog 
changes from Iceberg repo to Hive repo.

[https://iceberg.apache.org/releases/#141-release]

 


> Iceberg: Upgrade iceberg version to 1.4.1
> -
>
> Key: HIVE-27819
> URL: https://issues.apache.org/jira/browse/HIVE-27819
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
>
> Iceberg latest version 1.4.2 has been released out. we need upgrade iceberg 
> depdency from 1.3.9 to 1.4.2. Meantime, we should port some Hive catalog 
> changes from Iceberg repo to Hive repo.
> [https://iceberg.apache.org/releases/#142-release|https://iceberg.apache.org/releases/#141-release]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26192) JDBC data connector queries occur exception at cbo stage

2023-10-31 Thread zhangbutao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781230#comment-17781230
 ] 

zhangbutao commented on HIVE-26192:
---

[~ngangam] Thanks for letting me this issue.

If i understand correctlly, we should change code as follows when the jdbc 
connector has different meaning between schema and database. e.g, postgres and 
oracle.  getCatalogName() can keep be null as for PG the database name must be 
specified in jdbc url, e.g. {*}jdbc:postgresql://localhost:5432/testpgdb{*}, so 
the value in getCatalogName() is no need any more and also it has no effect for 
the PG connection.

And users can  select a certain schema if they use the schemaname from property 
"connector.remoteDbName". I have tested this change locally, it works as 
expected.
{code:java}
diff --git 
a/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/PostgreSQLConnectorProvider.java
 
b/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/PostgreSQLConnectorProvider.java
index b79bee452d..79a505e6a9 100644
--- 
a/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/PostgreSQLConnectorProvider.java
+++ 
b/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/PostgreSQLConnectorProvider.java
@@ -36,11 +36,11 @@ public PostgreSQLConnectorProvider(String dbName, 
DataConnector dataConn) {
   }   @Override protected String getCatalogName() {
-    return scoped_db;
+    return null;
   }   @Override protected String getDatabaseName() {
-    return null;
+    return scoped_db;
   }
 {code}
Do I understand your question correctly? If we come to an agreement about this 
issue, i can submit a PR to fix it. Thanks.

> JDBC data connector queries  occur exception at cbo stage
> -
>
> Key: HIVE-26192
> URL: https://issues.apache.org/jira/browse/HIVE-26192
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0-alpha-2
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> If you do a select query qtest with jdbc data connector, you will  see 
> exception at cbo stage:
> {code:java}
> [ERROR] Failures:
> [ERROR]   TestMiniLlapCliDriver.testCliDriver:62 Client execution failed with 
> error code = 4
> running
> select * from country
> fname=dataconnector_mysql.qSee ./ql/target/tmp/log/hive.log or 
> ./itests/qtest/target/tmp/log/hive.log, or check ./ql/target/surefire-reports 
> or ./itests/qtest/target/surefire-reports/ for specific test cases logs.
>  org.apache.hadoop.hive.ql.parse.SemanticException: Table qtestDB.country was 
> not found in the database
>         at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genTableLogicalPlan(CalcitePlanner.java:3078)
>         at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5048)
>         at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1665)
>         at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1605)
>         at 
> org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131)
>         at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914)
>         at 
> org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180)
>         at 
> org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126)
>         at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1357)
>         at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:567)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12587)
>         at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:460)
>         at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:317)
>         at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
>         at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:106)
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:500)
>         at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:452)
>         at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:416)
>         at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:410)
>         at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespo

[jira] [Commented] (HIVE-9260) Implement the bloom filter for the ParquetSerde

2023-10-25 Thread zhangbutao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779740#comment-17779740
 ] 

zhangbutao commented on HIVE-9260:
--

[~Ferd] Will you continue to finish the ticket? I think bloom filter is more 
usefull to accelerate the parquet table query.

> Implement the bloom filter for the ParquetSerde
> ---
>
> Key: HIVE-9260
> URL: https://issues.apache.org/jira/browse/HIVE-9260
> Project: Hive
>  Issue Type: Sub-task
>  Components: Serializers/Deserializers
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
>Priority: Major
> Attachments: HIVE-9260.patch
>
>
> Implement the boom filter for parquet



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27826) Upgrade to Parquet 1.13.1

2023-10-25 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-27826:
--
Attachment: mvn_dependency_tree.text

> Upgrade to Parquet 1.13.1
> -
>
> Key: HIVE-27826
> URL: https://issues.apache.org/jira/browse/HIVE-27826
> Project: Hive
>  Issue Type: Improvement
>  Components: Parquet
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
> Attachments: mvn_dependency_tree.text
>
>
> Upgrade parquet to 1.13.1.  Apache Iceberg also use this latest parquet 
> version.
> [https://github.com/apache/iceberg/pull/7301]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-27826) Upgrade to Parquet 1.13.1

2023-10-25 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao reassigned HIVE-27826:
-

Assignee: zhangbutao

> Upgrade to Parquet 1.13.1
> -
>
> Key: HIVE-27826
> URL: https://issues.apache.org/jira/browse/HIVE-27826
> Project: Hive
>  Issue Type: Improvement
>  Components: Parquet
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>
> Upgrade parquet to 1.13.1.  Apache Iceberg also use this latest parquet 
> version.
> [https://github.com/apache/iceberg/pull/7301]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27826) Upgrade to Parquet 1.13.1

2023-10-25 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-27826:
--
Description: 
Upgrade parquet to 1.13.1.  Apache Iceberg also use this latest parquet version.

[https://github.com/apache/iceberg/pull/7301]

 

  was:Upgrade parquet to 1.13.1.  Apache Iceberg also use this parquet version.


> Upgrade to Parquet 1.13.1
> -
>
> Key: HIVE-27826
> URL: https://issues.apache.org/jira/browse/HIVE-27826
> Project: Hive
>  Issue Type: Improvement
>  Components: Parquet
>Reporter: zhangbutao
>Priority: Major
>
> Upgrade parquet to 1.13.1.  Apache Iceberg also use this latest parquet 
> version.
> [https://github.com/apache/iceberg/pull/7301]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27826) Upgrade to Parquet 1.13.1

2023-10-25 Thread zhangbutao (Jira)

zhangbutao created HIVE-27826:
-

 Summary: Upgrade to Parquet 1.13.1
 Key: HIVE-27826
 URL: https://issues.apache.org/jira/browse/HIVE-27826
 Project: Hive
  Issue Type: Improvement
  Components: Parquet
Reporter: zhangbutao


Upgrade parquet to 1.13.1.  Apache Iceberg also use this parquet version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-27776) Iceberg: Upgrade iceberg version to 1.4.0

2023-10-24 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao resolved HIVE-27776.
---
Resolution: Duplicate

> Iceberg: Upgrade iceberg version to 1.4.0
> -
>
> Key: HIVE-27776
> URL: https://issues.apache.org/jira/browse/HIVE-27776
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: zhangbutao
>Priority: Major
>
> [https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-core/1.4.0]
> Iceberg1.4.0 has been released out, and we need upgrade iceberg depdency from 
> 1.3.1 to 1.4.0. Meantime, we should port some Hive catalog changes from 
> Iceberg repo to Hive repo.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-27776) Iceberg: Upgrade iceberg version to 1.4.0

2023-10-24 Thread zhangbutao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779010#comment-17779010
 ] 

zhangbutao commented on HIVE-27776:
---

Supersede by https://issues.apache.org/jira/browse/HIVE-27819

> Iceberg: Upgrade iceberg version to 1.4.0
> -
>
> Key: HIVE-27776
> URL: https://issues.apache.org/jira/browse/HIVE-27776
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: zhangbutao
>Priority: Major
>
> [https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-core/1.4.0]
> Iceberg1.4.0 has been released out, and we need upgrade iceberg depdency from 
> 1.3.1 to 1.4.0. Meantime, we should port some Hive catalog changes from 
> Iceberg repo to Hive repo.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-27819) Iceberg: Upgrade iceberg version to 1.4.1

2023-10-24 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao reassigned HIVE-27819:
-

Assignee: zhangbutao

> Iceberg: Upgrade iceberg version to 1.4.1
> -
>
> Key: HIVE-27819
> URL: https://issues.apache.org/jira/browse/HIVE-27819
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>
> Iceberg latest version 1.4.1 has been released out. we need upgrade iceberg 
> depdency from 1.3.1 to 1.4.1. Meantime, we should port some Hive catalog 
> changes from Iceberg repo to Hive repo.
> [https://iceberg.apache.org/releases/#141-release]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27819) Iceberg: Upgrade iceberg version to 1.4.1

2023-10-24 Thread zhangbutao (Jira)

zhangbutao created HIVE-27819:
-

 Summary: Iceberg: Upgrade iceberg version to 1.4.1
 Key: HIVE-27819
 URL: https://issues.apache.org/jira/browse/HIVE-27819
 Project: Hive
  Issue Type: Improvement
  Components: Iceberg integration
Reporter: zhangbutao


Iceberg latest version 1.4.1 has been released out. we need upgrade iceberg 
depdency from 1.3.1 to 1.4.1. Meantime, we should port some Hive catalog 
changes from Iceberg repo to Hive repo.

[https://iceberg.apache.org/releases/#141-release]

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (HIVE-27814) Support VIEWs in the metadata federation

2023-10-19 Thread zhangbutao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1561#comment-1561
 ] 

zhangbutao edited comment on HIVE-27814 at 10/20/23 4:43 AM:
-

[~ngangam] All the data connectors we have implemented are JDBC type, and I 
think it is no problem to add the jdbc VIEW as hms remote table.  I can't think 
of connctor which should be excluded. Let's implement it first.


was (Author: zhangbutao):
[~ngangam] All the data connectors we have implemented are JDBC type, and I 
think it is no problem to add the jdbc VIEW as hms remote table.  I can't think 
of connctor which should exclude. Let's implement it first.

> Support VIEWs in the metadata federation
> 
>
> Key: HIVE-27814
> URL: https://issues.apache.org/jira/browse/HIVE-27814
> Project: Hive
>  Issue Type: Sub-task
>  Components: Standalone Metastore
>Affects Versions: 4.0.0-beta-1
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>
> Currently, we only federate the TABLE type objects from the remote 
> datasource. We should be able to pull in VIEW type objects as well.
> It appears we can currently create a JDBC-storage handler based table in Hive 
> that points to a view in the remote DB server. I do not see a reason to not 
> include this in the list of federated objects we pull in.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-27814) Support VIEWs in the metadata federation

2023-10-19 Thread zhangbutao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1561#comment-1561
 ] 

zhangbutao commented on HIVE-27814:
---

[~ngangam] All the data connectors we have implemented are JDBC type, and I 
think it is no problem to add the jdbc VIEW as hms remote table.  I can't think 
of connctor which should exclude. Let's implement it first.

> Support VIEWs in the metadata federation
> 
>
> Key: HIVE-27814
> URL: https://issues.apache.org/jira/browse/HIVE-27814
> Project: Hive
>  Issue Type: Sub-task
>  Components: Standalone Metastore
>Affects Versions: 4.0.0-beta-1
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>
> Currently, we only federate the TABLE type objects from the remote 
> datasource. We should be able to pull in VIEW type objects as well.
> It appears we can currently create a JDBC-storage handler based table in Hive 
> that points to a view in the remote DB server. I do not see a reason to not 
> include this in the list of federated objects we pull in.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-27793) Iceberg: Support setting current snapshot with SnapshotRef

2023-10-12 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao reassigned HIVE-27793:
-

Assignee: zhangbutao

> Iceberg: Support setting current snapshot with SnapshotRef
> --
>
> Key: HIVE-27793
> URL: https://issues.apache.org/jira/browse/HIVE-27793
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>
> [https://iceberg.apache.org/docs/latest/spark-procedures/#set_current_snapshot]
> Spark supports setting current snapshot using snapshotId or snapshotRef. We 
> can refer to this to implement setting current snapshot with 
> SnapshotRef(branch or tag).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27793) Iceberg: Support setting current snapshot with SnapshotRef

2023-10-12 Thread zhangbutao (Jira)

zhangbutao created HIVE-27793:
-

 Summary: Iceberg: Support setting current snapshot with SnapshotRef
 Key: HIVE-27793
 URL: https://issues.apache.org/jira/browse/HIVE-27793
 Project: Hive
  Issue Type: Improvement
  Components: Iceberg integration
Reporter: zhangbutao


[https://iceberg.apache.org/docs/latest/spark-procedures/#set_current_snapshot]

Spark supports setting current snapshot using snapshotId or snapshotRef. We can 
refer to this to implement setting current snapshot with SnapshotRef(branch or 
tag).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-27597) Implement JDBC Connector for HiveServer

2023-10-09 Thread zhangbutao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773426#comment-17773426
 ] 

zhangbutao commented on HIVE-27597:
---

[~ngangam] I probably still don't have edit permission, and i can't find the 
edit button. Could you please check it again? Thanks.

[https://cwiki.apache.org/confluence/display/Hive/Data+Connectors+in+Hive]

 

> Implement JDBC Connector for HiveServer 
> 
>
> Key: HIVE-27597
> URL: https://issues.apache.org/jira/browse/HIVE-27597
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>
> The initial idea of having a thrift based connector, that would enable Hive 
> Metastore to use thrift APIs to interact with another metastore from another 
> cluster, has some limitations. Features like column masking support become a 
> challenge as we may bypass the authz controls on the remote cluster.
> Instead if we could federate a query from one instance of HS2 to another 
> instance of HS2 over JDBC, we would address the above concerns. This will 
> atleast give us the ability to access tables across cluster boundaries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-27780) Implement direct SQL for get_all_functions

2023-10-09 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao reassigned HIVE-27780:
-

Assignee: zhangbutao

> Implement direct SQL for get_all_functions
> --
>
> Key: HIVE-27780
> URL: https://issues.apache.org/jira/browse/HIVE-27780
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27780) Implement direct SQL for get_all_functions

2023-10-09 Thread zhangbutao (Jira)

zhangbutao created HIVE-27780:
-

 Summary: Implement direct SQL for get_all_functions
 Key: HIVE-27780
 URL: https://issues.apache.org/jira/browse/HIVE-27780
 Project: Hive
  Issue Type: Improvement
  Components: Standalone Metastore
Reporter: zhangbutao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27776) Iceberg: Upgrade iceberg version to 1.4.0

2023-10-06 Thread zhangbutao (Jira)

zhangbutao created HIVE-27776:
-

 Summary: Iceberg: Upgrade iceberg version to 1.4.0
 Key: HIVE-27776
 URL: https://issues.apache.org/jira/browse/HIVE-27776
 Project: Hive
  Issue Type: Improvement
  Components: Iceberg integration
Reporter: zhangbutao


[https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-core/1.4.0]

Iceberg1.4.0 has been released out, and we need upgrade iceberg depdency from 
1.3.1 to 1.4.0. Meantime, we should port some Hive catalog changes from Iceberg 
repo to Hive repo.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27729) Iceberg: Check Iceberg type in AlterTableExecuteAnalyzer

2023-09-25 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-27729:
--
Description: 
If we execute ROLLBACK and other cmd(expire snashot & fast_forward. etc) on a 
*non-iceberg table,* it will throw NPE. We need to check iceberg type in 
_AlterTableExecuteAnalyzer_ to throw a better exception.
{code:java}
//create a non-iceberg table
create table non-iceberg (id int);{code}
{code:java}
// execute rollback
ALTER TABLE non_ice EXECUTE ROLLBACK('2022-09-26 00:00:00');{code}
 
{code:java}
ERROR : Failed
java.lang.NullPointerException: null
        at 
org.apache.hadoop.hive.ql.metadata.Hive.alterTableExecuteOperation(Hive.java:6772)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.ddl.table.execute.AlterTableExecuteOperation.execute(AlterTableExecuteOperation.java:37)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:327) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:244) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:105) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:367) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:205) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:236)
 ~[hive-service-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:90)
 ~[hive-service-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:336)
 ~[hive-service-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at java.security.AccessController.doPrivileged(Native Method) 
~[?:1.8.0_291]
        at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_291]
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
 ~[hadoop-common-3.3.1.jar:?]
        at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:356)
 ~[hive-service-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_291]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_291]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_291]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_291]
        at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_291]
ERROR : DDLTask failed, DDL Operation: class 
org.apache.hadoop.hive.ql.ddl.table.execute.AlterTableExecuteOperation
 {code}
 

 

  was:
If we execute ROLLBACK and other cmd(expire snashot & fast_forward. etc) on a 
*non-iceberg table,* it will throw NPE. We need to check iceberg type in 
_AlterTableExecuteAnalyzer_ to throw a better exception.

 
{code:java}
ERROR : Failed
java.lang.NullPointerException: null
        at 
org.apache.hadoop.hive.ql.metadata.Hive.alterTableExecuteOperation(Hive.java:6772)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.ddl.table.execute.AlterTableExecuteOperation.execute(AlterTableExecuteOperation.java:37)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hive.q

[jira] [Assigned] (HIVE-27729) Iceberg: Check Iceberg type in AlterTableExecuteAnalyzer

2023-09-25 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao reassigned HIVE-27729:
-

Assignee: zhangbutao

> Iceberg: Check Iceberg type in AlterTableExecuteAnalyzer
> 
>
> Key: HIVE-27729
> URL: https://issues.apache.org/jira/browse/HIVE-27729
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>
> If we execute ROLLBACK and other cmd(expire snashot & fast_forward. etc) on a 
> *non-iceberg table,* it will throw NPE. We need to check iceberg type in 
> _AlterTableExecuteAnalyzer_ to throw a better exception.
>  
> {code:java}
> ERROR : Failed
> java.lang.NullPointerException: null
>         at 
> org.apache.hadoop.hive.ql.metadata.Hive.alterTableExecuteOperation(Hive.java:6772)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.ddl.table.execute.AlterTableExecuteOperation.execute(AlterTableExecuteOperation.java:37)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:327) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:244) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:105) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:367) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:205) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185) 
> ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:236)
>  ~[hive-service-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:90)
>  ~[hive-service-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:336)
>  ~[hive-service-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_291]
>         at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_291]
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
>  ~[hadoop-common-3.3.1.jar:?]
>         at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:356)
>  ~[hive-service-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_291]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_291]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_291]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_291]
>         at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_291]
> ERROR : DDLTask failed, DDL Operation: class 
> org.apache.hadoop.hive.ql.ddl.table.execute.AlterTableExecuteOperation
>  {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27729) Iceberg: Check Iceberg type in AlterTableExecuteAnalyzer

2023-09-25 Thread zhangbutao (Jira)

zhangbutao created HIVE-27729:
-

 Summary: Iceberg: Check Iceberg type in AlterTableExecuteAnalyzer
 Key: HIVE-27729
 URL: https://issues.apache.org/jira/browse/HIVE-27729
 Project: Hive
  Issue Type: Improvement
  Components: Iceberg integration
Reporter: zhangbutao


If we execute ROLLBACK and other cmd(expire snashot & fast_forward. etc) on a 
*non-iceberg table,* it will throw NPE. We need to check iceberg type in 
_AlterTableExecuteAnalyzer_ to throw a better exception.

 
{code:java}
ERROR : Failed
java.lang.NullPointerException: null
        at 
org.apache.hadoop.hive.ql.metadata.Hive.alterTableExecuteOperation(Hive.java:6772)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.ddl.table.execute.AlterTableExecuteOperation.execute(AlterTableExecuteOperation.java:37)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:327) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:244) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:105) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:367) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:205) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:236)
 ~[hive-service-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:90)
 ~[hive-service-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:336)
 ~[hive-service-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at java.security.AccessController.doPrivileged(Native Method) 
~[?:1.8.0_291]
        at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_291]
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
 ~[hadoop-common-3.3.1.jar:?]
        at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:356)
 ~[hive-service-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_291]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_291]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_291]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_291]
        at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_291]
ERROR : DDLTask failed, DDL Operation: class 
org.apache.hadoop.hive.ql.ddl.table.execute.AlterTableExecuteOperation
 {code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-27711) Allow creating a branch from tag name

2023-09-21 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao reassigned HIVE-27711:
-

Assignee: zhangbutao

> Allow creating a branch from tag name
> -
>
> Key: HIVE-27711
> URL: https://issues.apache.org/jira/browse/HIVE-27711
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ayush Saxena
>Assignee: zhangbutao
>Priority: Major
>
> Allow creating a branch from tag name.
> If a tag is already there we should be able to create a branch with the same 
> snapshot id that corresponds to that tag



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-27689) Iceberg: Remove unsed iceberg property

2023-09-13 Thread zhangbutao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764995#comment-17764995
 ] 

zhangbutao commented on HIVE-27689:
---

PR https://github.com/apache/hive/pull/4681

> Iceberg: Remove unsed iceberg property
> --
>
> Key: HIVE-27689
> URL: https://issues.apache.org/jira/browse/HIVE-27689
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27689) Iceberg: Remove unsed iceberg property

2023-09-13 Thread zhangbutao (Jira)

zhangbutao created HIVE-27689:
-

 Summary: Iceberg: Remove unsed iceberg property
 Key: HIVE-27689
 URL: https://issues.apache.org/jira/browse/HIVE-27689
 Project: Hive
  Issue Type: Improvement
  Components: Iceberg integration
Reporter: zhangbutao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-27689) Iceberg: Remove unsed iceberg property

2023-09-13 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao reassigned HIVE-27689:
-

Assignee: zhangbutao

> Iceberg: Remove unsed iceberg property
> --
>
> Key: HIVE-27689
> URL: https://issues.apache.org/jira/browse/HIVE-27689
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-27651) Upgrade hbase version

2023-08-31 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao reassigned HIVE-27651:
-

Assignee: zhangbutao

> Upgrade hbase version
> -
>
> Key: HIVE-27651
> URL: https://issues.apache.org/jira/browse/HIVE-27651
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: zhangbutao
>Priority: Major
>
> Upgrade hbase version in hive, currently we are using some legacy alpha-4 
> version, move it to the latest



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27593) Iceberg: Keep iceberg properties in sync with hms properties

2023-08-10 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-27593:
--
Description: 
In HIVE-26596, as we have not implement all COW mode, we enforced *mor* mode 
for iceberg v2 table in the tow scenarios:
 # create a v2 iceberg table, the delete mode will be set *mor* if not specified
 # upgrage v1 table to v2, and the delete mode will be set mor

 

In HS2, we check the mode(cow/mor) from hms table properties instead of 
*iceberg* {*}properties(metadata json file){*}, and HIVE-26596 only change hms 
table properties.  Therefore, it is ok for HS2 to operate iceberg table by 
checking cow/mor mode from hms properties, but for others like Spark, they 
operate the iceberg table by checking cow/mor from {*}iceberg 
properties(metadata json file){*}.

Before we implement all COW mode, we need keep iceberg properties in sync with 
hms properties to  make the users have the same experience on multiple 
engines(HS2 & Spark).

  was:
In HIVE-26596, as we have not implement all COW mode, we enforced *mor* mode 
for iceberg v2 table in the tow scenarios:
 # create a v2 iceberg table, the delete mode will be set *mor* if not specified
 # upgrage v1 table to v2, and the delete mode will be set mor

 

In HS2, we check the mode(cow/mor) from hms table properties instead of 
*iceberg* {*}properties(metadata json file){*}, and HIVE-26596 only change hms 
table properties.  Therefore, it is ok for HS2 to operate iceberg table by 
checking cow/mor mode from hms properties, but for others like Spark, they 
operate the iceberg table by checking cow/mor from {*}iceberg 
properties(metadata json file){*}.

Before we implement all COW mode, we need keep iceberg properties in sync with 
hms properties to ** make the users have the same experience on multiple 
engines(HS2 & Spark).


> Iceberg: Keep iceberg properties in sync with hms properties
> 
>
> Key: HIVE-27593
> URL: https://issues.apache.org/jira/browse/HIVE-27593
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>
> In HIVE-26596, as we have not implement all COW mode, we enforced *mor* mode 
> for iceberg v2 table in the tow scenarios:
>  # create a v2 iceberg table, the delete mode will be set *mor* if not 
> specified
>  # upgrage v1 table to v2, and the delete mode will be set mor
>  
> In HS2, we check the mode(cow/mor) from hms table properties instead of 
> *iceberg* {*}properties(metadata json file){*}, and HIVE-26596 only change 
> hms table properties.  Therefore, it is ok for HS2 to operate iceberg table 
> by checking cow/mor mode from hms properties, but for others like Spark, they 
> operate the iceberg table by checking cow/mor from {*}iceberg 
> properties(metadata json file){*}.
> Before we implement all COW mode, we need keep iceberg properties in sync 
> with hms properties to  make the users have the same experience on multiple 
> engines(HS2 & Spark).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27593) Iceberg: Keep iceberg properties in sync with hms properties

2023-08-10 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-27593:
--
Description: 
In HIVE-26596, as we have not implement all COW mode, we enforced *mor* mode 
for iceberg v2 table in the tow scenarios:
 # create a v2 iceberg table, the delete mode will be set *mor* if not specified
 # upgrage v1 table to v2, and the delete mode will be set mor

 

In HS2, we check the mode(cow/mor) from hms table properties instead of 
*iceberg* {*}properties(metadata json file){*}, and HIVE-26596 only change hms 
table properties.  Therefore, it is ok for HS2 to operate iceberg table by 
checking cow/mor mode from hms properties, but for others like Spark, they 
operate the iceberg table by checking cow/mor from {*}iceberg 
properties(metadata json file){*}.

Before we implement all COW mode, we need keep iceberg properties in sync with 
hms properties to ** make the users have the same experience on multiple 
engines(HS2 & Spark).

> Iceberg: Keep iceberg properties in sync with hms properties
> 
>
> Key: HIVE-27593
> URL: https://issues.apache.org/jira/browse/HIVE-27593
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>
> In HIVE-26596, as we have not implement all COW mode, we enforced *mor* mode 
> for iceberg v2 table in the tow scenarios:
>  # create a v2 iceberg table, the delete mode will be set *mor* if not 
> specified
>  # upgrage v1 table to v2, and the delete mode will be set mor
>  
> In HS2, we check the mode(cow/mor) from hms table properties instead of 
> *iceberg* {*}properties(metadata json file){*}, and HIVE-26596 only change 
> hms table properties.  Therefore, it is ok for HS2 to operate iceberg table 
> by checking cow/mor mode from hms properties, but for others like Spark, they 
> operate the iceberg table by checking cow/mor from {*}iceberg 
> properties(metadata json file){*}.
> Before we implement all COW mode, we need keep iceberg properties in sync 
> with hms properties to ** make the users have the same experience on multiple 
> engines(HS2 & Spark).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27593) Iceberg: Keep iceberg properties in sync with hms properties

2023-08-10 Thread zhangbutao (Jira)

zhangbutao created HIVE-27593:
-

 Summary: Iceberg: Keep iceberg properties in sync with hms 
properties
 Key: HIVE-27593
 URL: https://issues.apache.org/jira/browse/HIVE-27593
 Project: Hive
  Issue Type: Improvement
  Components: Iceberg integration
Reporter: zhangbutao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-27593) Iceberg: Keep iceberg properties in sync with hms properties

2023-08-10 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao reassigned HIVE-27593:
-

Assignee: zhangbutao

> Iceberg: Keep iceberg properties in sync with hms properties
> 
>
> Key: HIVE-27593
> URL: https://issues.apache.org/jira/browse/HIVE-27593
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27565) Fix NPE when dropping table in HiveQueryLifeTimeHook::checkAndRollbackCTAS

2023-08-04 Thread zhangbutao (Jira)

zhangbutao created HIVE-27565:
-

 Summary: Fix NPE when dropping table in 
HiveQueryLifeTimeHook::checkAndRollbackCTAS
 Key: HIVE-27565
 URL: https://issues.apache.org/jira/browse/HIVE-27565
 Project: Hive
  Issue Type: Bug
Reporter: zhangbutao


If dropping a iceberg table which is used by a materialized view, 
HiveQueryLifeTimeHook::checkAndRollbackCTAS will throw NPE.

 

Step to repro:
 * create a iceberg table:

create table test_ice1 (id int) stored by iceberg;
 * create a materialized view:

create materialized view ice_mat1 as select * from test_ice1;
 * drop the iceberg table:

drop table test_ice01;

 
{code:java}
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_291]
        at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:4462)
 ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
        at com.sun.proxy.$Proxy47.dropTable(Unknown Source) ~[?:?]
        at org.apache.hadoop.hive.ql.metadata.Hive.dropTable(Hive.java:1500) 
~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
        ... 26 more
ERROR : FAILED: Execution Error, return code 4 from 
org.apache.hadoop.hive.ql.ddl.DDLTask. MetaException(message:Cannot drop table 
as it is used in the following materialized views [testdbpr.ice_mat1]
)
WARN  : Failed when invoking query after execution hook
java.lang.RuntimeException: Not able to check whether the CTAS table directory 
exists due to:
        at 
org.apache.hadoop.hive.ql.HiveQueryLifeTimeHook.checkAndRollbackCTAS(HiveQueryLifeTimeHook.java:84)
 ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.HiveQueryLifeTimeHook.afterExecution(HiveQueryLifeTimeHook.java:65)
 ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.HookRunner.runAfterExecutionHook(HookRunner.java:185) 
~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Executor.cleanUp(Executor.java:525) 
~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:118) 
~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:367) 
~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:205) 
~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) 
~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) 
~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185) 
~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
        at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:236)
 ~[hive-service-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
        at 
org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:90)
 ~[hive-service-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
        at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:336)
 ~[hive-service-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
        at java.security.AccessController.doPrivileged(Native Method) 
~[?:1.8.0_291]
        at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_291]
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
 ~[hadoop-common-3.3.1.jar:?]
        at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:356)
 ~[hive-service-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_291]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_291]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_291]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_291]
        at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_291]
Caused by: java.lang.NullPointerException
        at 
org.apache.hadoop.hive.ql.HiveQueryLifeTimeHook.checkAndRollbackCTAS(HiveQueryLifeTimeHook.java:79)
 ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
        ... 21 more
INFO  : Completed executing 
command(queryId=hive_20230804145734_08837e22-5ff0-4b56-a0cf-69b0414171dd); Time 
taken: 0.073 seconds
Error: Error while compiling statement: FAILED: Execution Error, return code 
4 from org.apache.hadoop.hive.ql.ddl.DDLTask. MetaException(message:Cannot 
drop table as it is used in the follo

[jira] [Assigned] (HIVE-27565) Fix NPE when dropping table in HiveQueryLifeTimeHook::checkAndRollbackCTAS

2023-08-04 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao reassigned HIVE-27565:
-

Assignee: zhangbutao

> Fix NPE when dropping table in HiveQueryLifeTimeHook::checkAndRollbackCTAS
> --
>
> Key: HIVE-27565
> URL: https://issues.apache.org/jira/browse/HIVE-27565
> Project: Hive
>  Issue Type: Bug
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>
> If dropping a iceberg table which is used by a materialized view, 
> HiveQueryLifeTimeHook::checkAndRollbackCTAS will throw NPE.
>  
> Step to repro:
>  * create a iceberg table:
> create table test_ice1 (id int) stored by iceberg;
>  * create a materialized view:
> create materialized view ice_mat1 as select * from test_ice1;
>  * drop the iceberg table:
> drop table test_ice1;
>  
> {code:java}
>         at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_291]
>         at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:4462)
>  ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
>         at com.sun.proxy.$Proxy47.dropTable(Unknown Source) ~[?:?]
>         at org.apache.hadoop.hive.ql.metadata.Hive.dropTable(Hive.java:1500) 
> ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
>         ... 26 more
> ERROR : FAILED: Execution Error, return code 4 from 
> org.apache.hadoop.hive.ql.ddl.DDLTask. MetaException(message:Cannot drop 
> table as it is used in the following materialized views [testdbpr.ice_mat1]
> )
> WARN  : Failed when invoking query after execution hook
> java.lang.RuntimeException: Not able to check whether the CTAS table 
> directory exists due to:
>         at 
> org.apache.hadoop.hive.ql.HiveQueryLifeTimeHook.checkAndRollbackCTAS(HiveQueryLifeTimeHook.java:84)
>  ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.HiveQueryLifeTimeHook.afterExecution(HiveQueryLifeTimeHook.java:65)
>  ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.HookRunner.runAfterExecutionHook(HookRunner.java:185)
>  ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Executor.cleanUp(Executor.java:525) 
> ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:118) 
> ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:367) 
> ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:205) 
> ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) 
> ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) 
> ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185) 
> ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:236)
>  ~[hive-service-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:90)
>  ~[hive-service-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
>         at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:336)
>  ~[hive-service-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
>         at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_291]
>         at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_291]
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
>  ~[hadoop-common-3.3.1.jar:?]
>         at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:356)
>  ~[hive-service-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_291]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_291]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_291]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_291]
>         at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_291]
> Caused by: java.lang.NullPointerException
>         at 
> org.apache.hadoop.hive.ql.HiveQueryLifeTimeHook.checkAndRollbac

[jira] [Updated] (HIVE-27565) Fix NPE when dropping table in HiveQueryLifeTimeHook::checkAndRollbackCTAS

2023-08-04 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-27565:
--
Description: 
If dropping a iceberg table which is used by a materialized view, 
HiveQueryLifeTimeHook::checkAndRollbackCTAS will throw NPE.

 

Step to repro:
 * create a iceberg table:

create table test_ice1 (id int) stored by iceberg;
 * create a materialized view:

create materialized view ice_mat1 as select * from test_ice1;
 * drop the iceberg table:

drop table test_ice1;

 
{code:java}
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_291]
        at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:4462)
 ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
        at com.sun.proxy.$Proxy47.dropTable(Unknown Source) ~[?:?]
        at org.apache.hadoop.hive.ql.metadata.Hive.dropTable(Hive.java:1500) 
~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
        ... 26 more
ERROR : FAILED: Execution Error, return code 4 from 
org.apache.hadoop.hive.ql.ddl.DDLTask. MetaException(message:Cannot drop table 
as it is used in the following materialized views [testdbpr.ice_mat1]
)
WARN  : Failed when invoking query after execution hook
java.lang.RuntimeException: Not able to check whether the CTAS table directory 
exists due to:
        at 
org.apache.hadoop.hive.ql.HiveQueryLifeTimeHook.checkAndRollbackCTAS(HiveQueryLifeTimeHook.java:84)
 ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.HiveQueryLifeTimeHook.afterExecution(HiveQueryLifeTimeHook.java:65)
 ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.HookRunner.runAfterExecutionHook(HookRunner.java:185) 
~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Executor.cleanUp(Executor.java:525) 
~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:118) 
~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:367) 
~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:205) 
~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) 
~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) 
~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185) 
~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
        at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:236)
 ~[hive-service-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
        at 
org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:90)
 ~[hive-service-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
        at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:336)
 ~[hive-service-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
        at java.security.AccessController.doPrivileged(Native Method) 
~[?:1.8.0_291]
        at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_291]
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
 ~[hadoop-common-3.3.1.jar:?]
        at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:356)
 ~[hive-service-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_291]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_291]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_291]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_291]
        at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_291]
Caused by: java.lang.NullPointerException
        at 
org.apache.hadoop.hive.ql.HiveQueryLifeTimeHook.checkAndRollbackCTAS(HiveQueryLifeTimeHook.java:79)
 ~[hive-exec-4.0.0-beta-1-SNAPSHOT.jar:4.0.0-beta-1-SNAPSHOT]
        ... 21 more
INFO  : Completed executing 
command(queryId=hive_20230804145734_08837e22-5ff0-4b56-a0cf-69b0414171dd); Time 
taken: 0.073 seconds
Error: Error while compiling statement: FAILED: Execution Error, return code 
4 from org.apache.hadoop.hive.ql.ddl.DDLTask. MetaException(message:Cannot 
drop table as it is used in the following materialized views [testdbpr.ice_mat1]
) (state=08S01,code=4)
 {code}
 

 

  was:
If dropping a iceberg table which is used by a materiali

[jira] [Commented] (HIVE-27553) After upgrading from Hive1 to Hive3, Decimal computation experiences a loss of precision

2023-08-01 Thread zhangbutao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17749598#comment-17749598
 ] 

zhangbutao commented on HIVE-27553:
---

This issue was caused by HIVE-15331 which emulated the SQL Server decimal 
behavior. 

> After upgrading from Hive1 to Hive3, Decimal computation experiences a loss 
> of precision
> 
>
> Key: HIVE-27553
> URL: https://issues.apache.org/jira/browse/HIVE-27553
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.3
>Reporter: ZhengBowen
>Priority: Major
> Attachments: image-2023-07-31-20-40-00-679.png, 
> image-2023-07-31-20-40-35-050.png, image-2023-07-31-20-43-05-379.png, 
> image-2023-07-31-20-43-49-775.png
>
>
> I can reproduce this bug.
> {quote}{{create}} {{table}} {{decimal_test(}}
> {{}}{{id }}{{{}int{}}}{{{},{}}}
> {{}}{{quantity }}{{{}decimal{}}}{{{}(38,8),{}}}
> {{}}{{cost }}{{{}decimal{}}}{{{}(38,8){}}}
> {{) stored }}{{as}} {{textfile;}}
>  
> {{insert}} {{into}} {{decimal_test }}{{{}values{}}}{{{}(1,0.8000, 
> 0.00015000);{}}}
>  
> {{select}} {{quantity * cost }}{{from}} {{decimal_test;}}
> {quote}
> *1、The following are the execution results and execution plan on Hive-1.0.1：*
> !image-2023-07-31-20-40-00-679.png|width=550,height=230!
> !image-2023-07-31-20-43-05-379.png|width=540,height=144!
> *2、The following are the execution results and execution plan on Hive-3.1.3：*
> !image-2023-07-31-20-40-35-050.png|width=538,height=257!
> !image-2023-07-31-20-43-49-775.png|width=533,height=142!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-27440) Improve data connector cache

2023-07-15 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao resolved HIVE-27440.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

> Improve data connector cache
> 
>
> Key: HIVE-27440
> URL: https://issues.apache.org/jira/browse/HIVE-27440
> Project: Hive
>  Issue Type: Sub-task
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> _*DataConnectorProviderFactory*_ uses HashMap to cache data connector 
> instances, and there is no way to invalidate the cache unless you restart the 
> MetaStore.  
> What is more serious is that if you drop or alter the dataconnector, the 
> cache will not change, and you maybe use a invalid dataconnector next time.
>  
> I think we can improve the dataconnector cache from the two aspects:
>  * Use Caffeine with a *maxmumsize* e.g. 100  to cache data connector instead 
> of HashMap, and set a *expire time* after the last accessing. And we also 
> should close the underlying  datasource connection using {*}Caffeine 
> RemovalListener{*}.
>  * After executing Drop or Alter DDL on a dataConnector, we should *update 
> cache* to clean the dataConnector to avoid using the invalid dataConnector 
> next time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-27440) Improve data connector cache

2023-07-15 Thread zhangbutao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17743354#comment-17743354
 ] 

zhangbutao commented on HIVE-27440:
---

Fix has been merged! Thanks [~hemanth619] [~akshatm] [~ngangam]

> Improve data connector cache
> 
>
> Key: HIVE-27440
> URL: https://issues.apache.org/jira/browse/HIVE-27440
> Project: Hive
>  Issue Type: Sub-task
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
>
> _*DataConnectorProviderFactory*_ uses HashMap to cache data connector 
> instances, and there is no way to invalidate the cache unless you restart the 
> MetaStore.  
> What is more serious is that if you drop or alter the dataconnector, the 
> cache will not change, and you maybe use a invalid dataconnector next time.
>  
> I think we can improve the dataconnector cache from the two aspects:
>  * Use Caffeine with a *maxmumsize* e.g. 100  to cache data connector instead 
> of HashMap, and set a *expire time* after the last accessing. And we also 
> should close the underlying  datasource connection using {*}Caffeine 
> RemovalListener{*}.
>  * After executing Drop or Alter DDL on a dataConnector, we should *update 
> cache* to clean the dataConnector to avoid using the invalid dataConnector 
> next time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27503) Support query iceberg tag

2023-07-13 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-27503:
--
Description: 
Support query iceberg tag like this: 
{code:java}
select * from db.tbl.tag_tagName;{code}
 

In addition, Iceberg tag can not be written data and we should throw exception 
in compile stage if users want to write data to iceberg tag. 

  was:
Support query iceberg tag like this: 
{code:java}
select * from db.tbl.tag_tagName;{code}
 

In addition, Iceberg tag can not be written data and we should throw exception 
when compile stage if users want to write data to iceberg tag. 


> Support query iceberg tag
> -
>
> Key: HIVE-27503
> URL: https://issues.apache.org/jira/browse/HIVE-27503
> Project: Hive
>  Issue Type: Sub-task
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>
> Support query iceberg tag like this: 
> {code:java}
> select * from db.tbl.tag_tagName;{code}
>  
> In addition, Iceberg tag can not be written data and we should throw 
> exception in compile stage if users want to write data to iceberg tag. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27503) Support query iceberg tag

2023-07-13 Thread zhangbutao (Jira)

zhangbutao created HIVE-27503:
-

 Summary: Support query iceberg tag
 Key: HIVE-27503
 URL: https://issues.apache.org/jira/browse/HIVE-27503
 Project: Hive
  Issue Type: Sub-task
  Components: Iceberg integration
Reporter: zhangbutao


Support query iceberg tag like this: 
{code:java}
select * from db.tbl.tag_tagName;{code}
 

In addition, Iceberg tag can not be written data and we should throw exception 
when compile stage if users want to write data to iceberg tag. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-27503) Support query iceberg tag

2023-07-13 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao reassigned HIVE-27503:
-

Assignee: zhangbutao

> Support query iceberg tag
> -
>
> Key: HIVE-27503
> URL: https://issues.apache.org/jira/browse/HIVE-27503
> Project: Hive
>  Issue Type: Sub-task
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>
> Support query iceberg tag like this: 
> {code:java}
> select * from db.tbl.tag_tagName;{code}
>  
> In addition, Iceberg tag can not be written data and we should throw 
> exception when compile stage if users want to write data to iceberg tag. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27440) Improve data connector cache

2023-06-14 Thread zhangbutao (Jira)

zhangbutao created HIVE-27440:
-

 Summary: Improve data connector cache
 Key: HIVE-27440
 URL: https://issues.apache.org/jira/browse/HIVE-27440
 Project: Hive
  Issue Type: Sub-task
Reporter: zhangbutao


_*DataConnectorProviderFactory*_ uses HashMap to cache data connector 
instances, and there is no way to invalidate the cache unless you restart the 
MetaStore.  
What is more serious is that if you drop or alter the dataconnector, the cache 
will not change, and you maybe use a invalid dataconnector next time.

 

I think we can improve the dataconnector cache from the two aspects:
 * Use Caffeine with a *maxmumsize* e.g. 100  to cache data connector instead 
of HashMap, and set a *expire time* after the last accessing. And we also 
should close the underlying  datasource connection using {*}Caffeine 
RemovalListener{*}.
 * After executing Drop or Alter DDL on a dataConnector, we should *update 
cache* to clean the dataConnector to avoid using the invalid dataConnector next 
time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-27440) Improve data connector cache

2023-06-14 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao reassigned HIVE-27440:
-

Assignee: zhangbutao

> Improve data connector cache
> 
>
> Key: HIVE-27440
> URL: https://issues.apache.org/jira/browse/HIVE-27440
> Project: Hive
>  Issue Type: Sub-task
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>
> _*DataConnectorProviderFactory*_ uses HashMap to cache data connector 
> instances, and there is no way to invalidate the cache unless you restart the 
> MetaStore.  
> What is more serious is that if you drop or alter the dataconnector, the 
> cache will not change, and you maybe use a invalid dataconnector next time.
>  
> I think we can improve the dataconnector cache from the two aspects:
>  * Use Caffeine with a *maxmumsize* e.g. 100  to cache data connector instead 
> of HashMap, and set a *expire time* after the last accessing. And we also 
> should close the underlying  datasource connection using {*}Caffeine 
> RemovalListener{*}.
>  * After executing Drop or Alter DDL on a dataConnector, we should *update 
> cache* to clean the dataConnector to avoid using the invalid dataConnector 
> next time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27435) Iceberg: Add cache for Hive::createStorageHandler to avoid creating storage handler frequently

2023-06-12 Thread zhangbutao (Jira)

zhangbutao created HIVE-27435:
-

 Summary: Iceberg:  Add cache for Hive::createStorageHandler to 
avoid creating storage handler frequently
 Key: HIVE-27435
 URL: https://issues.apache.org/jira/browse/HIVE-27435
 Project: Hive
  Issue Type: Improvement
  Components: Iceberg integration
Reporter: zhangbutao


[https://github.com/apache/hive/pull/4372/files#r1222816743]

Create or Alter iceberg table will invoke method _*Hive::createStorageHandler*_ 
multiple times. We may should consider how to add cache to avoid this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27431) Clean invalid properties in test moduel

2023-06-12 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-27431:
--
Summary: Clean invalid properties in test moduel  (was: Clean invalid 
property in test moduel)

> Clean invalid properties in test moduel
> ---
>
> Key: HIVE-27431
> URL: https://issues.apache.org/jira/browse/HIVE-27431
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Minor
>
> In *data/conf* module,  *hive-site.xml* is used to qtest&test. It keeps many 
> invalid properties, and if you run test in IDE, you will see lots lof WARN: 
> {code:java}
> 2023-06-12T01:28:18,074  WARN [main] conf.HiveConf: HiveConf of name 
> hive.mapjoin.max.gc.time.percentage does not exist
> 2023-06-12T01:28:18,074  WARN [main] conf.HiveConf: HiveConf of name 
> hive.llap.io.cache.orc.size does not exist
> 2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
> hive.dummyparam.test.server.specific.config.override does not exist
> 2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
> hive.metastore.metadb.dir does not exist
> 2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
> hive.llap.io.cache.orc.alloc.min does not exist
> 2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
> hive.dummyparam.test.server.specific.config.hivesite does not exist
> 2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
> hive.llap.io.cache.orc.alloc.max does not exist
> 2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
> hive.metastore.client.cache.maxSize does not exist
> 2023-06-12T01:28:18,076  WARN [main] conf.HiveConf: HiveConf of name 
> hive.dummyparam.test.server.specific.config.metastoresite does not exist
> 2023-06-12T01:28:18,076  WARN [main] conf.HiveConf: HiveConf of name 
> hive.metastore.client.cache.recordStats does not exist
> 2023-06-12T01:28:18,076  WARN [main] conf.HiveConf: HiveConf of name 
> hive.llap.io.cache.orc.arena.size does not exist
> 2023-06-12T01:28:18,076  WARN [main] conf.HiveConf: HiveConf of name 
> hive.stats.key.prefix.reserve.length does not exist {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27431) Clean invalid property in test moduel

2023-06-12 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-27431:
--
Description: 
In *data/conf* module,  *hive-site.xml* is used to qtest&test. It keeps many 
invalid properties, and if you run test in IDE, you will see lots lof WARN: 
{code:java}
2023-06-12T01:28:18,074  WARN [main] conf.HiveConf: HiveConf of name 
hive.mapjoin.max.gc.time.percentage does not exist
2023-06-12T01:28:18,074  WARN [main] conf.HiveConf: HiveConf of name 
hive.llap.io.cache.orc.size does not exist
2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
hive.dummyparam.test.server.specific.config.override does not exist
2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
hive.metastore.metadb.dir does not exist
2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
hive.llap.io.cache.orc.alloc.min does not exist
2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
hive.dummyparam.test.server.specific.config.hivesite does not exist
2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
hive.llap.io.cache.orc.alloc.max does not exist
2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
hive.metastore.client.cache.maxSize does not exist
2023-06-12T01:28:18,076  WARN [main] conf.HiveConf: HiveConf of name 
hive.dummyparam.test.server.specific.config.metastoresite does not exist
2023-06-12T01:28:18,076  WARN [main] conf.HiveConf: HiveConf of name 
hive.metastore.client.cache.recordStats does not exist
2023-06-12T01:28:18,076  WARN [main] conf.HiveConf: HiveConf of name 
hive.llap.io.cache.orc.arena.size does not exist
2023-06-12T01:28:18,076  WARN [main] conf.HiveConf: HiveConf of name 
hive.stats.key.prefix.reserve.length does not exist {code}

  was:
{code:java}
2023-06-12T01:28:18,074  WARN [main] conf.HiveConf: HiveConf of name 
hive.mapjoin.max.gc.time.percentage does not exist
2023-06-12T01:28:18,074  WARN [main] conf.HiveConf: HiveConf of name 
hive.llap.io.cache.orc.size does not exist
2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
hive.dummyparam.test.server.specific.config.override does not exist
2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
hive.metastore.metadb.dir does not exist
2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
hive.llap.io.cache.orc.alloc.min does not exist
2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
hive.dummyparam.test.server.specific.config.hivesite does not exist
2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
hive.llap.io.cache.orc.alloc.max does not exist
2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
hive.metastore.client.cache.maxSize does not exist
2023-06-12T01:28:18,076  WARN [main] conf.HiveConf: HiveConf of name 
hive.dummyparam.test.server.specific.config.metastoresite does not exist
2023-06-12T01:28:18,076  WARN [main] conf.HiveConf: HiveConf of name 
hive.metastore.client.cache.recordStats does not exist
2023-06-12T01:28:18,076  WARN [main] conf.HiveConf: HiveConf of name 
hive.llap.io.cache.orc.arena.size does not exist
2023-06-12T01:28:18,076  WARN [main] conf.HiveConf: HiveConf of name 
hive.stats.key.prefix.reserve.length does not exist {code}


> Clean invalid property in test moduel
> -
>
> Key: HIVE-27431
> URL: https://issues.apache.org/jira/browse/HIVE-27431
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Minor
>
> In *data/conf* module,  *hive-site.xml* is used to qtest&test. It keeps many 
> invalid properties, and if you run test in IDE, you will see lots lof WARN: 
> {code:java}
> 2023-06-12T01:28:18,074  WARN [main] conf.HiveConf: HiveConf of name 
> hive.mapjoin.max.gc.time.percentage does not exist
> 2023-06-12T01:28:18,074  WARN [main] conf.HiveConf: HiveConf of name 
> hive.llap.io.cache.orc.size does not exist
> 2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
> hive.dummyparam.test.server.specific.config.override does not exist
> 2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
> hive.metastore.metadb.dir does not exist
> 2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
> hive.llap.io.cache.orc.alloc.min does not exist
> 2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
> hive.dummyparam.test.server.specific.config.hivesite does not exist
> 2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
> hive.llap.io.cache.orc.alloc.max does not exist
> 2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
> hive.metastore.client.cache.maxSize does not exist
> 2023-06-12T01:28:18,076  WARN [main] conf.HiveConf: HiveC

[jira] [Created] (HIVE-27431) Clean invalid property in test moduel

2023-06-12 Thread zhangbutao (Jira)

zhangbutao created HIVE-27431:
-

 Summary: Clean invalid property in test moduel
 Key: HIVE-27431
 URL: https://issues.apache.org/jira/browse/HIVE-27431
 Project: Hive
  Issue Type: Test
  Components: Test
Reporter: zhangbutao


{code:java}
2023-06-12T01:28:18,074  WARN [main] conf.HiveConf: HiveConf of name 
hive.mapjoin.max.gc.time.percentage does not exist
2023-06-12T01:28:18,074  WARN [main] conf.HiveConf: HiveConf of name 
hive.llap.io.cache.orc.size does not exist
2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
hive.dummyparam.test.server.specific.config.override does not exist
2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
hive.metastore.metadb.dir does not exist
2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
hive.llap.io.cache.orc.alloc.min does not exist
2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
hive.dummyparam.test.server.specific.config.hivesite does not exist
2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
hive.llap.io.cache.orc.alloc.max does not exist
2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
hive.metastore.client.cache.maxSize does not exist
2023-06-12T01:28:18,076  WARN [main] conf.HiveConf: HiveConf of name 
hive.dummyparam.test.server.specific.config.metastoresite does not exist
2023-06-12T01:28:18,076  WARN [main] conf.HiveConf: HiveConf of name 
hive.metastore.client.cache.recordStats does not exist
2023-06-12T01:28:18,076  WARN [main] conf.HiveConf: HiveConf of name 
hive.llap.io.cache.orc.arena.size does not exist
2023-06-12T01:28:18,076  WARN [main] conf.HiveConf: HiveConf of name 
hive.stats.key.prefix.reserve.length does not exist {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-27431) Clean invalid property in test moduel

2023-06-12 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao reassigned HIVE-27431:
-

Assignee: zhangbutao

> Clean invalid property in test moduel
> -
>
> Key: HIVE-27431
> URL: https://issues.apache.org/jira/browse/HIVE-27431
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Minor
>
> {code:java}
> 2023-06-12T01:28:18,074  WARN [main] conf.HiveConf: HiveConf of name 
> hive.mapjoin.max.gc.time.percentage does not exist
> 2023-06-12T01:28:18,074  WARN [main] conf.HiveConf: HiveConf of name 
> hive.llap.io.cache.orc.size does not exist
> 2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
> hive.dummyparam.test.server.specific.config.override does not exist
> 2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
> hive.metastore.metadb.dir does not exist
> 2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
> hive.llap.io.cache.orc.alloc.min does not exist
> 2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
> hive.dummyparam.test.server.specific.config.hivesite does not exist
> 2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
> hive.llap.io.cache.orc.alloc.max does not exist
> 2023-06-12T01:28:18,075  WARN [main] conf.HiveConf: HiveConf of name 
> hive.metastore.client.cache.maxSize does not exist
> 2023-06-12T01:28:18,076  WARN [main] conf.HiveConf: HiveConf of name 
> hive.dummyparam.test.server.specific.config.metastoresite does not exist
> 2023-06-12T01:28:18,076  WARN [main] conf.HiveConf: HiveConf of name 
> hive.metastore.client.cache.recordStats does not exist
> 2023-06-12T01:28:18,076  WARN [main] conf.HiveConf: HiveConf of name 
> hive.llap.io.cache.orc.arena.size does not exist
> 2023-06-12T01:28:18,076  WARN [main] conf.HiveConf: HiveConf of name 
> hive.stats.key.prefix.reserve.length does not exist {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-27429) Disable flaky test TestCompactionMetrics#testCleanerFailuresCountedCorrectly

2023-06-11 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao reassigned HIVE-27429:
-

Assignee: zhangbutao

> Disable flaky test TestCompactionMetrics#testCleanerFailuresCountedCorrectly
> 
>
> Key: HIVE-27429
> URL: https://issues.apache.org/jira/browse/HIVE-27429
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>
> {code:java}
> mvn test -Dtest=TestCompactionMetrics#testCleanerFailuresCountedCorrectly  
> -pl ql/{code}
> [http://ci.hive.apache.org/job/hive-flaky-check/697/testReport/]
>  
> I also have found several PR  integration tests failed with the test 
> _*TestCompactionMetrics#testCleanerFailuresCountedCorrectly*_
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4404/1/tests/]
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4402/4/tests/]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27429) Disable flaky test TestCompactionMetrics#testCleanerFailuresCountedCorrectly

2023-06-11 Thread zhangbutao (Jira)

zhangbutao created HIVE-27429:
-

 Summary: Disable flaky test 
TestCompactionMetrics#testCleanerFailuresCountedCorrectly
 Key: HIVE-27429
 URL: https://issues.apache.org/jira/browse/HIVE-27429
 Project: Hive
  Issue Type: Test
  Components: Test
Reporter: zhangbutao


{code:java}
mvn test -Dtest=TestCompactionMetrics#testCleanerFailuresCountedCorrectly  -pl 
ql/{code}
[http://ci.hive.apache.org/job/hive-flaky-check/697/testReport/]

 

I also have found several PR  integration tests failed with the test 
_*TestCompactionMetrics#testCleanerFailuresCountedCorrectly*_

[http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4404/1/tests/]

[http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4402/4/tests/]

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-27360) Iceberg: Don't create the redundant MANAGED location when creating table without EXTERNAL keyword

2023-06-10 Thread zhangbutao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731305#comment-17731305
 ] 

zhangbutao commented on HIVE-27360:
---

Finally, i  think we can get a agreement about creating iceberg table:

If we create a iceberg table neither has EXTERNAL keyword nor is specified a 
location explicity. We should make a check as follows:
 # Check the value of 
{*}_MetastoreConf.ConfVars.METASTORE_METADATA_TRANSFORMER_CLASS_{*}, and if it 
is set a valid value, we let it go its own way to determine the tables' type 
and location.
 # if *_MetastoreConf.ConfVars.METASTORE_METADATA_TRANSFORMER_CLASS_* is not be 
set a valid value, we shoud make sure the table is EXTERNAL type and the 
location is on EXTERNAL warehouse, but the *purge flag* should be set to true.

> Iceberg: Don't create the redundant MANAGED location when creating table 
> without EXTERNAL keyword
> -
>
> Key: HIVE-27360
> URL: https://issues.apache.org/jira/browse/HIVE-27360
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
>
> If you create a managed iceberg table without specifying the location and the 
> database has both location and managed_location, the final iceberg table 
> location will be on database location instead of managed_location. But you 
> can see a the database managed_location also has a iceberg table subdirectory 
> which is always here even if the table was dropped.
> We should ensure the managed iceberg table always on database 
> managed_location in case of database managed_location existing. The direct 
> and  simple way is we can use the created hms table location before 
> committing iceberg table to avoid creating a new iceberg location.
>  
> Step to repro:
> 1. set location and managed location properties:
>  
> {code:java}
> set hive.metastore.warehouse.dir=/user/hive/warehouse/hiveicetest;
> set hive.metastore.warehouse.external.dir= 
> /user/hive/warehouse/external/hiveicetest;
> set metastore.metadata.transformer.class=' ';  //disable metastore 
> transformer, this conf only can be set in metasetore server side{code}
> 2. create a database with default location and managed_location:
>  
> {code:java}
> create database testdb;{code}
>  
> {code:java}
> desc database testdb;{code}
>  
> {code:java}
> +--+--+++-+-+-++
> | db_name  | comment  |                      location                      |  
>                 managedlocation                   | owner_name  | owner_type  
> | connector_name  | remote_dbname  |
> +--+--+++-+-+-++
> | testdb   |          | 
> hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db | 
> hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db | hive        | USER      
>   |                   |
> +--+--+++-+-+-++
>  {code}
>  
>  
> 3. create a managed iceberg table without specifing the table location:
>  
> {code:java}
> // the table location will on: 
> hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01
> create table ice01 (id int) Stored by Iceberg stored as ORC;{code}
> but here you will find the two created location:
>  
> {code:java}
> hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01   //the 
> actual location which is used by the managed iceberg table
> hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db/ice01// a 
> empty managed location which is unused
> {code}
>  
> 4. drop the icebeg table
> you will find this unused managed location is still there:
> {code:java}
> hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db{code}
>  
>  
> We should use the created managed location to avoid creating a new iceberg 
> location.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27360) Iceberg: Don't create the redundant MANAGED location when creating table without EXTERNAL keyword

2023-06-10 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-27360:
--
Summary: Iceberg: Don't create the redundant MANAGED location when creating 
table without EXTERNAL keyword  (was: Iceberg: Don't create a new iceberg 
location if hms table already has a default location )

> Iceberg: Don't create the redundant MANAGED location when creating table 
> without EXTERNAL keyword
> -
>
> Key: HIVE-27360
> URL: https://issues.apache.org/jira/browse/HIVE-27360
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
>
> If you create a managed iceberg table without specifying the location and the 
> database has both location and managed_location, the final iceberg table 
> location will be on database location instead of managed_location. But you 
> can see a the database managed_location also has a iceberg table subdirectory 
> which is always here even if the table was dropped.
> We should ensure the managed iceberg table always on database 
> managed_location in case of database managed_location existing. The direct 
> and  simple way is we can use the created hms table location before 
> committing iceberg table to avoid creating a new iceberg location.
>  
> Step to repro:
> 1. set location and managed location properties:
>  
> {code:java}
> set hive.metastore.warehouse.dir=/user/hive/warehouse/hiveicetest;
> set hive.metastore.warehouse.external.dir= 
> /user/hive/warehouse/external/hiveicetest;
> set metastore.metadata.transformer.class=' ';  //disable metastore 
> transformer, this conf only can be set in metasetore server side{code}
> 2. create a database with default location and managed_location:
>  
> {code:java}
> create database testdb;{code}
>  
> {code:java}
> desc database testdb;{code}
>  
> {code:java}
> +--+--+++-+-+-++
> | db_name  | comment  |                      location                      |  
>                 managedlocation                   | owner_name  | owner_type  
> | connector_name  | remote_dbname  |
> +--+--+++-+-+-++
> | testdb   |          | 
> hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db | 
> hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db | hive        | USER      
>   |                   |
> +--+--+++-+-+-++
>  {code}
>  
>  
> 3. create a managed iceberg table without specifing the table location:
>  
> {code:java}
> // the table location will on: 
> hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01
> create table ice01 (id int) Stored by Iceberg stored as ORC;{code}
> but here you will find the two created location:
>  
> {code:java}
> hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01   //the 
> actual location which is used by the managed iceberg table
> hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db/ice01// a 
> empty managed location which is unused
> {code}
>  
> 4. drop the icebeg table
> you will find this unused managed location is still there:
> {code:java}
> hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db{code}
>  
>  
> We should use the created managed location to avoid creating a new iceberg 
> location.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-27418) UNION ALL + ORDER BY ordinal works incorrectly for all const queries

2023-06-09 Thread zhangbutao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730898#comment-17730898
 ] 

zhangbutao commented on HIVE-27418:
---

Hi [~csringhofer] , could you provide more info about your hive cluster env? 
e.g. Hive & Hadoop &Tez version.

And what execution engine did you use for the test? Tez? or MR?

> UNION ALL + ORDER BY ordinal works incorrectly for all const queries
> 
>
> Key: HIVE-27418
> URL: https://issues.apache.org/jira/browse/HIVE-27418
> Project: Hive
>  Issue Type: Bug
>Reporter: Csaba Ringhofer
>Priority: Major
>
> For the following query I get results in wrong order:
> SELECT '1', 'b' UNION ALL SELECT '2', 'a'  ORDER BY 2;
> +--+--+
> | _c0  | _c1  |
> +--+--+
> | 1| b|
> | 2| a|
> +--+--+
> I get correct results if:
> - the column has an alias
> - the same rows come from tables
> - the UNION ALL part of the query is in a sub-query and ORDER BY is run on 
> the sub*query
>  Checked with postgres and Apache Impala and they apply ORDER BY correctly.
> (also noted the the ordinal after ORDER BY is not checked, so it could be 20 
> and Hive doesn't complain)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27409) Iceberg: table with EXTERNAL type can not use statistics to optimize the query

2023-06-04 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-27409:
--
Description: 
We have supported iceberg statistics recently. e.g. _HIVE-24928_  and  
{_}HIVE-27158{_}. And we can use iceberg stats to optimize some queries like 
{_}HIVE-27347{_}.

However, in current hive codebase, we prohibit using EXTERNAL table stats and 
this change was introduced by HIVE-11266. And HIVE-19329 also disabled some 
optimizations for EXTERNAL table whether it is iceberg or not.  Therefore, The 
EXTERNAL type iceberg table can not use stats to optimize query.

 

In {_}HIVE-24928{_},  we have added method 
*_HiveStorageHandler::canProvideBasicStatistics()_*  to indicate iceberg can 
have the ability to provide stats. That is to say, Although Iceberg table is 
regard as EXTERNAL table in Hive, it can provide details statistics.

 

Therefore, here i suggest we should check both table type and boolean result of 
*_HiveStorageHandler::canProvideBasicStatistics()_* to determine if the table 
can use stats.

  was:
We have supported iceberg statistics recently. e.g. _HIVE-24928_  and  
{_}HIVE-27158{_}. And we can use iceberg stats to optimize some queries like 
{_}HIVE-27347{_}.

However, in current hive codebase, we prohibit using EXTERNAL table stats and 
this change was introduced by HIVE-11266. Therefore, The EXTERNAL type iceberg 
table can not use stats to optimize query.

 

In {_}HIVE-24928{_},  we have added method 
*_HiveStorageHandler::canProvideBasicStatistics()_*  to indicate iceberg can 
have the ability to provide stats. That is to say, Although Iceberg table is 
regard as EXTERNAL table in Hive, it can provide details statistics.

 

Therefore, here i suggest we should check both table type and boolean result of 
*_HiveStorageHandler::canProvideBasicStatistics()_* to determine if the table 
can use stats.


> Iceberg: table with EXTERNAL type can not use statistics to optimize the query
> --
>
> Key: HIVE-27409
> URL: https://issues.apache.org/jira/browse/HIVE-27409
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Minor
>  Labels: pull-request-available
>
> We have supported iceberg statistics recently. e.g. _HIVE-24928_  and  
> {_}HIVE-27158{_}. And we can use iceberg stats to optimize some queries like 
> {_}HIVE-27347{_}.
> However, in current hive codebase, we prohibit using EXTERNAL table stats and 
> this change was introduced by HIVE-11266. And HIVE-19329 also disabled some 
> optimizations for EXTERNAL table whether it is iceberg or not.  Therefore, 
> The EXTERNAL type iceberg table can not use stats to optimize query.
>  
> In {_}HIVE-24928{_},  we have added method 
> *_HiveStorageHandler::canProvideBasicStatistics()_*  to indicate iceberg can 
> have the ability to provide stats. That is to say, Although Iceberg table is 
> regard as EXTERNAL table in Hive, it can provide details statistics.
>  
> Therefore, here i suggest we should check both table type and boolean result 
> of *_HiveStorageHandler::canProvideBasicStatistics()_* to determine if the 
> table can use stats.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27409) Iceberg: table with EXTERNAL type can not use statistics to optimize the query

2023-06-04 Thread zhangbutao (Jira)

zhangbutao created HIVE-27409:
-

 Summary: Iceberg: table with EXTERNAL type can not use statistics 
to optimize the query
 Key: HIVE-27409
 URL: https://issues.apache.org/jira/browse/HIVE-27409
 Project: Hive
  Issue Type: Improvement
  Components: Iceberg integration
Reporter: zhangbutao


We have supported iceberg statistics recently. e.g. _HIVE-24928_  and  
{_}HIVE-27158{_}. And we can use iceberg stats to optimize some queries like 
{_}HIVE-27347{_}.

However, in current hive codebase, we prohibit using EXTERNAL table stats and 
this change was introduced by HIVE-11266. Therefore, The EXTERNAL type iceberg 
table can not use stats to optimize query.

 

In {_}HIVE-24928{_},  we have added method 
*_HiveStorageHandler::canProvideBasicStatistics()_*  to indicate iceberg can 
have the ability to provide stats. That is to say, Although Iceberg table is 
regard as EXTERNAL table in Hive, it can provide details statistics.

 

Therefore, here i suggest we should check both table type and boolean result of 
*_HiveStorageHandler::canProvideBasicStatistics()_* to determine if the table 
can use stats.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-27409) Iceberg: table with EXTERNAL type can not use statistics to optimize the query

2023-06-04 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao reassigned HIVE-27409:
-

Assignee: zhangbutao

> Iceberg: table with EXTERNAL type can not use statistics to optimize the query
> --
>
> Key: HIVE-27409
> URL: https://issues.apache.org/jira/browse/HIVE-27409
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Minor
>
> We have supported iceberg statistics recently. e.g. _HIVE-24928_  and  
> {_}HIVE-27158{_}. And we can use iceberg stats to optimize some queries like 
> {_}HIVE-27347{_}.
> However, in current hive codebase, we prohibit using EXTERNAL table stats and 
> this change was introduced by HIVE-11266. Therefore, The EXTERNAL type 
> iceberg table can not use stats to optimize query.
>  
> In {_}HIVE-24928{_},  we have added method 
> *_HiveStorageHandler::canProvideBasicStatistics()_*  to indicate iceberg can 
> have the ability to provide stats. That is to say, Although Iceberg table is 
> regard as EXTERNAL table in Hive, it can provide details statistics.
>  
> Therefore, here i suggest we should check both table type and boolean result 
> of *_HiveStorageHandler::canProvideBasicStatistics()_* to determine if the 
> table can use stats.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27360) Iceberg: Don't create a new iceberg location if hms table already has a default location

2023-05-22 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-27360:
--
Description: 
If you create a managed iceberg table without specifying the location and the 
database has both location and managed_location, the final iceberg table 
location will be on database location instead of managed_location. But you can 
see a the database managed_location also has a iceberg table subdirectory which 
is always here even if the table was dropped.

We should ensure the managed iceberg table always on database managed_location 
in case of database managed_location existing. The direct and  simple way is we 
can use the created hms table location before committing iceberg table to avoid 
creating a new iceberg location.

 

Step to repro:

1. set location and managed location properties:

 
{code:java}
set hive.metastore.warehouse.dir=/user/hive/warehouse/hiveicetest;
set hive.metastore.warehouse.external.dir= 
/user/hive/warehouse/external/hiveicetest;
set metastore.metadata.transformer.class=' ';  //disable metastore transformer, 
this conf only can be set in metasetore server side{code}
2. create a database with default location and managed_location:

 
{code:java}
create database testdb;{code}
 
{code:java}
desc database testdb;{code}
 
{code:java}
+--+--+++-+-+-++
| db_name  | comment  |                      location                      |    
              managedlocation                   | owner_name  | owner_type  | 
connector_name  | remote_dbname  |
+--+--+++-+-+-++
| testdb   |          | 
hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db | 
hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db | hive        | USER        
|                   |
+--+--+++-+-+-++
 {code}
 

 

3. create a managed iceberg table without specifing the table location:

 
{code:java}
// the table location will on: 
hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01
create table ice01 (id int) Stored by Iceberg stored as ORC;{code}
but here you will find the two created location:

 
{code:java}
hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01   //the 
actual location which is used by the managed iceberg table
hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db/ice01// a empty 
managed location which is unused
{code}
 

4. drop the icebeg table

you will find this unused managed location is still there:
{code:java}
hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db{code}
 

 

We should use the created managed location to avoid creating a new iceberg 
location.

 

 

 

  was:
If you create a managed iceberg table without specifying the location and the 
database has both location and managed_location, the final iceberg table 
location will be on database location instead of managed_location. But you can 
see a the database managed_location also has a iceberg table subdirectory which 
is always here even if the table was dropped.

We should ensure the managed iceberg table always on database managed_location 
in case of database managed_location existing. The direct and  simple way is we 
can use the created hms table location before committing iceberg table to avoid 
creating a new iceberg location.

 

Step to repro:

1. set location and managed location properties:

 
{code:java}
set hive.metastore.warehouse.dir=/user/hive/warehouse/hiveicetest;
set hive.metastore.warehouse.external.dir= 
/user/hive/warehouse/external/hiveicetest;
{code}
2. create a database with default location and managed_location:

 
{code:java}
create database testdb;{code}
 
{code:java}
desc database testdb;{code}
 
{code:java}
+--+--+++-+-+-++
| db_name  | comment  |                      location                      |    
              managedlocation                   | owner_name  | owner_type  | 
connector_name  | remote_dbname  |
+--+--+++-+-+-++
| testdb   |          | 
hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db | 
hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db | hive        | USER

[jira] [Assigned] (HIVE-27364) StorageHandler: Skip to create staging directory for non-native table

2023-05-21 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao reassigned HIVE-27364:
-

Assignee: zhangbutao

> StorageHandler: Skip to create staging directory for non-native table
> -
>
> Key: HIVE-27364
> URL: https://issues.apache.org/jira/browse/HIVE-27364
> Project: Hive
>  Issue Type: Improvement
>  Components: StorageHandler
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27364) StorageHandler: Skip to create staging directory for non-native table

2023-05-21 Thread zhangbutao (Jira)

zhangbutao created HIVE-27364:
-

 Summary: StorageHandler: Skip to create staging directory for 
non-native table
 Key: HIVE-27364
 URL: https://issues.apache.org/jira/browse/HIVE-27364
 Project: Hive
  Issue Type: Improvement
  Components: StorageHandler
Reporter: zhangbutao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-27360) Iceberg: Don't create a new iceberg location if hms table already has a default location

2023-05-18 Thread zhangbutao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17724109#comment-17724109
 ] 

zhangbutao commented on HIVE-27360:
---

PR available: https://github.com/apache/hive/pull/4341

> Iceberg: Don't create a new iceberg location if hms table already has a 
> default location 
> -
>
> Key: HIVE-27360
> URL: https://issues.apache.org/jira/browse/HIVE-27360
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>
> If you create a managed iceberg table without specifying the location and the 
> database has both location and managed_location, the final iceberg table 
> location will be on database location instead of managed_location. But you 
> can see a the database managed_location also has a iceberg table subdirectory 
> which is always here even if the table was dropped.
> We should ensure the managed iceberg table always on database 
> managed_location in case of database managed_location existing. The direct 
> and  simple way is we can use the created hms table location before 
> committing iceberg table to avoid creating a new iceberg location.
>  
> Step to repro:
> 1. set location and managed location properties:
>  
> {code:java}
> set hive.metastore.warehouse.dir=/user/hive/warehouse/hiveicetest;
> set hive.metastore.warehouse.external.dir= 
> /user/hive/warehouse/external/hiveicetest;
> {code}
> 2. create a database with default location and managed_location:
>  
> {code:java}
> create database testdb;{code}
>  
> {code:java}
> desc database testdb;{code}
>  
> {code:java}
> +--+--+++-+-+-++
> | db_name  | comment  |                      location                      |  
>                 managedlocation                   | owner_name  | owner_type  
> | connector_name  | remote_dbname  |
> +--+--+++-+-+-++
> | testdb   |          | 
> hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db | 
> hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db | hive        | USER      
>   |                   |
> +--+--+++-+-+-++
>  {code}
>  
>  
> 3. create a managed iceberg table without specifing the table location:
>  
> {code:java}
> // the table location will on: 
> hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01
> create table ice01 (id int) Stored by Iceberg stored as ORC;{code}
> but here you will find the two created location:
>  
> {code:java}
> hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01   //the 
> actual location which is used by the managed iceberg table
> hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db/ice01// a 
> empty managed location which is unused
> {code}
>  
> 4. drop the icebeg table
> you will find this unused managed location is still there:
> {code:java}
> hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db{code}
>  
>  
> We should use the created managed location to avoid creating a new iceberg 
> location.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-27360) Iceberg: Don't create a new iceberg location if hms table already has a default location

2023-05-18 Thread zhangbutao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17724108#comment-17724108
 ] 

zhangbutao commented on HIVE-27360:
---

[~ayushtkn] Thanks for quick commenting! 

In this ticket's description,  the hmsTbl_managed_location is actually created 
automatically  by *_HMSHandler::create_database_core_* based on database 
location&managed_location if table location not specified, and then 
*_HiveIcebergMetaHook::commitCreateTable_* will alter the hms location from the 
created managed location to external location based on 
{*}_HiveCatalog::defaultWarehouseLocation_{*}. 

And so if we drop the table, the initially hmsTbl_managed_location will not be 
deleted and will become a dangling directory.

Note, before Hive4, this is not a problem as database's location only has one.

 

In the PR, i reused the created hmsTbl_managed_location to avoid creating a new 
iceberg location as well as eliminating dangling directory.

 

Do you think we should always keep iceberg table as external table? Imo, user 
usually create external table with keyword *external* like  \{*}'{*}{_}create 
*external* table ice01 (id int) Stored by Iceberg stored as ORC{_}',  and table 
shoud be on managed_location if location is not specified and without keyword 
external.

> Iceberg: Don't create a new iceberg location if hms table already has a 
> default location 
> -
>
> Key: HIVE-27360
> URL: https://issues.apache.org/jira/browse/HIVE-27360
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>
> If you create a managed iceberg table without specifying the location and the 
> database has both location and managed_location, the final iceberg table 
> location will be on database location instead of managed_location. But you 
> can see a the database managed_location also has a iceberg table subdirectory 
> which is always here even if the table was dropped.
> We should ensure the managed iceberg table always on database 
> managed_location in case of database managed_location existing. The direct 
> and  simple way is we can use the created hms table location before 
> committing iceberg table to avoid creating a new iceberg location.
>  
> Step to repro:
> 1. set location and managed location properties:
>  
> {code:java}
> set hive.metastore.warehouse.dir=/user/hive/warehouse/hiveicetest;
> set hive.metastore.warehouse.external.dir= 
> /user/hive/warehouse/external/hiveicetest;
> {code}
> 2. create a database with default location and managed_location:
>  
> {code:java}
> create database testdb;{code}
>  
> {code:java}
> desc database testdb;{code}
>  
> {code:java}
> +--+--+++-+-+-++
> | db_name  | comment  |                      location                      |  
>                 managedlocation                   | owner_name  | owner_type  
> | connector_name  | remote_dbname  |
> +--+--+++-+-+-++
> | testdb   |          | 
> hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db | 
> hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db | hive        | USER      
>   |                   |
> +--+--+++-+-+-++
>  {code}
>  
>  
> 3. create a managed iceberg table without specifing the table location:
>  
> {code:java}
> // the table location will on: 
> hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01
> create table ice01 (id int) Stored by Iceberg stored as ORC;{code}
> but here you will find the two created location:
>  
> {code:java}
> hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01   //the 
> actual location which is used by the managed iceberg table
> hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db/ice01// a 
> empty managed location which is unused
> {code}
>  
> 4. drop the icebeg table
> you will find this unused managed location is still there:
> {code:java}
> hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db{code}
>  
>  
> We should use the created managed location to avoid creating a new iceberg 
> location.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27360) Iceberg: Don't create a new iceberg location if hms table already has a default location

2023-05-18 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-27360:
--
Description: 
If you create a managed iceberg table without specifying the location and the 
database has both location and managed_location, the final iceberg table 
location will be on database location instead of managed_location. But you can 
see a the database managed_location also has a iceberg table subdirectory which 
is always here even if the table was dropped.

We should ensure the managed iceberg table always on database managed_location 
in case of database managed_location existing. The direct and  simple way is we 
can use the created hms table location before committing iceberg table to avoid 
creating a new iceberg location.

 

Step to repro:

1. set location and managed location properties:

 
{code:java}
set hive.metastore.warehouse.dir=/user/hive/warehouse/hiveicetest;
set hive.metastore.warehouse.external.dir= 
/user/hive/warehouse/external/hiveicetest;
{code}
2. create a database with default location and managed_location:

 
{code:java}
create database testdb;{code}
 
{code:java}
desc database testdb;{code}
 
{code:java}
+--+--+++-+-+-++
| db_name  | comment  |                      location                      |    
              managedlocation                   | owner_name  | owner_type  | 
connector_name  | remote_dbname  |
+--+--+++-+-+-++
| testdb   |          | 
hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db | 
hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db | hive        | USER        
|                   |
+--+--+++-+-+-++
 {code}
 

 

3. create a managed iceberg table without specifing the table location:

 
{code:java}
// the table location will on: 
hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01
create table ice01 (id int) Stored by Iceberg stored as ORC;{code}
but here you will find the two created location:

 
{code:java}
hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01   //the 
actual location which is used by the managed iceberg table
hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db/ice01// a empty 
managed location which is unused
{code}
 

4. drop the icebeg table

you will find this unused managed location is still there:
{code:java}
hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db{code}
 

 

We should use the created managed location to avoid creating a new iceberg 
location.

 

 

 

  was:
If you create a managed iceberg table without specifying the location and the 
database has both location and managed_location, the final iceberg table 
location will be on database location instead of managed_location. But you can 
see a the database managed_location also has a iceberg table subdirectory which 
is always here even if the table was dropped.

We should ensure the managed iceberg table always on database managed_location 
in case of database managed_location existing. The direct and  simple way is we 
can use the created hms table location before committing iceberg table to avoid 
creating a new iceberg location.

 

Step to repro:

1. set location and managed location properties:

 
{code:java}
set hive.metastore.warehouse.dir=/user/hive/warehouse/hiveicetest;
set hive.metastore.warehouse.external.dir= 
/user/hive/warehouse/external/hiveicetest;
{code}
2. create a database with default location and managed_location:

 
{code:java}
create database testdb;{code}
 
{code:java}
desc database testdb;{code}
 
{code:java}
+--+--+++-+-+-++
| db_name  | comment  |                      location                      |    
              managedlocation                   | owner_name  | owner_type  | 
connector_name  | remote_dbname  |
+--+--+++-+-+-++
| testdb   |          | 
hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db | 
hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db | hive        | USER        
|                   |
+--+--++---

[jira] [Updated] (HIVE-27360) Iceberg: Don't create a new iceberg location if hms table already has a default location

2023-05-18 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-27360:
--
Description: 
If you create a managed iceberg table without specifying the location and the 
database has both location and managed_location, the final iceberg table 
location will be on database location instead of managed_location. But you can 
see a the database managed_location also has a iceberg table subdirectory which 
is always here even if the table was dropped.

We should ensure the managed iceberg table always on database managed_location 
in case of database managed_location existing. The direct and  simple way is we 
can use the created hms table location before committing iceberg table to avoid 
creating a new iceberg location.

 

Step to repro:

1. set location and managed location properties:

 
{code:java}
set hive.metastore.warehouse.dir=/user/hive/warehouse/hiveicetest;
set hive.metastore.warehouse.external.dir= 
/user/hive/warehouse/external/hiveicetest;
{code}
2. create a database with default location and managed_location:

 
{code:java}
create database testdb;{code}
 
{code:java}
desc database testdb;{code}
 
{code:java}
+--+--+++-+-+-++
| db_name  | comment  |                      location                      |    
              managedlocation                   | owner_name  | owner_type  | 
connector_name  | remote_dbname  |
+--+--+++-+-+-++
| testdb   |          | 
hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db | 
hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db | hive        | USER        
|                   |
+--+--+++-+-+-++
 {code}
 

 

3. create a managed iceberg table without specifing the table location:

 
{code:java}
// the table location will on: 
hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01
create table ice01 (id int) Stored by Iceberg stored as ORC;{code}
but here you will find the two created location:

 
{code:java}
hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01   //the 
actual location which is used by the managed iceberg table
hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db  // a empty 
managed location which is unused
{code}
 

4. drop the icebeg table

you will find this unused managed location is still there:
{code:java}
hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db{code}
 

 

We should use the created managed location to avoid creating a new iceberg 
location.

 

 

 

  was:
If you create a managed iceberg table without specifying the location and the 
database has both location and managed_location, the final iceberg table 
location will be on database location instead of managed_location. But you can 
see a the database managed_location also has a iceberg table subdirectory which 
is always here even if the table was dropped.

We should ensure the managed iceberg table always on database managed_location 
in case of database managed_location existing. The direct and  simple way is we 
can use the created hms table location before committing iceberg table to avoid 
creating a new iceberg location.

 

Step to repro:

1. set location and managed location properties:

 
{code:java}
set hive.metastore.warehouse.dir=/user/hive/warehouse/hiveicetest;
set hive.metastore.warehouse.external.dir= 
/user/hive/warehouse/external/hiveicetest;
{code}
2. create a database with default location and managed_location:

 
{code:java}
create database testdb;{code}
 
{code:java}
desc database testdb;{code}
 
{code:java}
+--+--+++-+-+-++
| db_name  | comment  |                      location                      |    
              managedlocation                   | owner_name  | owner_type  | 
connector_name  | remote_dbname  |
+--+--+++-+-+-++
| testdb   |          | 
hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db | 
hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db | hive        | USER        
|                   |
+--+--++---

[jira] [Updated] (HIVE-27360) Iceberg: Don't create a new iceberg location if hms table already has a default location

2023-05-18 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-27360:
--
Description: 
If you create a managed iceberg table without specifying the location and the 
database has both location and managed_location, the final iceberg table 
location will be on database location instead of managed_location. But you can 
see a the database managed_location also has a iceberg table subdirectory which 
is always here even if the table was dropped.

We should ensure the managed iceberg table always on database managed_location 
in case of database managed_location existing. The direct and  simple way is we 
can use the created hms table location before committing iceberg table to avoid 
creating a new iceberg location.

 

Step to repro:

1. set location and managed location properties:

 
{code:java}
set hive.metastore.warehouse.dir=/user/hive/warehouse/hiveicetest;
set hive.metastore.warehouse.external.dir= 
/user/hive/warehouse/external/hiveicetest;
{code}
2. create a database with default location and managed_location:

 
{code:java}
create database testdb;{code}
 
{code:java}
desc database testdb;{code}
 
{code:java}
+--+--+++-+-+-++
| db_name  | comment  |                      location                      |    
              managedlocation                   | owner_name  | owner_type  | 
connector_name  | remote_dbname  |
+--+--+++-+-+-++
| testdb   |          | 
hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db | 
hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db | hive        | USER        
|                   |
+--+--+++-+-+-++
 {code}
 

 

3. create a managed iceberg table without specifing the table location:

 
{code:java}
// the table location will on: 
hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01
create table ice01 (id int) Stored by Iceberg stored as ORC;{code}
but here you will find the two created location:

 
{code:java}
hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01   //the 
actual location which is used by the managed iceberg table
hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db  // a empty 
managed location which is unused
{code}
 

4. drop the icebeg table

you will find this unused managed location is still there:
{code:java}
hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db{code}
 

 

 

 

 

  was:
If you create a managed iceberg table without specifying the location and the 
database has both location and managed_location, the final iceberg table 
location will be on database location instead of managed_location. But you can 
see a the database managed_location also has a iceberg table subdirectory which 
is always here even if the table was dropped.

We should ensure the managed iceberg table always on database managed_location 
in case of database managed_location existing. The direct and  simple way is we 
can use the created hms table location before committing iceberg table to avoid 
creating a new iceberg location.


> Iceberg: Don't create a new iceberg location if hms table already has a 
> default location 
> -
>
> Key: HIVE-27360
> URL: https://issues.apache.org/jira/browse/HIVE-27360
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>
> If you create a managed iceberg table without specifying the location and the 
> database has both location and managed_location, the final iceberg table 
> location will be on database location instead of managed_location. But you 
> can see a the database managed_location also has a iceberg table subdirectory 
> which is always here even if the table was dropped.
> We should ensure the managed iceberg table always on database 
> managed_location in case of database managed_location existing. The direct 
> and  simple way is we can use the created hms table location before 
> committing iceberg table to avoid creating a new iceberg location.
>  
> Step to repro:
> 1. set location and managed location properties:
>  
> {code:java}
> set hive.metastore.warehouse.dir=/user/hive/warehouse/hiveicetest;
> set hive.metastore.warehouse.external.dir= 
> /user/hive/w

[jira] [Updated] (HIVE-27360) Iceberg: Don't create a new iceberg location if hms table already has a default location

2023-05-18 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-27360:
--
Description: 
If you create a managed iceberg table without specifying the location and the 
database has both location and managed_location, the final iceberg table 
location will be on database location instead of managed_location. But you can 
see a the database managed_location also has a iceberg table subdirectory which 
is always here even if the table was dropped.

We should ensure the managed iceberg table always on database managed_location 
in case of database managed_location existing. The direct and  simple way is we 
can use the created hms table location before committing iceberg table to avoid 
creating a new iceberg location.

> Iceberg: Don't create a new iceberg location if hms table already has a 
> default location 
> -
>
> Key: HIVE-27360
> URL: https://issues.apache.org/jira/browse/HIVE-27360
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>
> If you create a managed iceberg table without specifying the location and the 
> database has both location and managed_location, the final iceberg table 
> location will be on database location instead of managed_location. But you 
> can see a the database managed_location also has a iceberg table subdirectory 
> which is always here even if the table was dropped.
> We should ensure the managed iceberg table always on database 
> managed_location in case of database managed_location existing. The direct 
> and  simple way is we can use the created hms table location before 
> committing iceberg table to avoid creating a new iceberg location.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-27360) Iceberg: Don't create a new iceberg location if hms table already has a default location

2023-05-18 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao reassigned HIVE-27360:
-

Assignee: zhangbutao

> Iceberg: Don't create a new iceberg location if hms table already has a 
> default location 
> -
>
> Key: HIVE-27360
> URL: https://issues.apache.org/jira/browse/HIVE-27360
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27360) Iceberg: Don't create a new iceberg location if hms table already has a default location

2023-05-18 Thread zhangbutao (Jira)

zhangbutao created HIVE-27360:
-

 Summary: Iceberg: Don't create a new iceberg location if hms table 
already has a default location 
 Key: HIVE-27360
 URL: https://issues.apache.org/jira/browse/HIVE-27360
 Project: Hive
  Issue Type: Improvement
  Components: Iceberg integration
Reporter: zhangbutao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-27317) Temporary (local) session files cleanup improvements

2023-05-04 Thread zhangbutao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17719385#comment-17719385
 ] 

zhangbutao commented on HIVE-27317:
---

Hi [~sercan.tekin] Please create a Github pull request in 
[https://github.com/apache/hive/pulls] ,as patch review has been not used for a 
long time.

 

> Temporary (local) session files cleanup improvements
> 
>
> Key: HIVE-27317
> URL: https://issues.apache.org/jira/browse/HIVE-27317
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sercan Tekin
>Assignee: Sercan Tekin
>Priority: Major
> Attachments: HIVE-27317.patch
>
>
> When Hive session is killed, no chance for shutdown hook to clean-up tmp 
> files.
> There is a Hive service to clean residual files 
> https://issues.apache.org/jira/browse/HIVE-13429, and later on its execution 
> is scheduled inside HS2 https://issues.apache.org/jira/browse/HIVE-15068 to 
> make sure not to leave any temp file behind. But this service cleans up only 
> HDFS temp files, there are still residual files/dirs in 
> *HiveConf.ConfVars.LOCALSCRATCHDIR* location as follows;
> {code:java}
> > ll /tmp/user/97c4ef50-5e80-480e-a6f0-4f779050852b*
> drwx-- 2 user user 4096 Oct 29 10:09 97c4ef50-5e80-480e-a6f0-4f779050852b
> -rw--- 1 user user    0 Oct 29 10:09 
> 97c4ef50-5e80-480e-a6f0-4f779050852b10571819313894728966.pipeout
> -rw--- 1 user user    0 Oct 29 10:09 
> 97c4ef50-5e80-480e-a6f0-4f779050852b16013956055489853961.pipeout
> -rw--- 1 user user    0 Oct 29 10:09 
> 97c4ef50-5e80-480e-a6f0-4f779050852b4383913570068173450.pipeout
> -rw--- 1 user user    0 Oct 29 10:09 
> 97c4ef50-5e80-480e-a6f0-4f779050852b889740171428672108.pipeout {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-27302) Iceberg: Suport write to iceberg branch

2023-04-27 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao reassigned HIVE-27302:
-

Assignee: zhangbutao

> Iceberg: Suport write to iceberg branch
> ---
>
> Key: HIVE-27302
> URL: https://issues.apache.org/jira/browse/HIVE-27302
> Project: Hive
>  Issue Type: Sub-task
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>
> This feature depends on Iceberg1.2.0 interface: 
> [https://github.com/apache/iceberg/pull/5234] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27302) Iceberg: Suport write to iceberg branch

2023-04-27 Thread zhangbutao (Jira)

zhangbutao created HIVE-27302:
-

 Summary: Iceberg: Suport write to iceberg branch
 Key: HIVE-27302
 URL: https://issues.apache.org/jira/browse/HIVE-27302
 Project: Hive
  Issue Type: Sub-task
  Components: Iceberg integration
Reporter: zhangbutao


This feature depends on Iceberg1.2.0 interface: 
[https://github.com/apache/iceberg/pull/5234] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-27273) Iceberg: Upgrade iceberg to 1.2.1

2023-04-22 Thread zhangbutao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715386#comment-17715386
 ] 

zhangbutao commented on HIVE-27273:
---

PR https://github.com/apache/hive/pull/4252

> Iceberg:  Upgrade iceberg to 1.2.1
> --
>
> Key: HIVE-27273
> URL: https://issues.apache.org/jira/browse/HIVE-27273
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>
> [https://iceberg.apache.org/releases/#121-release] Iceberg1.2.1(include 
> 1.2.0) has lots of improvement, e.g. _branch commit_  and 
> _{{position_deletes}} metadata table._



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-27273) Iceberg: Upgrade iceberg to 1.2.1

2023-04-19 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao reassigned HIVE-27273:
-

Assignee: zhangbutao

> Iceberg:  Upgrade iceberg to 1.2.1
> --
>
> Key: HIVE-27273
> URL: https://issues.apache.org/jira/browse/HIVE-27273
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>
> [https://iceberg.apache.org/releases/#121-release] Iceberg1.2.1(include 
> 1.2.0) has lots of improvement, e.g. _branch commit_  and 
> _{{position_deletes}} metadata table._



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27273) Iceberg: Upgrade iceberg to 1.2.1

2023-04-19 Thread zhangbutao (Jira)

zhangbutao created HIVE-27273:
-

 Summary: Iceberg:  Upgrade iceberg to 1.2.1
 Key: HIVE-27273
 URL: https://issues.apache.org/jira/browse/HIVE-27273
 Project: Hive
  Issue Type: Improvement
  Components: Iceberg integration
Reporter: zhangbutao


[https://iceberg.apache.org/releases/#121-release] Iceberg1.2.1(include 1.2.0) 
has lots of improvement, e.g. _branch commit_  and _{{position_deletes}} 
metadata table._



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

1 2 3 4 >

1 - 100 of 337 matches

Mail list logo