[jira] [Comment Edited] (HIVE-27901) Hive's performance for querying the Iceberg table is very poor.

2024-01-11 Thread yongzhi.shao (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805914#comment-17805914
 ] 

yongzhi.shao edited comment on HIVE-27901 at 1/12/24 7:29 AM:
--

[~zhangbutao] ;

Hi. I reset _read.split.target-size=67108864. But it's improved a little bit 
only, Hive still takes more than 6 times as long as Spark to read the same 
table._ 

>From the logs printed by TEZ-CONSOLE, the time taken by Map-Task is quite 
>long, which means that there is a serious problem with the efficiency of HIVE 
>in reading ICEBERG data.


was (Author: lisoda):
[~zhangbutao] ;

Hi. I reset _read.split.target-size=67108864. But it's improved a little bit 
only, Hive still takes more than 6 times as long as Spark to read the same 
table._ 

> Hive's performance for querying the Iceberg table is very poor.
> ---
>
> Key: HIVE-27901
> URL: https://issues.apache.org/jira/browse/HIVE-27901
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Major
> Attachments: image-2023-11-22-18-32-28-344.png, 
> image-2023-11-22-18-33-01-885.png, image-2023-11-22-18-33-32-915.png
>
>
> I am using HIVE4.0.0-BETA for testing.
> BTW,I found that the performance of HIVE reading ICEBERG table is still very 
> slow.
> How should I deal with this problem?
> I count a 7 billion table and compare the performance difference between HIVE 
> reading ICEBERG-ORC and ORC table respectively.
> We use ICEBERG 1.4.2, ICEBERG-ORC with ZSTD compression enabled.
> ORC with SNAPPY compression.
> HADOOP version 3.1.1 (native zstd not supported).
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
> CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
> STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); 
> --inner orc table( set hive default format = orc )
> set hive.default.fileformat=orc;
> set hive.default.fileformat.managed=orc;
> create table if not exists iceberg_dwd.orc_inner_table as select * from 
> iceberg_dwd.b_std_trade;{code}
>  
> !image-2023-11-22-18-32-28-344.png!
> !image-2023-11-22-18-33-01-885.png!
> Also, I have another question. The Submit Plan statistic is clearly 
> incorrect. Is this something that needs to be fixed?
> !image-2023-11-22-18-33-32-915.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27901) Hive's performance for querying the Iceberg table is very poor.

2024-01-11 Thread yongzhi.shao (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805917#comment-17805917
 ] 

yongzhi.shao commented on HIVE-27901:
-

Can you do a one-time validation using the official release 4.0.0-beta1? (using 
hadoop-location_based_table).

Since version 4.0.0 has not yet been released, I'm not sure if this has been 
improved in version 4.0.0.

> Hive's performance for querying the Iceberg table is very poor.
> ---
>
> Key: HIVE-27901
> URL: https://issues.apache.org/jira/browse/HIVE-27901
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Major
> Attachments: image-2023-11-22-18-32-28-344.png, 
> image-2023-11-22-18-33-01-885.png, image-2023-11-22-18-33-32-915.png
>
>
> I am using HIVE4.0.0-BETA for testing.
> BTW,I found that the performance of HIVE reading ICEBERG table is still very 
> slow.
> How should I deal with this problem?
> I count a 7 billion table and compare the performance difference between HIVE 
> reading ICEBERG-ORC and ORC table respectively.
> We use ICEBERG 1.4.2, ICEBERG-ORC with ZSTD compression enabled.
> ORC with SNAPPY compression.
> HADOOP version 3.1.1 (native zstd not supported).
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
> CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
> STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); 
> --inner orc table( set hive default format = orc )
> set hive.default.fileformat=orc;
> set hive.default.fileformat.managed=orc;
> create table if not exists iceberg_dwd.orc_inner_table as select * from 
> iceberg_dwd.b_std_trade;{code}
>  
> !image-2023-11-22-18-32-28-344.png!
> !image-2023-11-22-18-33-01-885.png!
> Also, I have another question. The Submit Plan statistic is clearly 
> incorrect. Is this something that needs to be fixed?
> !image-2023-11-22-18-33-32-915.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-27901) Hive's performance for querying the Iceberg table is very poor.

2024-01-11 Thread yongzhi.shao (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805914#comment-17805914
 ] 

yongzhi.shao edited comment on HIVE-27901 at 1/12/24 7:25 AM:
--

[~zhangbutao] ;

Hi. I reset _read.split.target-size=67108864. But it's improved a little bit 
only, Hive still takes more than 6 times as long as Spark to read the same 
table._ 


was (Author: lisoda):
[~zhangbutao] ;

Hi. I reset _read.split.target-size=67108864. But it's improved a bit, but Hive 
still takes more than 6 times as long as Spark to read the same table._ 

> Hive's performance for querying the Iceberg table is very poor.
> ---
>
> Key: HIVE-27901
> URL: https://issues.apache.org/jira/browse/HIVE-27901
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Major
> Attachments: image-2023-11-22-18-32-28-344.png, 
> image-2023-11-22-18-33-01-885.png, image-2023-11-22-18-33-32-915.png
>
>
> I am using HIVE4.0.0-BETA for testing.
> BTW,I found that the performance of HIVE reading ICEBERG table is still very 
> slow.
> How should I deal with this problem?
> I count a 7 billion table and compare the performance difference between HIVE 
> reading ICEBERG-ORC and ORC table respectively.
> We use ICEBERG 1.4.2, ICEBERG-ORC with ZSTD compression enabled.
> ORC with SNAPPY compression.
> HADOOP version 3.1.1 (native zstd not supported).
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
> CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
> STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); 
> --inner orc table( set hive default format = orc )
> set hive.default.fileformat=orc;
> set hive.default.fileformat.managed=orc;
> create table if not exists iceberg_dwd.orc_inner_table as select * from 
> iceberg_dwd.b_std_trade;{code}
>  
> !image-2023-11-22-18-32-28-344.png!
> !image-2023-11-22-18-33-01-885.png!
> Also, I have another question. The Submit Plan statistic is clearly 
> incorrect. Is this something that needs to be fixed?
> !image-2023-11-22-18-33-32-915.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27901) Hive's performance for querying the Iceberg table is very poor.

2024-01-11 Thread yongzhi.shao (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805914#comment-17805914
 ] 

yongzhi.shao commented on HIVE-27901:
-

[~zhangbutao] ;

Hi. I reset _read.split.target-size=67108864. But it's improved a bit, but Hive 
still takes more than 6 times as long as Spark to read the same table._ 

> Hive's performance for querying the Iceberg table is very poor.
> ---
>
> Key: HIVE-27901
> URL: https://issues.apache.org/jira/browse/HIVE-27901
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Major
> Attachments: image-2023-11-22-18-32-28-344.png, 
> image-2023-11-22-18-33-01-885.png, image-2023-11-22-18-33-32-915.png
>
>
> I am using HIVE4.0.0-BETA for testing.
> BTW,I found that the performance of HIVE reading ICEBERG table is still very 
> slow.
> How should I deal with this problem?
> I count a 7 billion table and compare the performance difference between HIVE 
> reading ICEBERG-ORC and ORC table respectively.
> We use ICEBERG 1.4.2, ICEBERG-ORC with ZSTD compression enabled.
> ORC with SNAPPY compression.
> HADOOP version 3.1.1 (native zstd not supported).
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
> CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
> STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); 
> --inner orc table( set hive default format = orc )
> set hive.default.fileformat=orc;
> set hive.default.fileformat.managed=orc;
> create table if not exists iceberg_dwd.orc_inner_table as select * from 
> iceberg_dwd.b_std_trade;{code}
>  
> !image-2023-11-22-18-32-28-344.png!
> !image-2023-11-22-18-33-01-885.png!
> Also, I have another question. The Submit Plan statistic is clearly 
> incorrect. Is this something that needs to be fixed?
> !image-2023-11-22-18-33-32-915.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27370) SUBSTR UDF return '?' against 4-bytes character

2024-01-11 Thread Ryu Kobayashi (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805902#comment-17805902
 ] 

Ryu Kobayashi commented on HIVE-27370:
--

This issue should still be resolved not yet. I have recreated the PR.

> SUBSTR UDF return '?' against 4-bytes character
> ---
>
> Key: HIVE-27370
> URL: https://issues.apache.org/jira/browse/HIVE-27370
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: All Versions
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Major
>  Labels: pull-request-available
>
> SUBSTR doesn't seem to support 4-byte characters. This also happens in master 
> branch. Also, this does not happen in vectorized mode, so it is a problem 
> specific to non-vectorized mode. An example is below:
> {code:java}
> -- vectorized mode
> create temporary table foo (str string) stored as orc;
> insert into foo values('安佐町大字久地字野𨵱4614番地'), ('あa🤎いiうu');
> SELECT
>   SUBSTR(str, 1, 10) as a1,
>   SUBSTR(str, 10, 3) as a2,
>   SUBSTR(str, -7) as a3,
>   substr(str, 1, 3) as b1,
>   substr(str, 3) as b2,
>   substr(str, -5) as b3
> from foo
> ;
> 安佐町大字久地字野𨵱  𨵱4614番地  安佐町       町大字久地字野𨵱4614番地     614番地
> あa🤎             あa🤎いiうu        あa🤎        🤎いiうu    🤎いiうu {code}
> {code:java}
> -- non-vectorized
> SELECT
>   SUBSTR('安佐町大字久地字野𨵱4614番地', 1, 10) as a1,
>   SUBSTR('安佐町大字久地字野𨵱4614番地', 10, 3) as a2,
>   SUBSTR('安佐町大字久地字野𨵱4614番地', -7) as a3,
>   substr('あa🤎いiうu', 1, 3) as b1,
>   substr('あa🤎いiうu', 3) as b2,
>   substr('あa🤎いiうu', -5) as b3
> ; 
> 安佐町大字久地字野?    �4   ?4614番地     あa?   �いiうu    ?いiうu{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27997) Incorrect result for Hive join query with NVL and Map Join

2024-01-11 Thread Mergen (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mergen updated HIVE-27997:
--
Description: 
Hive returns incorrect result if there is NVL() in an ON clause with Map Join 
enabled.

 

STEPS TO REPRODUCE:
{code:java}
Step 1: Create a table test_nvl
create table test_nvl(a string);

Step 2: Insert null and non-null data into table test_nvl
insert into test_nvl values ('x'), ('y'), (null);
select * from test_nvl;
+-+
| test_nvl.a  |
+-+
| x   |
| y   |
| NULL|
+-+

Step 3 : Execute the following query
select x.a, y.a
from test_nvl x
left join test_nvl y
on nvl(x.a, '') = nvl(y.a, '');{code}
 

EXPECTED RESULT:
{code:java}
+---+---+
| x.a   | y.a   |
+---+---+
| x | x |
| y | y |
| NULL  | NULL  |
+---+---+ {code}
 

ACTUAL RESULT:
{code:java}
+---+--+
| x.a   | y.a  |
+---+--+
| x | x|
| y | x|
| NULL  | x|
+---+--+{code}
(Obviously 'y' != 'x' and NULL != 'x' so they should not be in the same line)

 

The query works fine with Map Join disabled:
{code:java}
-- Using Merge Join instead.
set hive.auto.convert.join=false;
select x.a, y.a
from test_nvl x
left join test_nvl y
on nvl(x.a, '') = nvl(y.a, '');
+---+---+
| x.a   | y.a   |
+---+---+
| NULL  | NULL  |
| x | x |
| y | y |
+---+---+ {code}
 

  was:
Hive returns incorrect result if there is NVL() in an ON clause with Map Join 
enabled.

 

STEPS TO REPRODUCE:

 
{code:java}
Step 1: Create a table test_nvl
create table test_nvl(a string);

Step 2: Insert null and non-null data into table test_nvl
insert into test_nvl values ('x'), ('y'), (null);
select * from test_nvl;
+-+
| test_nvl.a  |
+-+
| x   |
| y   |
| NULL|
+-+

Step 3 : Execute the following query
select x.a, y.a
from test_nvl x
left join test_nvl y
on nvl(x.a, '') = nvl(y.a, '');{code}
 

EXPECTED RESULT:
{code:java}
+---+---+
| x.a   | y.a   |
+---+---+
| x | x |
| y | y |
| NULL  | NULL  |
+---+---+ {code}
 

ACTUAL RESULT:
{code:java}
+---+--+
| x.a   | y.a  |
+---+--+
| x | x|
| y | x|
| NULL  | x|
+---+--+{code}
(Obviously 'y' != 'x' and NULL != 'x' so they should not be in the same line)

 

The query works fine with Map Join disabled:

 
{code:java}
-- Using Merge Join instead.
set hive.auto.convert.join=false;
select x.a, y.a
from test_nvl x
left join test_nvl y
on nvl(x.a, '') = nvl(y.a, '');
+---+---+
| x.a   | y.a   |
+---+---+
| NULL  | NULL  |
| x | x |
| y | y |
+---+---+ {code}
 


> Incorrect result for Hive join query with NVL and Map Join
> --
>
> Key: HIVE-27997
> URL: https://issues.apache.org/jira/browse/HIVE-27997
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Affects Versions: 3.1.3
>Reporter: Mergen
>Priority: Major
>
> Hive returns incorrect result if there is NVL() in an ON clause with Map Join 
> enabled.
>  
> STEPS TO REPRODUCE:
> {code:java}
> Step 1: Create a table test_nvl
> create table test_nvl(a string);
> Step 2: Insert null and non-null data into table test_nvl
> insert into test_nvl values ('x'), ('y'), (null);
> select * from test_nvl;
> +-+
> | test_nvl.a  |
> +-+
> | x   |
> | y   |
> | NULL|
> +-+
> Step 3 : Execute the following query
> select x.a, y.a
> from test_nvl x
> left join test_nvl y
> on nvl(x.a, '') = nvl(y.a, '');{code}
>  
> EXPECTED RESULT:
> {code:java}
> +---+---+
> | x.a   | y.a   |
> +---+---+
> | x | x |
> | y | y |
> | NULL  | NULL  |
> +---+---+ {code}
>  
> ACTUAL RESULT:
> {code:java}
> +---+--+
> | x.a   | y.a  |
> +---+--+
> | x | x|
> | y | x|
> | NULL  | x|
> +---+--+{code}
> (Obviously 'y' != 'x' and NULL != 'x' so they should not be in the same line)
>  
> The query works fine with Map Join disabled:
> {code:java}
> -- Using Merge Join instead.
> set hive.auto.convert.join=false;
> select x.a, y.a
> from test_nvl x
> left join test_nvl y
> on nvl(x.a, '') = nvl(y.a, '');
> +---+---+
> | x.a   | y.a   |
> +---+---+
> | NULL  | NULL  |
> | x | x |
> | y | y |
> +---+---+ {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27997) Incorrect result for Hive join query with NVL and Map Join

2024-01-11 Thread Mergen (Jira)
Mergen created HIVE-27997:
-

 Summary: Incorrect result for Hive join query with NVL and Map Join
 Key: HIVE-27997
 URL: https://issues.apache.org/jira/browse/HIVE-27997
 Project: Hive
  Issue Type: Bug
  Components: Operators
Affects Versions: 3.1.3
Reporter: Mergen


Hive returns incorrect result if there is NVL() in an ON clause with Map Join 
enabled.

 

STEPS TO REPRODUCE:

 
{code:java}
Step 1: Create a table test_nvl
create table test_nvl(a string);

Step 2: Insert null and non-null data into table test_nvl
insert into test_nvl values ('x'), ('y'), (null);
select * from test_nvl;
+-+
| test_nvl.a  |
+-+
| x   |
| y   |
| NULL|
+-+

Step 3 : Execute the following query
select x.a, y.a
from test_nvl x
left join test_nvl y
on nvl(x.a, '') = nvl(y.a, '');{code}
 

EXPECTED RESULT:
{code:java}
+---+---+
| x.a   | y.a   |
+---+---+
| x | x |
| y | y |
| NULL  | NULL  |
+---+---+ {code}
 

ACTUAL RESULT:
{code:java}
+---+--+
| x.a   | y.a  |
+---+--+
| x | x|
| y | x|
| NULL  | x|
+---+--+{code}
(Obviously 'y' != 'x' and NULL != 'x' so they should not be in the same line)

 

The query works fine with Map Join disabled:

 
{code:java}
-- Using Merge Join instead.
set hive.auto.convert.join=false;
select x.a, y.a
from test_nvl x
left join test_nvl y
on nvl(x.a, '') = nvl(y.a, '');
+---+---+
| x.a   | y.a   |
+---+---+
| NULL  | NULL  |
| x | x |
| y | y |
+---+---+ {code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27992) Upgrade to tez 0.10.3

2024-01-11 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-27992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805794#comment-17805794
 ] 

László Bodor commented on HIVE-27992:
-

http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4991/8
 looks good (only an unrelated silly error), so the used artifacts are ready to 
be promoted as a tez release candidate 
https://repository.apache.org/content/repositories/orgapachetez-1078/org/apache/tez/

> Upgrade to tez 0.10.3
> -
>
> Key: HIVE-27992
> URL: https://issues.apache.org/jira/browse/HIVE-27992
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27996) Revert HIVE-27406 & HIVE-27481

2024-01-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27996:
--
Labels: pull-request-available  (was: )

> Revert HIVE-27406 & HIVE-27481
> --
>
> Key: HIVE-27996
> URL: https://issues.apache.org/jira/browse/HIVE-27996
> Project: Hive
>  Issue Type: Task
>Reporter: László Végh
>Priority: Major
>  Labels: pull-request-available
>
> Revert HIVE-27406 & HIVE-27481
>  
> The introduced changes were causing DB incompatibility issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27996) Revert HIVE-27406 & HIVE-27481

2024-01-11 Thread Jira
László Végh created HIVE-27996:
--

 Summary: Revert HIVE-27406 & HIVE-27481
 Key: HIVE-27996
 URL: https://issues.apache.org/jira/browse/HIVE-27996
 Project: Hive
  Issue Type: Task
Reporter: László Végh


Revert HIVE-27406 & HIVE-27481

 

The introduced changes were causing DB incompatibility issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27972) Set 'tez' as default value in hive.execution.engine

2024-01-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor reassigned HIVE-27972:
---

Assignee: László Bodor

> Set 'tez' as default value in hive.execution.engine
> ---
>
> Key: HIVE-27972
> URL: https://issues.apache.org/jira/browse/HIVE-27972
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>
> Maybe this is not the first ticket addressing this, please link if it's a 
> duplicate.
> We need to set this to 'tez' to reflect that we have deprecated 'mr':
> https://github.com/apache/hive/blob/bd16e0098916aa5fc2dede99492c6a240b51e677/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L4567
> I'm expecting lots of UT failures because of this, as we're still running 
> those on mr (which might be fine where the actual unit test is not closely 
> related to the execution engine), so we'll see what to do.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27995) Fix inconsistent behavior of LOAD DATA command for partitioned and non-partitioned tables

2024-01-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27995:
--
Labels: pull-request-available  (was: )

> Fix inconsistent behavior of LOAD DATA command for partitioned and 
> non-partitioned tables
> -
>
> Key: HIVE-27995
> URL: https://issues.apache.org/jira/browse/HIVE-27995
> Project: Hive
>  Issue Type: Bug
>Reporter: Shivangi Jha
>Assignee: Shivangi Jha
>Priority: Major
>  Labels: pull-request-available
>
> For partitioned tables, while executing LOAD DATA/ LOAD DATA LOCAL commands, 
> the check for file existence is not executed on HiveServer2, and this in turn 
> throws an error during Runtime once the job is launched. 
> {code:java}
> java.io.FileNotFoundException: File file:/ does not exist.{code}
> Non-partitioned tables do not follow this control flow, and the checks are 
> run appropriately at compile time. 
> Incase the file does not exist, user is presented with the error.
> {code:java}
> Invalid path "file:///: No files matching path 
> file:/{code}
> This is inconsistent and error prone behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27995) Fix inconsistent behavior of LOAD DATA command for partitioned and non-partitioned tables

2024-01-11 Thread Shivangi Jha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivangi Jha updated HIVE-27995:

Summary: Fix inconsistent behavior of LOAD DATA command for partitioned and 
non-partitioned tables  (was: FIx inconsistent behavior of LOAD DATA command 
for partitioned and non-partitioned tables.)

> Fix inconsistent behavior of LOAD DATA command for partitioned and 
> non-partitioned tables
> -
>
> Key: HIVE-27995
> URL: https://issues.apache.org/jira/browse/HIVE-27995
> Project: Hive
>  Issue Type: Bug
>Reporter: Shivangi Jha
>Assignee: Shivangi Jha
>Priority: Major
>
> For partitioned tables, while executing LOAD DATA/ LOAD DATA LOCAL commands, 
> the check for file existence is not executed on HiveServer2, and this in turn 
> throws an error during Runtime once the job is launched. 
> {code:java}
> java.io.FileNotFoundException: File file:/ does not exist.{code}
> Non-partitioned tables do not follow this control flow, and the checks are 
> run appropriately at compile time. 
> Incase the file does not exist, user is presented with the error.
> {code:java}
> Invalid path "file:///: No files matching path 
> file:/{code}
> This is inconsistent and error prone behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27995) FIx inconsistent behavior of LOAD DATA command for partitioned and non-partitioned tables.

2024-01-11 Thread Shivangi Jha (Jira)
Shivangi Jha created HIVE-27995:
---

 Summary: FIx inconsistent behavior of LOAD DATA command for 
partitioned and non-partitioned tables.
 Key: HIVE-27995
 URL: https://issues.apache.org/jira/browse/HIVE-27995
 Project: Hive
  Issue Type: Bug
Reporter: Shivangi Jha
Assignee: Shivangi Jha


For partitioned tables, while executing LOAD DATA/ LOAD DATA LOCAL commands, 
the check for file existence is not executed on HiveServer2, and this in turn 
throws an error during Runtime once the job is launched. 
{code:java}
java.io.FileNotFoundException: File file:/ does not exist.{code}

Non-partitioned tables do not follow this control flow, and the checks are run 
appropriately at compile time. 

Incase the file does not exist, user is presented with the error.
{code:java}
Invalid path "file:///: No files matching path file:/{code}
This is inconsistent and error prone behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-27992) Upgrade to tez 0.10.3

2024-01-11 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-27992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805481#comment-17805481
 ] 

László Bodor edited comment on HIVE-27992 at 1/11/24 11:00 AM:
---

testing artifacts, initial issues discovered:
1. some TestNegativeLlapLocalCliDriver failures - UPDATE: TEZ-4506, can be 
masked 
http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4991/7/tests
2. long tez perf cli driver run, need to check older for baseline (~3000s = 50m)
{code}

[2024-01-11T09:14:32.488Z] [INFO] Running 
org.apache.hadoop.hive.cli.TestTezTPCDS30TBPerfCliDriver

[2024-01-11T10:03:28.803Z] [WARNING] Tests run: 200, Failures: 0, Errors: 0, 
Skipped: 4, Time elapsed: 2,932.865 s - in 
org.apache.hadoop.hive.cli.TestTezTPCDS30TBPerfCliDriver
{code}
UPDATE: it's not much worse than before:
{code}
#EARLIER
wget -nv -O - 
http://ci.hive.apache.org/job/hive-precommit/job/PR-4977/3/testReport/api/json 
> check-pr.json; jq --arg testname "TestTezTPCDS30TBPerfCliDriver" '.suites[] | 
select(.name | contains($testname)) | {test:.name , time:.duration} ' 
check-pr.json
2024-01-11 11:58:42 
URL:http://ci.hive.apache.org/job/hive-precommit/job/PR-4977/3/testReport/api/json
 [37554552] -> "-" [1]
{
  "test": "org.apache.hadoop.hive.cli.TestTezTPCDS30TBPerfCliDriver",
  "time": 2802.562
}

#NOW
wget -nv -O - 
http://ci.hive.apache.org/job/hive-precommit/job/PR-4991/7/testReport/api/json 
> check-pr.json; jq --arg testname "TestTezTPCDS30TBPerfCliDriver" '.suites[] | 
select(.name | contains($testname)) | {test:.name , time:.duration} ' 
check-pr.json
2024-01-11 11:59:12 
URL:http://ci.hive.apache.org/job/hive-precommit/job/PR-4991/7/testReport/api/json
 [38046027] -> "-" [1]
{
  "test": "org.apache.hadoop.hive.cli.TestTezTPCDS30TBPerfCliDriver",
  "time": 2932.865
}
{code}


was (Author: abstractdog):
testing artifacts, initial issues discovered:
1. some TestNegativeLlapLocalCliDriver failures - UPDATE: TEZ-4506, can be 
masked 
http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4991/7/tests
2. long tez perf cli driver run, need to check older for baseline (~3000s = 50m)
{code}

[2024-01-11T09:14:32.488Z] [INFO] Running 
org.apache.hadoop.hive.cli.TestTezTPCDS30TBPerfCliDriver

[2024-01-11T10:03:28.803Z] [WARNING] Tests run: 200, Failures: 0, Errors: 0, 
Skipped: 4, Time elapsed: 2,932.865 s - in 
org.apache.hadoop.hive.cli.TestTezTPCDS30TBPerfCliDriver
{code}

> Upgrade to tez 0.10.3
> -
>
> Key: HIVE-27992
> URL: https://issues.apache.org/jira/browse/HIVE-27992
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-27992) Upgrade to tez 0.10.3

2024-01-11 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-27992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805481#comment-17805481
 ] 

László Bodor edited comment on HIVE-27992 at 1/11/24 10:53 AM:
---

testing artifacts, initial issues discovered:
1. some TestNegativeLlapLocalCliDriver failures - UPDATE: TEZ-4506, can be 
masked 
http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4991/7/tests
2. long tez perf cli driver run, need to check older for baseline (~3000s = 50m)
{code}

[2024-01-11T09:14:32.488Z] [INFO] Running 
org.apache.hadoop.hive.cli.TestTezTPCDS30TBPerfCliDriver

[2024-01-11T10:03:28.803Z] [WARNING] Tests run: 200, Failures: 0, Errors: 0, 
Skipped: 4, Time elapsed: 2,932.865 s - in 
org.apache.hadoop.hive.cli.TestTezTPCDS30TBPerfCliDriver
{code}


was (Author: abstractdog):
testing artifacts, initial issues discovered:
1. some TestNegativeLlapLocalCliDriver failures 
http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4991/7/tests
2. long tez perf cli driver run, need to check older for baseline (~3000s = 50m)
{code}

[2024-01-11T09:14:32.488Z] [INFO] Running 
org.apache.hadoop.hive.cli.TestTezTPCDS30TBPerfCliDriver

[2024-01-11T10:03:28.803Z] [WARNING] Tests run: 200, Failures: 0, Errors: 0, 
Skipped: 4, Time elapsed: 2,932.865 s - in 
org.apache.hadoop.hive.cli.TestTezTPCDS30TBPerfCliDriver
{code}

> Upgrade to tez 0.10.3
> -
>
> Key: HIVE-27992
> URL: https://issues.apache.org/jira/browse/HIVE-27992
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27992) Upgrade to tez 0.10.3

2024-01-11 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-27992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805481#comment-17805481
 ] 

László Bodor commented on HIVE-27992:
-

testing artifacts, initial issues discovered:
1. some TestNegativeLlapLocalCliDriver failures 
http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4991/7/tests
2. long tez perf cli driver run, need to check older for baseline (~3000s = 50m)
{code}

[2024-01-11T09:14:32.488Z] [INFO] Running 
org.apache.hadoop.hive.cli.TestTezTPCDS30TBPerfCliDriver

[2024-01-11T10:03:28.803Z] [WARNING] Tests run: 200, Failures: 0, Errors: 0, 
Skipped: 4, Time elapsed: 2,932.865 s - in 
org.apache.hadoop.hive.cli.TestTezTPCDS30TBPerfCliDriver
{code}

> Upgrade to tez 0.10.3
> -
>
> Key: HIVE-27992
> URL: https://issues.apache.org/jira/browse/HIVE-27992
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27977) Fix ordering flakiness in TestHplSqlViaBeeLine

2024-01-11 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-27977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805473#comment-17805473
 ] 

László Bodor commented on HIVE-27977:
-

merged to master, thanks [~zhangbutao] and [~ayushtkn] for the reviews!

> Fix ordering flakiness in TestHplSqlViaBeeLine
> --
>
> Key: HIVE-27977
> URL: https://issues.apache.org/jira/browse/HIVE-27977
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> like:
> {code}
> Output: '++
> |  _c0   |
> ++
> | Hello Smith!   |
> | Hello Sachin!  |
> ++
> ' should match Hello Sachin!.*Hello Smith!
> {code}
> I found this flakiness after backporting a related patch to downstream repos 
> (HIVE-24730)
> not sure why it isn't flaky upstream, however, select records without order 
> is not deterministic by design, so it's worth taking care of this



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27977) Fix ordering flakiness in TestHplSqlViaBeeLine

2024-01-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor resolved HIVE-27977.
-
Resolution: Fixed

> Fix ordering flakiness in TestHplSqlViaBeeLine
> --
>
> Key: HIVE-27977
> URL: https://issues.apache.org/jira/browse/HIVE-27977
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> like:
> {code}
> Output: '++
> |  _c0   |
> ++
> | Hello Smith!   |
> | Hello Sachin!  |
> ++
> ' should match Hello Sachin!.*Hello Smith!
> {code}
> I found this flakiness after backporting a related patch to downstream repos 
> (HIVE-24730)
> not sure why it isn't flaky upstream, however, select records without order 
> is not deterministic by design, so it's worth taking care of this



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27977) Fix ordering flakiness in TestHplSqlViaBeeLine

2024-01-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-27977:

Fix Version/s: 4.0.0

> Fix ordering flakiness in TestHplSqlViaBeeLine
> --
>
> Key: HIVE-27977
> URL: https://issues.apache.org/jira/browse/HIVE-27977
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> like:
> {code}
> Output: '++
> |  _c0   |
> ++
> | Hello Smith!   |
> | Hello Sachin!  |
> ++
> ' should match Hello Sachin!.*Hello Smith!
> {code}
> I found this flakiness after backporting a related patch to downstream repos 
> (HIVE-24730)
> not sure why it isn't flaky upstream, however, select records without order 
> is not deterministic by design, so it's worth taking care of this



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26713) StringExpr ArrayIndexOutOfBoundsException with LIKE '%xxx%'

2024-01-11 Thread Ryu Kobayashi (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805444#comment-17805444
 ] 

Ryu Kobayashi commented on HIVE-26713:
--

This issue should still be resolved not yet. I have recreated the PR.

> StringExpr ArrayIndexOutOfBoundsException with LIKE '%xxx%'
> ---
>
> Key: HIVE-26713
> URL: https://issues.apache.org/jira/browse/HIVE-26713
> Project: Hive
>  Issue Type: Bug
>  Components: storage-api
>Affects Versions: All Versions
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When LIKE(%xxx%) search is performed, if the character string contains 
> control characters, overflow occurs as follows.
> https://github.com/apache/hive/blob/master/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringExpr.java#L345
> {code:java}
> // input[next] == -1
> // shift[input[next] & MAX_BYTE] == 255
> next += shift[input[next] & MAX_BYTE]; {code}
>  
> Stack trace:
> {code:java}
> TaskAttempt 3 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1665986828766_64791_1_00_00_3:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
> 2 at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:220)
> 3 at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:177)
> 4 at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:479)
> 5 at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> 6 at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> 7 at java.security.AccessController.doPrivileged(Native Method)
> 8 at javax.security.auth.Subject.doAs(Subject.java:422)
> 9 at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
> 10at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> 11at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> 12at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> 13at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
> 14at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
> 15at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
> 16at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 17at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 18at java.lang.Thread.run(Thread.java:750)
> 19Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row 
> 20at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:95)
> 21at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:70)
> 22at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419)
> 23at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:194)
> 24... 16 more
> 25Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row 
> 26at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:883)
> 27at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:86)
> 28... 19 more
> 29Caused by: java.lang.ArrayIndexOutOfBoundsException: 255
> 30at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.StringExpr$BoyerMooreHorspool.find(StringExpr.java:409)
> 31at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.AbstractFilterStringColLikeStringScalar$MiddleChecker.index(AbstractFilterStringColLikeStringScalar.java:314)
> 32at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.AbstractFilterStringColLikeStringScalar$MiddleChecker.check(AbstractFilterStringColLikeStringScalar.java:307)
> 33at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.AbstractFilterStringColLikeStringScalar.evaluate(AbstractFilterStringColLikeStringScalar.java:115)
> 34at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.FilterExprOrExpr.evaluate(FilterExprOrExpr.java:183)
> 35at 
> org.apache.hadoop.hive.ql.exec.vector.expressio

[jira] [Commented] (HIVE-26339) HIVE-26047 Related LIKE pattern issues

2024-01-11 Thread Ryu Kobayashi (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805441#comment-17805441
 ] 

Ryu Kobayashi commented on HIVE-26339:
--

This issue should still be resolved not yet. I have recreated the PR.

> HIVE-26047 Related LIKE pattern issues
> --
>
> Key: HIVE-26339
> URL: https://issues.apache.org/jira/browse/HIVE-26339
> Project: Hive
>  Issue Type: Bug
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Fixed https://issues.apache.org/jira/browse/HIVE-26047 without using regular 
> expressions. Current code also confirmed that the current regular expression 
> pattern cannot be supported by the following LIKE patterns.
> End pattern
> {code:java}
> %abc\%def {code}
> Start pattern
> {code:java}
> abc\%def% {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27939) Many UNION ALL throws SemanticException when trying to remove partition predicates: fail to find child from parent

2024-01-11 Thread Ryu Kobayashi (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805436#comment-17805436
 ] 

Ryu Kobayashi commented on HIVE-27939:
--

I pushed new PR.

> Many UNION ALL throws SemanticException when trying to remove partition 
> predicates: fail to find child from parent
> --
>
> Key: HIVE-27939
> URL: https://issues.apache.org/jira/browse/HIVE-27939
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 4.0.0-beta-1
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Major
>  Labels: pull-request-available
> Attachments: ddl.sql, query.sql
>
>
> I found that the ticket for HIVE-26779 alone does not resolve when using many 
> UNION ALL. When we create a DDL with [^ddl.sql] and execute a query with 
> [^query.sql], we get a SemanticException similar to HIVE-26779.
> {code:java}
> 23/12/07 18:02:01 ERROR ql.Driver: FAILED: SemanticException Exception when 
> trying to remove partition predicates: fail to find child from parent
> org.apache.hadoop.hive.ql.parse.SemanticException: Exception when trying to 
> remove partition predicates: fail to find child from parent
>         at 
> org.apache.hadoop.hive.ql.exec.Operator.removeChildAndAdoptItsChildren(Operator.java:809)
>         at 
> org.apache.hadoop.hive.ql.parse.GenTezUtils.removeUnionOperators(GenTezUtils.java:472)
>         at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:691)
>         at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:301)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.compilePlan(SemanticAnalyzer.java:13054)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13272)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12628)
>         at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
>         at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
>         at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107)
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519)
>         at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471)
>         at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436)
>         at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430)
>         at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121)
>         at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:227)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
>         at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:509)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:525)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:843)
>         at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:807)
>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:721)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:236){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)