[jira] [Comment Edited] (HIVE-27901) Hive's performance for querying the Iceberg table is very poor.
[ https://issues.apache.org/jira/browse/HIVE-27901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805914#comment-17805914 ] yongzhi.shao edited comment on HIVE-27901 at 1/12/24 7:29 AM: -- [~zhangbutao] ; Hi. I reset _read.split.target-size=67108864. But it's improved a little bit only, Hive still takes more than 6 times as long as Spark to read the same table._ >From the logs printed by TEZ-CONSOLE, the time taken by Map-Task is quite >long, which means that there is a serious problem with the efficiency of HIVE >in reading ICEBERG data. was (Author: lisoda): [~zhangbutao] ; Hi. I reset _read.split.target-size=67108864. But it's improved a little bit only, Hive still takes more than 6 times as long as Spark to read the same table._ > Hive's performance for querying the Iceberg table is very poor. > --- > > Key: HIVE-27901 > URL: https://issues.apache.org/jira/browse/HIVE-27901 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Major > Attachments: image-2023-11-22-18-32-28-344.png, > image-2023-11-22-18-33-01-885.png, image-2023-11-22-18-33-32-915.png > > > I am using HIVE4.0.0-BETA for testing. > BTW,I found that the performance of HIVE reading ICEBERG table is still very > slow. > How should I deal with this problem? > I count a 7 billion table and compare the performance difference between HIVE > reading ICEBERG-ORC and ORC table respectively. > We use ICEBERG 1.4.2, ICEBERG-ORC with ZSTD compression enabled. > ORC with SNAPPY compression. > HADOOP version 3.1.1 (native zstd not supported). > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > --inner orc table( set hive default format = orc ) > set hive.default.fileformat=orc; > set hive.default.fileformat.managed=orc; > create table if not exists iceberg_dwd.orc_inner_table as select * from > iceberg_dwd.b_std_trade;{code} > > !image-2023-11-22-18-32-28-344.png! > !image-2023-11-22-18-33-01-885.png! > Also, I have another question. The Submit Plan statistic is clearly > incorrect. Is this something that needs to be fixed? > !image-2023-11-22-18-33-32-915.png! > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27901) Hive's performance for querying the Iceberg table is very poor.
[ https://issues.apache.org/jira/browse/HIVE-27901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805917#comment-17805917 ] yongzhi.shao commented on HIVE-27901: - Can you do a one-time validation using the official release 4.0.0-beta1? (using hadoop-location_based_table). Since version 4.0.0 has not yet been released, I'm not sure if this has been improved in version 4.0.0. > Hive's performance for querying the Iceberg table is very poor. > --- > > Key: HIVE-27901 > URL: https://issues.apache.org/jira/browse/HIVE-27901 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Major > Attachments: image-2023-11-22-18-32-28-344.png, > image-2023-11-22-18-33-01-885.png, image-2023-11-22-18-33-32-915.png > > > I am using HIVE4.0.0-BETA for testing. > BTW,I found that the performance of HIVE reading ICEBERG table is still very > slow. > How should I deal with this problem? > I count a 7 billion table and compare the performance difference between HIVE > reading ICEBERG-ORC and ORC table respectively. > We use ICEBERG 1.4.2, ICEBERG-ORC with ZSTD compression enabled. > ORC with SNAPPY compression. > HADOOP version 3.1.1 (native zstd not supported). > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > --inner orc table( set hive default format = orc ) > set hive.default.fileformat=orc; > set hive.default.fileformat.managed=orc; > create table if not exists iceberg_dwd.orc_inner_table as select * from > iceberg_dwd.b_std_trade;{code} > > !image-2023-11-22-18-32-28-344.png! > !image-2023-11-22-18-33-01-885.png! > Also, I have another question. The Submit Plan statistic is clearly > incorrect. Is this something that needs to be fixed? > !image-2023-11-22-18-33-32-915.png! > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HIVE-27901) Hive's performance for querying the Iceberg table is very poor.
[ https://issues.apache.org/jira/browse/HIVE-27901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805914#comment-17805914 ] yongzhi.shao edited comment on HIVE-27901 at 1/12/24 7:25 AM: -- [~zhangbutao] ; Hi. I reset _read.split.target-size=67108864. But it's improved a little bit only, Hive still takes more than 6 times as long as Spark to read the same table._ was (Author: lisoda): [~zhangbutao] ; Hi. I reset _read.split.target-size=67108864. But it's improved a bit, but Hive still takes more than 6 times as long as Spark to read the same table._ > Hive's performance for querying the Iceberg table is very poor. > --- > > Key: HIVE-27901 > URL: https://issues.apache.org/jira/browse/HIVE-27901 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Major > Attachments: image-2023-11-22-18-32-28-344.png, > image-2023-11-22-18-33-01-885.png, image-2023-11-22-18-33-32-915.png > > > I am using HIVE4.0.0-BETA for testing. > BTW,I found that the performance of HIVE reading ICEBERG table is still very > slow. > How should I deal with this problem? > I count a 7 billion table and compare the performance difference between HIVE > reading ICEBERG-ORC and ORC table respectively. > We use ICEBERG 1.4.2, ICEBERG-ORC with ZSTD compression enabled. > ORC with SNAPPY compression. > HADOOP version 3.1.1 (native zstd not supported). > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > --inner orc table( set hive default format = orc ) > set hive.default.fileformat=orc; > set hive.default.fileformat.managed=orc; > create table if not exists iceberg_dwd.orc_inner_table as select * from > iceberg_dwd.b_std_trade;{code} > > !image-2023-11-22-18-32-28-344.png! > !image-2023-11-22-18-33-01-885.png! > Also, I have another question. The Submit Plan statistic is clearly > incorrect. Is this something that needs to be fixed? > !image-2023-11-22-18-33-32-915.png! > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27901) Hive's performance for querying the Iceberg table is very poor.
[ https://issues.apache.org/jira/browse/HIVE-27901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805914#comment-17805914 ] yongzhi.shao commented on HIVE-27901: - [~zhangbutao] ; Hi. I reset _read.split.target-size=67108864. But it's improved a bit, but Hive still takes more than 6 times as long as Spark to read the same table._ > Hive's performance for querying the Iceberg table is very poor. > --- > > Key: HIVE-27901 > URL: https://issues.apache.org/jira/browse/HIVE-27901 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Major > Attachments: image-2023-11-22-18-32-28-344.png, > image-2023-11-22-18-33-01-885.png, image-2023-11-22-18-33-32-915.png > > > I am using HIVE4.0.0-BETA for testing. > BTW,I found that the performance of HIVE reading ICEBERG table is still very > slow. > How should I deal with this problem? > I count a 7 billion table and compare the performance difference between HIVE > reading ICEBERG-ORC and ORC table respectively. > We use ICEBERG 1.4.2, ICEBERG-ORC with ZSTD compression enabled. > ORC with SNAPPY compression. > HADOOP version 3.1.1 (native zstd not supported). > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > --inner orc table( set hive default format = orc ) > set hive.default.fileformat=orc; > set hive.default.fileformat.managed=orc; > create table if not exists iceberg_dwd.orc_inner_table as select * from > iceberg_dwd.b_std_trade;{code} > > !image-2023-11-22-18-32-28-344.png! > !image-2023-11-22-18-33-01-885.png! > Also, I have another question. The Submit Plan statistic is clearly > incorrect. Is this something that needs to be fixed? > !image-2023-11-22-18-33-32-915.png! > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27370) SUBSTR UDF return '?' against 4-bytes character
[ https://issues.apache.org/jira/browse/HIVE-27370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805902#comment-17805902 ] Ryu Kobayashi commented on HIVE-27370: -- This issue should still be resolved not yet. I have recreated the PR. > SUBSTR UDF return '?' against 4-bytes character > --- > > Key: HIVE-27370 > URL: https://issues.apache.org/jira/browse/HIVE-27370 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: All Versions >Reporter: Ryu Kobayashi >Assignee: Ryu Kobayashi >Priority: Major > Labels: pull-request-available > > SUBSTR doesn't seem to support 4-byte characters. This also happens in master > branch. Also, this does not happen in vectorized mode, so it is a problem > specific to non-vectorized mode. An example is below: > {code:java} > -- vectorized mode > create temporary table foo (str string) stored as orc; > insert into foo values('安佐町大字久地字野𨵱4614番地'), ('あa🤎いiうu'); > SELECT > SUBSTR(str, 1, 10) as a1, > SUBSTR(str, 10, 3) as a2, > SUBSTR(str, -7) as a3, > substr(str, 1, 3) as b1, > substr(str, 3) as b2, > substr(str, -5) as b3 > from foo > ; > 安佐町大字久地字野𨵱 𨵱4614番地 安佐町 町大字久地字野𨵱4614番地 614番地 > あa🤎 あa🤎いiうu あa🤎 🤎いiうu 🤎いiうu {code} > {code:java} > -- non-vectorized > SELECT > SUBSTR('安佐町大字久地字野𨵱4614番地', 1, 10) as a1, > SUBSTR('安佐町大字久地字野𨵱4614番地', 10, 3) as a2, > SUBSTR('安佐町大字久地字野𨵱4614番地', -7) as a3, > substr('あa🤎いiうu', 1, 3) as b1, > substr('あa🤎いiうu', 3) as b2, > substr('あa🤎いiうu', -5) as b3 > ; > 安佐町大字久地字野? �4 ?4614番地 あa? �いiうu ?いiうu{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27997) Incorrect result for Hive join query with NVL and Map Join
[ https://issues.apache.org/jira/browse/HIVE-27997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mergen updated HIVE-27997: -- Description: Hive returns incorrect result if there is NVL() in an ON clause with Map Join enabled. STEPS TO REPRODUCE: {code:java} Step 1: Create a table test_nvl create table test_nvl(a string); Step 2: Insert null and non-null data into table test_nvl insert into test_nvl values ('x'), ('y'), (null); select * from test_nvl; +-+ | test_nvl.a | +-+ | x | | y | | NULL| +-+ Step 3 : Execute the following query select x.a, y.a from test_nvl x left join test_nvl y on nvl(x.a, '') = nvl(y.a, '');{code} EXPECTED RESULT: {code:java} +---+---+ | x.a | y.a | +---+---+ | x | x | | y | y | | NULL | NULL | +---+---+ {code} ACTUAL RESULT: {code:java} +---+--+ | x.a | y.a | +---+--+ | x | x| | y | x| | NULL | x| +---+--+{code} (Obviously 'y' != 'x' and NULL != 'x' so they should not be in the same line) The query works fine with Map Join disabled: {code:java} -- Using Merge Join instead. set hive.auto.convert.join=false; select x.a, y.a from test_nvl x left join test_nvl y on nvl(x.a, '') = nvl(y.a, ''); +---+---+ | x.a | y.a | +---+---+ | NULL | NULL | | x | x | | y | y | +---+---+ {code} was: Hive returns incorrect result if there is NVL() in an ON clause with Map Join enabled. STEPS TO REPRODUCE: {code:java} Step 1: Create a table test_nvl create table test_nvl(a string); Step 2: Insert null and non-null data into table test_nvl insert into test_nvl values ('x'), ('y'), (null); select * from test_nvl; +-+ | test_nvl.a | +-+ | x | | y | | NULL| +-+ Step 3 : Execute the following query select x.a, y.a from test_nvl x left join test_nvl y on nvl(x.a, '') = nvl(y.a, '');{code} EXPECTED RESULT: {code:java} +---+---+ | x.a | y.a | +---+---+ | x | x | | y | y | | NULL | NULL | +---+---+ {code} ACTUAL RESULT: {code:java} +---+--+ | x.a | y.a | +---+--+ | x | x| | y | x| | NULL | x| +---+--+{code} (Obviously 'y' != 'x' and NULL != 'x' so they should not be in the same line) The query works fine with Map Join disabled: {code:java} -- Using Merge Join instead. set hive.auto.convert.join=false; select x.a, y.a from test_nvl x left join test_nvl y on nvl(x.a, '') = nvl(y.a, ''); +---+---+ | x.a | y.a | +---+---+ | NULL | NULL | | x | x | | y | y | +---+---+ {code} > Incorrect result for Hive join query with NVL and Map Join > -- > > Key: HIVE-27997 > URL: https://issues.apache.org/jira/browse/HIVE-27997 > Project: Hive > Issue Type: Bug > Components: Operators >Affects Versions: 3.1.3 >Reporter: Mergen >Priority: Major > > Hive returns incorrect result if there is NVL() in an ON clause with Map Join > enabled. > > STEPS TO REPRODUCE: > {code:java} > Step 1: Create a table test_nvl > create table test_nvl(a string); > Step 2: Insert null and non-null data into table test_nvl > insert into test_nvl values ('x'), ('y'), (null); > select * from test_nvl; > +-+ > | test_nvl.a | > +-+ > | x | > | y | > | NULL| > +-+ > Step 3 : Execute the following query > select x.a, y.a > from test_nvl x > left join test_nvl y > on nvl(x.a, '') = nvl(y.a, '');{code} > > EXPECTED RESULT: > {code:java} > +---+---+ > | x.a | y.a | > +---+---+ > | x | x | > | y | y | > | NULL | NULL | > +---+---+ {code} > > ACTUAL RESULT: > {code:java} > +---+--+ > | x.a | y.a | > +---+--+ > | x | x| > | y | x| > | NULL | x| > +---+--+{code} > (Obviously 'y' != 'x' and NULL != 'x' so they should not be in the same line) > > The query works fine with Map Join disabled: > {code:java} > -- Using Merge Join instead. > set hive.auto.convert.join=false; > select x.a, y.a > from test_nvl x > left join test_nvl y > on nvl(x.a, '') = nvl(y.a, ''); > +---+---+ > | x.a | y.a | > +---+---+ > | NULL | NULL | > | x | x | > | y | y | > +---+---+ {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27997) Incorrect result for Hive join query with NVL and Map Join
Mergen created HIVE-27997: - Summary: Incorrect result for Hive join query with NVL and Map Join Key: HIVE-27997 URL: https://issues.apache.org/jira/browse/HIVE-27997 Project: Hive Issue Type: Bug Components: Operators Affects Versions: 3.1.3 Reporter: Mergen Hive returns incorrect result if there is NVL() in an ON clause with Map Join enabled. STEPS TO REPRODUCE: {code:java} Step 1: Create a table test_nvl create table test_nvl(a string); Step 2: Insert null and non-null data into table test_nvl insert into test_nvl values ('x'), ('y'), (null); select * from test_nvl; +-+ | test_nvl.a | +-+ | x | | y | | NULL| +-+ Step 3 : Execute the following query select x.a, y.a from test_nvl x left join test_nvl y on nvl(x.a, '') = nvl(y.a, '');{code} EXPECTED RESULT: {code:java} +---+---+ | x.a | y.a | +---+---+ | x | x | | y | y | | NULL | NULL | +---+---+ {code} ACTUAL RESULT: {code:java} +---+--+ | x.a | y.a | +---+--+ | x | x| | y | x| | NULL | x| +---+--+{code} (Obviously 'y' != 'x' and NULL != 'x' so they should not be in the same line) The query works fine with Map Join disabled: {code:java} -- Using Merge Join instead. set hive.auto.convert.join=false; select x.a, y.a from test_nvl x left join test_nvl y on nvl(x.a, '') = nvl(y.a, ''); +---+---+ | x.a | y.a | +---+---+ | NULL | NULL | | x | x | | y | y | +---+---+ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27992) Upgrade to tez 0.10.3
[ https://issues.apache.org/jira/browse/HIVE-27992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805794#comment-17805794 ] László Bodor commented on HIVE-27992: - http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4991/8 looks good (only an unrelated silly error), so the used artifacts are ready to be promoted as a tez release candidate https://repository.apache.org/content/repositories/orgapachetez-1078/org/apache/tez/ > Upgrade to tez 0.10.3 > - > > Key: HIVE-27992 > URL: https://issues.apache.org/jira/browse/HIVE-27992 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27996) Revert HIVE-27406 & HIVE-27481
[ https://issues.apache.org/jira/browse/HIVE-27996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-27996: -- Labels: pull-request-available (was: ) > Revert HIVE-27406 & HIVE-27481 > -- > > Key: HIVE-27996 > URL: https://issues.apache.org/jira/browse/HIVE-27996 > Project: Hive > Issue Type: Task >Reporter: László Végh >Priority: Major > Labels: pull-request-available > > Revert HIVE-27406 & HIVE-27481 > > The introduced changes were causing DB incompatibility issues. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27996) Revert HIVE-27406 & HIVE-27481
László Végh created HIVE-27996: -- Summary: Revert HIVE-27406 & HIVE-27481 Key: HIVE-27996 URL: https://issues.apache.org/jira/browse/HIVE-27996 Project: Hive Issue Type: Task Reporter: László Végh Revert HIVE-27406 & HIVE-27481 The introduced changes were causing DB incompatibility issues. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27972) Set 'tez' as default value in hive.execution.engine
[ https://issues.apache.org/jira/browse/HIVE-27972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor reassigned HIVE-27972: --- Assignee: László Bodor > Set 'tez' as default value in hive.execution.engine > --- > > Key: HIVE-27972 > URL: https://issues.apache.org/jira/browse/HIVE-27972 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > > Maybe this is not the first ticket addressing this, please link if it's a > duplicate. > We need to set this to 'tez' to reflect that we have deprecated 'mr': > https://github.com/apache/hive/blob/bd16e0098916aa5fc2dede99492c6a240b51e677/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L4567 > I'm expecting lots of UT failures because of this, as we're still running > those on mr (which might be fine where the actual unit test is not closely > related to the execution engine), so we'll see what to do. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27995) Fix inconsistent behavior of LOAD DATA command for partitioned and non-partitioned tables
[ https://issues.apache.org/jira/browse/HIVE-27995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-27995: -- Labels: pull-request-available (was: ) > Fix inconsistent behavior of LOAD DATA command for partitioned and > non-partitioned tables > - > > Key: HIVE-27995 > URL: https://issues.apache.org/jira/browse/HIVE-27995 > Project: Hive > Issue Type: Bug >Reporter: Shivangi Jha >Assignee: Shivangi Jha >Priority: Major > Labels: pull-request-available > > For partitioned tables, while executing LOAD DATA/ LOAD DATA LOCAL commands, > the check for file existence is not executed on HiveServer2, and this in turn > throws an error during Runtime once the job is launched. > {code:java} > java.io.FileNotFoundException: File file:/ does not exist.{code} > Non-partitioned tables do not follow this control flow, and the checks are > run appropriately at compile time. > Incase the file does not exist, user is presented with the error. > {code:java} > Invalid path "file:///: No files matching path > file:/{code} > This is inconsistent and error prone behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27995) Fix inconsistent behavior of LOAD DATA command for partitioned and non-partitioned tables
[ https://issues.apache.org/jira/browse/HIVE-27995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivangi Jha updated HIVE-27995: Summary: Fix inconsistent behavior of LOAD DATA command for partitioned and non-partitioned tables (was: FIx inconsistent behavior of LOAD DATA command for partitioned and non-partitioned tables.) > Fix inconsistent behavior of LOAD DATA command for partitioned and > non-partitioned tables > - > > Key: HIVE-27995 > URL: https://issues.apache.org/jira/browse/HIVE-27995 > Project: Hive > Issue Type: Bug >Reporter: Shivangi Jha >Assignee: Shivangi Jha >Priority: Major > > For partitioned tables, while executing LOAD DATA/ LOAD DATA LOCAL commands, > the check for file existence is not executed on HiveServer2, and this in turn > throws an error during Runtime once the job is launched. > {code:java} > java.io.FileNotFoundException: File file:/ does not exist.{code} > Non-partitioned tables do not follow this control flow, and the checks are > run appropriately at compile time. > Incase the file does not exist, user is presented with the error. > {code:java} > Invalid path "file:///: No files matching path > file:/{code} > This is inconsistent and error prone behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27995) FIx inconsistent behavior of LOAD DATA command for partitioned and non-partitioned tables.
Shivangi Jha created HIVE-27995: --- Summary: FIx inconsistent behavior of LOAD DATA command for partitioned and non-partitioned tables. Key: HIVE-27995 URL: https://issues.apache.org/jira/browse/HIVE-27995 Project: Hive Issue Type: Bug Reporter: Shivangi Jha Assignee: Shivangi Jha For partitioned tables, while executing LOAD DATA/ LOAD DATA LOCAL commands, the check for file existence is not executed on HiveServer2, and this in turn throws an error during Runtime once the job is launched. {code:java} java.io.FileNotFoundException: File file:/ does not exist.{code} Non-partitioned tables do not follow this control flow, and the checks are run appropriately at compile time. Incase the file does not exist, user is presented with the error. {code:java} Invalid path "file:///: No files matching path file:/{code} This is inconsistent and error prone behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HIVE-27992) Upgrade to tez 0.10.3
[ https://issues.apache.org/jira/browse/HIVE-27992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805481#comment-17805481 ] László Bodor edited comment on HIVE-27992 at 1/11/24 11:00 AM: --- testing artifacts, initial issues discovered: 1. some TestNegativeLlapLocalCliDriver failures - UPDATE: TEZ-4506, can be masked http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4991/7/tests 2. long tez perf cli driver run, need to check older for baseline (~3000s = 50m) {code} [2024-01-11T09:14:32.488Z] [INFO] Running org.apache.hadoop.hive.cli.TestTezTPCDS30TBPerfCliDriver [2024-01-11T10:03:28.803Z] [WARNING] Tests run: 200, Failures: 0, Errors: 0, Skipped: 4, Time elapsed: 2,932.865 s - in org.apache.hadoop.hive.cli.TestTezTPCDS30TBPerfCliDriver {code} UPDATE: it's not much worse than before: {code} #EARLIER wget -nv -O - http://ci.hive.apache.org/job/hive-precommit/job/PR-4977/3/testReport/api/json > check-pr.json; jq --arg testname "TestTezTPCDS30TBPerfCliDriver" '.suites[] | select(.name | contains($testname)) | {test:.name , time:.duration} ' check-pr.json 2024-01-11 11:58:42 URL:http://ci.hive.apache.org/job/hive-precommit/job/PR-4977/3/testReport/api/json [37554552] -> "-" [1] { "test": "org.apache.hadoop.hive.cli.TestTezTPCDS30TBPerfCliDriver", "time": 2802.562 } #NOW wget -nv -O - http://ci.hive.apache.org/job/hive-precommit/job/PR-4991/7/testReport/api/json > check-pr.json; jq --arg testname "TestTezTPCDS30TBPerfCliDriver" '.suites[] | select(.name | contains($testname)) | {test:.name , time:.duration} ' check-pr.json 2024-01-11 11:59:12 URL:http://ci.hive.apache.org/job/hive-precommit/job/PR-4991/7/testReport/api/json [38046027] -> "-" [1] { "test": "org.apache.hadoop.hive.cli.TestTezTPCDS30TBPerfCliDriver", "time": 2932.865 } {code} was (Author: abstractdog): testing artifacts, initial issues discovered: 1. some TestNegativeLlapLocalCliDriver failures - UPDATE: TEZ-4506, can be masked http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4991/7/tests 2. long tez perf cli driver run, need to check older for baseline (~3000s = 50m) {code} [2024-01-11T09:14:32.488Z] [INFO] Running org.apache.hadoop.hive.cli.TestTezTPCDS30TBPerfCliDriver [2024-01-11T10:03:28.803Z] [WARNING] Tests run: 200, Failures: 0, Errors: 0, Skipped: 4, Time elapsed: 2,932.865 s - in org.apache.hadoop.hive.cli.TestTezTPCDS30TBPerfCliDriver {code} > Upgrade to tez 0.10.3 > - > > Key: HIVE-27992 > URL: https://issues.apache.org/jira/browse/HIVE-27992 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HIVE-27992) Upgrade to tez 0.10.3
[ https://issues.apache.org/jira/browse/HIVE-27992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805481#comment-17805481 ] László Bodor edited comment on HIVE-27992 at 1/11/24 10:53 AM: --- testing artifacts, initial issues discovered: 1. some TestNegativeLlapLocalCliDriver failures - UPDATE: TEZ-4506, can be masked http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4991/7/tests 2. long tez perf cli driver run, need to check older for baseline (~3000s = 50m) {code} [2024-01-11T09:14:32.488Z] [INFO] Running org.apache.hadoop.hive.cli.TestTezTPCDS30TBPerfCliDriver [2024-01-11T10:03:28.803Z] [WARNING] Tests run: 200, Failures: 0, Errors: 0, Skipped: 4, Time elapsed: 2,932.865 s - in org.apache.hadoop.hive.cli.TestTezTPCDS30TBPerfCliDriver {code} was (Author: abstractdog): testing artifacts, initial issues discovered: 1. some TestNegativeLlapLocalCliDriver failures http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4991/7/tests 2. long tez perf cli driver run, need to check older for baseline (~3000s = 50m) {code} [2024-01-11T09:14:32.488Z] [INFO] Running org.apache.hadoop.hive.cli.TestTezTPCDS30TBPerfCliDriver [2024-01-11T10:03:28.803Z] [WARNING] Tests run: 200, Failures: 0, Errors: 0, Skipped: 4, Time elapsed: 2,932.865 s - in org.apache.hadoop.hive.cli.TestTezTPCDS30TBPerfCliDriver {code} > Upgrade to tez 0.10.3 > - > > Key: HIVE-27992 > URL: https://issues.apache.org/jira/browse/HIVE-27992 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27992) Upgrade to tez 0.10.3
[ https://issues.apache.org/jira/browse/HIVE-27992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805481#comment-17805481 ] László Bodor commented on HIVE-27992: - testing artifacts, initial issues discovered: 1. some TestNegativeLlapLocalCliDriver failures http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4991/7/tests 2. long tez perf cli driver run, need to check older for baseline (~3000s = 50m) {code} [2024-01-11T09:14:32.488Z] [INFO] Running org.apache.hadoop.hive.cli.TestTezTPCDS30TBPerfCliDriver [2024-01-11T10:03:28.803Z] [WARNING] Tests run: 200, Failures: 0, Errors: 0, Skipped: 4, Time elapsed: 2,932.865 s - in org.apache.hadoop.hive.cli.TestTezTPCDS30TBPerfCliDriver {code} > Upgrade to tez 0.10.3 > - > > Key: HIVE-27992 > URL: https://issues.apache.org/jira/browse/HIVE-27992 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27977) Fix ordering flakiness in TestHplSqlViaBeeLine
[ https://issues.apache.org/jira/browse/HIVE-27977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805473#comment-17805473 ] László Bodor commented on HIVE-27977: - merged to master, thanks [~zhangbutao] and [~ayushtkn] for the reviews! > Fix ordering flakiness in TestHplSqlViaBeeLine > -- > > Key: HIVE-27977 > URL: https://issues.apache.org/jira/browse/HIVE-27977 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > like: > {code} > Output: '++ > | _c0 | > ++ > | Hello Smith! | > | Hello Sachin! | > ++ > ' should match Hello Sachin!.*Hello Smith! > {code} > I found this flakiness after backporting a related patch to downstream repos > (HIVE-24730) > not sure why it isn't flaky upstream, however, select records without order > is not deterministic by design, so it's worth taking care of this -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27977) Fix ordering flakiness in TestHplSqlViaBeeLine
[ https://issues.apache.org/jira/browse/HIVE-27977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor resolved HIVE-27977. - Resolution: Fixed > Fix ordering flakiness in TestHplSqlViaBeeLine > -- > > Key: HIVE-27977 > URL: https://issues.apache.org/jira/browse/HIVE-27977 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > like: > {code} > Output: '++ > | _c0 | > ++ > | Hello Smith! | > | Hello Sachin! | > ++ > ' should match Hello Sachin!.*Hello Smith! > {code} > I found this flakiness after backporting a related patch to downstream repos > (HIVE-24730) > not sure why it isn't flaky upstream, however, select records without order > is not deterministic by design, so it's worth taking care of this -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27977) Fix ordering flakiness in TestHplSqlViaBeeLine
[ https://issues.apache.org/jira/browse/HIVE-27977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-27977: Fix Version/s: 4.0.0 > Fix ordering flakiness in TestHplSqlViaBeeLine > -- > > Key: HIVE-27977 > URL: https://issues.apache.org/jira/browse/HIVE-27977 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > like: > {code} > Output: '++ > | _c0 | > ++ > | Hello Smith! | > | Hello Sachin! | > ++ > ' should match Hello Sachin!.*Hello Smith! > {code} > I found this flakiness after backporting a related patch to downstream repos > (HIVE-24730) > not sure why it isn't flaky upstream, however, select records without order > is not deterministic by design, so it's worth taking care of this -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26713) StringExpr ArrayIndexOutOfBoundsException with LIKE '%xxx%'
[ https://issues.apache.org/jira/browse/HIVE-26713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805444#comment-17805444 ] Ryu Kobayashi commented on HIVE-26713: -- This issue should still be resolved not yet. I have recreated the PR. > StringExpr ArrayIndexOutOfBoundsException with LIKE '%xxx%' > --- > > Key: HIVE-26713 > URL: https://issues.apache.org/jira/browse/HIVE-26713 > Project: Hive > Issue Type: Bug > Components: storage-api >Affects Versions: All Versions >Reporter: Ryu Kobayashi >Assignee: Ryu Kobayashi >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > When LIKE(%xxx%) search is performed, if the character string contains > control characters, overflow occurs as follows. > https://github.com/apache/hive/blob/master/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringExpr.java#L345 > {code:java} > // input[next] == -1 > // shift[input[next] & MAX_BYTE] == 255 > next += shift[input[next] & MAX_BYTE]; {code} > > Stack trace: > {code:java} > TaskAttempt 3 failed, info=[Error: Error while running task ( failure ) : > attempt_1665986828766_64791_1_00_00_3:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > 2 at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:220) > 3 at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:177) > 4 at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:479) > 5 at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > 6 at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > 7 at java.security.AccessController.doPrivileged(Native Method) > 8 at javax.security.auth.Subject.doAs(Subject.java:422) > 9 at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) > 10at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > 11at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > 12at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > 13at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) > 14at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) > 15at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) > 16at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > 17at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > 18at java.lang.Thread.run(Thread.java:750) > 19Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > 20at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:95) > 21at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:70) > 22at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419) > 23at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:194) > 24... 16 more > 25Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > 26at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:883) > 27at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:86) > 28... 19 more > 29Caused by: java.lang.ArrayIndexOutOfBoundsException: 255 > 30at > org.apache.hadoop.hive.ql.exec.vector.expressions.StringExpr$BoyerMooreHorspool.find(StringExpr.java:409) > 31at > org.apache.hadoop.hive.ql.exec.vector.expressions.AbstractFilterStringColLikeStringScalar$MiddleChecker.index(AbstractFilterStringColLikeStringScalar.java:314) > 32at > org.apache.hadoop.hive.ql.exec.vector.expressions.AbstractFilterStringColLikeStringScalar$MiddleChecker.check(AbstractFilterStringColLikeStringScalar.java:307) > 33at > org.apache.hadoop.hive.ql.exec.vector.expressions.AbstractFilterStringColLikeStringScalar.evaluate(AbstractFilterStringColLikeStringScalar.java:115) > 34at > org.apache.hadoop.hive.ql.exec.vector.expressions.FilterExprOrExpr.evaluate(FilterExprOrExpr.java:183) > 35at > org.apache.hadoop.hive.ql.exec.vector.expressio
[jira] [Commented] (HIVE-26339) HIVE-26047 Related LIKE pattern issues
[ https://issues.apache.org/jira/browse/HIVE-26339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805441#comment-17805441 ] Ryu Kobayashi commented on HIVE-26339: -- This issue should still be resolved not yet. I have recreated the PR. > HIVE-26047 Related LIKE pattern issues > -- > > Key: HIVE-26339 > URL: https://issues.apache.org/jira/browse/HIVE-26339 > Project: Hive > Issue Type: Bug >Reporter: Ryu Kobayashi >Assignee: Ryu Kobayashi >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > Fixed https://issues.apache.org/jira/browse/HIVE-26047 without using regular > expressions. Current code also confirmed that the current regular expression > pattern cannot be supported by the following LIKE patterns. > End pattern > {code:java} > %abc\%def {code} > Start pattern > {code:java} > abc\%def% {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27939) Many UNION ALL throws SemanticException when trying to remove partition predicates: fail to find child from parent
[ https://issues.apache.org/jira/browse/HIVE-27939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805436#comment-17805436 ] Ryu Kobayashi commented on HIVE-27939: -- I pushed new PR. > Many UNION ALL throws SemanticException when trying to remove partition > predicates: fail to find child from parent > -- > > Key: HIVE-27939 > URL: https://issues.apache.org/jira/browse/HIVE-27939 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 4.0.0-beta-1 >Reporter: Ryu Kobayashi >Assignee: Ryu Kobayashi >Priority: Major > Labels: pull-request-available > Attachments: ddl.sql, query.sql > > > I found that the ticket for HIVE-26779 alone does not resolve when using many > UNION ALL. When we create a DDL with [^ddl.sql] and execute a query with > [^query.sql], we get a SemanticException similar to HIVE-26779. > {code:java} > 23/12/07 18:02:01 ERROR ql.Driver: FAILED: SemanticException Exception when > trying to remove partition predicates: fail to find child from parent > org.apache.hadoop.hive.ql.parse.SemanticException: Exception when trying to > remove partition predicates: fail to find child from parent > at > org.apache.hadoop.hive.ql.exec.Operator.removeChildAndAdoptItsChildren(Operator.java:809) > at > org.apache.hadoop.hive.ql.parse.GenTezUtils.removeUnionOperators(GenTezUtils.java:472) > at > org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:691) > at > org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:301) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.compilePlan(SemanticAnalyzer.java:13054) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13272) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12628) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:227) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257) > at > org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356) > at > org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:509) > at > org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:525) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:843) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:807) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:721) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.util.RunJar.run(RunJar.java:323) > at org.apache.hadoop.util.RunJar.main(RunJar.java:236){code} -- This message was sent by Atlassian Jira (v8.20.10#820010)