[jira] [Created] (HIVE-27905) SPLIT throws ClassCastException

2023-11-22 Thread okumin (Jira)
okumin created HIVE-27905:
-

 Summary: SPLIT throws ClassCastException
 Key: HIVE-27905
 URL: https://issues.apache.org/jira/browse/HIVE-27905
 Project: Hive
  Issue Type: Improvement
Affects Versions: 4.0.0-beta-1
Reporter: okumin
Assignee: okumin


GenericUDFSplit throws ClassCastException when a non-primitive type is given.
{code:java}
0: jdbc:hive2://hive-hiveserver2:1/defaul> select split(array('a,b,c'), 
',');
Error: Error while compiling statement: FAILED: ClassCastException 
org.apache.hadoop.hive.serde2.objectinspector.StandardConstantListObjectInspector
 cannot be cast to 
org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector 
(state=42000,code=4) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27901) Hive's performance for querying the Iceberg table is very poor.

2023-11-22 Thread zhangbutao (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789013#comment-17789013
 ] 

zhangbutao commented on HIVE-27901:
---

I think this ticket looks something like 
https://issues.apache.org/jira/browse/HIVE-27883 . Currently, some optimization 
properties  like merge/split data can not be used on Iceberg table as iceberg 
has its own optimization properties. 

 

For this ticket, it seems that orc table has more tasks than iceberg table, so 
the orc table can run faster. I think that maybe you can try to optimize the 
property _set read.split.target-size=67108864;_

[https://iceberg.apache.org/docs/latest/configuration/#read-properties]  
read.split.target-size is default 134217728.

But i am not sure if this is a good way to optimize your query, as i can not 
reproduce and delve into your problem.

> Hive's performance for querying the Iceberg table is very poor.
> ---
>
> Key: HIVE-27901
> URL: https://issues.apache.org/jira/browse/HIVE-27901
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Major
> Attachments: image-2023-11-22-18-32-28-344.png, 
> image-2023-11-22-18-33-01-885.png, image-2023-11-22-18-33-32-915.png
>
>
> I am using HIVE4.0.0-BETA for testing.
> BTW,I found that the performance of HIVE reading ICEBERG table is still very 
> slow.
> How should I deal with this problem?
> I count a 7 billion table and compare the performance difference between HIVE 
> reading ICEBERG-ORC and ORC table respectively.
> We use ICEBERG 1.4.2, ICEBERG-ORC with ZSTD compression enabled.
> ORC with SNAPPY compression.
> HADOOP version 3.1.1 (native zstd not supported).
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
> CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
> STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); 
> --inner orc table( set hive default format = orc )
> set hive.default.fileformat=orc;
> set hive.default.fileformat.managed=orc;
> create table if not exists iceberg_dwd.orc_inner_table as select * from 
> iceberg_dwd.b_std_trade;{code}
>  
> !image-2023-11-22-18-32-28-344.png!
> !image-2023-11-22-18-33-01-885.png!
> Also, I have another question. The Submit Plan statistic is clearly 
> incorrect. Is this something that needs to be fixed?
> !image-2023-11-22-18-33-32-915.png!
>  



--
This m

[jira] [Commented] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2023-11-22 Thread zhangbutao (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789008#comment-17789008
 ] 

zhangbutao commented on HIVE-27898:
---

Please provide more simple test to help others to reproduce this issue. 

1)  Can we create a more simple table with several columns? table 
*_datacenter.dwd.b_std_trade_* has too many columns.

2) Can we insert several rows to help to reproduce this issue? 

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> --hive-orc
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.trade_test 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;--10 ROWS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27900) hive can not read iceberg-parquet table

2023-11-22 Thread zhangbutao (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789007#comment-17789007
 ] 

zhangbutao commented on HIVE-27900:
---

I can not reproduce this issue on master code. My env is :

1) Hive master branch:

    You can compile hive code using the cmd:

 
{code:java}
mvn clean install -DskipTests -Piceberg -Pdist{code}
2)Tez 0.10.2

   I recommend you use 0.10.2 to test as 0.10.3 is not released. We can not 
make sure 0.10.3 work well with Hive.

3) Hadoop 3.3.1

 

 

BTW, if the table _*local.test.b_qqd_shop_rfm_parquet_snappy* is_ empty without 
data, will the issue occur again in your env?

> hive can not read iceberg-parquet table
> ---
>
> Key: HIVE-27900
> URL: https://issues.apache.org/jira/browse/HIVE-27900
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Major
>
> We found that using HIVE4-BETA version, we could not query the 
> Iceberg-Parquet table with vectorised execution turned on.
> {code:java}
> --spark-sql(3.4.1+iceberg 1.4.2)
> CREATE TABLE local.test.b_qqd_shop_rfm_parquet_snappy (
> a string,b string,c string)
> USING iceberg
> LOCATION '/iceberg-catalog/warehouse/test/b_qqd_shop_rfm_parquet_snappy'
> TBLPROPERTIES (
>   'current-snapshot-id' = '5138351937447353683',
>   'format' = 'iceberg/parquet',
>   'format-version' = '2',
>   'read.orc.vectorization.enabled' = 'true',
>   'write.format.default' = 'parquet',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.parquet.compression-codec' = 'snappy');
> --hive-sql
> CREATE EXTERNAL TABLE iceberg_dwd.b_qqd_shop_rfm_parquet_snappy
> STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
> LOCATION 
> 'hdfs://xxx/iceberg-catalog/warehouse/test/b_qqd_shop_rfm_parquet_snappy/'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> set hive.default.fileformat=orc;
> set hive.default.fileformat.managed=orc;
> create table test_parquet_as_orc as select * from 
> b_qqd_shop_rfm_parquet_snappy limit 100;
> , TaskAttempt 2 failed, info=[Error: Node: /xxx..xx.xx: Error while 
> running task ( failure ) : 
> attempt_1696729618575_69586_1_00_00_2:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
> at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
> ... 16 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:993)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSourc

[jira] [Commented] (HIVE-27899) Killed speculative execution task attempt should not commit file

2023-11-22 Thread Sungwoo Park (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17788971#comment-17788971
 ] 

Sungwoo Park commented on HIVE-27899:
-

Calling canCommit() may not be a complete solution. For example, can we have a 
bad scenario like this?

TaskAttempt#1 calls canCommit(), writes output, and then fails for some reason.
Later TaskAttempt#2 calls canCommit(), writes output, and then completes 
successfully.


> Killed speculative execution task attempt should not commit file
> 
>
> Key: HIVE-27899
> URL: https://issues.apache.org/jira/browse/HIVE-27899
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Reporter: Chenyu Zheng
>Assignee: Chenyu Zheng
>Priority: Major
> Attachments: reproduce_bug.md
>
>
> As I mentioned in HIVE-25561, when tez turns on speculative execution, the 
> data file produced by hive may be duplicated. I mentioned in HIVE-25561 that 
> if the speculatively executed task is killed, some data may be submitted 
> unexpectedly. However, after HIVE-25561, there is still a situation that has 
> not been solved. If two task attempts commit file at the same time, the 
> problem of duplicate data files may also occur. Although the probability of 
> this happening is very, very low, it does happen.
>  
> Why?
> There are two key steps:
> (1)FileSinkOperator::closeOp
> TezProcessor::initializeAndRunProcessor --> ... --> FileSinkOperator::closeOp 
> --> fsp.commit
> When the OP is closed, the process of closing the OP will be triggered, and 
> eventually the call to fsp.commit will be triggered.
> (2)removeTempOrDuplicateFiles
> (2.a)Firstly, listStatus the files in the temporary directory. 
> (2.b)Secondly check whether there are multiple incorrect commit, and finally 
> move the correct results to the final directory.
> When speculative execution is enabled, when one attempt of a Task is 
> completed, other attempts will be killed. However, AM only sends the kill 
> event and does not ensure that all cleanup actions are completed, that is, 
> closeOp may be executed between 2.a and 2.b. Therefore, 
> removeTempOrDuplicateFiles will not delete the file generated by the kill 
> attempt.
> How?
> The problem is that both speculatively executed tasks commit the file. This 
> will not happen in the Tez examples because they will try canCommit, which 
> can guarantee that one and only one task attempt commit successfully. If one 
> task attempt executes canCommti successfully, the other one will be stuck by 
> canCommit until it receives a kill signal.
> detail see: 
> [https://github.com/apache/tez/blob/51d6f53967110e2b91b6d90b46f8e16bdc062091/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/processor/SimpleMRProcessor.java#L70]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27904) Add fiexed-length column serde

2023-11-22 Thread Jiamin Wang (Jira)
Jiamin Wang created HIVE-27904:
--

 Summary: Add fiexed-length column serde
 Key: HIVE-27904
 URL: https://issues.apache.org/jira/browse/HIVE-27904
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Jiamin Wang


Hive does not support setting the column delimiter as a space. Some systems 
require storing files in fixed-length format. I am thinking that maybe we can 
add this feature. I can submit the code, and create tables like this.

```sql

CREATE TABLE fixed_length_table (
  column1 STRING,
  column2 STRING,
  column3 STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.fixed.FixedLengthTextSerDe'
WITH SERDEPROPERTIES (
  "field.lengths"="10,5,8"
)
STORED AS TEXTFILE;

```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27901) Hive's performance for querying the Iceberg table is very poor.

2023-11-22 Thread yongzhi.shao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yongzhi.shao updated HIVE-27901:

Description: 
I am using HIVE4.0.0-BETA for testing.

BTW,I found that the performance of HIVE reading ICEBERG table is still very 
slow.

How should I deal with this problem?

I count a 7 billion table and compare the performance difference between HIVE 
reading ICEBERG-ORC and ORC table respectively.

We use ICEBERG 1.4.2, ICEBERG-ORC with ZSTD compression enabled.

ORC with SNAPPY compression.

HADOOP version 3.1.1 (native zstd not supported).
{code:java}
--spark3.4.1+iceberg 1.4.2
CREATE TABLE datacenter.dwd.b_std_trade (
  uni_order_id STRING,
  data_from BIGINT,
  partner STRING,
  plat_code STRING,
  order_id STRING,
  uni_shop_id STRING,
  uni_id STRING,
  guide_id STRING,
  shop_id STRING,
  plat_account STRING,
  total_fee DOUBLE,
  item_discount_fee DOUBLE,
  trade_discount_fee DOUBLE,
  adjust_fee DOUBLE,
  post_fee DOUBLE,
  discount_rate DOUBLE,
  payment_no_postfee DOUBLE,
  payment DOUBLE,
  pay_time STRING,
  product_num BIGINT,
  order_status STRING,
  is_refund STRING,
  refund_fee DOUBLE,
  insert_time STRING,
  created STRING,
  endtime STRING,
  modified STRING,
  trade_type STRING,
  receiver_name STRING,
  receiver_country STRING,
  receiver_state STRING,
  receiver_city STRING,
  receiver_district STRING,
  receiver_town STRING,
  receiver_address STRING,
  receiver_mobile STRING,
  trade_source STRING,
  delivery_type STRING,
  consign_time STRING,
  orders_num BIGINT,
  is_presale BIGINT,
  presale_status STRING,
  first_fee_paytime STRING,
  last_fee_paytime STRING,
  first_paid_fee DOUBLE,
  tenant STRING,
  tidb_modified STRING,
  step_paid_fee DOUBLE,
  seller_flag STRING,
  is_used_store_card BIGINT,
  store_card_used DOUBLE,
  store_card_basic_used DOUBLE,
  store_card_expand_used DOUBLE,
  order_promotion_num BIGINT,
  item_promotion_num BIGINT,
  buyer_remark STRING,
  seller_remark STRING,
  trade_business_type STRING)
USING iceberg
PARTITIONED BY (uni_shop_id, truncate(4, created))
LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
TBLPROPERTIES (
  'current-snapshot-id' = '7217819472703702905',
  'format' = 'iceberg/orc',
  'format-version' = '1',
  'hive.stored-as' = 'iceberg',
  'read.orc.vectorization.enabled' = 'true',
  'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
  'write.distribution-mode' = 'hash',
  'write.format.default' = 'orc',
  'write.metadata.delete-after-commit.enabled' = 'true',
  'write.metadata.previous-versions-max' = '3',
  'write.orc.bloom.filter.columns' = 'order_id',
  'write.orc.compression-codec' = 'zstd')



--hive-iceberg
CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); 

--inner orc table( set hive default format = orc )
set hive.default.fileformat=orc;
set hive.default.fileformat.managed=orc;
create table if not exists iceberg_dwd.orc_inner_table as select * from 
iceberg_dwd.b_std_trade;{code}
 

!image-2023-11-22-18-32-28-344.png!

!image-2023-11-22-18-33-01-885.png!

Also, I have another question. The Submit Plan statistic is clearly incorrect. 
Is this something that needs to be fixed?

!image-2023-11-22-18-33-32-915.png!

 

  was:
I am using HIVE4.0.0-BETA for testing.

BTW,I found that the performance of HIVE reading ICEBERG table is still very 
slow.

How should I deal with this problem?

I count a 7 billion table and compare the performance difference between HIVE 
reading ICEBERG-ORC and ORC table respectively.

We use ICEBERG 1.4.2, ICEBERG-ORC with ZSTD compression enabled.

ORC with SNAPPY compression.

HADOOP version 3.1.1 (native zstd not supported).
{code:java}
--spark3.4.1+iceberg 1.4.2
CREATE TABLE datacenter.dwd.b_std_trade (
  uni_order_id STRING,
  data_from BIGINT,
  partner STRING,
  plat_code STRING,
  order_id STRING,
  uni_shop_id STRING,
  uni_id STRING,
  guide_id STRING,
  shop_id STRING,
  plat_account STRING,
  total_fee DOUBLE,
  item_discount_fee DOUBLE,
  trade_discount_fee DOUBLE,
  adjust_fee DOUBLE,
  post_fee DOUBLE,
  discount_rate DOUBLE,
  payment_no_postfee DOUBLE,
  payment DOUBLE,
  pay_time STRING,
  product_num BIGINT,
  order_status STRING,
  is_refund STRING,
  refund_fee DOUBLE,
  insert_time STRING,
  created STRING,
  endtime STRING,
  modified STRING,
  trade_type STRING,
  receiver_name STRING,
  receiver_country STRING,
  receiver_state STRING,
  receiver_city STRING,
  receiver_district STRING,
  receiver_town STRING,
  receiver_address STRING,
  receiver_mobile STRING,
  trade_source STRING,
  delivery_type STRING,
  consign_time STRING,
  orders_num BIGINT,
  is_presale BIGINT,
  presale_status STRING,
  first_fee_paytime STRING,
  last_fee_paytim

[jira] [Updated] (HIVE-27901) Hive's performance for querying the Iceberg table is very poor.

2023-11-22 Thread yongzhi.shao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yongzhi.shao updated HIVE-27901:

Description: 
I am using HIVE4.0.0-BETA for testing.

BTW,I found that the performance of HIVE reading ICEBERG table is still very 
slow.

How should I deal with this problem?

I count a 7 billion table and compare the performance difference between HIVE 
reading ICEBERG-ORC and ORC table respectively.

We use ICEBERG 1.4.2, ICEBERG-ORC with ZSTD compression enabled.

ORC with SNAPPY compression.

HADOOP version 3.1.1 (native zstd not supported).
{code:java}
--spark3.4.1+iceberg 1.4.2
CREATE TABLE datacenter.dwd.b_std_trade (
  uni_order_id STRING,
  data_from BIGINT,
  partner STRING,
  plat_code STRING,
  order_id STRING,
  uni_shop_id STRING,
  uni_id STRING,
  guide_id STRING,
  shop_id STRING,
  plat_account STRING,
  total_fee DOUBLE,
  item_discount_fee DOUBLE,
  trade_discount_fee DOUBLE,
  adjust_fee DOUBLE,
  post_fee DOUBLE,
  discount_rate DOUBLE,
  payment_no_postfee DOUBLE,
  payment DOUBLE,
  pay_time STRING,
  product_num BIGINT,
  order_status STRING,
  is_refund STRING,
  refund_fee DOUBLE,
  insert_time STRING,
  created STRING,
  endtime STRING,
  modified STRING,
  trade_type STRING,
  receiver_name STRING,
  receiver_country STRING,
  receiver_state STRING,
  receiver_city STRING,
  receiver_district STRING,
  receiver_town STRING,
  receiver_address STRING,
  receiver_mobile STRING,
  trade_source STRING,
  delivery_type STRING,
  consign_time STRING,
  orders_num BIGINT,
  is_presale BIGINT,
  presale_status STRING,
  first_fee_paytime STRING,
  last_fee_paytime STRING,
  first_paid_fee DOUBLE,
  tenant STRING,
  tidb_modified STRING,
  step_paid_fee DOUBLE,
  seller_flag STRING,
  is_used_store_card BIGINT,
  store_card_used DOUBLE,
  store_card_basic_used DOUBLE,
  store_card_expand_used DOUBLE,
  order_promotion_num BIGINT,
  item_promotion_num BIGINT,
  buyer_remark STRING,
  seller_remark STRING,
  trade_business_type STRING)
USING iceberg
PARTITIONED BY (uni_shop_id, truncate(4, created))
LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
TBLPROPERTIES (
  'current-snapshot-id' = '7217819472703702905',
  'format' = 'iceberg/orc',
  'format-version' = '1',
  'hive.stored-as' = 'iceberg',
  'read.orc.vectorization.enabled' = 'true',
  'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
  'write.distribution-mode' = 'hash',
  'write.format.default' = 'orc',
  'write.metadata.delete-after-commit.enabled' = 'true',
  'write.metadata.previous-versions-max' = '3',
  'write.orc.bloom.filter.columns' = 'order_id',
  'write.orc.compression-codec' = 'zstd')



--hive-iceberg
CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); 

--inner orc table( set hive default format = orc )
create table if not exists iceberg_dwd.orc_inner_table as select * from 
iceberg_dwd.b_std_trade;{code}
 

!image-2023-11-22-18-32-28-344.png!

!image-2023-11-22-18-33-01-885.png!

Also, I have another question. The Submit Plan statistic is clearly incorrect. 
Is this something that needs to be fixed?

!image-2023-11-22-18-33-32-915.png!

 

  was:
I am using HIVE4.0.0-BETA for testing.

BTW,I found that the performance of HIVE reading ICEBERG table is still very 
slow.

How should I deal with this problem?

I count a 7 billion table and compare the performance difference between HIVE 
reading ICEBERG-ORC and ORC table respectively.

We use ICEBERG 1.4.2, ICEBERG-ORC with ZSTD compression enabled.

ORC with SNAPPY compression.

HADOOP version 3.1.1 (native zstd not supported).

!image-2023-11-22-18-32-28-344.png!

!image-2023-11-22-18-33-01-885.png!

Also, I have another question. The Submit Plan statistic is clearly incorrect. 
Is this something that needs to be fixed?

!image-2023-11-22-18-33-32-915.png!

 


> Hive's performance for querying the Iceberg table is very poor.
> ---
>
> Key: HIVE-27901
> URL: https://issues.apache.org/jira/browse/HIVE-27901
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Major
> Attachments: image-2023-11-22-18-32-28-344.png, 
> image-2023-11-22-18-33-01-885.png, image-2023-11-22-18-33-32-915.png
>
>
> I am using HIVE4.0.0-BETA for testing.
> BTW,I found that the performance of HIVE reading ICEBERG table is still very 
> slow.
> How should I deal with this problem?
> I count a 7 billion table and compare the performance difference between HIVE 
> reading ICEBERG-ORC and ORC table respectively.
> We use ICEBERG 1.4.

[jira] [Updated] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2023-11-22 Thread yongzhi.shao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yongzhi.shao updated HIVE-27898:

Description: 
Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
table in the subquery, we can't get any data in the end.

I have used HIVE3-TEZ for cross validation and HIVE3 does not have this problem 
when querying ICEBERG.
{code:java}
--spark3.4.1+iceberg 1.4.2
CREATE TABLE datacenter.dwd.b_std_trade (
  uni_order_id STRING,
  data_from BIGINT,
  partner STRING,
  plat_code STRING,
  order_id STRING,
  uni_shop_id STRING,
  uni_id STRING,
  guide_id STRING,
  shop_id STRING,
  plat_account STRING,
  total_fee DOUBLE,
  item_discount_fee DOUBLE,
  trade_discount_fee DOUBLE,
  adjust_fee DOUBLE,
  post_fee DOUBLE,
  discount_rate DOUBLE,
  payment_no_postfee DOUBLE,
  payment DOUBLE,
  pay_time STRING,
  product_num BIGINT,
  order_status STRING,
  is_refund STRING,
  refund_fee DOUBLE,
  insert_time STRING,
  created STRING,
  endtime STRING,
  modified STRING,
  trade_type STRING,
  receiver_name STRING,
  receiver_country STRING,
  receiver_state STRING,
  receiver_city STRING,
  receiver_district STRING,
  receiver_town STRING,
  receiver_address STRING,
  receiver_mobile STRING,
  trade_source STRING,
  delivery_type STRING,
  consign_time STRING,
  orders_num BIGINT,
  is_presale BIGINT,
  presale_status STRING,
  first_fee_paytime STRING,
  last_fee_paytime STRING,
  first_paid_fee DOUBLE,
  tenant STRING,
  tidb_modified STRING,
  step_paid_fee DOUBLE,
  seller_flag STRING,
  is_used_store_card BIGINT,
  store_card_used DOUBLE,
  store_card_basic_used DOUBLE,
  store_card_expand_used DOUBLE,
  order_promotion_num BIGINT,
  item_promotion_num BIGINT,
  buyer_remark STRING,
  seller_remark STRING,
  trade_business_type STRING)
USING iceberg
PARTITIONED BY (uni_shop_id, truncate(4, created))
LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
TBLPROPERTIES (
  'current-snapshot-id' = '7217819472703702905',
  'format' = 'iceberg/orc',
  'format-version' = '1',
  'hive.stored-as' = 'iceberg',
  'read.orc.vectorization.enabled' = 'true',
  'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
  'write.distribution-mode' = 'hash',
  'write.format.default' = 'orc',
  'write.metadata.delete-after-commit.enabled' = 'true',
  'write.metadata.previous-versions-max' = '3',
  'write.orc.bloom.filter.columns' = 'order_id',
  'write.orc.compression-codec' = 'zstd')



--hive-iceberg

 CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');


select * from iceberg_dwd.b_std_trade 
where uni_shop_id = 'TEST|1' limit 10  --10 rows


select *
from ( 
select * from iceberg_dwd.b_std_trade 
where uni_shop_id = 'TEST|1' limit 10
) t1;   --10 rows


select uni_shop_id
from ( 
select * from iceberg_dwd.b_std_trade 
where uni_shop_id = 'TEST|1' limit 10
) t1;  --0 rows


select uni_shop_id
from ( 
select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
where uni_shop_id = 'TEST|1' limit 10
) t1;  --0 rows


--orc
select uni_shop_id
from ( 
select * from iceberg_dwd.trade_test 
where uni_shop_id = 'TEST|1' limit 10
) t1;--10 ROWS{code}
 

  was:
Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
table in the subquery, we can't get any data in the end.

I have used HIVE3-TEZ for cross validation and HIVE3 does not have this problem 
when querying ICEBERG.
{code:java}
--iceberg
select * from iceberg_dwd.b_std_trade 
where uni_shop_id = 'TEST|1' limit 10  --10 rows


select *
from ( 
select * from iceberg_dwd.b_std_trade 
where uni_shop_id = 'TEST|1' limit 10
) t1;   --10 rows


select uni_shop_id
from ( 
select * from iceberg_dwd.b_std_trade 
where uni_shop_id = 'TEST|1' limit 10
) t1;  --0 rows


select uni_shop_id
from ( 
select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
where uni_shop_id = 'TEST|1' limit 10
) t1;  --0 rows


--orc
select uni_shop_id
from ( 
select * from iceberg_dwd.trade_test 
where uni_shop_id = 'TEST|1' limit 10
) t1;--10 ROWS{code}
 


> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying

[jira] [Updated] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2023-11-22 Thread yongzhi.shao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yongzhi.shao updated HIVE-27898:

Description: 
Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
table in the subquery, we can't get any data in the end.

I have used HIVE3-TEZ for cross validation and HIVE3 does not have this problem 
when querying ICEBERG.
{code:java}
--spark3.4.1+iceberg 1.4.2
CREATE TABLE datacenter.dwd.b_std_trade (
  uni_order_id STRING,
  data_from BIGINT,
  partner STRING,
  plat_code STRING,
  order_id STRING,
  uni_shop_id STRING,
  uni_id STRING,
  guide_id STRING,
  shop_id STRING,
  plat_account STRING,
  total_fee DOUBLE,
  item_discount_fee DOUBLE,
  trade_discount_fee DOUBLE,
  adjust_fee DOUBLE,
  post_fee DOUBLE,
  discount_rate DOUBLE,
  payment_no_postfee DOUBLE,
  payment DOUBLE,
  pay_time STRING,
  product_num BIGINT,
  order_status STRING,
  is_refund STRING,
  refund_fee DOUBLE,
  insert_time STRING,
  created STRING,
  endtime STRING,
  modified STRING,
  trade_type STRING,
  receiver_name STRING,
  receiver_country STRING,
  receiver_state STRING,
  receiver_city STRING,
  receiver_district STRING,
  receiver_town STRING,
  receiver_address STRING,
  receiver_mobile STRING,
  trade_source STRING,
  delivery_type STRING,
  consign_time STRING,
  orders_num BIGINT,
  is_presale BIGINT,
  presale_status STRING,
  first_fee_paytime STRING,
  last_fee_paytime STRING,
  first_paid_fee DOUBLE,
  tenant STRING,
  tidb_modified STRING,
  step_paid_fee DOUBLE,
  seller_flag STRING,
  is_used_store_card BIGINT,
  store_card_used DOUBLE,
  store_card_basic_used DOUBLE,
  store_card_expand_used DOUBLE,
  order_promotion_num BIGINT,
  item_promotion_num BIGINT,
  buyer_remark STRING,
  seller_remark STRING,
  trade_business_type STRING)
USING iceberg
PARTITIONED BY (uni_shop_id, truncate(4, created))
LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
TBLPROPERTIES (
  'current-snapshot-id' = '7217819472703702905',
  'format' = 'iceberg/orc',
  'format-version' = '1',
  'hive.stored-as' = 'iceberg',
  'read.orc.vectorization.enabled' = 'true',
  'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
  'write.distribution-mode' = 'hash',
  'write.format.default' = 'orc',
  'write.metadata.delete-after-commit.enabled' = 'true',
  'write.metadata.previous-versions-max' = '3',
  'write.orc.bloom.filter.columns' = 'order_id',
  'write.orc.compression-codec' = 'zstd')



--hive-iceberg

 CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');


select * from iceberg_dwd.b_std_trade 
where uni_shop_id = 'TEST|1' limit 10  --10 rows


select *
from ( 
select * from iceberg_dwd.b_std_trade 
where uni_shop_id = 'TEST|1' limit 10
) t1;   --10 rows


select uni_shop_id
from ( 
select * from iceberg_dwd.b_std_trade 
where uni_shop_id = 'TEST|1' limit 10
) t1;  --0 rows


select uni_shop_id
from ( 
select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
where uni_shop_id = 'TEST|1' limit 10
) t1;  --0 rows


--hive-orc
select uni_shop_id
from ( 
select * from iceberg_dwd.trade_test 
where uni_shop_id = 'TEST|1' limit 10
) t1;--10 ROWS{code}
 

  was:
Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
table in the subquery, we can't get any data in the end.

I have used HIVE3-TEZ for cross validation and HIVE3 does not have this problem 
when querying ICEBERG.
{code:java}
--spark3.4.1+iceberg 1.4.2
CREATE TABLE datacenter.dwd.b_std_trade (
  uni_order_id STRING,
  data_from BIGINT,
  partner STRING,
  plat_code STRING,
  order_id STRING,
  uni_shop_id STRING,
  uni_id STRING,
  guide_id STRING,
  shop_id STRING,
  plat_account STRING,
  total_fee DOUBLE,
  item_discount_fee DOUBLE,
  trade_discount_fee DOUBLE,
  adjust_fee DOUBLE,
  post_fee DOUBLE,
  discount_rate DOUBLE,
  payment_no_postfee DOUBLE,
  payment DOUBLE,
  pay_time STRING,
  product_num BIGINT,
  order_status STRING,
  is_refund STRING,
  refund_fee DOUBLE,
  insert_time STRING,
  created STRING,
  endtime STRING,
  modified STRING,
  trade_type STRING,
  receiver_name STRING,
  receiver_country STRING,
  receiver_state STRING,
  receiver_city STRING,
  receiver_district STRING,
  receiver_town STRING,
  receiver_address STRING,
  receiver_mobile STRING,
  trade_source STRING,
  delivery_type STRING,
  consign_time STRING,
  orders_num BIGINT,
  is_presale BIGINT,
  presale_status STRING,
  first_fee_paytime STRING,
  last_fee_paytime STRING,
  first_paid_fee DOUBLE,
  tenant STRING,
  tidb_modified STRING,
  step_paid_fee DOUBLE,
  seller_flag STRING,
  is_used_store_card BIGINT,
  store_card_used DOUBLE,
  store_card_basic_used DOUBLE,
  store_card_expand_

[jira] [Updated] (HIVE-27900) hive can not read iceberg-parquet table

2023-11-22 Thread yongzhi.shao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yongzhi.shao updated HIVE-27900:

Description: 
We found that using HIVE4-BETA version, we could not query the Iceberg-Parquet 
table with vectorised execution turned on.
{code:java}
--spark-sql(3.4.1+iceberg 1.4.2)
CREATE TABLE local.test.b_qqd_shop_rfm_parquet_snappy (
a string,b string,c string)
USING iceberg
LOCATION '/iceberg-catalog/warehouse/test/b_qqd_shop_rfm_parquet_snappy'
TBLPROPERTIES (
  'current-snapshot-id' = '5138351937447353683',
  'format' = 'iceberg/parquet',
  'format-version' = '2',
  'read.orc.vectorization.enabled' = 'true',
  'write.format.default' = 'parquet',
  'write.metadata.delete-after-commit.enabled' = 'true',
  'write.metadata.previous-versions-max' = '3',
  'write.parquet.compression-codec' = 'snappy');



--hive-sql
CREATE EXTERNAL TABLE iceberg_dwd.b_qqd_shop_rfm_parquet_snappy
STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
LOCATION 
'hdfs://xxx/iceberg-catalog/warehouse/test/b_qqd_shop_rfm_parquet_snappy/'
TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');


set hive.default.fileformat=orc;
set hive.default.fileformat.managed=orc;
create table test_parquet_as_orc as select * from b_qqd_shop_rfm_parquet_snappy 
limit 100;






, TaskAttempt 2 failed, info=[Error: Node: /xxx..xx.xx: Error while 
running task ( failure ) : 
attempt_1696729618575_69586_1_00_00_2:java.lang.RuntimeException: 
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
... 16 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:993)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
... 19 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.vector.reducesink.VectorReduceSinkEmptyKeyOperator.process(VectorReduceSinkEmptyKeyOperator.java:137)
at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)
at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorLimitOperator.process(VectorLimitOperator.java:108)
at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:171)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:809)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:878)
... 20 more
Caused by: java.lang.NullPointerEx

[jira] [Commented] (HIVE-23354) Remove file size sanity checking from compareTempOrDuplicateFiles

2023-11-22 Thread Chenyu Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17788956#comment-17788956
 ] 

Chenyu Zheng commented on HIVE-23354:
-

[~jfs] [~pvary] [~kuczoram] [~nareshpr] 

Hi, I know we disable speculative execution for removeTempOrDuplicateFiles in 
this issue.

 

I found some problem about "Hive on tez" when tez speculative execution is 
enable. By my debug, I found some problem, and submit the issue HIVE-25561 and 
HIVE-27899. And I explain the reason in the two issues.

 

       _"That might be problematic when there are speculative execution on the 
way, and the original execution is finished, but the newest/speculative 
execution is still running"_

I think after HIVE-25561 and HIVE-27899, this is not a problem, at least on tez.

After HIVE-25561, If the original execution is killed, the generated file will 
not be commtited, the generated will be a temp file. removeTempOrDuplicateFiles 
will delete the temp file.

After HIVE-25561, if the original execution is finished with success state, the 
generated file will be committed. the speculative task attempt will stuck until 
received kill signal, will never trigger to commit file, so the file generated 
by speculative task is tmp file, will be deleted by removeTempOrDuplicateFiles.

 
What do I think? Can we enable the speculative execution? After all, 
speculative execution is crucial in large-scale production environments.
Looking forward to your reply!
 

> Remove file size sanity checking from compareTempOrDuplicateFiles
> -
>
> Key: HIVE-23354
> URL: https://issues.apache.org/jira/browse/HIVE-23354
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: John Sherman
>Assignee: John Sherman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-1
>
> Attachments: HIVE-23354.1.patch, HIVE-23354.2.patch, 
> HIVE-23354.3.patch, HIVE-23354.4.patch, HIVE-23354.5.patch, 
> HIVE-23354.6.patch, HIVE-23354.7.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/hive/blob/cdd55aa319a3440963a886ebfff11cd2a240781d/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L1952-L2010]
>  compareTempOrDuplicateFiles uses a combination of attemptId and fileSize to 
> determine which file(s) to keep.
>  I've seen instances where this function throws an exception due to the fact 
> that the newer attemptId file size is less than the older attemptId (thus 
> failing the query).
>  I think this assumption is faulty, due to various factors such as file 
> compression and the order in which values are written. It may be prudent to 
> trust that the newest attemptId is in fact the best choice.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27903) TBLPROPERTIES('history.expire.max-snapshot-age-ms') doesn't work

2023-11-22 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17788895#comment-17788895
 ] 

Ayush Saxena commented on HIVE-27903:
-

Should work post: HIVE-27789

If you specify the timestamp via command the property won't be used. The 
official iceberg documentation says that as well

``

if {{older_than}} and {{retain_last}} are omitted, the table’s [expiration 
properties|https://iceberg.apache.org/docs/latest/configuration/#table-behavior-properties]
 will be used.

``

So, try with RETAIN LAST & this config should take effect. It isn't supposed to 
work when you specify older_than timestamp, & in your example you specified the 
older_than as ('2200-10-10')

> TBLPROPERTIES('history.expire.max-snapshot-age-ms') doesn't work
> 
>
> Key: HIVE-27903
> URL: https://issues.apache.org/jira/browse/HIVE-27903
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0-alpha-2
>Reporter: JK Pasimuthu
>Priority: Major
>
> [https://github.com/apache/iceberg/issues/9123]
> The 'history.expire.max-snapshot-age-ms' option doesn't have any effect while 
> expiring snapshots.
>  #  
> CREATE TABLE IF NOT EXISTS test5d78b6 (
> id INT, random1 STRING
> )
> PARTITIONED BY (random2 STRING)
> STORED BY ICEBERG
> TBLPROPERTIES (
> 'write.format.default'='orc',
> 'format-version'='2',
> 'write.orc.compression-codec'='none'
> )
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # SLEEP for 30 seconds
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # SELECT (UNIX_TIMESTAMP(CURRENT_TIMESTAMP) - UNIX_TIMESTAMP('2023-10-09 
> 13:23:54.455')) * 1000;
>  # ALTER TABLE test5d78b6 SET 
> tblproperties('history.expire.max-snapshot-age-ms'='54000'); - the elapsed 
> time in ms from the second insert and current time
>  # ALTER TABLE test5d78b6 EXECUTE expire_snapshots('2200-10-10');
>  # SELECT COUNT FROM default.test5d78b6.snapshots;
> output: 1. it should be 2 rows. The default 1 is retained an all snapshots 
> are expired as usual, so setting the property has no effect.
> Additional Info: the default value for 'history.expire.max-snapshot-age-ms' 
> is 5 days per this link: 
> [https://iceberg.apache.org/docs/1.3.1/configuration/]
> Now while writing the tests and running them, the expiring snapshots just 
> worked fine within few seconds of the snapshots being created.
> So, I'm assuming that this option doesn't have any effect right now. Having 
> said that, I'm thinking the implications on end user will have if we fix this.
> The end user may not know about this option at all and will have tough time 
> figuring out why the snapshots are not getting expired. One option could be 
> to set the default to 0ms.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27903) TBLPROPERTIES('history.expire.max-snapshot-age-ms') doesn't work

2023-11-22 Thread JK Pasimuthu (Jira)
JK Pasimuthu created HIVE-27903:
---

 Summary: TBLPROPERTIES('history.expire.max-snapshot-age-ms') 
doesn't work
 Key: HIVE-27903
 URL: https://issues.apache.org/jira/browse/HIVE-27903
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Affects Versions: 4.0.0-alpha-2
Reporter: JK Pasimuthu


[https://github.com/apache/iceberg/issues/9123]

The 'history.expire.max-snapshot-age-ms' option doesn't have any effect while 
expiring snapshots.
 #  

CREATE TABLE IF NOT EXISTS test5d78b6 (
id INT, random1 STRING
)
PARTITIONED BY (random2 STRING)
STORED BY ICEBERG
TBLPROPERTIES (
'write.format.default'='orc',
'format-version'='2',
'write.orc.compression-codec'='none'

)
 # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, uuid(), 
uuid() FROM test5d78b6

 # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, uuid(), 
uuid() FROM test5d78b6

 # SLEEP for 30 seconds

 # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, uuid(), 
uuid() FROM test5d78b6

 # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, uuid(), 
uuid() FROM test5d78b6

 # SELECT (UNIX_TIMESTAMP(CURRENT_TIMESTAMP) - UNIX_TIMESTAMP('2023-10-09 
13:23:54.455')) * 1000;

 # ALTER TABLE test5d78b6 SET 
tblproperties('history.expire.max-snapshot-age-ms'='54000'); - the elapsed time 
in ms from the second insert and current time

 # ALTER TABLE test5d78b6 EXECUTE expire_snapshots('2200-10-10');

 # SELECT COUNT FROM default.test5d78b6.snapshots;

output: 1. it should be 2 rows. The default 1 is retained an all snapshots are 
expired as usual, so setting the property has no effect.

Additional Info: the default value for 'history.expire.max-snapshot-age-ms' is 
5 days per this link: [https://iceberg.apache.org/docs/1.3.1/configuration/]

Now while writing the tests and running them, the expiring snapshots just 
worked fine within few seconds of the snapshots being created.

So, I'm assuming that this option doesn't have any effect right now. Having 
said that, I'm thinking the implications on end user will have if we fix this.

The end user may not know about this option at all and will have tough time 
figuring out why the snapshots are not getting expired. One option could be to 
set the default to 0ms.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27902) Rewrite Update with empty Where clause to IOW

2023-11-22 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27902:
--
Labels: ACID iceberg  (was: )

> Rewrite Update with empty Where clause to IOW
> -
>
> Key: HIVE-27902
> URL: https://issues.apache.org/jira/browse/HIVE-27902
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0-beta-1
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: ACID, iceberg
>
> rewrite 
> {code}
> update table mytbl set a = a+5 
> {code}
> with 
> {code}insert overwrite table mytbl as select a+5 from mytbl
> {code}
> note: in case of Iceberg tables it should take care of partition evolution 
> and overwrite all



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27902) Rewrite Update with empty Where clause to IOW

2023-11-22 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27902:
--
Description: 
rewrite 
{code}
update table mytbl set a = a+5 
{code}
with 
{code}insert overwrite table mytbl as select a+5 from mytbl
{code}
note: in case of Iceberg tables it should take care of partition evolution and 
overwrite all

  was:
rewrite 
{code}
update table mytbl set a = a+5 
{code}
with 
{code}insert overwrite table mytbl as select a+5 from mytbl
{code}


> Rewrite Update with empty Where clause to IOW
> -
>
> Key: HIVE-27902
> URL: https://issues.apache.org/jira/browse/HIVE-27902
> Project: Hive
>  Issue Type: Improvement
>Reporter: Denys Kuzmenko
>Priority: Major
>
> rewrite 
> {code}
> update table mytbl set a = a+5 
> {code}
> with 
> {code}insert overwrite table mytbl as select a+5 from mytbl
> {code}
> note: in case of Iceberg tables it should take care of partition evolution 
> and overwrite all



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27902) Rewrite Update with empty Where clause to IOW

2023-11-22 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27902:
--
Affects Version/s: 4.0.0-beta-1

> Rewrite Update with empty Where clause to IOW
> -
>
> Key: HIVE-27902
> URL: https://issues.apache.org/jira/browse/HIVE-27902
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0-beta-1
>Reporter: Denys Kuzmenko
>Priority: Major
>
> rewrite 
> {code}
> update table mytbl set a = a+5 
> {code}
> with 
> {code}insert overwrite table mytbl as select a+5 from mytbl
> {code}
> note: in case of Iceberg tables it should take care of partition evolution 
> and overwrite all



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27687) Logger variable should be static final as its creation takes more time in query compilation

2023-11-22 Thread Ramesh Kumar Thangarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramesh Kumar Thangarajan resolved HIVE-27687.
-
Resolution: Fixed

> Logger variable should be static final as its creation takes more time in 
> query compilation
> ---
>
> Key: HIVE-27687
> URL: https://issues.apache.org/jira/browse/HIVE-27687
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Screenshot 2023-09-12 at 5.03.31 PM.png
>
>
> In query compilation, 
> LoggerFactory.getLogger() seems to take up more time. Some of the serde 
> classes use non static variable for Logger that forces the getLogger() call 
> for each of the class creation.
> Making Logger variable static final will avoid this code path for every serde 
> class construction.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27687) Logger variable should be static final as its creation takes more time in query compilation

2023-11-22 Thread Ramesh Kumar Thangarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17788845#comment-17788845
 ] 

Ramesh Kumar Thangarajan commented on HIVE-27687:
-

[~zabetak] Thanks, marked it.

> Logger variable should be static final as its creation takes more time in 
> query compilation
> ---
>
> Key: HIVE-27687
> URL: https://issues.apache.org/jira/browse/HIVE-27687
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Screenshot 2023-09-12 at 5.03.31 PM.png
>
>
> In query compilation, 
> LoggerFactory.getLogger() seems to take up more time. Some of the serde 
> classes use non static variable for Logger that forces the getLogger() call 
> for each of the class creation.
> Making Logger variable static final will avoid this code path for every serde 
> class construction.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HIVE-27687) Logger variable should be static final as its creation takes more time in query compilation

2023-11-22 Thread Ramesh Kumar Thangarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramesh Kumar Thangarajan closed HIVE-27687.
---

> Logger variable should be static final as its creation takes more time in 
> query compilation
> ---
>
> Key: HIVE-27687
> URL: https://issues.apache.org/jira/browse/HIVE-27687
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Screenshot 2023-09-12 at 5.03.31 PM.png
>
>
> In query compilation, 
> LoggerFactory.getLogger() seems to take up more time. Some of the serde 
> classes use non static variable for Logger that forces the getLogger() call 
> for each of the class creation.
> Making Logger variable static final will avoid this code path for every serde 
> class construction.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27902) Rewrite Update with empty Where clause to IOW

2023-11-22 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27902:
--
Description: 
rewrite 
{code}
update table mytbl set a = a+5 
{code}
with 
{code}insert overwrite table mytbl as select a+5 from mytbl
{code}

  was:
rewrite 
{code}
update table mytbl set a = a+5 
{code}
with 
{code}insert overwrite table mytbl select a+5 from mytbl
{code}


> Rewrite Update with empty Where clause to IOW
> -
>
> Key: HIVE-27902
> URL: https://issues.apache.org/jira/browse/HIVE-27902
> Project: Hive
>  Issue Type: Improvement
>Reporter: Denys Kuzmenko
>Priority: Major
>
> rewrite 
> {code}
> update table mytbl set a = a+5 
> {code}
> with 
> {code}insert overwrite table mytbl as select a+5 from mytbl
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27902) Rewrite Update with empty Where clause to IOW

2023-11-22 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27902:
--
Description: 
rewrite 
{code}
update table mytbl set mytbl.a = a+5 
{code}
with 
{code}insert overwrite table mytbl select a+5 from mytbl
{code}

  was:
rewrite 
{code}
update table mytbl set mytbl.a+5 
{code}
with 
{code}insert overwrite table mytbl select a+5 from mytbl
{code}


> Rewrite Update with empty Where clause to IOW
> -
>
> Key: HIVE-27902
> URL: https://issues.apache.org/jira/browse/HIVE-27902
> Project: Hive
>  Issue Type: Improvement
>Reporter: Denys Kuzmenko
>Priority: Major
>
> rewrite 
> {code}
> update table mytbl set mytbl.a = a+5 
> {code}
> with 
> {code}insert overwrite table mytbl select a+5 from mytbl
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27902) Rewrite Update with empty Where clause to IOW

2023-11-22 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27902:
--
Description: 
rewrite 
{code}
update table mytbl set a = a+5 
{code}
with 
{code}insert overwrite table mytbl select a+5 from mytbl
{code}

  was:
rewrite 
{code}
update table mytbl set mytbl.a = a+5 
{code}
with 
{code}insert overwrite table mytbl select a+5 from mytbl
{code}


> Rewrite Update with empty Where clause to IOW
> -
>
> Key: HIVE-27902
> URL: https://issues.apache.org/jira/browse/HIVE-27902
> Project: Hive
>  Issue Type: Improvement
>Reporter: Denys Kuzmenko
>Priority: Major
>
> rewrite 
> {code}
> update table mytbl set a = a+5 
> {code}
> with 
> {code}insert overwrite table mytbl select a+5 from mytbl
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27902) Rewrite Update with empty Where clause to IOW

2023-11-22 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27902:
--
Description: 
rewrite 
{code}
update table mytbl set mytbl.a+5 
{code}
with 
{code}insert overwrite table mytbl select a+5 from mytbl
{code}

> Rewrite Update with empty Where clause to IOW
> -
>
> Key: HIVE-27902
> URL: https://issues.apache.org/jira/browse/HIVE-27902
> Project: Hive
>  Issue Type: Improvement
>Reporter: Denys Kuzmenko
>Priority: Major
>
> rewrite 
> {code}
> update table mytbl set mytbl.a+5 
> {code}
> with 
> {code}insert overwrite table mytbl select a+5 from mytbl
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27687) Logger variable should be static final as its creation takes more time in query compilation

2023-11-22 Thread Ramesh Kumar Thangarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramesh Kumar Thangarajan updated HIVE-27687:

Fix Version/s: 4.0.0

> Logger variable should be static final as its creation takes more time in 
> query compilation
> ---
>
> Key: HIVE-27687
> URL: https://issues.apache.org/jira/browse/HIVE-27687
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Screenshot 2023-09-12 at 5.03.31 PM.png
>
>
> In query compilation, 
> LoggerFactory.getLogger() seems to take up more time. Some of the serde 
> classes use non static variable for Logger that forces the getLogger() call 
> for each of the class creation.
> Making Logger variable static final will avoid this code path for every serde 
> class construction.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27902) Rewrite Update with empty Where clause to IOW

2023-11-22 Thread Denys Kuzmenko (Jira)
Denys Kuzmenko created HIVE-27902:
-

 Summary: Rewrite Update with empty Where clause to IOW
 Key: HIVE-27902
 URL: https://issues.apache.org/jira/browse/HIVE-27902
 Project: Hive
  Issue Type: Improvement
Reporter: Denys Kuzmenko






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (HIVE-27687) Logger variable should be static final as its creation takes more time in query compilation

2023-11-22 Thread Ramesh Kumar Thangarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramesh Kumar Thangarajan reopened HIVE-27687:
-

> Logger variable should be static final as its creation takes more time in 
> query compilation
> ---
>
> Key: HIVE-27687
> URL: https://issues.apache.org/jira/browse/HIVE-27687
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2023-09-12 at 5.03.31 PM.png
>
>
> In query compilation, 
> LoggerFactory.getLogger() seems to take up more time. Some of the serde 
> classes use non static variable for Logger that forces the getLogger() call 
> for each of the class creation.
> Making Logger variable static final will avoid this code path for every serde 
> class construction.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27885) Cast decimal from string with space without digits before dot returns NULL

2023-11-22 Thread Naveen Gangam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam resolved HIVE-27885.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Fix has been merged to master. Thank you [~nareshpr] for the patch.

> Cast decimal from string with space without digits before dot returns NULL
> --
>
> Key: HIVE-27885
> URL: https://issues.apache.org/jira/browse/HIVE-27885
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> eg.,
> select cast(". " as decimal(8,4))
> {code:java}
> – Expected output
> 0.
> – Actual output
> NULL
> {code}
> select cast("0. " as decimal(8,4))
> {code:java}
> – Actual output
> 0.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27900) hive can not read iceberg-parquet table

2023-11-22 Thread yongzhi.shao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yongzhi.shao updated HIVE-27900:

Description: 
We found that using HIVE4-BETA version, we could not query the Iceberg-Parquet 
table with vectorised execution turned on.
{code:java}
CREATE EXTERNAL TABLE iceberg_dwd.b_qqd_shop_rfm_parquet_snappy
STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
LOCATION 
'hdfs://xxx/iceberg-catalog/warehouse/test/b_qqd_shop_rfm_parquet_snappy/'
TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');


set hive.default.fileformat=orc;
set hive.default.fileformat.managed=orc;
create table test_parquet_as_orc as select * from b_qqd_shop_rfm_parquet_snappy 
limit 100;






, TaskAttempt 2 failed, info=[Error: Node: /xxx..xx.xx: Error while 
running task ( failure ) : 
attempt_1696729618575_69586_1_00_00_2:java.lang.RuntimeException: 
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
... 16 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:993)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
... 19 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.vector.reducesink.VectorReduceSinkEmptyKeyOperator.process(VectorReduceSinkEmptyKeyOperator.java:137)
at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)
at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorLimitOperator.process(VectorLimitOperator.java:108)
at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:171)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:809)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:878)
... 20 more
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.common.io.NonSyncByteArrayOutputStream.write(NonSyncByteArrayOutputStream.java:110)
at 
org.apache.hadoop.hive.serde2.lazybinary.fast.LazyBinarySerializeWrite.writeString(LazyBinarySerializeWrite.java:280)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow$VectorSerializeStringWriter.serialize(VectorSerializeRow.java:532)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow.serializeWrite(VectorSerializeRow.java:316)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow.serializeWrite(VectorSerializeRow.java:297)

[jira] [Updated] (HIVE-27512) CalciteSemanticException.UnsupportedFeature enum to capital

2023-11-22 Thread Mahesh Raju Somalaraju (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahesh Raju Somalaraju updated HIVE-27512:
--
Status: Patch Available  (was: In Progress)

> CalciteSemanticException.UnsupportedFeature enum to capital
> ---
>
> Key: HIVE-27512
> URL: https://issues.apache.org/jira/browse/HIVE-27512
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: Mahesh Raju Somalaraju
>Priority: Major
>  Labels: newbie, pull-request-available
>
> https://github.com/apache/hive/blob/3bc62cbc2d42c22dfd55f78ad7b41ec84a71380f/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/CalciteSemanticException.java#L32-L39
> {code}
>   public enum UnsupportedFeature {
> Distinct_without_an_aggreggation, Duplicates_in_RR, 
> Filter_expression_with_non_boolean_return_type,
> Having_clause_without_any_groupby, Invalid_column_reference, 
> Invalid_decimal,
> Less_than_equal_greater_than, Others, Same_name_in_multiple_expressions,
> Schema_less_table, Select_alias_in_having_clause, Select_transform, 
> Subquery,
> Table_sample_clauses, UDTF, Union_type, Unique_join,
> HighPrecissionTimestamp // CALCITE-1690
>   };
> {code}
> this just hurts my eyes, I expect it as DISTINCT_WITHOUT_AN_AGGREGATION ...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27512) CalciteSemanticException.UnsupportedFeature enum to capital

2023-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27512:
--
Labels: newbie pull-request-available  (was: newbie)

> CalciteSemanticException.UnsupportedFeature enum to capital
> ---
>
> Key: HIVE-27512
> URL: https://issues.apache.org/jira/browse/HIVE-27512
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: Mahesh Raju Somalaraju
>Priority: Major
>  Labels: newbie, pull-request-available
>
> https://github.com/apache/hive/blob/3bc62cbc2d42c22dfd55f78ad7b41ec84a71380f/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/CalciteSemanticException.java#L32-L39
> {code}
>   public enum UnsupportedFeature {
> Distinct_without_an_aggreggation, Duplicates_in_RR, 
> Filter_expression_with_non_boolean_return_type,
> Having_clause_without_any_groupby, Invalid_column_reference, 
> Invalid_decimal,
> Less_than_equal_greater_than, Others, Same_name_in_multiple_expressions,
> Schema_less_table, Select_alias_in_having_clause, Select_transform, 
> Subquery,
> Table_sample_clauses, UDTF, Union_type, Unique_join,
> HighPrecissionTimestamp // CALCITE-1690
>   };
> {code}
> this just hurts my eyes, I expect it as DISTINCT_WITHOUT_AN_AGGREGATION ...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-26618) Add setting to turn on/off removing sections of a query plan known never produces rows

2023-11-22 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-26618.
---
Resolution: Won't Fix

> Add setting to turn on/off removing sections of a query plan known never 
> produces rows
> --
>
> Key: HIVE-26618
> URL: https://issues.apache.org/jira/browse/HIVE-26618
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> HIVE-26524 introduced an optimization to remove sections of query plan known 
> never produces rows.
> Add a setting into hive conf to turn on/off this optimization.
> When the optimization is turned off restore the legacy behavior:
> * represent empty result operator with {{HiveSortLimit}} 0
> * disable {{HiveRemoveEmptySingleRules}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27899) Killed speculative execution task attempt should not commit file

2023-11-22 Thread Chenyu Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chenyu Zheng updated HIVE-27899:

Description: 
As I mentioned in HIVE-25561, when tez turns on speculative execution, the data 
file produced by hive may be duplicated. I mentioned in HIVE-25561 that if the 
speculatively executed task is killed, some data may be submitted unexpectedly. 
However, after HIVE-25561, there is still a situation that has not been solved. 
If two task attempts commit file at the same time, the problem of duplicate 
data files may also occur. Although the probability of this happening is very, 
very low, it does happen.

 

Why?
There are two key steps:
(1)FileSinkOperator::closeOp
TezProcessor::initializeAndRunProcessor --> ... --> FileSinkOperator::closeOp 
--> fsp.commit
When the OP is closed, the process of closing the OP will be triggered, and 
eventually the call to fsp.commit will be triggered.

(2)removeTempOrDuplicateFiles
(2.a)Firstly, listStatus the files in the temporary directory. 
(2.b)Secondly check whether there are multiple incorrect commit, and finally 
move the correct results to the final directory.

When speculative execution is enabled, when one attempt of a Task is completed, 
other attempts will be killed. However, AM only sends the kill event and does 
not ensure that all cleanup actions are completed, that is, 
closeOp may be executed between 2.a and 2.b. Therefore, 
removeTempOrDuplicateFiles will not delete the file generated by the kill 
attempt.

How?
The problem is that both speculatively executed tasks commit the file. This 
will not happen in the Tez examples because they will try canCommit, which can 
guarantee that one and only one task attempt commit successfully. If one task 
attempt executes canCommti successfully, the other one will be stuck by 
canCommit until it receives a kill signal.
detail see: 
[https://github.com/apache/tez/blob/51d6f53967110e2b91b6d90b46f8e16bdc062091/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/processor/SimpleMRProcessor.java#L70]

  was:
As I mentioned in HIVE-25561, when tez turns on speculative execution, the data 
file produced by hive may be duplicated. I mentioned in HIVE-25561 that if the 
speculatively executed task is killed, some data may be submitted unexpectedly. 
However, after HIVE-25561, there is still a situation that has not been solved. 
If two task attempts commit file at the same time, the problem of duplicate 
data files may also occur. Although the probability of this happening is very, 
very low, it does happen.

 

Why?
There are two key steps:
(1)FileSinkOperator::closeOp
TezProcessor::initializeAndRunProcessor --> ... --> FileSinkOperator::closeOp 
--> fsp.commit
When the OP is closed, the process of closing the OP will be triggered, and 
eventually the call to fsp.commit will be triggered.

(2)removeTempOrDuplicateFiles
(2.a)Firstly, listStatus the files in the temporary directory. 
(2.b)Secondly check whether there are multiple incorrect commit, and finally 
move the correct results to the final directory.

When speculative execution is enabled, when one attempt of a Task is completed, 
other attempts will be killed. However, AM only sends the kill event and does 
not ensure that all cleanup actions are completed, that is, 
closeOp may be executed between 2.a and 2.b. Therefore, 
removeTempOrDuplicateFiles will not delete the file generated by the kill 
attempt.


How?
The problem is that both speculatively executed tasks commit the file. This 
will not happen in the Tez examples because they will try canCommit, which can 
guarantee that one and only one task attempt commit successfully. If one task 
attempt executes canCommti successfully, the other one will be stuck by 
canCommti until it receives a kill signal.
detail see: 
https://github.com/apache/tez/blob/51d6f53967110e2b91b6d90b46f8e16bdc062091/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/processor/SimpleMRProcessor.java#L70


> Killed speculative execution task attempt should not commit file
> 
>
> Key: HIVE-27899
> URL: https://issues.apache.org/jira/browse/HIVE-27899
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Reporter: Chenyu Zheng
>Assignee: Chenyu Zheng
>Priority: Major
> Attachments: reproduce_bug.md
>
>
> As I mentioned in HIVE-25561, when tez turns on speculative execution, the 
> data file produced by hive may be duplicated. I mentioned in HIVE-25561 that 
> if the speculatively executed task is killed, some data may be submitted 
> unexpectedly. However, after HIVE-25561, there is still a situation that has 
> not been solved. If two task attempts commit file at the same time, the 
> problem of duplicate data files may also occur. Although the probability of 
> this

[jira] [Commented] (HIVE-27899) Killed speculative execution task attempt should not commit file

2023-11-22 Thread Chenyu Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17788725#comment-17788725
 ] 

Chenyu Zheng commented on HIVE-27899:
-

The probability of this bug recurring is very very low, and some special code 
must be added to increase the probability of this bug recurring. Only then can 
the correctness of the repair code be guaranteed. I added a sleep to the 
relevant code to simulate the stuck. This greatly increases the probability of 
the problem recurring. I've added the relevant details in the attachment 
`reproduce_bug.md`.

> Killed speculative execution task attempt should not commit file
> 
>
> Key: HIVE-27899
> URL: https://issues.apache.org/jira/browse/HIVE-27899
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Reporter: Chenyu Zheng
>Assignee: Chenyu Zheng
>Priority: Major
> Attachments: reproduce_bug.md
>
>
> As I mentioned in HIVE-25561, when tez turns on speculative execution, the 
> data file produced by hive may be duplicated. I mentioned in HIVE-25561 that 
> if the speculatively executed task is killed, some data may be submitted 
> unexpectedly. However, after HIVE-25561, there is still a situation that has 
> not been solved. If two task attempts commit file at the same time, the 
> problem of duplicate data files may also occur. Although the probability of 
> this happening is very, very low, it does happen.
>  
> Why?
> There are two key steps:
> (1)FileSinkOperator::closeOp
> TezProcessor::initializeAndRunProcessor --> ... --> FileSinkOperator::closeOp 
> --> fsp.commit
> When the OP is closed, the process of closing the OP will be triggered, and 
> eventually the call to fsp.commit will be triggered.
> (2)removeTempOrDuplicateFiles
> (2.a)Firstly, listStatus the files in the temporary directory. 
> (2.b)Secondly check whether there are multiple incorrect commit, and finally 
> move the correct results to the final directory.
> When speculative execution is enabled, when one attempt of a Task is 
> completed, other attempts will be killed. However, AM only sends the kill 
> event and does not ensure that all cleanup actions are completed, that is, 
> closeOp may be executed between 2.a and 2.b. Therefore, 
> removeTempOrDuplicateFiles will not delete the file generated by the kill 
> attempt.
> How?
> The problem is that both speculatively executed tasks commit the file. This 
> will not happen in the Tez examples because they will try canCommit, which 
> can guarantee that one and only one task attempt commit successfully. If one 
> task attempt executes canCommti successfully, the other one will be stuck by 
> canCommti until it receives a kill signal.
> detail see: 
> https://github.com/apache/tez/blob/51d6f53967110e2b91b6d90b46f8e16bdc062091/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/processor/SimpleMRProcessor.java#L70



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27899) Killed speculative execution task attempt should not commit file

2023-11-22 Thread Chenyu Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chenyu Zheng updated HIVE-27899:

Attachment: reproduce_bug.md

> Killed speculative execution task attempt should not commit file
> 
>
> Key: HIVE-27899
> URL: https://issues.apache.org/jira/browse/HIVE-27899
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Reporter: Chenyu Zheng
>Assignee: Chenyu Zheng
>Priority: Major
> Attachments: reproduce_bug.md
>
>
> As I mentioned in HIVE-25561, when tez turns on speculative execution, the 
> data file produced by hive may be duplicated. I mentioned in HIVE-25561 that 
> if the speculatively executed task is killed, some data may be submitted 
> unexpectedly. However, after HIVE-25561, there is still a situation that has 
> not been solved. If two task attempts commit file at the same time, the 
> problem of duplicate data files may also occur. Although the probability of 
> this happening is very, very low, it does happen.
>  
> Why?
> There are two key steps:
> (1)FileSinkOperator::closeOp
> TezProcessor::initializeAndRunProcessor --> ... --> FileSinkOperator::closeOp 
> --> fsp.commit
> When the OP is closed, the process of closing the OP will be triggered, and 
> eventually the call to fsp.commit will be triggered.
> (2)removeTempOrDuplicateFiles
> (2.a)Firstly, listStatus the files in the temporary directory. 
> (2.b)Secondly check whether there are multiple incorrect commit, and finally 
> move the correct results to the final directory.
> When speculative execution is enabled, when one attempt of a Task is 
> completed, other attempts will be killed. However, AM only sends the kill 
> event and does not ensure that all cleanup actions are completed, that is, 
> closeOp may be executed between 2.a and 2.b. Therefore, 
> removeTempOrDuplicateFiles will not delete the file generated by the kill 
> attempt.
> How?
> The problem is that both speculatively executed tasks commit the file. This 
> will not happen in the Tez examples because they will try canCommit, which 
> can guarantee that one and only one task attempt commit successfully. If one 
> task attempt executes canCommti successfully, the other one will be stuck by 
> canCommti until it receives a kill signal.
> detail see: 
> https://github.com/apache/tez/blob/51d6f53967110e2b91b6d90b46f8e16bdc062091/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/processor/SimpleMRProcessor.java#L70



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27892) Hive "insert overwrite table" for multiple partition table issue

2023-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27892:
--
Labels: pull-request-available  (was: )

> Hive "insert overwrite table" for multiple partition table issue
> 
>
> Key: HIVE-27892
> URL: https://issues.apache.org/jira/browse/HIVE-27892
> Project: Hive
>  Issue Type: Bug
>Reporter: Mayank Kunwar
>Assignee: Mayank Kunwar
>Priority: Major
>  Labels: pull-request-available
>
> Authorization is not working for Hive "insert overwrite table" for multiple 
> partition table.
> Steps to reproduce the issue:
> 1) CREATE EXTERNAL TABLE Part (eid int, name int)
> PARTITIONED BY (position int, dept int);
> 2) SET hive.exec.dynamic.partition.mode=nonstrict;
> 3) INSERT INTO TABLE PART PARTITION (position,DEPT)
> SELECT 1,1,1,1;
> 4) select * from part;
> create a test user test123, and grant test123 only Select permission for db 
> default, table Part and column * .
> 1) insert overwrite table part partition(position=2,DEPT=2) select 2,2;
> This will failed as expected.
> 2) insert overwrite table part partition(position,DEPT) select 2,2,2,2;
> This will failed as expected.
> 3) insert overwrite table part partition(position=2,DEPT) select 2,2,2;
> But this will succeed and no audit in Ranger, which means no authorization 
> happened when this query was executed.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27899) Killed speculative execution task attempt should not commit file

2023-11-22 Thread Chenyu Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chenyu Zheng updated HIVE-27899:

Description: 
As I mentioned in HIVE-25561, when tez turns on speculative execution, the data 
file produced by hive may be duplicated. I mentioned in HIVE-25561 that if the 
speculatively executed task is killed, some data may be submitted unexpectedly. 
However, after HIVE-25561, there is still a situation that has not been solved. 
If two task attempts commit file at the same time, the problem of duplicate 
data files may also occur. Although the probability of this happening is very, 
very low, it does happen.

 

Why?
There are two key steps:
(1)FileSinkOperator::closeOp
TezProcessor::initializeAndRunProcessor --> ... --> FileSinkOperator::closeOp 
--> fsp.commit
When the OP is closed, the process of closing the OP will be triggered, and 
eventually the call to fsp.commit will be triggered.

(2)removeTempOrDuplicateFiles
(2.a)Firstly, listStatus the files in the temporary directory. 
(2.b)Secondly check whether there are multiple incorrect commit, and finally 
move the correct results to the final directory.

When speculative execution is enabled, when one attempt of a Task is completed, 
other attempts will be killed. However, AM only sends the kill event and does 
not ensure that all cleanup actions are completed, that is, 
closeOp may be executed between 2.a and 2.b. Therefore, 
removeTempOrDuplicateFiles will not delete the file generated by the kill 
attempt.


How?
The problem is that both speculatively executed tasks commit the file. This 
will not happen in the Tez examples because they will try canCommit, which can 
guarantee that one and only one task attempt commit successfully. If one task 
attempt executes canCommti successfully, the other one will be stuck by 
canCommti until it receives a kill signal.
detail see: 
https://github.com/apache/tez/blob/51d6f53967110e2b91b6d90b46f8e16bdc062091/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/processor/SimpleMRProcessor.java#L70

  was:As I mentioned in HIVE-25561, when tez turns on speculative execution, 
the data file produced by hive may be duplicated. I mentioned in HIVE-25561 
that if the speculatively executed task is killed, some data may be submitted 
unexpectedly. However, after HIVE-25561, there is still a situation that has 
not been solved. If two task attempts commit file at the same time, the problem 
of duplicate data files may also occur. Although the probability of this 
happening is very, very low, it does happen.


> Killed speculative execution task attempt should not commit file
> 
>
> Key: HIVE-27899
> URL: https://issues.apache.org/jira/browse/HIVE-27899
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Reporter: Chenyu Zheng
>Assignee: Chenyu Zheng
>Priority: Major
>
> As I mentioned in HIVE-25561, when tez turns on speculative execution, the 
> data file produced by hive may be duplicated. I mentioned in HIVE-25561 that 
> if the speculatively executed task is killed, some data may be submitted 
> unexpectedly. However, after HIVE-25561, there is still a situation that has 
> not been solved. If two task attempts commit file at the same time, the 
> problem of duplicate data files may also occur. Although the probability of 
> this happening is very, very low, it does happen.
>  
> Why?
> There are two key steps:
> (1)FileSinkOperator::closeOp
> TezProcessor::initializeAndRunProcessor --> ... --> FileSinkOperator::closeOp 
> --> fsp.commit
> When the OP is closed, the process of closing the OP will be triggered, and 
> eventually the call to fsp.commit will be triggered.
> (2)removeTempOrDuplicateFiles
> (2.a)Firstly, listStatus the files in the temporary directory. 
> (2.b)Secondly check whether there are multiple incorrect commit, and finally 
> move the correct results to the final directory.
> When speculative execution is enabled, when one attempt of a Task is 
> completed, other attempts will be killed. However, AM only sends the kill 
> event and does not ensure that all cleanup actions are completed, that is, 
> closeOp may be executed between 2.a and 2.b. Therefore, 
> removeTempOrDuplicateFiles will not delete the file generated by the kill 
> attempt.
> How?
> The problem is that both speculatively executed tasks commit the file. This 
> will not happen in the Tez examples because they will try canCommit, which 
> can guarantee that one and only one task attempt commit successfully. If one 
> task attempt executes canCommti successfully, the other one will be stuck by 
> canCommti until it receives a kill signal.
> detail see: 
> https://github.com/apache/tez/blob/51d6f53967110e2b91b6d90b46f8e16bdc062091/tez-mapreduce/src/main/java/org/apache/tez

[jira] [Updated] (HIVE-27899) Killed speculative execution task attempt should not commit file

2023-11-22 Thread Chenyu Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chenyu Zheng updated HIVE-27899:

Description: As I mentioned in HIVE-25561, when tez turns on speculative 
execution, the data file produced by hive may be duplicated. I mentioned in 
HIVE-25561 that if the speculatively executed task is killed, some data may be 
submitted unexpectedly. However, after HIVE-25561, there is still a situation 
that has not been solved. If two task attempts commit file at the same time, 
the problem of duplicate data files may also occur. Although the probability of 
this happening is very, very low, it does happen.

> Killed speculative execution task attempt should not commit file
> 
>
> Key: HIVE-27899
> URL: https://issues.apache.org/jira/browse/HIVE-27899
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Reporter: Chenyu Zheng
>Assignee: Chenyu Zheng
>Priority: Major
>
> As I mentioned in HIVE-25561, when tez turns on speculative execution, the 
> data file produced by hive may be duplicated. I mentioned in HIVE-25561 that 
> if the speculatively executed task is killed, some data may be submitted 
> unexpectedly. However, after HIVE-25561, there is still a situation that has 
> not been solved. If two task attempts commit file at the same time, the 
> problem of duplicate data files may also occur. Although the probability of 
> this happening is very, very low, it does happen.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27899) Killed speculative execution task attempt should not commit file

2023-11-22 Thread Chenyu Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chenyu Zheng updated HIVE-27899:

Summary: Killed speculative execution task attempt should not commit file  
(was: Speculative execution task which will be killed should not commit file)

> Killed speculative execution task attempt should not commit file
> 
>
> Key: HIVE-27899
> URL: https://issues.apache.org/jira/browse/HIVE-27899
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Reporter: Chenyu Zheng
>Assignee: Chenyu Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27901) Hive's performance for querying the Iceberg table is very poor.

2023-11-22 Thread yongzhi.shao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yongzhi.shao updated HIVE-27901:

Description: 
I am using HIVE4.0.0-BETA for testing.

BTW,I found that the performance of HIVE reading ICEBERG table is still very 
slow.

How should I deal with this problem?

I count a 7 billion table and compare the performance difference between HIVE 
reading ICEBERG-ORC and ORC table respectively.

We use ICEBERG 1.4.2, ICEBERG-ORC with ZSTD compression enabled.

ORC with SNAPPY compression.

HADOOP version 3.1.1 (native zstd not supported).

!image-2023-11-22-18-32-28-344.png!

!image-2023-11-22-18-33-01-885.png!

Also, I have another question. The Submit Plan statistic is clearly incorrect. 
Is this something that needs to be fixed?

!image-2023-11-22-18-33-32-915.png!

 

  was:
I am using HIVE-4.0.0-BETA for testing.

BTW,I found that the performance of HIVE reading ICEBERG table is still very 
slow.

How should I deal with this problem?

I count a 7 billion table and compare the performance difference between HIVE 
reading ICEBERG-ORC and ORC table respectively.

We use ICEBERG 1.4.2, ICEBERG-ORC with ZSTD compression enabled.

ORC with SNAPPY compression.

HADOOP version 3.1.1 (native zstd not supported).

!image-2023-11-22-18-32-28-344.png!

!image-2023-11-22-18-33-01-885.png!

Also, I have another question. The Submit Plan statistic is clearly incorrect. 
Is this something that needs to be fixed?

!image-2023-11-22-18-33-32-915.png!

 


> Hive's performance for querying the Iceberg table is very poor.
> ---
>
> Key: HIVE-27901
> URL: https://issues.apache.org/jira/browse/HIVE-27901
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Major
> Attachments: image-2023-11-22-18-32-28-344.png, 
> image-2023-11-22-18-33-01-885.png, image-2023-11-22-18-33-32-915.png
>
>
> I am using HIVE4.0.0-BETA for testing.
> BTW,I found that the performance of HIVE reading ICEBERG table is still very 
> slow.
> How should I deal with this problem?
> I count a 7 billion table and compare the performance difference between HIVE 
> reading ICEBERG-ORC and ORC table respectively.
> We use ICEBERG 1.4.2, ICEBERG-ORC with ZSTD compression enabled.
> ORC with SNAPPY compression.
> HADOOP version 3.1.1 (native zstd not supported).
> !image-2023-11-22-18-32-28-344.png!
> !image-2023-11-22-18-33-01-885.png!
> Also, I have another question. The Submit Plan statistic is clearly 
> incorrect. Is this something that needs to be fixed?
> !image-2023-11-22-18-33-32-915.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27901) Hive's performance for querying the Iceberg table is very poor.

2023-11-22 Thread yongzhi.shao (Jira)
yongzhi.shao created HIVE-27901:
---

 Summary: Hive's performance for querying the Iceberg table is very 
poor.
 Key: HIVE-27901
 URL: https://issues.apache.org/jira/browse/HIVE-27901
 Project: Hive
  Issue Type: Bug
  Components: Iceberg integration
Affects Versions: 4.0.0-beta-1
Reporter: yongzhi.shao
 Attachments: image-2023-11-22-18-32-28-344.png, 
image-2023-11-22-18-33-01-885.png, image-2023-11-22-18-33-32-915.png

I am using HIVE-4.0.0-BETA for testing.

BTW,I found that the performance of HIVE reading ICEBERG table is still very 
slow.

How should I deal with this problem?

I count a 7 billion table and compare the performance difference between HIVE 
reading ICEBERG-ORC and ORC table respectively.

We use ICEBERG 1.4.2, ICEBERG-ORC with ZSTD compression enabled.

ORC with SNAPPY compression.

HADOOP version 3.1.1 (native zstd not supported).

!image-2023-11-22-18-32-28-344.png!

!image-2023-11-22-18-33-01-885.png!

Also, I have another question. The Submit Plan statistic is clearly incorrect. 
Is this something that needs to be fixed?

!image-2023-11-22-18-33-32-915.png!

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27900) hive can not read iceberg-parquet table

2023-11-22 Thread yongzhi.shao (Jira)
yongzhi.shao created HIVE-27900:
---

 Summary: hive can not read iceberg-parquet table
 Key: HIVE-27900
 URL: https://issues.apache.org/jira/browse/HIVE-27900
 Project: Hive
  Issue Type: Bug
  Components: Iceberg integration
Affects Versions: 4.0.0-beta-1
Reporter: yongzhi.shao


We found that using HIVE4-BETA version, we could not query the Iceberg-Parquet 
table with vectorised execution turned on.
{code:java}
CREATE EXTERNAL TABLE iceberg_dwd.b_qqd_shop_rfm_parquet_snappy
STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
LOCATION 
'hdfs://xxx/iceberg-catalog/warehouse/test/b_qqd_shop_rfm_parquet_snappy/'
TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');


set hive.default.fileformat=orc;
set hive.default.fileformat.managed=orc;
create table test_parquet_as_orc as select * from b_qqd_shop_rfm_parquet_snappy 
limit 100;






, TaskAttempt 2 failed, info=[Error: Node: /xxx..xx.xx: Error while 
running task ( failure ) : 
attempt_1696729618575_69586_1_00_00_2:java.lang.RuntimeException: 
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
... 16 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:993)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
... 19 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.vector.reducesink.VectorReduceSinkEmptyKeyOperator.process(VectorReduceSinkEmptyKeyOperator.java:137)
at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)
at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorLimitOperator.process(VectorLimitOperator.java:108)
at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:171)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:809)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:878)
... 20 more
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.common.io.NonSyncByteArrayOutputStream.write(NonSyncByteArrayOutputStream.java:110)
at 
org.apache.hadoop.hive.serde2.lazybinary.fast.LazyBinarySerializeWrite.writeString(LazyBinarySerializeWrite.java:280)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow$VectorSerializeStringWriter.serialize(VectorSerializeRow.java:532)
at 
org.apache.hado

[jira] [Updated] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2023-11-22 Thread yongzhi.shao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yongzhi.shao updated HIVE-27898:

Description: 
Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
table in the subquery, we can't get any data in the end.

I have used HIVE3-TEZ for cross validation and HIVE3 does not have this problem 
when querying ICEBERG.
{code:java}
--iceberg
select * from iceberg_dwd.b_std_trade 
where uni_shop_id = 'TEST|1' limit 10  --10 rows


select *
from ( 
select * from iceberg_dwd.b_std_trade 
where uni_shop_id = 'TEST|1' limit 10
) t1;   --10 rows


select uni_shop_id
from ( 
select * from iceberg_dwd.b_std_trade 
where uni_shop_id = 'TEST|1' limit 10
) t1;  --0 rows


select uni_shop_id
from ( 
select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
where uni_shop_id = 'TEST|1' limit 10
) t1;  --0 rows


--orc
select uni_shop_id
from ( 
select * from iceberg_dwd.trade_test 
where uni_shop_id = 'TEST|1' limit 10
) t1;--10 ROWS{code}
 

  was:
Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
table in the subquery, we can't get any data in the end.

I have used HIVE3 for cross validation and HIVE3 does not have this problem 
when querying ICEBERG.
{code:java}
--iceberg
select * from iceberg_dwd.b_std_trade 
where uni_shop_id = 'TEST|1' limit 10  --10 rows


select *
from ( 
select * from iceberg_dwd.b_std_trade 
where uni_shop_id = 'TEST|1' limit 10
) t1;   --10 rows


select uni_shop_id
from ( 
select * from iceberg_dwd.b_std_trade 
where uni_shop_id = 'TEST|1' limit 10
) t1;  --0 rows


select uni_shop_id
from ( 
select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
where uni_shop_id = 'TEST|1' limit 10
) t1;  --0 rows


--orc
select uni_shop_id
from ( 
select * from iceberg_dwd.trade_test 
where uni_shop_id = 'TEST|1' limit 10
) t1;--10 ROWS{code}
 


> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --iceberg
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> --orc
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.trade_test 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;--10 ROWS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27899) Speculative execution task which will be killed should not commit file

2023-11-22 Thread Chenyu Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chenyu Zheng reassigned HIVE-27899:
---

  Assignee: Chenyu Zheng
Issue Type: Bug  (was: Improvement)

> Speculative execution task which will be killed should not commit file
> --
>
> Key: HIVE-27899
> URL: https://issues.apache.org/jira/browse/HIVE-27899
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Reporter: Chenyu Zheng
>Assignee: Chenyu Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27899) Speculative execution task which will be killed should not commit file

2023-11-22 Thread Chenyu Zheng (Jira)
Chenyu Zheng created HIVE-27899:
---

 Summary: Speculative execution task which will be killed should 
not commit file
 Key: HIVE-27899
 URL: https://issues.apache.org/jira/browse/HIVE-27899
 Project: Hive
  Issue Type: Improvement
  Components: Tez
Reporter: Chenyu Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2023-11-22 Thread yongzhi.shao (Jira)
yongzhi.shao created HIVE-27898:
---

 Summary: HIVE4 can't use ICEBERG table in subqueries
 Key: HIVE-27898
 URL: https://issues.apache.org/jira/browse/HIVE-27898
 Project: Hive
  Issue Type: Bug
  Components: Iceberg integration
Affects Versions: 4.0.0-beta-1
Reporter: yongzhi.shao


Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
table in the subquery, we can't get any data in the end.

I have used HIVE3 for cross validation and HIVE3 does not have this problem 
when querying ICEBERG.
{code:java}
--iceberg
select * from iceberg_dwd.b_std_trade 
where uni_shop_id = 'TEST|1' limit 10  --10 rows


select *
from ( 
select * from iceberg_dwd.b_std_trade 
where uni_shop_id = 'TEST|1' limit 10
) t1;   --10 rows


select uni_shop_id
from ( 
select * from iceberg_dwd.b_std_trade 
where uni_shop_id = 'TEST|1' limit 10
) t1;  --0 rows


select uni_shop_id
from ( 
select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
where uni_shop_id = 'TEST|1' limit 10
) t1;  --0 rows


--orc
select uni_shop_id
from ( 
select * from iceberg_dwd.trade_test 
where uni_shop_id = 'TEST|1' limit 10
) t1;--10 ROWS{code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27687) Logger variable should be static final as its creation takes more time in query compilation

2023-11-22 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17788686#comment-17788686
 ] 

Stamatis Zampetakis commented on HIVE-27687:


[~rameshkumar] Please fill in the "Fix Version" field otherwise this entry will 
never make it to the release notes.

> Logger variable should be static final as its creation takes more time in 
> query compilation
> ---
>
> Key: HIVE-27687
> URL: https://issues.apache.org/jira/browse/HIVE-27687
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2023-09-12 at 5.03.31 PM.png
>
>
> In query compilation, 
> LoggerFactory.getLogger() seems to take up more time. Some of the serde 
> classes use non static variable for Logger that forces the getLogger() call 
> for each of the class creation.
> Making Logger variable static final will avoid this code path for every serde 
> class construction.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27897) Backport of HIVE-22373, HIVE-25553, HIVE-23561, HIVE-24321, HIVE-22856, HIVE-22973, HIVE-21729

2023-11-22 Thread Aman Raj (Jira)
Aman Raj created HIVE-27897:
---

 Summary: Backport of HIVE-22373, HIVE-25553, HIVE-23561, 
HIVE-24321, HIVE-22856, HIVE-22973, HIVE-21729
 Key: HIVE-27897
 URL: https://issues.apache.org/jira/browse/HIVE-27897
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 3.2.0
Reporter: Aman Raj
Assignee: Aman Raj






--
This message was sent by Atlassian Jira
(v8.20.10#820010)