[jira] [Created] (HIVE-26978) Stale "Runtime stats" causes poor query planning

2023-01-23 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-26978:
---

 Summary: Stale "Runtime stats" causes poor query planning
 Key: HIVE-26978
 URL: https://issues.apache.org/jira/browse/HIVE-26978
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Rajesh Balamohan
 Attachments: Screenshot 2023-01-24 at 10.23.16 AM.png

* Runtime stats can be stored in hiveserver or in metastore via 
"hive.query.reexecution.stats.persist.scope".
 * Though the table is dropped and recreated, it ends up showing old stats via 
"RUNTIME" stats. Here is an example (note that the table is empty, but gets 
datasize and numRows from RUNTIME stats)
 * This causes suboptimal plan for "MERGE INTO" queries by creating CUSTOM_EDGE 
instead of broadcast edge.

!Screenshot 2023-01-24 at 10.23.16 AM.png|width=2053,height=753!

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-26976) Failure in direct SQL processing do not undo the changes made to database before fallback to JDO resulting in stale entries in database forever

2023-01-23 Thread Venugopal Reddy K (Jira)
Venugopal Reddy K created HIVE-26976:


 Summary: Failure in direct SQL processing do not undo the changes 
made to database before fallback to JDO resulting in stale entries in database 
forever
 Key: HIVE-26976
 URL: https://issues.apache.org/jira/browse/HIVE-26976
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Venugopal Reddy K
 Attachments: image-2023-01-23-15-23-00-975.png

*[Description]* 

Failure in direct SQL processing do not undo the changes made to database 
before falling back to JDO based processing. Thus results in stale/dangling 
entries in database forever.

For instance, during dropPartitions() direct SQL processing, after dropping 
rows from few tables(like PARTITIONS, PARTITION_PARAMS, PARTITION_KEY_VALS 
etc), it causes typecast exception in dropStorageDescriptors() and fallback to 
JDO processing. But datanucleus JDO processing cannot delete rows from 
remaining tables(i.e., from sds, serdes, sds_params, serde_params, sort_cols, 
bucketing_cols, skewed cols/values/location if any for the partitions) since 
the partition is already in the same transaction during direct SQL processing.

!image-2023-01-23-15-23-00-975.png!

 

 

*[Steps to reproduce]* 

Reproduction steps are described in issue - 
[https://issues.apache.org/jira/browse/HIVE-26860|https://issues.apache.org/jira/browse/HIVE-26860]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-26975) MERGE: Wrong reducer estimate causing smaller files to be created

2023-01-23 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-26975:
---

 Summary: MERGE: Wrong reducer estimate causing smaller files to be 
created
 Key: HIVE-26975
 URL: https://issues.apache.org/jira/browse/HIVE-26975
 Project: Hive
  Issue Type: Improvement
  Components: Iceberg integration
Reporter: Rajesh Balamohan


* "Merge into" estimates wrong number of reducers causing more number of small 
files to be created.* e.g 400+ files in 3+ MB file each.*
 * This can be reproduced by writing data into "store_sales" table in iceberg 
format via another source table (using merge-into).
 ** e.g  Running this few times will create wrong number of reduce tasks 
causing lot of small files to be created in iceberg table.

{noformat}
MERGE INTO store_sales_t t

using ssv s

ON ( t.ss_item_sk = s.ss_item_sk

 AND t.ss_customer_sk = s.ss_customer_sk

 AND t.ss_sold_date_sk = "2451181"

 AND ( ( Floor(( s.ss_item_sk ) / 1000) * 1000 ) BETWEEN 1000 AND 2000 )

 AND s.ss_ext_discount_amt < 0.0 )

WHEN matched AND t.ss_ext_discount_amt IS NULL THEN

  UPDATE SET ss_ext_discount_amt = 0.0

WHEN NOT matched THEN

  INSERT ( ss_sold_time_sk,

   ss_item_sk,

   ss_customer_sk,

   ss_cdemo_sk,

   ss_hdemo_sk,

   ss_addr_sk,

   ss_store_sk,

   ss_promo_sk,

   ss_ticket_number,

   ss_quantity,

   ss_wholesale_cost,

   ss_list_price,

   ss_sales_price,

   ss_ext_discount_amt,

   ss_ext_sales_price,

   ss_ext_wholesale_cost,

   ss_ext_list_price,

   ss_ext_tax,

   ss_coupon_amt,

   ss_net_paid,

   ss_net_paid_inc_tax,

   ss_net_profit,

   ss_sold_date_sk )

  VALUES ( s.ss_sold_time_sk,

   s.ss_item_sk,

   s.ss_customer_sk,

   s.ss_cdemo_sk,

   s.ss_hdemo_sk,

   s.ss_addr_sk,

   s.ss_store_sk,

   s.ss_promo_sk,

   s.ss_ticket_number,

   s.ss_quantity,

   s.ss_wholesale_cost,

   s.ss_list_price,

   s.ss_sales_price,

   s.ss_ext_discount_amt,

   s.ss_ext_sales_price,

   s.ss_ext_wholesale_cost,

   s.ss_ext_list_price,

   s.ss_ext_tax,

   s.ss_coupon_amt,

   s.ss_net_paid,

   s.ss_net_paid_inc_tax,

   s.ss_net_profit,

   "2451181") 

{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)