[jira] [Created] (HIVE-26978) Stale "Runtime stats" causes poor query planning
Rajesh Balamohan created HIVE-26978: --- Summary: Stale "Runtime stats" causes poor query planning Key: HIVE-26978 URL: https://issues.apache.org/jira/browse/HIVE-26978 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Rajesh Balamohan Attachments: Screenshot 2023-01-24 at 10.23.16 AM.png * Runtime stats can be stored in hiveserver or in metastore via "hive.query.reexecution.stats.persist.scope". * Though the table is dropped and recreated, it ends up showing old stats via "RUNTIME" stats. Here is an example (note that the table is empty, but gets datasize and numRows from RUNTIME stats) * This causes suboptimal plan for "MERGE INTO" queries by creating CUSTOM_EDGE instead of broadcast edge. !Screenshot 2023-01-24 at 10.23.16 AM.png|width=2053,height=753! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26976) Failure in direct SQL processing do not undo the changes made to database before fallback to JDO resulting in stale entries in database forever
Venugopal Reddy K created HIVE-26976: Summary: Failure in direct SQL processing do not undo the changes made to database before fallback to JDO resulting in stale entries in database forever Key: HIVE-26976 URL: https://issues.apache.org/jira/browse/HIVE-26976 Project: Hive Issue Type: Bug Components: Metastore Reporter: Venugopal Reddy K Attachments: image-2023-01-23-15-23-00-975.png *[Description]* Failure in direct SQL processing do not undo the changes made to database before falling back to JDO based processing. Thus results in stale/dangling entries in database forever. For instance, during dropPartitions() direct SQL processing, after dropping rows from few tables(like PARTITIONS, PARTITION_PARAMS, PARTITION_KEY_VALS etc), it causes typecast exception in dropStorageDescriptors() and fallback to JDO processing. But datanucleus JDO processing cannot delete rows from remaining tables(i.e., from sds, serdes, sds_params, serde_params, sort_cols, bucketing_cols, skewed cols/values/location if any for the partitions) since the partition is already in the same transaction during direct SQL processing. !image-2023-01-23-15-23-00-975.png! *[Steps to reproduce]* Reproduction steps are described in issue - [https://issues.apache.org/jira/browse/HIVE-26860|https://issues.apache.org/jira/browse/HIVE-26860] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26975) MERGE: Wrong reducer estimate causing smaller files to be created
Rajesh Balamohan created HIVE-26975: --- Summary: MERGE: Wrong reducer estimate causing smaller files to be created Key: HIVE-26975 URL: https://issues.apache.org/jira/browse/HIVE-26975 Project: Hive Issue Type: Improvement Components: Iceberg integration Reporter: Rajesh Balamohan * "Merge into" estimates wrong number of reducers causing more number of small files to be created.* e.g 400+ files in 3+ MB file each.* * This can be reproduced by writing data into "store_sales" table in iceberg format via another source table (using merge-into). ** e.g Running this few times will create wrong number of reduce tasks causing lot of small files to be created in iceberg table. {noformat} MERGE INTO store_sales_t t using ssv s ON ( t.ss_item_sk = s.ss_item_sk AND t.ss_customer_sk = s.ss_customer_sk AND t.ss_sold_date_sk = "2451181" AND ( ( Floor(( s.ss_item_sk ) / 1000) * 1000 ) BETWEEN 1000 AND 2000 ) AND s.ss_ext_discount_amt < 0.0 ) WHEN matched AND t.ss_ext_discount_amt IS NULL THEN UPDATE SET ss_ext_discount_amt = 0.0 WHEN NOT matched THEN INSERT ( ss_sold_time_sk, ss_item_sk, ss_customer_sk, ss_cdemo_sk, ss_hdemo_sk, ss_addr_sk, ss_store_sk, ss_promo_sk, ss_ticket_number, ss_quantity, ss_wholesale_cost, ss_list_price, ss_sales_price, ss_ext_discount_amt, ss_ext_sales_price, ss_ext_wholesale_cost, ss_ext_list_price, ss_ext_tax, ss_coupon_amt, ss_net_paid, ss_net_paid_inc_tax, ss_net_profit, ss_sold_date_sk ) VALUES ( s.ss_sold_time_sk, s.ss_item_sk, s.ss_customer_sk, s.ss_cdemo_sk, s.ss_hdemo_sk, s.ss_addr_sk, s.ss_store_sk, s.ss_promo_sk, s.ss_ticket_number, s.ss_quantity, s.ss_wholesale_cost, s.ss_list_price, s.ss_sales_price, s.ss_ext_discount_amt, s.ss_ext_sales_price, s.ss_ext_wholesale_cost, s.ss_ext_list_price, s.ss_ext_tax, s.ss_coupon_amt, s.ss_net_paid, s.ss_net_paid_inc_tax, s.ss_net_profit, "2451181") {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)