[jira] [Created] (HIVE-26901) Add metrics on transactions in replication metrics table
Amit Saonerkar created HIVE-26901: - Summary: Add metrics on transactions in replication metrics table Key: HIVE-26901 URL: https://issues.apache.org/jira/browse/HIVE-26901 Project: Hive Issue Type: Improvement Components: Hive Reporter: Amit Saonerkar Assignee: Amit Saonerkar This is related to corresponding [https://jira.cloudera.com/browse/CDPD-17985?filter=-1] We need to enahnce replication metrics table information by adding informations related to transactions during REPL DUMP/LOAD operations. Basically idea here is to give user a picture about how transaction are making progress during dump and load operations. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26900) Error message not representing the correct line number with a syntax error in a HQL File
Vikram Ahuja created HIVE-26900: --- Summary: Error message not representing the correct line number with a syntax error in a HQL File Key: HIVE-26900 URL: https://issues.apache.org/jira/browse/HIVE-26900 Project: Hive Issue Type: Bug Reporter: Vikram Ahuja When a wrong syntax is added in a HQL file, the error thrown by beeline while running the HQL file is having the wrong line number. The line number and even the position is incorrect. Seems like parser is not considering spaces and new lines and always throwing the error on line number 1 irrespective of what line the error is on in the HQL file -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26899) Upgrade arrow to 0.11.0 in branch-3
Aman Raj created HIVE-26899: --- Summary: Upgrade arrow to 0.11.0 in branch-3 Key: HIVE-26899 URL: https://issues.apache.org/jira/browse/HIVE-26899 Project: Hive Issue Type: Sub-task Reporter: Aman Raj Assignee: Aman Raj -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26898) Split Notification logging so that we can busy clusters can have better performance
Taraka Rama Rao Lethavadla created HIVE-26898: - Summary: Split Notification logging so that we can busy clusters can have better performance Key: HIVE-26898 URL: https://issues.apache.org/jira/browse/HIVE-26898 Project: Hive Issue Type: New Feature Reporter: Taraka Rama Rao Lethavadla For DDL & DML events are logged into notifications log table and those get cleaned as soon as ttl got expired. In most of the busy clusters, the notification log is growing even though cleaner is running and kept on cleaning the events. It means the rate of Hive db operations are very high compared to rate at which cleaning is happening. So any query on this table is becoming bottle neck at backend DB causing slow response The proposal is to split the notification log table in to multiple tables like notification_log_dml - for all DML queries notification_log_insert - for all insert queries .. etc. So that load on that single table gets reduced improving the performance of the backend db as well as Hive -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26897) Provide a command/tool to recover data in ACID table when table data got corrupted with invalid/junk delta/delete_delta folders
Taraka Rama Rao Lethavadla created HIVE-26897: - Summary: Provide a command/tool to recover data in ACID table when table data got corrupted with invalid/junk delta/delete_delta folders Key: HIVE-26897 URL: https://issues.apache.org/jira/browse/HIVE-26897 Project: Hive Issue Type: New Feature Reporter: Taraka Rama Rao Lethavadla Example: A table has below directories {noformat} drwx-- - hive hive 0 2022-11-05 19:43 /data/warehouse/tbl/delete_delta_0080483_0087704_v0973185 drwx-- - hive pdl_prod_nosh_jsin 0 2022-12-05 00:18 /data/warehouse/tbl/delete_delta_0080483_0088384_v507{noformat} When we read data from this table, we get below errors {noformat} java.util.concurrent.ExecutionException: java.lang.IllegalStateException: Duplicate key null (attempted merging values org.apache.hadoop.hive.ql.io.AcidInputFormat$DeltaFileMetaData@41776cd9 and org.apache.hadoop.hive.ql.io.AcidInputFormat$DeltaFileMetaData@1404a054){noformat} delete_delta_0080483_0087704_v0973185,delete_delta_0080483_0088384_v507 are created as part of minor compaction. In general, once minor compaction completed, the next minor compaction picks min_writeId value as greater than the value of the previously compacted max_writeId. In this case for both the minor compacted directories could see min_writeId is the same (i.e. 0080483). To mitigate the issue, we had to remove those directories manually from hdfs, then create a fresh table out of it, drop the actual table and rename fresh table to actual table *Proposal* Create a tool/command to read the data from the corrupted ACID table to recover data out of it before we make any changes to the underlying data. So that we can workaround the problem by creating another table with same data -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-26896) Backport of Test fixes for lineage3.q and load_static_ptn_into_bucketed_table.q
Aman Raj created HIVE-26896: --- Summary: Backport of Test fixes for lineage3.q and load_static_ptn_into_bucketed_table.q Key: HIVE-26896 URL: https://issues.apache.org/jira/browse/HIVE-26896 Project: Hive Issue Type: Sub-task Reporter: Aman Raj Assignee: Aman Raj These tests were fixed in branch-3.1 so backporting them to branch-3 -- This message was sent by Atlassian Jira (v8.20.10#820010)