[jira] [Created] (HIVE-26901) Add metrics on transactions in replication metrics table

2023-01-02 Thread Amit Saonerkar (Jira)
Amit Saonerkar created HIVE-26901:
-

 Summary: Add metrics on transactions in replication metrics table 
 Key: HIVE-26901
 URL: https://issues.apache.org/jira/browse/HIVE-26901
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Reporter: Amit Saonerkar
Assignee: Amit Saonerkar


This is related to corresponding 
[https://jira.cloudera.com/browse/CDPD-17985?filter=-1]

We need to enahnce replication metrics table information by adding informations 
related to transactions during REPL DUMP/LOAD operations. Basically idea here 
is to give user a picture about how transaction are making progress during dump 
and load operations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-26900) Error message not representing the correct line number with a syntax error in a HQL File

2023-01-02 Thread Vikram Ahuja (Jira)
Vikram Ahuja created HIVE-26900:
---

 Summary: Error message not representing the correct line number 
with a syntax error in a HQL File
 Key: HIVE-26900
 URL: https://issues.apache.org/jira/browse/HIVE-26900
 Project: Hive
  Issue Type: Bug
Reporter: Vikram Ahuja


When a wrong syntax is added in a HQL file, the error thrown by beeline while 
running the HQL file is having the wrong line number.  The line number and even 
the position is incorrect. Seems like parser is not considering spaces and new 
lines and always throwing the error on line number 1 irrespective of what line 
the error is on in the HQL file



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-26899) Upgrade arrow to 0.11.0 in branch-3

2023-01-02 Thread Aman Raj (Jira)
Aman Raj created HIVE-26899:
---

 Summary: Upgrade arrow to 0.11.0 in branch-3
 Key: HIVE-26899
 URL: https://issues.apache.org/jira/browse/HIVE-26899
 Project: Hive
  Issue Type: Sub-task
Reporter: Aman Raj
Assignee: Aman Raj






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-26898) Split Notification logging so that we can busy clusters can have better performance

2023-01-02 Thread Taraka Rama Rao Lethavadla (Jira)
Taraka Rama Rao Lethavadla created HIVE-26898:
-

 Summary: Split Notification logging so that we can busy clusters 
can have better performance
 Key: HIVE-26898
 URL: https://issues.apache.org/jira/browse/HIVE-26898
 Project: Hive
  Issue Type: New Feature
Reporter: Taraka Rama Rao Lethavadla


For DDL & DML events are logged into notifications log table and those get 
cleaned as soon as ttl got expired.

In most of the busy clusters, the notification log is growing even though 
cleaner is running and kept on cleaning the events. It means the rate of Hive 
db operations are very high compared to rate at which cleaning is happening.

So any query on this table is becoming bottle neck at backend DB causing slow 
response

The proposal is to split the notification log table in to multiple tables like 

notification_log_dml - for all DML queries

notification_log_insert - for all insert queries

..

etc.

 

So that load on that single table gets reduced improving the performance of the 
backend db as well as Hive



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-26897) Provide a command/tool to recover data in ACID table when table data got corrupted with invalid/junk delta/delete_delta folders

2023-01-02 Thread Taraka Rama Rao Lethavadla (Jira)
Taraka Rama Rao Lethavadla created HIVE-26897:
-

 Summary: Provide a command/tool to recover data in ACID table when 
table data got corrupted with invalid/junk delta/delete_delta folders 
 Key: HIVE-26897
 URL: https://issues.apache.org/jira/browse/HIVE-26897
 Project: Hive
  Issue Type: New Feature
Reporter: Taraka Rama Rao Lethavadla


Example: A table has below directories
{noformat}
drwx-- - hive hive 0 2022-11-05 19:43 
/data/warehouse/tbl/delete_delta_0080483_0087704_v0973185
drwx-- - hive pdl_prod_nosh_jsin 0 2022-12-05 00:18 
/data/warehouse/tbl/delete_delta_0080483_0088384_v507{noformat}
When we read data from this table, we get below errors
{noformat}
java.util.concurrent.ExecutionException: java.lang.IllegalStateException: 
Duplicate key null (attempted merging values 
org.apache.hadoop.hive.ql.io.AcidInputFormat$DeltaFileMetaData@41776cd9 and 
org.apache.hadoop.hive.ql.io.AcidInputFormat$DeltaFileMetaData@1404a054){noformat}
delete_delta_0080483_0087704_v0973185,delete_delta_0080483_0088384_v507 are 
created as part of minor compaction. In general, once minor compaction 
completed, the next minor compaction picks min_writeId value as greater than 
the value of the previously compacted max_writeId. In this case for both the 
minor compacted directories could see min_writeId is the same (i.e. 0080483).

To mitigate the issue, we had to remove those directories manually from hdfs, 
then create a fresh table out of it, drop the actual table and rename fresh 
table to actual table

*Proposal*

Create a tool/command to read the data from the corrupted ACID table to recover 
data out of it before we make any changes to the underlying data. So that we 
can workaround the problem by creating another table with same data



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-26896) Backport of Test fixes for lineage3.q and load_static_ptn_into_bucketed_table.q

2023-01-02 Thread Aman Raj (Jira)
Aman Raj created HIVE-26896:
---

 Summary: Backport of Test fixes for lineage3.q and 
load_static_ptn_into_bucketed_table.q
 Key: HIVE-26896
 URL: https://issues.apache.org/jira/browse/HIVE-26896
 Project: Hive
  Issue Type: Sub-task
Reporter: Aman Raj
Assignee: Aman Raj


These tests were fixed in branch-3.1 so backporting them to branch-3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)