[ 
https://issues.apache.org/jira/browse/HIVE-22911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleksiy Sayankin updated HIVE-22911:
------------------------------------
    Description: 
We found lots of orphan/old/leftover lock entries inside {{HIVE_LOCKS}}. There 
are more than 120k locks in HIVE_LOCKS of MySQL database. We also checked the 
top 3 tables which are related to the existing locks:

 
{code}
mysql> select HL_DB,HL_TABLE, count(*) from HIVE_LOCKS group by 1,2 order by 3 
desc limit 10;
+-----------+------------------------------+----------+
| HL_DB | HL_TABLE | count(*) |
+-----------+------------------------------+----------+
| db1 | table1 | 66984 |
| db1 | table2 | 33208 |
| db1 | table3 | 9315 |
…
{code}

For table “db1. table1”, here are 3 Hive sessions related, and each of the Hive 
session is waiting for 22328 read locks. This is because this table “db1. 
table1” is a huge partition table, and it has more than 200k child partitions. 
I am guessing each of Hive session was trying to do a full table scan on it. I 
group-by based on column {{HL_LAST_HEARTBEAT}} instead, here is the list:

 

{code}
mysql> select cast(FROM_UNIXTIME(HL_LAST_HEARTBEAT/1000) as date) as 
dt,count(*) as cnt from HIVE_LOCKS
    -> group by 1 order by 1;
+------------+--------+
| dt         | cnt    |
+------------+--------+
| 1969-12-31 |      2 |
| 2019-05-20 |     10 |
| 2019-05-21 |      3 |
| 2019-05-23 |      5 |
| 2019-05-24 |      2 |
| 2019-05-25 |      1 |
| 2019-05-29 |      7 |
| 2019-05-30 |      2 |
| 2019-06-11 |     13 |
| 2019-06-28 |      3 |
| 2019-07-02 |      2 |
| 2019-07-04 |      5 |
| 2019-07-09 |      1 |
| 2019-07-15 |      2 |
| 2019-07-16 |      1 |
| 2019-07-18 |      2 |
| 2019-07-20 |      3 |
| 2019-07-29 |      5 |
| 2019-07-30 |      9 |
| 2019-07-31 |      7 |
| 2019-08-02 |      2 |
| 2019-08-06 |      5 |
| 2019-08-07 |     17 |
| 2019-08-08 |      8 |
| 2019-08-09 |      5 |
| 2019-08-21 |      1 |
| 2019-08-22 |     20 |
| 2019-08-23 |      1 |
| 2019-08-26 |      5 |
| 2019-08-27 |     98 |
| 2019-08-28 |      3 |
| 2019-08-29 |      1 |
| 2019-09-02 |      3 |
| 2019-09-04 |      3 |
| 2019-09-05 |    105 |
| 2019-09-06 |      3 |
| 2019-09-07 |      2 |
| 2019-09-09 |      6 |
| 2019-09-12 |      9 |
| 2019-09-13 |      1 |
| 2019-09-17 |      1 |
| 2019-09-24 |      3 |
| 2019-09-26 |      6 |
| 2019-09-27 |      4 |
| 2019-09-30 |      1 |
| 2019-10-01 |      2 |
| 2019-10-03 |      9 |
| 2019-10-04 |      2 |
| 2019-10-06 |      1 |
| 2019-10-08 |      1 |
| 2019-10-09 |      1 |
| 2019-10-10 |      6 |
| 2019-10-11 |      1 |
| 2019-10-16 |     13 |
| 2019-10-17 |      1 |
| 2019-10-18 |      2 |
| 2019-10-19 |      2 |
| 2019-10-21 |     10 |
| 2019-10-22 |      6 |
| 2019-10-28 |      2 |
| 2019-10-29 |      4 |
| 2019-10-30 |      2 |
| 2019-10-31 |      2 |
| 2019-11-05 |      2 |
| 2019-11-06 |      2 |
| 2019-11-11 |      1 |
| 2019-11-13 |      1 |
| 2019-11-14 |      1 |
| 2019-11-21 |      4 |
| 2019-11-26 |      1 |
| 2019-11-27 |      1 |
| 2019-12-05 |      4 |
| 2019-12-06 |      2 |
| 2019-12-12 |      1 |
| 2019-12-14 |      1 |
| 2019-12-15 |      3 |
| 2019-12-16 |      1 |
| 2019-12-17 |      1 |
| 2019-12-18 |      1 |
| 2019-12-19 |      2 |
| 2019-12-20 |      2 |
| 2019-12-23 |      1 |
| 2019-12-27 |      1 |
| 2020-01-07 |      1 |
| 2020-01-08 |     14 |
| 2020-01-09 |      2 |
| 2020-01-12 |    372 |
| 2020-01-14 |      2 |
| 2020-01-15 |      1 |
| 2020-01-20 |     11 |
| 2020-01-21 | 119253 |
| 2020-01-23 |    113 |
| 2020-01-24 |      4 |
| 2020-01-25 |    536 |
| 2020-01-26 |   2132 |
| 2020-01-27 |    396 |
| 2020-01-28 |      1 |
| 2020-01-29 |      3 |
| 2020-01-30 |     11 |
| 2020-01-31 |     11 |
| 2020-02-03 |      2 |
| 2020-02-04 |      4 |
| 2020-02-05 |      5 |
| 2020-02-06 |      8 |
| 2020-02-10 |     32 |
| 2020-02-11 |     15 |
| 2020-02-12 |     14 |
| 2020-02-13 |      1 |
| 2020-02-14 |     92 |
+------------+--------+
109 rows in set (0.16 sec)
{code}

However most of the entries have {{HL_ACQUIRED_AT=NULL}} in {{HIVE_LOCKS}}:
{code}
mysql> SELECT COUNT(*) FROM HIVE_LOCKS WHERE HL_ACQUIRED_AT is null;
+----------+
| COUNT(*) |
+----------+
|   123437 |
+----------+
1 row in set (0.04 sec)
{code}

{code}
mysql> SELECT COUNT(*) FROM HIVE_LOCKS WHERE HL_ACQUIRED_AT is not null;
+----------+
| COUNT(*) |
+----------+
|       97 |
+----------+
1 row in set (0.04 sec)
{code}
 
Hot-fix is to remove orphan/lost locks from {{HIVE_LOCKS}}, but this does not 
solve the problem in the future.

  was:
We found lots of orphan/old/leftover lock entries inside {{HIVE_LOCKS}}. There 
are more than 120k locks in HIVE_LOCKS of MySQL database. We also checked the 
top 3 tables which are related to the existing locks:

 
{code}
mysql> select HL_DB,HL_TABLE, count(*) from HIVE_LOCKS group by 1,2 order by 3 
desc limit 10;
+-----------+------------------------------+----------+
| HL_DB | HL_TABLE | count(*) |
+-----------+------------------------------+----------+
| db1 | table1 | 66984 |
| db1 | table2 | 33208 |
| db1 | table3 | 9315 |
…
{code}

For table “db1. table1”, here are 3 Hive sessions related, and each of the Hive 
session is waiting for 22328 read locks. This is because this table “db1. 
table1” is a huge partition table, and it has more than 200k child partitions. 
I am guessing each of Hive session was trying to do a full table scan on it. I 
group-by based on column {{HL_LAST_HEARTBEAT}} instead, here is the list:

 

{code}
mysql> select cast(FROM_UNIXTIME(HL_LAST_HEARTBEAT/1000) as date) as 
dt,count(*) as cnt from HIVE_LOCKS
    -> group by 1 order by 1;
+------------+--------+
| dt         | cnt    |
+------------+--------+
| 1969-12-31 |      2 |
| 2019-05-20 |     10 |
| 2019-05-21 |      3 |
| 2019-05-23 |      5 |
| 2019-05-24 |      2 |
| 2019-05-25 |      1 |
| 2019-05-29 |      7 |
| 2019-05-30 |      2 |
| 2019-06-11 |     13 |
| 2019-06-28 |      3 |
| 2019-07-02 |      2 |
| 2019-07-04 |      5 |
| 2019-07-09 |      1 |
| 2019-07-15 |      2 |
| 2019-07-16 |      1 |
| 2019-07-18 |      2 |
| 2019-07-20 |      3 |
| 2019-07-29 |      5 |
| 2019-07-30 |      9 |
| 2019-07-31 |      7 |
| 2019-08-02 |      2 |
| 2019-08-06 |      5 |
| 2019-08-07 |     17 |
| 2019-08-08 |      8 |
| 2019-08-09 |      5 |
| 2019-08-21 |      1 |
| 2019-08-22 |     20 |
| 2019-08-23 |      1 |
| 2019-08-26 |      5 |
| 2019-08-27 |     98 |
| 2019-08-28 |      3 |
| 2019-08-29 |      1 |
| 2019-09-02 |      3 |
| 2019-09-04 |      3 |
| 2019-09-05 |    105 |
| 2019-09-06 |      3 |
| 2019-09-07 |      2 |
| 2019-09-09 |      6 |
| 2019-09-12 |      9 |
| 2019-09-13 |      1 |
| 2019-09-17 |      1 |
| 2019-09-24 |      3 |
| 2019-09-26 |      6 |
| 2019-09-27 |      4 |
| 2019-09-30 |      1 |
| 2019-10-01 |      2 |
| 2019-10-03 |      9 |
| 2019-10-04 |      2 |
| 2019-10-06 |      1 |
| 2019-10-08 |      1 |
| 2019-10-09 |      1 |
| 2019-10-10 |      6 |
| 2019-10-11 |      1 |
| 2019-10-16 |     13 |
| 2019-10-17 |      1 |
| 2019-10-18 |      2 |
| 2019-10-19 |      2 |
| 2019-10-21 |     10 |
| 2019-10-22 |      6 |
| 2019-10-28 |      2 |
| 2019-10-29 |      4 |
| 2019-10-30 |      2 |
| 2019-10-31 |      2 |
| 2019-11-05 |      2 |
| 2019-11-06 |      2 |
| 2019-11-11 |      1 |
| 2019-11-13 |      1 |
| 2019-11-14 |      1 |
| 2019-11-21 |      4 |
| 2019-11-26 |      1 |
| 2019-11-27 |      1 |
| 2019-12-05 |      4 |
| 2019-12-06 |      2 |
| 2019-12-12 |      1 |
| 2019-12-14 |      1 |
| 2019-12-15 |      3 |
| 2019-12-16 |      1 |
| 2019-12-17 |      1 |
| 2019-12-18 |      1 |
| 2019-12-19 |      2 |
| 2019-12-20 |      2 |
| 2019-12-23 |      1 |
| 2019-12-27 |      1 |
| 2020-01-07 |      1 |
| 2020-01-08 |     14 |
| 2020-01-09 |      2 |
| 2020-01-12 |    372 |
| 2020-01-14 |      2 |
| 2020-01-15 |      1 |
| 2020-01-20 |     11 |
| 2020-01-21 | 119253 |
| 2020-01-23 |    113 |
| 2020-01-24 |      4 |
| 2020-01-25 |    536 |
| 2020-01-26 |   2132 |
| 2020-01-27 |    396 |
| 2020-01-28 |      1 |
| 2020-01-29 |      3 |
| 2020-01-30 |     11 |
| 2020-01-31 |     11 |
| 2020-02-03 |      2 |
| 2020-02-04 |      4 |
| 2020-02-05 |      5 |
| 2020-02-06 |      8 |
| 2020-02-10 |     32 |
| 2020-02-11 |     15 |
| 2020-02-12 |     14 |
| 2020-02-13 |      1 |
| 2020-02-14 |     92 |
+------------+--------+
109 rows in set (0.16 sec)
{code}

However most of the entries are {{HL_ACQUIRED_AT=NULL}} in {{HIVE_LOCKS}}:
{code}
mysql> SELECT COUNT(*) FROM HIVE_LOCKS WHERE HL_ACQUIRED_AT is null;
+----------+
| COUNT(*) |
+----------+
|   123437 |
+----------+
1 row in set (0.04 sec)
{code}

{code}
mysql> SELECT COUNT(*) FROM HIVE_LOCKS WHERE HL_ACQUIRED_AT is not null;
+----------+
| COUNT(*) |
+----------+
|       97 |
+----------+
1 row in set (0.04 sec)
{code}
 


> Locks entries are left over inside HIVE_LOCKS when using DbTxnManager
> ---------------------------------------------------------------------
>
>                 Key: HIVE-22911
>                 URL: https://issues.apache.org/jira/browse/HIVE-22911
>             Project: Hive
>          Issue Type: Bug
>          Components: Locking, Transactions
>         Environment: Hive-2.3.
>            Reporter: Oleksiy Sayankin
>            Assignee: Oleksiy Sayankin
>            Priority: Critical
>
> We found lots of orphan/old/leftover lock entries inside {{HIVE_LOCKS}}. 
> There are more than 120k locks in HIVE_LOCKS of MySQL database. We also 
> checked the top 3 tables which are related to the existing locks:
>  
> {code}
> mysql> select HL_DB,HL_TABLE, count(*) from HIVE_LOCKS group by 1,2 order by 
> 3 desc limit 10;
> +-----------+------------------------------+----------+
> | HL_DB | HL_TABLE | count(*) |
> +-----------+------------------------------+----------+
> | db1 | table1 | 66984 |
> | db1 | table2 | 33208 |
> | db1 | table3 | 9315 |
> …
> {code}
> For table “db1. table1”, here are 3 Hive sessions related, and each of the 
> Hive session is waiting for 22328 read locks. This is because this table 
> “db1. table1” is a huge partition table, and it has more than 200k child 
> partitions. I am guessing each of Hive session was trying to do a full table 
> scan on it. I group-by based on column {{HL_LAST_HEARTBEAT}} instead, here is 
> the list:
>  
> {code}
> mysql> select cast(FROM_UNIXTIME(HL_LAST_HEARTBEAT/1000) as date) as 
> dt,count(*) as cnt from HIVE_LOCKS
>     -> group by 1 order by 1;
> +------------+--------+
> | dt         | cnt    |
> +------------+--------+
> | 1969-12-31 |      2 |
> | 2019-05-20 |     10 |
> | 2019-05-21 |      3 |
> | 2019-05-23 |      5 |
> | 2019-05-24 |      2 |
> | 2019-05-25 |      1 |
> | 2019-05-29 |      7 |
> | 2019-05-30 |      2 |
> | 2019-06-11 |     13 |
> | 2019-06-28 |      3 |
> | 2019-07-02 |      2 |
> | 2019-07-04 |      5 |
> | 2019-07-09 |      1 |
> | 2019-07-15 |      2 |
> | 2019-07-16 |      1 |
> | 2019-07-18 |      2 |
> | 2019-07-20 |      3 |
> | 2019-07-29 |      5 |
> | 2019-07-30 |      9 |
> | 2019-07-31 |      7 |
> | 2019-08-02 |      2 |
> | 2019-08-06 |      5 |
> | 2019-08-07 |     17 |
> | 2019-08-08 |      8 |
> | 2019-08-09 |      5 |
> | 2019-08-21 |      1 |
> | 2019-08-22 |     20 |
> | 2019-08-23 |      1 |
> | 2019-08-26 |      5 |
> | 2019-08-27 |     98 |
> | 2019-08-28 |      3 |
> | 2019-08-29 |      1 |
> | 2019-09-02 |      3 |
> | 2019-09-04 |      3 |
> | 2019-09-05 |    105 |
> | 2019-09-06 |      3 |
> | 2019-09-07 |      2 |
> | 2019-09-09 |      6 |
> | 2019-09-12 |      9 |
> | 2019-09-13 |      1 |
> | 2019-09-17 |      1 |
> | 2019-09-24 |      3 |
> | 2019-09-26 |      6 |
> | 2019-09-27 |      4 |
> | 2019-09-30 |      1 |
> | 2019-10-01 |      2 |
> | 2019-10-03 |      9 |
> | 2019-10-04 |      2 |
> | 2019-10-06 |      1 |
> | 2019-10-08 |      1 |
> | 2019-10-09 |      1 |
> | 2019-10-10 |      6 |
> | 2019-10-11 |      1 |
> | 2019-10-16 |     13 |
> | 2019-10-17 |      1 |
> | 2019-10-18 |      2 |
> | 2019-10-19 |      2 |
> | 2019-10-21 |     10 |
> | 2019-10-22 |      6 |
> | 2019-10-28 |      2 |
> | 2019-10-29 |      4 |
> | 2019-10-30 |      2 |
> | 2019-10-31 |      2 |
> | 2019-11-05 |      2 |
> | 2019-11-06 |      2 |
> | 2019-11-11 |      1 |
> | 2019-11-13 |      1 |
> | 2019-11-14 |      1 |
> | 2019-11-21 |      4 |
> | 2019-11-26 |      1 |
> | 2019-11-27 |      1 |
> | 2019-12-05 |      4 |
> | 2019-12-06 |      2 |
> | 2019-12-12 |      1 |
> | 2019-12-14 |      1 |
> | 2019-12-15 |      3 |
> | 2019-12-16 |      1 |
> | 2019-12-17 |      1 |
> | 2019-12-18 |      1 |
> | 2019-12-19 |      2 |
> | 2019-12-20 |      2 |
> | 2019-12-23 |      1 |
> | 2019-12-27 |      1 |
> | 2020-01-07 |      1 |
> | 2020-01-08 |     14 |
> | 2020-01-09 |      2 |
> | 2020-01-12 |    372 |
> | 2020-01-14 |      2 |
> | 2020-01-15 |      1 |
> | 2020-01-20 |     11 |
> | 2020-01-21 | 119253 |
> | 2020-01-23 |    113 |
> | 2020-01-24 |      4 |
> | 2020-01-25 |    536 |
> | 2020-01-26 |   2132 |
> | 2020-01-27 |    396 |
> | 2020-01-28 |      1 |
> | 2020-01-29 |      3 |
> | 2020-01-30 |     11 |
> | 2020-01-31 |     11 |
> | 2020-02-03 |      2 |
> | 2020-02-04 |      4 |
> | 2020-02-05 |      5 |
> | 2020-02-06 |      8 |
> | 2020-02-10 |     32 |
> | 2020-02-11 |     15 |
> | 2020-02-12 |     14 |
> | 2020-02-13 |      1 |
> | 2020-02-14 |     92 |
> +------------+--------+
> 109 rows in set (0.16 sec)
> {code}
> However most of the entries have {{HL_ACQUIRED_AT=NULL}} in {{HIVE_LOCKS}}:
> {code}
> mysql> SELECT COUNT(*) FROM HIVE_LOCKS WHERE HL_ACQUIRED_AT is null;
> +----------+
> | COUNT(*) |
> +----------+
> |   123437 |
> +----------+
> 1 row in set (0.04 sec)
> {code}
> {code}
> mysql> SELECT COUNT(*) FROM HIVE_LOCKS WHERE HL_ACQUIRED_AT is not null;
> +----------+
> | COUNT(*) |
> +----------+
> |       97 |
> +----------+
> 1 row in set (0.04 sec)
> {code}
>  
> Hot-fix is to remove orphan/lost locks from {{HIVE_LOCKS}}, but this does not 
> solve the problem in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to