[ https://issues.apache.org/jira/browse/HIVE-22911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Oleksiy Sayankin updated HIVE-22911: ------------------------------------ Description: We found lots of orphan/old/leftover lock entries inside {{HIVE_LOCKS}}. There are more than 120k locks in HIVE_LOCKS of MySQL database. We also checked the top 3 tables which are related to the existing locks: {code} mysql> select HL_DB,HL_TABLE, count(*) from HIVE_LOCKS group by 1,2 order by 3 desc limit 10; +-----------+------------------------------+----------+ | HL_DB | HL_TABLE | count(*) | +-----------+------------------------------+----------+ | db1 | table1 | 66984 | | db1 | table2 | 33208 | | db1 | table3 | 9315 | … {code} For table “db1. table1”, here are 3 Hive sessions related, and each of the Hive session is waiting for 22328 read locks. This is because this table “db1. table1” is a huge partition table, and it has more than 200k child partitions. I am guessing each of Hive session was trying to do a full table scan on it. I group-by based on column {{HL_LAST_HEARTBEAT}} instead, here is the list: {code} mysql> select cast(FROM_UNIXTIME(HL_LAST_HEARTBEAT/1000) as date) as dt,count(*) as cnt from HIVE_LOCKS -> group by 1 order by 1; +------------+--------+ | dt | cnt | +------------+--------+ | 1969-12-31 | 2 | | 2019-05-20 | 10 | | 2019-05-21 | 3 | | 2019-05-23 | 5 | | 2019-05-24 | 2 | | 2019-05-25 | 1 | | 2019-05-29 | 7 | | 2019-05-30 | 2 | | 2019-06-11 | 13 | | 2019-06-28 | 3 | | 2019-07-02 | 2 | | 2019-07-04 | 5 | | 2019-07-09 | 1 | | 2019-07-15 | 2 | | 2019-07-16 | 1 | | 2019-07-18 | 2 | | 2019-07-20 | 3 | | 2019-07-29 | 5 | | 2019-07-30 | 9 | | 2019-07-31 | 7 | | 2019-08-02 | 2 | | 2019-08-06 | 5 | | 2019-08-07 | 17 | | 2019-08-08 | 8 | | 2019-08-09 | 5 | | 2019-08-21 | 1 | | 2019-08-22 | 20 | | 2019-08-23 | 1 | | 2019-08-26 | 5 | | 2019-08-27 | 98 | | 2019-08-28 | 3 | | 2019-08-29 | 1 | | 2019-09-02 | 3 | | 2019-09-04 | 3 | | 2019-09-05 | 105 | | 2019-09-06 | 3 | | 2019-09-07 | 2 | | 2019-09-09 | 6 | | 2019-09-12 | 9 | | 2019-09-13 | 1 | | 2019-09-17 | 1 | | 2019-09-24 | 3 | | 2019-09-26 | 6 | | 2019-09-27 | 4 | | 2019-09-30 | 1 | | 2019-10-01 | 2 | | 2019-10-03 | 9 | | 2019-10-04 | 2 | | 2019-10-06 | 1 | | 2019-10-08 | 1 | | 2019-10-09 | 1 | | 2019-10-10 | 6 | | 2019-10-11 | 1 | | 2019-10-16 | 13 | | 2019-10-17 | 1 | | 2019-10-18 | 2 | | 2019-10-19 | 2 | | 2019-10-21 | 10 | | 2019-10-22 | 6 | | 2019-10-28 | 2 | | 2019-10-29 | 4 | | 2019-10-30 | 2 | | 2019-10-31 | 2 | | 2019-11-05 | 2 | | 2019-11-06 | 2 | | 2019-11-11 | 1 | | 2019-11-13 | 1 | | 2019-11-14 | 1 | | 2019-11-21 | 4 | | 2019-11-26 | 1 | | 2019-11-27 | 1 | | 2019-12-05 | 4 | | 2019-12-06 | 2 | | 2019-12-12 | 1 | | 2019-12-14 | 1 | | 2019-12-15 | 3 | | 2019-12-16 | 1 | | 2019-12-17 | 1 | | 2019-12-18 | 1 | | 2019-12-19 | 2 | | 2019-12-20 | 2 | | 2019-12-23 | 1 | | 2019-12-27 | 1 | | 2020-01-07 | 1 | | 2020-01-08 | 14 | | 2020-01-09 | 2 | | 2020-01-12 | 372 | | 2020-01-14 | 2 | | 2020-01-15 | 1 | | 2020-01-20 | 11 | | 2020-01-21 | 119253 | | 2020-01-23 | 113 | | 2020-01-24 | 4 | | 2020-01-25 | 536 | | 2020-01-26 | 2132 | | 2020-01-27 | 396 | | 2020-01-28 | 1 | | 2020-01-29 | 3 | | 2020-01-30 | 11 | | 2020-01-31 | 11 | | 2020-02-03 | 2 | | 2020-02-04 | 4 | | 2020-02-05 | 5 | | 2020-02-06 | 8 | | 2020-02-10 | 32 | | 2020-02-11 | 15 | | 2020-02-12 | 14 | | 2020-02-13 | 1 | | 2020-02-14 | 92 | +------------+--------+ 109 rows in set (0.16 sec) {code} However most of the entries have {{HL_ACQUIRED_AT=NULL}} in {{HIVE_LOCKS}}: {code} mysql> SELECT COUNT(*) FROM HIVE_LOCKS WHERE HL_ACQUIRED_AT is null; +----------+ | COUNT(*) | +----------+ | 123437 | +----------+ 1 row in set (0.04 sec) {code} {code} mysql> SELECT COUNT(*) FROM HIVE_LOCKS WHERE HL_ACQUIRED_AT is not null; +----------+ | COUNT(*) | +----------+ | 97 | +----------+ 1 row in set (0.04 sec) {code} Hot-fix is to remove orphan/lost locks from {{HIVE_LOCKS}}, but this does not solve the problem in the future. was: We found lots of orphan/old/leftover lock entries inside {{HIVE_LOCKS}}. There are more than 120k locks in HIVE_LOCKS of MySQL database. We also checked the top 3 tables which are related to the existing locks: {code} mysql> select HL_DB,HL_TABLE, count(*) from HIVE_LOCKS group by 1,2 order by 3 desc limit 10; +-----------+------------------------------+----------+ | HL_DB | HL_TABLE | count(*) | +-----------+------------------------------+----------+ | db1 | table1 | 66984 | | db1 | table2 | 33208 | | db1 | table3 | 9315 | … {code} For table “db1. table1”, here are 3 Hive sessions related, and each of the Hive session is waiting for 22328 read locks. This is because this table “db1. table1” is a huge partition table, and it has more than 200k child partitions. I am guessing each of Hive session was trying to do a full table scan on it. I group-by based on column {{HL_LAST_HEARTBEAT}} instead, here is the list: {code} mysql> select cast(FROM_UNIXTIME(HL_LAST_HEARTBEAT/1000) as date) as dt,count(*) as cnt from HIVE_LOCKS -> group by 1 order by 1; +------------+--------+ | dt | cnt | +------------+--------+ | 1969-12-31 | 2 | | 2019-05-20 | 10 | | 2019-05-21 | 3 | | 2019-05-23 | 5 | | 2019-05-24 | 2 | | 2019-05-25 | 1 | | 2019-05-29 | 7 | | 2019-05-30 | 2 | | 2019-06-11 | 13 | | 2019-06-28 | 3 | | 2019-07-02 | 2 | | 2019-07-04 | 5 | | 2019-07-09 | 1 | | 2019-07-15 | 2 | | 2019-07-16 | 1 | | 2019-07-18 | 2 | | 2019-07-20 | 3 | | 2019-07-29 | 5 | | 2019-07-30 | 9 | | 2019-07-31 | 7 | | 2019-08-02 | 2 | | 2019-08-06 | 5 | | 2019-08-07 | 17 | | 2019-08-08 | 8 | | 2019-08-09 | 5 | | 2019-08-21 | 1 | | 2019-08-22 | 20 | | 2019-08-23 | 1 | | 2019-08-26 | 5 | | 2019-08-27 | 98 | | 2019-08-28 | 3 | | 2019-08-29 | 1 | | 2019-09-02 | 3 | | 2019-09-04 | 3 | | 2019-09-05 | 105 | | 2019-09-06 | 3 | | 2019-09-07 | 2 | | 2019-09-09 | 6 | | 2019-09-12 | 9 | | 2019-09-13 | 1 | | 2019-09-17 | 1 | | 2019-09-24 | 3 | | 2019-09-26 | 6 | | 2019-09-27 | 4 | | 2019-09-30 | 1 | | 2019-10-01 | 2 | | 2019-10-03 | 9 | | 2019-10-04 | 2 | | 2019-10-06 | 1 | | 2019-10-08 | 1 | | 2019-10-09 | 1 | | 2019-10-10 | 6 | | 2019-10-11 | 1 | | 2019-10-16 | 13 | | 2019-10-17 | 1 | | 2019-10-18 | 2 | | 2019-10-19 | 2 | | 2019-10-21 | 10 | | 2019-10-22 | 6 | | 2019-10-28 | 2 | | 2019-10-29 | 4 | | 2019-10-30 | 2 | | 2019-10-31 | 2 | | 2019-11-05 | 2 | | 2019-11-06 | 2 | | 2019-11-11 | 1 | | 2019-11-13 | 1 | | 2019-11-14 | 1 | | 2019-11-21 | 4 | | 2019-11-26 | 1 | | 2019-11-27 | 1 | | 2019-12-05 | 4 | | 2019-12-06 | 2 | | 2019-12-12 | 1 | | 2019-12-14 | 1 | | 2019-12-15 | 3 | | 2019-12-16 | 1 | | 2019-12-17 | 1 | | 2019-12-18 | 1 | | 2019-12-19 | 2 | | 2019-12-20 | 2 | | 2019-12-23 | 1 | | 2019-12-27 | 1 | | 2020-01-07 | 1 | | 2020-01-08 | 14 | | 2020-01-09 | 2 | | 2020-01-12 | 372 | | 2020-01-14 | 2 | | 2020-01-15 | 1 | | 2020-01-20 | 11 | | 2020-01-21 | 119253 | | 2020-01-23 | 113 | | 2020-01-24 | 4 | | 2020-01-25 | 536 | | 2020-01-26 | 2132 | | 2020-01-27 | 396 | | 2020-01-28 | 1 | | 2020-01-29 | 3 | | 2020-01-30 | 11 | | 2020-01-31 | 11 | | 2020-02-03 | 2 | | 2020-02-04 | 4 | | 2020-02-05 | 5 | | 2020-02-06 | 8 | | 2020-02-10 | 32 | | 2020-02-11 | 15 | | 2020-02-12 | 14 | | 2020-02-13 | 1 | | 2020-02-14 | 92 | +------------+--------+ 109 rows in set (0.16 sec) {code} However most of the entries are {{HL_ACQUIRED_AT=NULL}} in {{HIVE_LOCKS}}: {code} mysql> SELECT COUNT(*) FROM HIVE_LOCKS WHERE HL_ACQUIRED_AT is null; +----------+ | COUNT(*) | +----------+ | 123437 | +----------+ 1 row in set (0.04 sec) {code} {code} mysql> SELECT COUNT(*) FROM HIVE_LOCKS WHERE HL_ACQUIRED_AT is not null; +----------+ | COUNT(*) | +----------+ | 97 | +----------+ 1 row in set (0.04 sec) {code} > Locks entries are left over inside HIVE_LOCKS when using DbTxnManager > --------------------------------------------------------------------- > > Key: HIVE-22911 > URL: https://issues.apache.org/jira/browse/HIVE-22911 > Project: Hive > Issue Type: Bug > Components: Locking, Transactions > Environment: Hive-2.3. > Reporter: Oleksiy Sayankin > Assignee: Oleksiy Sayankin > Priority: Critical > > We found lots of orphan/old/leftover lock entries inside {{HIVE_LOCKS}}. > There are more than 120k locks in HIVE_LOCKS of MySQL database. We also > checked the top 3 tables which are related to the existing locks: > > {code} > mysql> select HL_DB,HL_TABLE, count(*) from HIVE_LOCKS group by 1,2 order by > 3 desc limit 10; > +-----------+------------------------------+----------+ > | HL_DB | HL_TABLE | count(*) | > +-----------+------------------------------+----------+ > | db1 | table1 | 66984 | > | db1 | table2 | 33208 | > | db1 | table3 | 9315 | > … > {code} > For table “db1. table1”, here are 3 Hive sessions related, and each of the > Hive session is waiting for 22328 read locks. This is because this table > “db1. table1” is a huge partition table, and it has more than 200k child > partitions. I am guessing each of Hive session was trying to do a full table > scan on it. I group-by based on column {{HL_LAST_HEARTBEAT}} instead, here is > the list: > > {code} > mysql> select cast(FROM_UNIXTIME(HL_LAST_HEARTBEAT/1000) as date) as > dt,count(*) as cnt from HIVE_LOCKS > -> group by 1 order by 1; > +------------+--------+ > | dt | cnt | > +------------+--------+ > | 1969-12-31 | 2 | > | 2019-05-20 | 10 | > | 2019-05-21 | 3 | > | 2019-05-23 | 5 | > | 2019-05-24 | 2 | > | 2019-05-25 | 1 | > | 2019-05-29 | 7 | > | 2019-05-30 | 2 | > | 2019-06-11 | 13 | > | 2019-06-28 | 3 | > | 2019-07-02 | 2 | > | 2019-07-04 | 5 | > | 2019-07-09 | 1 | > | 2019-07-15 | 2 | > | 2019-07-16 | 1 | > | 2019-07-18 | 2 | > | 2019-07-20 | 3 | > | 2019-07-29 | 5 | > | 2019-07-30 | 9 | > | 2019-07-31 | 7 | > | 2019-08-02 | 2 | > | 2019-08-06 | 5 | > | 2019-08-07 | 17 | > | 2019-08-08 | 8 | > | 2019-08-09 | 5 | > | 2019-08-21 | 1 | > | 2019-08-22 | 20 | > | 2019-08-23 | 1 | > | 2019-08-26 | 5 | > | 2019-08-27 | 98 | > | 2019-08-28 | 3 | > | 2019-08-29 | 1 | > | 2019-09-02 | 3 | > | 2019-09-04 | 3 | > | 2019-09-05 | 105 | > | 2019-09-06 | 3 | > | 2019-09-07 | 2 | > | 2019-09-09 | 6 | > | 2019-09-12 | 9 | > | 2019-09-13 | 1 | > | 2019-09-17 | 1 | > | 2019-09-24 | 3 | > | 2019-09-26 | 6 | > | 2019-09-27 | 4 | > | 2019-09-30 | 1 | > | 2019-10-01 | 2 | > | 2019-10-03 | 9 | > | 2019-10-04 | 2 | > | 2019-10-06 | 1 | > | 2019-10-08 | 1 | > | 2019-10-09 | 1 | > | 2019-10-10 | 6 | > | 2019-10-11 | 1 | > | 2019-10-16 | 13 | > | 2019-10-17 | 1 | > | 2019-10-18 | 2 | > | 2019-10-19 | 2 | > | 2019-10-21 | 10 | > | 2019-10-22 | 6 | > | 2019-10-28 | 2 | > | 2019-10-29 | 4 | > | 2019-10-30 | 2 | > | 2019-10-31 | 2 | > | 2019-11-05 | 2 | > | 2019-11-06 | 2 | > | 2019-11-11 | 1 | > | 2019-11-13 | 1 | > | 2019-11-14 | 1 | > | 2019-11-21 | 4 | > | 2019-11-26 | 1 | > | 2019-11-27 | 1 | > | 2019-12-05 | 4 | > | 2019-12-06 | 2 | > | 2019-12-12 | 1 | > | 2019-12-14 | 1 | > | 2019-12-15 | 3 | > | 2019-12-16 | 1 | > | 2019-12-17 | 1 | > | 2019-12-18 | 1 | > | 2019-12-19 | 2 | > | 2019-12-20 | 2 | > | 2019-12-23 | 1 | > | 2019-12-27 | 1 | > | 2020-01-07 | 1 | > | 2020-01-08 | 14 | > | 2020-01-09 | 2 | > | 2020-01-12 | 372 | > | 2020-01-14 | 2 | > | 2020-01-15 | 1 | > | 2020-01-20 | 11 | > | 2020-01-21 | 119253 | > | 2020-01-23 | 113 | > | 2020-01-24 | 4 | > | 2020-01-25 | 536 | > | 2020-01-26 | 2132 | > | 2020-01-27 | 396 | > | 2020-01-28 | 1 | > | 2020-01-29 | 3 | > | 2020-01-30 | 11 | > | 2020-01-31 | 11 | > | 2020-02-03 | 2 | > | 2020-02-04 | 4 | > | 2020-02-05 | 5 | > | 2020-02-06 | 8 | > | 2020-02-10 | 32 | > | 2020-02-11 | 15 | > | 2020-02-12 | 14 | > | 2020-02-13 | 1 | > | 2020-02-14 | 92 | > +------------+--------+ > 109 rows in set (0.16 sec) > {code} > However most of the entries have {{HL_ACQUIRED_AT=NULL}} in {{HIVE_LOCKS}}: > {code} > mysql> SELECT COUNT(*) FROM HIVE_LOCKS WHERE HL_ACQUIRED_AT is null; > +----------+ > | COUNT(*) | > +----------+ > | 123437 | > +----------+ > 1 row in set (0.04 sec) > {code} > {code} > mysql> SELECT COUNT(*) FROM HIVE_LOCKS WHERE HL_ACQUIRED_AT is not null; > +----------+ > | COUNT(*) | > +----------+ > | 97 | > +----------+ > 1 row in set (0.04 sec) > {code} > > Hot-fix is to remove orphan/lost locks from {{HIVE_LOCKS}}, but this does not > solve the problem in the future. -- This message was sent by Atlassian Jira (v8.3.4#803005)