[jira] [Comment Edited] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked
[ https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17083169#comment-17083169 ] Peter Vary edited comment on HIVE-21354 at 4/14/20, 12:39 PM: -- [~belugabehr]: I fear what we have found here is a hornet nest. Consider: {code} 0: jdbc:hive2://localhost:10003> explain locks insert into acid_part select * from acid_part where j=1; +---+ |Explain| +---+ | LOCK INFORMATION: | | default.acid_part -> SHARED_READ | | default.acid_part.j=1 -> SHARED_READ | | default.acid_part -> SHARED_READ | +---+ {code} The "first" table level lock is not needed (the source is the read which only reads {{j=1}} partition), but the "second" table level lock is needed (the source is the dynamic partitioning write) :) So I would create another jira to rationalize which locks are needed, and which locks are not needed. In this jira we should concentrate on the final filtering / escalation of the locks, with the steps you already suggested: * Remove partition locks, if we have a same type table level lock * Remove partition locks, and replace them with table level lock with the same type, if the number of locks with this type is higher than the configured (hive.lock.escalation.num) value. * Also we should remove duplicate locks in the end Your thoughts [~belugabehr], [~dkuzmenko]? Thanks, Peter was (Author: pvary): [~belugabehr]: I fear what we have found here is a hornet nest. Consider: {code} 0: jdbc:hive2://localhost:10003> explain locks insert into acid_part select * from acid_part where j=1; +---+ |Explain| +---+ | LOCK INFORMATION: | | default.acid_part -> SHARED_READ | | default.acid_part.j=1 -> SHARED_READ | | default.acid_part -> SHARED_READ | +---+ {code} The "first" table level lock is not needed (the source is the read which only reads {{j=1}} partition), but the "second" table level lock is needed (the source is the dynamic partitioning write) :) So I would create another jira to rationalize which locks are needed, and which locks are not needed. In this jira we should concentrate on the final filtering / escalation of the locks, with the steps you already suggested: * Remove partition locks, if we have a same type table level lock * Remove partition locks, and replace them with table level lock with the same type, if the number of locks with this type is higher than the configured (hive.lock.escalation.num) value. Your thoughts [~belugabehr], [~dkuzmenko]? Thanks, Peter > Lock The Entire Table If Majority Of Partitions Are Locked > -- > > Key: HIVE-21354 > URL: https://issues.apache.org/jira/browse/HIVE-21354 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0, 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > > One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism. > When a Hive query interacts with a table which has a lot of partitions, this > may put a lot of stress on the ZK system. > Please add a heuristic that works like this: > # Count the number of partitions that a query is required to lock > # Obtain the total number of partitions in the table > # If the number of partitions accessed by the query is greater than or equal > to half the total number of partitions, simply create one ZNode lock at the > table level. > This would improve performance of many queries, but in particular, a {{select > count(1) from table}} ... or ... {{select * from table limit 5}} where the > table has many partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked
[ https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17083169#comment-17083169 ] Peter Vary edited comment on HIVE-21354 at 4/14/20, 12:38 PM: -- [~belugabehr]: I fear what we have found here is a hornet nest. Consider: {code} 0: jdbc:hive2://localhost:10003> explain locks insert into acid_part select * from acid_part where j=1; +---+ |Explain| +---+ | LOCK INFORMATION: | | default.acid_part -> SHARED_READ | | default.acid_part.j=1 -> SHARED_READ | | default.acid_part -> SHARED_READ | +---+ {code} The "first" table level lock is not needed (the source is the read which only reads {{j=1}} partition), but the "second" table level lock is needed (the source is the dynamic partitioning write) :) So I would create another jira to rationalize which locks are needed, and which locks are not needed. In this jira we should concentrate on the final filtering / escalation of the locks, with the steps you already suggested: * Remove partition locks, if we have a same type table level lock * Remove partition locks, and replace them with table level lock with the same type, if the number of locks with this type is higher than the configured (hive.lock.escalation.num) value. Your thoughts [~belugabehr], [~dkuzmenko]? Thanks, Peter was (Author: pvary): [~belugabehr]: I fear what we found here is a hornet nest. Consider: {code} 0: jdbc:hive2://localhost:10003> explain locks insert into acid_part select * from acid_part where j=1; +---+ |Explain| +---+ | LOCK INFORMATION: | | default.acid_part -> SHARED_READ | | default.acid_part.j=1 -> SHARED_READ | | default.acid_part -> SHARED_READ | +---+ {code} The "first" table level lock is not needed (the source is the read which only reads {{j=1}} partition), but the "second" table level lock is needed (the source is the dynamic partitioning write) :) So I would create another jira to rationalize which locks are needed, and which locks are not needed. In this jira we should concentrate on the final filtering / escalation of the locks, with the steps you already suggested: * Remove partition locks, if we have a same type table level lock * Remove partition locks, and replace them with table level lock with the same type, if the number of locks with this type is higher than the configured (hive.lock.escalation.num) value. Your thoughts [~belugabehr], [~dkuzmenko]? Thanks, Peter > Lock The Entire Table If Majority Of Partitions Are Locked > -- > > Key: HIVE-21354 > URL: https://issues.apache.org/jira/browse/HIVE-21354 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0, 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > > One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism. > When a Hive query interacts with a table which has a lot of partitions, this > may put a lot of stress on the ZK system. > Please add a heuristic that works like this: > # Count the number of partitions that a query is required to lock > # Obtain the total number of partitions in the table > # If the number of partitions accessed by the query is greater than or equal > to half the total number of partitions, simply create one ZNode lock at the > table level. > This would improve performance of many queries, but in particular, a {{select > count(1) from table}} ... or ... {{select * from table limit 5}} where the > table has many partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked
[ https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082966#comment-17082966 ] Denys Kuzmenko edited comment on HIVE-21354 at 4/14/20, 8:17 AM: - [~belugabehr], [~pvary], checkLock searches stuff hierarchically, so if you have conflicting locks on any level (db/table/partition) - it's gonna backoff and try later. Exception are: EXCLUSIVE on a table/partition shouldn't prevent SHARED_READ on a database. was (Author: dkuzmenko): [~belugabehr], [~pvary], checkLock searches stuff hierarchically, so if you have conflicting locks on any level (db/table/partition) - it's gonna backoff and try later. Exception are EXCLUSIVE on a table shouldn't prevent SHARED_READ on a database, similarly EXCLUSIVE on a partition should not conflict with SHARED_READ on a database. > Lock The Entire Table If Majority Of Partitions Are Locked > -- > > Key: HIVE-21354 > URL: https://issues.apache.org/jira/browse/HIVE-21354 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0, 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > > One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism. > When a Hive query interacts with a table which has a lot of partitions, this > may put a lot of stress on the ZK system. > Please add a heuristic that works like this: > # Count the number of partitions that a query is required to lock > # Obtain the total number of partitions in the table > # If the number of partitions accessed by the query is greater than or equal > to half the total number of partitions, simply create one ZNode lock at the > table level. > This would improve performance of many queries, but in particular, a {{select > count(1) from table}} ... or ... {{select * from table limit 5}} where the > table has many partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked
[ https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082966#comment-17082966 ] Denys Kuzmenko edited comment on HIVE-21354 at 4/14/20, 8:15 AM: - [~belugabehr], [~pvary], checkLock searches stuff hierarchically, so if you have conflicting locks on any level (db/table/partition) - it's gonna backoff and try later. Exception are EXCLUSIVE on a table shouldn't prevent SHARED_READ on a database, similarly EXCLUSIVE on a partition should not conflict with SHARED_READ on a database. was (Author: dkuzmenko): [~belugabehr], [~pvary], checkLock searches stuff hierarchically, so if you have conflicting locks on any level (table/partition) - it's gonna backoff and try later. > Lock The Entire Table If Majority Of Partitions Are Locked > -- > > Key: HIVE-21354 > URL: https://issues.apache.org/jira/browse/HIVE-21354 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0, 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > > One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism. > When a Hive query interacts with a table which has a lot of partitions, this > may put a lot of stress on the ZK system. > Please add a heuristic that works like this: > # Count the number of partitions that a query is required to lock > # Obtain the total number of partitions in the table > # If the number of partitions accessed by the query is greater than or equal > to half the total number of partitions, simply create one ZNode lock at the > table level. > This would improve performance of many queries, but in particular, a {{select > count(1) from table}} ... or ... {{select * from table limit 5}} where the > table has many partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked
[ https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082966#comment-17082966 ] Denys Kuzmenko edited comment on HIVE-21354 at 4/14/20, 7:57 AM: - [~belugabehr], [~pvary], checkLock searches stuff hierarchically, so if you have conflicting locks on any level (table/partition) - it's gonna backoff and try later. was (Author: dkuzmenko): [~belugabehr], [~pvary], checkLock searches stuff hierarchically, so if you have conflicting locks on any level - it's gonna backoff and try later. > Lock The Entire Table If Majority Of Partitions Are Locked > -- > > Key: HIVE-21354 > URL: https://issues.apache.org/jira/browse/HIVE-21354 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0, 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > > One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism. > When a Hive query interacts with a table which has a lot of partitions, this > may put a lot of stress on the ZK system. > Please add a heuristic that works like this: > # Count the number of partitions that a query is required to lock > # Obtain the total number of partitions in the table > # If the number of partitions accessed by the query is greater than or equal > to half the total number of partitions, simply create one ZNode lock at the > table level. > This would improve performance of many queries, but in particular, a {{select > count(1) from table}} ... or ... {{select * from table limit 5}} where the > table has many partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked
[ https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082501#comment-17082501 ] David Mollitor edited comment on HIVE-21354 at 4/13/20, 5:18 PM: - [~pvary] I do not think that Hive has any logic that says "if a partition of a table is locked, then the table is locked." I think it does this in a simple way... it comes up with a list of all the required locks and the first one is always the table lock, the rest are the required partitions. That is to say, it takes an explicit lock on the table,... there is no logic for an implicit table lock: {code:none} EXPLAIN LOCKS SELECT * FROM web_logs; LOCK INFORMATION: default.web_logs -> SHARED_READ default.web_logs.date=2015-11-18 -> SHARED_READ default.web_logs.date=2015-11-19 -> SHARED_READ default.web_logs.date=2015-11-20 -> SHARED_READ default.web_logs.date=2015-11-21 -> SHARED_READ {code} What I would expect is, for an INSERT into a specific partition, or TRUNCATE partition statement, those queries would take a SHARED_READ lock on the table-level and an EXCLUSIVE lock on the specific partitions. was (Author: belugabehr): [~pvary] I do not think that Hive has any logic that says "if a partition of a table is locked, then the table is locked." I think it does this in a simple way... it comes up with a list of all the required locks and the first one is always the table lock, the rest are the required partitions. That is to say, it takes an explicit lock on the table,... there is no logic for an implicit table lock: {code:none} EXPLAIN LOCKS SELECT * FROM web_logs; LOCK INFORMATION: default.web_logs -> SHARED_READ default.web_logs.date=2015-11-18 -> SHARED_READ default.web_logs.date=2015-11-19 -> SHARED_READ default.web_logs.date=2015-11-20 -> SHARED_READ default.web_logs.date=2015-11-21 -> SHARED_READ {code} > Lock The Entire Table If Majority Of Partitions Are Locked > -- > > Key: HIVE-21354 > URL: https://issues.apache.org/jira/browse/HIVE-21354 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0, 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > > One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism. > When a Hive query interacts with a table which has a lot of partitions, this > may put a lot of stress on the ZK system. > Please add a heuristic that works like this: > # Count the number of partitions that a query is required to lock > # Obtain the total number of partitions in the table > # If the number of partitions accessed by the query is greater than or equal > to half the total number of partitions, simply create one ZNode lock at the > table level. > This would improve performance of many queries, but in particular, a {{select > count(1) from table}} ... or ... {{select * from table limit 5}} where the > table has many partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked
[ https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082501#comment-17082501 ] David Mollitor edited comment on HIVE-21354 at 4/13/20, 5:16 PM: - [~pvary] I do not think that Hive has any logic that says "if a partition of a table is locked, then the table is locked." I think it does this in a simple way... it comes up with a list of all the required locks and the first one is always the table lock, the rest are the required partitions. That is to say, it takes an explicit lock on the table,... there is no logic for an implicit table lock: {code:none} EXPLAIN LOCKS SELECT * FROM web_logs; LOCK INFORMATION: default.web_logs -> SHARED_READ default.web_logs.date=2015-11-18 -> SHARED_READ default.web_logs.date=2015-11-19 -> SHARED_READ default.web_logs.date=2015-11-20 -> SHARED_READ default.web_logs.date=2015-11-21 -> SHARED_READ {code} was (Author: belugabehr): [~pvary] I do not think that Hive has any logic that says "if a partition of a table is locked, then the table is locked." I think it does this this a a simple way... it comes up with a list of all the required locks and the first one is always the table lock, the rest are the required partitions. That is to say, it takes an explicit lock on the table,... there is no logic for an implicit table lock: {code:none} EXPLAIN LOCKS SELECT * FROM web_logs; LOCK INFORMATION: default.web_logs -> SHARED_READ default.web_logs.date=2015-11-18 -> SHARED_READ default.web_logs.date=2015-11-19 -> SHARED_READ default.web_logs.date=2015-11-20 -> SHARED_READ default.web_logs.date=2015-11-21 -> SHARED_READ {code} > Lock The Entire Table If Majority Of Partitions Are Locked > -- > > Key: HIVE-21354 > URL: https://issues.apache.org/jira/browse/HIVE-21354 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0, 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > > One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism. > When a Hive query interacts with a table which has a lot of partitions, this > may put a lot of stress on the ZK system. > Please add a heuristic that works like this: > # Count the number of partitions that a query is required to lock > # Obtain the total number of partitions in the table > # If the number of partitions accessed by the query is greater than or equal > to half the total number of partitions, simply create one ZNode lock at the > table level. > This would improve performance of many queries, but in particular, a {{select > count(1) from table}} ... or ... {{select * from table limit 5}} where the > table has many partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked
[ https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082501#comment-17082501 ] David Mollitor edited comment on HIVE-21354 at 4/13/20, 5:15 PM: - [~pvary] I do not think that Hive has any logic that says "if a partition of a table is locked, then the table is locked." I think it does this this a a simple way... it comes up with a list of all the required locks and the first one is always the table lock, the rest are the required partitions. That is to say, it takes an explicit lock on the table,... there is no logic for an implicit table lock: {code:none} EXPLAIN LOCKS SELECT * FROM web_logs; LOCK INFORMATION: default.web_logs -> SHARED_READ default.web_logs.date=2015-11-18 -> SHARED_READ default.web_logs.date=2015-11-19 -> SHARED_READ default.web_logs.date=2015-11-20 -> SHARED_READ default.web_logs.date=2015-11-21 -> SHARED_READ {code} was (Author: belugabehr): [~pvary] I do not think that Hive has any logic that says "if a partition of a table is locked, then the table is locked." I think it does this this a a simple way... it comes up with a list of all the required locks and the first one is always the table lock, the rest are the required partitions. That is to say, it takes an explicit lock on the table,... there is no logic for an implicit table lock: {code:none} EXPLAIN LOCKS SELECT * FROM web_logs; LOCK INFORMATION: default.web_logs -> SHARED_READ default.web_logs.date=2015-11-18 -> SHARED_READ default.web_logs.date=2015-11-19 -> SHARED_READ default.web_logs.date=2015-11-20 -> SHARED_READ default.web_logs.date=2015-11-21 -> SHARED_READ > Lock The Entire Table If Majority Of Partitions Are Locked > -- > > Key: HIVE-21354 > URL: https://issues.apache.org/jira/browse/HIVE-21354 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0, 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > > One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism. > When a Hive query interacts with a table which has a lot of partitions, this > may put a lot of stress on the ZK system. > Please add a heuristic that works like this: > # Count the number of partitions that a query is required to lock > # Obtain the total number of partitions in the table > # If the number of partitions accessed by the query is greater than or equal > to half the total number of partitions, simply create one ZNode lock at the > table level. > This would improve performance of many queries, but in particular, a {{select > count(1) from table}} ... or ... {{select * from table limit 5}} where the > table has many partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked
[ https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17080512#comment-17080512 ] David Mollitor edited comment on HIVE-21354 at 4/10/20, 3:43 PM: - [~pvary] Since the queries are always taking the table lock... why do they also bother to take the partition locks? was (Author: belugabehr): [~pvary] Since the queries are always taking the table lock... why do they also take the partition locks? > Lock The Entire Table If Majority Of Partitions Are Locked > -- > > Key: HIVE-21354 > URL: https://issues.apache.org/jira/browse/HIVE-21354 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0, 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > > One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism. > When a Hive query interacts with a table which has a lot of partitions, this > may put a lot of stress on the ZK system. > Please add a heuristic that works like this: > # Count the number of partitions that a query is required to lock > # Obtain the total number of partitions in the table > # If the number of partitions accessed by the query is greater than or equal > to half the total number of partitions, simply create one ZNode lock at the > table level. > This would improve performance of many queries, but in particular, a {{select > count(1) from table}} ... or ... {{select * from table limit 5}} where the > table has many partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)