[ https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eugene Koifman updated HIVE-16321: ---------------------------------- Attachment: HIVE-16321.03-branch-2.patch > Possible deadlock in metastore with Acid enabled > ------------------------------------------------ > > Key: HIVE-16321 > URL: https://issues.apache.org/jira/browse/HIVE-16321 > Project: Hive > Issue Type: Bug > Components: Transactions > Affects Versions: 1.3.0 > Reporter: Eugene Koifman > Assignee: Eugene Koifman > Priority: Blocker > Attachments: HIVE-16321.01.branch-2.patch, HIVE-16321.01.patch, > HIVE-16321.02.branch-2.patch, HIVE-16321.02.patch, > HIVE-16321.03-branch-2.patch, HIVE-16321.03.patch > > > TxnStore.MutexAPI is a mechanism how different Metastore instances can > coordinate their operations. It uses a JDBCConnection to achieve it. > In some cases this may lead to deadlock. TxnHandler uses a connection pool > of fixed size. Suppose you have X simultaneous calls to TxnHandler.lock(), > where X is >= size of the pool. This take all connections form the pool, so > when > {noformat} > handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name()); > {noformat} > is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the > pool is empty and the system is deadlocked. > MutexAPI can't use the same connection as the operation it's protecting. > (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example). > We could make MutexAPI use a separate connection pool (size > 'primary' conn > pool). > Or we could make TxnHandler.lock(LockRequest rqst) return immediately after > enqueueing the lock with the expectation that the caller will always follow > up with a call to checkLock(CheckLockRequest rqst). > cc [~f1sherox] -- This message was sent by Atlassian JIRA (v6.3.15#6346)