[jira] [Updated] (IGNITE-22980) Lock manager may fail and lock waiter simultaneously

Denis Chudov (Jira) Fri, 16 Aug 2024 06:05:05 -0700


     [ 
https://issues.apache.org/jira/browse/IGNITE-22980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Denis Chudov updated IGNITE-22980:
----------------------------------
    Description: 
h3. Motivation

The behavior was hardly predicted or planned. But currently, we can acquire a 
lock:
{code:java}
        private void lock() {
            lockMode = intendedLockMode;

            intendedLockMode = null;

            intendedLocks.clear();
        }
{code}
and made the waiter fail:
{code:java}
        private void fail(LockException e) {
            ex = e;
        }
{code}
without limitation (assertion checking or explicitly prohibition).

Scenario:
 * tx1 tries to acquire a lock and finds conflicting transaction tx2;
 * lock manager tries to check the state and coordinator of tx2;
 * coordinator of tx2 has left, so TxRecoveryMessage is sent;
 * the primary replica of commit partition of tx2 is on the same node, so 
TxRecoveryMessage is sent locally. It also triggers the tx recovery, so tx2 is 
finished and tx cleanup is performed locally. All of this happens in the same 
thread, and during txn cleanup the locks of tx2 are released;
 * the release of locks of tx2 allows the conflicting waiter of tx1 to acquire 
a lock;
 * the processing of conflicting transaction continues and #fail is called on 
the same waiter.

There is also another problem: tx recovery shouldn't happen within synchronized 
block of HeapLockManager. It can be moved to another pool, and this also won't 
allow the tx recovery, which releases the locks, to grant lock for waiter of 
tx1.
h3. Definition of done
 * Only one method can be applied to a lock attempt ether lock() or fail(), but 
not both. Do not forget, a retry attempt may be successful even though the 
previous attempt failed.
 * tx recovery is not executed synchronously within synchronized block of 
HeapLockManager.

  was:
h3. Motivation

The behavior was hardly predicted or planned. But currently, we can acquire a 
lock:
{code:java}
        private void lock() {
            lockMode = intendedLockMode;

            intendedLockMode = null;

            intendedLocks.clear();
        }
{code}
and made the waiter fail:
{code:java}
        private void fail(LockException e) {
            ex = e;
        }
{code}
without limitation (assertion checking or explicitly prohibition).

Scenario:
 * tx1 tries to acquire a lock and finds conflicting transaction tx2;
 * lock manager tries to check the state and coordinator of tx2;
 * coordinator of tx2 has left, so TxRecoveryMessage is sent;
 * the primary replica of commit partition of tx2 is on the same node, so 
TxRecoveryMessage is sent locally. It also triggers the recovery, so tx2 is 
finished and cleanup is performed locally. All of this happens in the same 
thread, and during txn cleanup the locks of tx2 are released;
 * the release of locks of tx2 allows the conflicting waiter of tx1 to acquire 
a lock;
 * the processing of conflicting transaction continues and #fail is called on 
the same waiter.

There is also another problem: tx recovery shouldn't happen within synchronized 
block of HeapLockManager. It can be moved to another pool, and this also won't 
allow the tx recovery, which releases the locks, to grant lock for waiter of 
tx1.
h3. Definition of done
 * Only one method can be applied to a lock attempt ether lock() or fail(), but 
not both. Do not forget, a retry attempt may be successful even though the 
previous attempt failed.
 * tx recovery is not executed synchronously within synchronized block of 
HeapLockManager.


> Lock manager may fail and lock waiter simultaneously
> ----------------------------------------------------
>
>                 Key: IGNITE-22980
>                 URL: https://issues.apache.org/jira/browse/IGNITE-22980
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Vladislav Pyatkov
>            Priority: Major
>              Labels: ignite-3
>
> h3. Motivation
> The behavior was hardly predicted or planned. But currently, we can acquire a 
> lock:
> {code:java}
>         private void lock() {
>             lockMode = intendedLockMode;
>             intendedLockMode = null;
>             intendedLocks.clear();
>         }
> {code}
> and made the waiter fail:
> {code:java}
>         private void fail(LockException e) {
>             ex = e;
>         }
> {code}
> without limitation (assertion checking or explicitly prohibition).
> Scenario:
>  * tx1 tries to acquire a lock and finds conflicting transaction tx2;
>  * lock manager tries to check the state and coordinator of tx2;
>  * coordinator of tx2 has left, so TxRecoveryMessage is sent;
>  * the primary replica of commit partition of tx2 is on the same node, so 
> TxRecoveryMessage is sent locally. It also triggers the tx recovery, so tx2 
> is finished and tx cleanup is performed locally. All of this happens in the 
> same thread, and during txn cleanup the locks of tx2 are released;
>  * the release of locks of tx2 allows the conflicting waiter of tx1 to 
> acquire a lock;
>  * the processing of conflicting transaction continues and #fail is called on 
> the same waiter.
> There is also another problem: tx recovery shouldn't happen within 
> synchronized block of HeapLockManager. It can be moved to another pool, and 
> this also won't allow the tx recovery, which releases the locks, to grant 
> lock for waiter of tx1.
> h3. Definition of done
>  * Only one method can be applied to a lock attempt ether lock() or fail(), 
> but not both. Do not forget, a retry attempt may be successful even though 
> the previous attempt failed.
>  * tx recovery is not executed synchronously within synchronized block of 
> HeapLockManager.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-22980) Lock manager may fail and lock waiter simultaneously

Reply via email to