Mikhail Petrov created IGNITE-17731: ---------------------------------------
Summary: Possible LRT in case of postponed GridDhtLockRequest Key: IGNITE-17731 URL: https://issues.apache.org/jira/browse/IGNITE-17731 Project: Ignite Issue Type: Bug Reporter: Mikhail Petrov Let's assume the foowing scenario: 1. TX coordinator starts transaction and sends GridDhtLockRequest to "near" nodes. 2. Some GridDhtLockRequest messages was delayed by the network. 3. Not all "near" nodes receive GridDhtLockRequest and as result not all of them respond to the TX coordinator. 4. TX coordinator aborts TX by the timeout. 5. Completed TX ID is stored in IgniteTxManager#completedVersHashMap. 6. TX load continuous (assume puts in TX cache) and record about described above completed TX is evicted from the map. 7. GridDhtLockRequest from the clause 2 is finally recived by the "near" nodes. They lock keys, start the local TX, and respond to the TX coordinator. But currently TX coordinator ignores GridDhtLockResponce as info about initial TX was evicted and does nothing. As a result near nodes keep holding key locks and waiting for next steps of TX protocol that will never happen as TX was already completed. As a WA TX can be explicitly KILLED on the near node. It is proposed to handle this situation and not aquire locks on the near node if TX coordinator or other cluster nodes do not have notion about TX to which current lock request belongs to. -- This message was sent by Atlassian Jira (v8.20.10#820010)