Mikhail Petrov created IGNITE-17731:
---------------------------------------

             Summary: Possible LRT in case of postponed GridDhtLockRequest
                 Key: IGNITE-17731
                 URL: https://issues.apache.org/jira/browse/IGNITE-17731
             Project: Ignite
          Issue Type: Bug
            Reporter: Mikhail Petrov


Let's assume the foowing scenario:

1.  TX coordinator starts transaction and sends GridDhtLockRequest to "near" 
nodes.
2. Some GridDhtLockRequest messages was delayed by the network. 
3. Not all "near" nodes receive GridDhtLockRequest and as result not all of 
them respond to the TX coordinator.
4. TX coordinator aborts TX by the timeout.
5. Completed TX ID is stored in IgniteTxManager#completedVersHashMap.
6. TX load continuous (assume puts in TX cache) and record about described 
above completed TX is evicted from the map.
7. GridDhtLockRequest from the clause 2 is finally recived by the "near" nodes. 
They lock keys, start the local TX, and respond to the TX coordinator.
But currently TX coordinator ignores GridDhtLockResponce as info about initial 
TX was evicted and does nothing.

As a result near nodes keep holding key locks and waiting for next steps of TX 
protocol that will never happen as TX was already completed.

As a WA TX can be explicitly KILLED on the near node. 

It is proposed to handle this situation and not aquire locks on the near node 
if TX coordinator or other cluster nodes do not have notion about TX to which 
current lock request belongs to.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to