Alexander Lapin created IGNITE-13374:
----------------------------------------

             Summary: Initial PME hangs because of multiple blinking nodes
                 Key: IGNITE-13374
                 URL: https://issues.apache.org/jira/browse/IGNITE-13374
             Project: Ignite
          Issue Type: Bug
            Reporter: Alexander Lapin
            Assignee: Alexander Lapin
             Fix For: 2.10


*Root cause* of the issue is a race inside GridDhtPartitionsExchangeFuture on 
client side between two processes:
 # When old coordinator fails and the new one takes over it sends 
GridDhtPartitionsSingleRequest messages to all nodes including clients to 
restore exchange results. Processing this message on client includes updating 
current coordinator reference (crd field).

 # When future receives discovery notification about old coordinator failure it 
should detect change of coordinator and send GridDhtPartitionsSingleMessage to 
new coordinator to obtain affinity. But updated crd field prevents client from 
detecting coordinator failure and sending SingleMessage to new coordinator 
which in turn leads to hanging client.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to