iiliev2 opened a new pull request, #4899:
URL: https://github.com/apache/activemq-artemis/pull/4899

   In a cluster deployed in kubernetes, when a node is destroyed it terminates 
the process and shuts down the network before the process has a chance to close 
connections. Then a new node might be brought up, reusing the old node’s ip. If 
this happens before the connection ttl, from artemis’ point of view, it looks 
like as if the connection came back. Yet it is actually not the same, the peer 
has a new node id, etc. This messes things up with the cluster, the old message 
flow record is invalid.
   
   This also solves another similar issue - if a node goes down and a new one 
comes in with a new nodeUUID and the same IP before the cluster connections in 
the others timeout, it would cause them to get stuck and list both the old and 
the new nodes in their topologies.
   
   The changes are grouped in tightly related incremental commits to make it 
easier to understand what is changed:
   
   1. `Ping` packets include `nodeUUID`
   2. Acceptors and connectors carry `TransportConfiguration`
   3. `RemotingConnectionImpl#doBufferReceived` tracks for ping nodeUUID 
mismatch with the target to flag it as `unhealthy`;  `ClientSessionFactoryImpl` 
destroys unhealthy connections(in addition to not receiving any data on time)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to