[ https://issues.apache.org/jira/browse/IGNITE-20087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18009947#comment-18009947 ]
Roman Puchkovskiy commented on IGNITE-20087: -------------------------------------------- First issue (node is restarted on the same address, so it gets a message sent to its previous incarnation) is already solved in IGNITE-25400 and IGNITE-25805. As of now, the behavior is like this: # If a message is sent using ClusterNode to identify the recipient, then the ephemeral ID of the node that actually going to receive the message is verified. If it differs the ID from the ClusterNode, the message will not be delivered; instead, the operation will fail with a RecipientLeftException # If a message is sent using consistentId to identify the recipient, then the suppose that the sender doesn't care about the 'incarnation' (the sender did not provide any information about it), so the ID verification is absent Second issue (support for many addresses of the same node) will be solved in IGNITE-22369. Hence I am closing this ticket. > Account for "nodeId" while sending the message > ---------------------------------------------- > > Key: IGNITE-20087 > URL: https://issues.apache.org/jira/browse/IGNITE-20087 > Project: Ignite > Issue Type: Improvement > Reporter: Ivan Bessonov > Priority: Major > Labels: ignite-3 > > h3. Disclaimer > This change will, most likely, break some existing code. > h3. The problem > It is safe to assume, that the node shouldn't be able to send a message to > another node that doesn't exist. The way we solve this problem is sending a > message only to nodes that have corresponding {{ClusterNode}} instance, that > contains the address. > There are shortcuts in MessagingService, some methods only take the > "consistentId" parameter, but they really only send the message if there's a > known ClusterNode instance. > Internally, to identify the receiver, we use a pair \{consistentId, address}. > And it seems like this is not a good identifier. Following are the reasons: > * Node may reconnect with a different address > * Node may restart on the same address > In first case, everything should still work, but currently it, most likely, > won't. > In the second case, it shouldn't work, because we were trying to send the > data to the "old" node instance, assuming that it is statefull and is aware > of the possibility of such message. But it will work. > h3. The solution > To resolve these issues, we need to: > * validate the "nodeId", also known as a "launchId", of the connection, and > only use it if the actual value matches the expected one. If it doesn't, then > somebody has outdated ClusterNode instance, and they should be punished; > * don't use address as a part of the primary identifier of the channel. This > may require a substantial reworking of the way we process cluster nodes right > now, making it similar to the implementation in Ignite 2. -- This message was sent by Atlassian Jira (v8.20.10#820010)