[
https://issues.apache.org/jira/browse/IGNITE-26196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Roman Puchkovskiy updated IGNITE-26196:
---------------------------------------
Description:
When a node sees that another node has left the physical topology, it marks it
stale. It never allows a connection to be established with a stale node. This
is to make sure that no protocol relying on a pair-wise connection between the
nodes will arrive at inconsistent state due to 'missing' some messages from the
middle of the message stream between the nodes.
Stale node IDs storage is persistent, so the following scenario is possible:
# There are nodes A and B
# Node A thinks node B has left (due to a GC pause) and marks B as stale
# Hence, no connection is possible between the nodes
# Node A is restarted
# It still sees node B as stale, so no connections are allowed between the
nodes
Item 5 is a problem, and it's not necessary as node A got a new identify after
restarting, so the 'protocol consistency' is not at danger anymore.
We should make stale node IDs storage volatile.
> Make node staleness status volatile
> -----------------------------------
>
> Key: IGNITE-26196
> URL: https://issues.apache.org/jira/browse/IGNITE-26196
> Project: Ignite
> Issue Type: Improvement
> Reporter: Roman Puchkovskiy
> Assignee: Roman Puchkovskiy
> Priority: Major
> Labels: ignite-3
>
> When a node sees that another node has left the physical topology, it marks
> it stale. It never allows a connection to be established with a stale node.
> This is to make sure that no protocol relying on a pair-wise connection
> between the nodes will arrive at inconsistent state due to 'missing' some
> messages from the middle of the message stream between the nodes.
> Stale node IDs storage is persistent, so the following scenario is possible:
> # There are nodes A and B
> # Node A thinks node B has left (due to a GC pause) and marks B as stale
> # Hence, no connection is possible between the nodes
> # Node A is restarted
> # It still sees node B as stale, so no connections are allowed between the
> nodes
> Item 5 is a problem, and it's not necessary as node A got a new identify
> after restarting, so the 'protocol consistency' is not at danger anymore.
> We should make stale node IDs storage volatile.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)