[ https://issues.apache.org/jira/browse/IGNITE-10898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexey Goncharuk updated IGNITE-10898: -------------------------------------- Fix Version/s: 2.8 > Exchange coordinator failover breaks in some cases when node filter is used > --------------------------------------------------------------------------- > > Key: IGNITE-10898 > URL: https://issues.apache.org/jira/browse/IGNITE-10898 > Project: Ignite > Issue Type: Bug > Reporter: Alexey Goncharuk > Priority: Critical > Fix For: 2.8 > > > Currently if a node does not pass cache node filter, we do not store this > cache affinity on the node unless the node is coordinator. This, however, may > fail in the following scenario: > 1) A node passing node filter joins cluster > 2) During the join coordinator fails, new coordinator is selected for which > previous exchange is completed > 3) Next coordinator attempts to fetch the affinity, and joining node resends > partitions single message, but there are two problems here. First, exchange > fast-reply does not wait for the new affinity initialization which results in > {{IllegalStateException}}. Second, such an attempt to fetch affinity may lead > either to deadlock or to incorrectly fetched affinity (basically, coordinator > must be in consensus with other nodes passing node filter) > Test attached reproduces the issue. > I suggest to always calculate and keep affinity on all nodes, even ones not > passing the filter. In this case, there will be no need to fetch and > recalculate affinity ({{initCoordinatorCaches}} will go away. -- This message was sent by Atlassian JIRA (v7.6.3#76005)