[ https://issues.apache.org/jira/browse/IGNITE-15364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amelchev Nikita updated IGNITE-15364: ------------------------------------- Release Note: Fixed rebalance issue when historical rebalancing is reassigned after the client node joined the cluster. (was: Fixed rebalance issue.) > The rebalancing can be broken if historical rebalancing is reassigned after > the client node joined the cluster. > --------------------------------------------------------------------------------------------------------------- > > Key: IGNITE-15364 > URL: https://issues.apache.org/jira/browse/IGNITE-15364 > Project: Ignite > Issue Type: Bug > Reporter: Vyacheslav Koptilin > Assignee: Vyacheslav Koptilin > Priority: Major > Fix For: 2.13 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Looks like the following scenario can break data consistency after > rebalancing: > - start and activate the cluster of three server nodes > - create a cache with two backups and fill initial data into it > - stop one server node and upload additional data to the cache in order to > trigger historical rebalance after the node returns to the cluster > - restart the node. make sure that historical rebalancing is started from > two other nodes. > - before rebalancing is completed a new client node should be started and > joined the cluster. this leads to clean up partition update counters on > server nodes, i.e. _GridDhtPartitionTopologyImpl#cntrMap_. ( * ) > - historical rebalancing from one node fails. > - in that case, rebalancing is reassigned and starting node tries to > rebalance missed partitions from another node. > unfortunately, update counters for historical rebalance cannot be properly > calculated due to ( * ) > An additional issue that was found while debugging: > RebalanceReassignExchangeTask is skipped under some circumstances > {code:java|title=GridCachePartitionExchangeManager.ExchangeWorker#body0} > else if (lastAffChangedVer.after(exchId.topologyVersion())) { > // There is a new exchange which should trigger rebalancing. > // This reassignment request can be skipped. > if (log.isInfoEnabled()) { > log.info("Partitions reassignment request skipped due > to affinity was already changed" + > " [reassignTopVer=" + exchId.topologyVersion() + > ", lastAffChangedTopVer=" + lastAffChangedVer + > ']'); > } > {code} > There could be cases when the current rebalance is not canceled on PME which > updates only minor versions and then triggers _RebalanceReassignExchangeTask_ > due to missed partitions on the supplier. After that, > _RebalanceReassignExchangeTask_ is skipped, as the current minor version is > higher than rebalance topology version, which leads to the situation when > instances of missed partitions on demander remain in MOVING state until next > PME that will trigger another rebalance. -- This message was sent by Atlassian Jira (v8.20.1#820001)