Vyacheslav Koptilin created IGNITE-13193: --------------------------------------------
Summary: Implement fallback to full partition rebalancing in case historical supplier failed to read all necessary data updates from WAL Key: IGNITE-13193 URL: https://issues.apache.org/jira/browse/IGNITE-13193 Project: Ignite Issue Type: Improvement Affects Versions: 2.8.1 Reporter: Vyacheslav Koptilin Assignee: Vyacheslav Koptilin Historical rebalance may fail for several reasons: 1) WAL on supplier node is corrupted - the supplier will trigger a failure handler in the current implementation. 2) After iteration over WAL demander node didn't receive all updates to make MOVING partition up-to-date (resulting update counter didn't converge with expected update counter of OWNING partition) - demander will silently ignore lack of updates in the current implementation. Such behavior negatively affects the stability of the cluster: an inappropriate state of historical WAL is not a reason to fail a supplier node. The more proper way to handle this scenario is: - Either try to rebalance partition historically from another supplier - Or use full partition rebalance for problem partition Once the supplier fails to provide data from part of the WAL, its corresponding sequence of checkpoints should be marked as inapplicable for historical rebalance in order to prevent further errors. -- This message was sent by Atlassian Jira (v8.3.4#803005)