[ https://issues.apache.org/jira/browse/IGNITE-20187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexander Lapin updated IGNITE-20187: ------------------------------------- Description: h3. Motivation Prior to the implementation of the meta storage compaction and the related node restart updates, the node restored its volatile state in terms of assignments through ms.watches starting from APPLIED_REVISION + 1. Meaning that after the restart, the node was notified about missing state through {*}the events{*}. However, it's no longer true: new logic assumes that the node will register ms.watch starting from APPLIED_REVISION + X + 1 and will manually read local meta storage state for APPLIED_REVISION +X along with related processing. The implementation of the above process is the essence of this ticket. h3. Definition of Done Within node restart process, TableManager or similar should manually read local assignments pending keys (reading assignments stable will be covered in a separate ticket) and schedule corresponding rebalance. h3. Implementation Notes It's possible that assignemnts.pending keys will be stale at the moment of processing, so in order to overcome given issue following common-for-current-rebalance steps are proposed: # Start all new needed nodes {{partition.assignments.pending / partition.assignments.stable}} # After successful starts - check if current node is the leader of raft group (leader response must be updated by current term), if it is # Read distributed {{partition.assignments.pending }}and if the retrieved revision is less or equal to the one retrieved within initial local read run RaftGroupService#changePeersAsync(leaderTerm, peers) RaftGroupService#changePeersAsync from old terms must be skipped. Seems that https://github.com/apache/ignite-3/blob/main/modules/table/tech-notes/rebalance.md should be also updated a bit. was: h3. Motivation Prior to the implementation of the meta storage compaction and the related node restart updates, the node restored its volatile state in terms of assignments through ms.watches starting from APPLIED_REVISION + 1. Meaning that after the restart, the node was notified about missing state through {*}the events{*}. However, it's no longer true: new logic assumes that the node will register ms.watch starting from APPLIED_REVISION + X + 1 and will manually read local meta storage state for APPLIED_REVISION +X along with related processing. The implementation of the above process is the essence of this ticket. h3. Definition of Done Within node restart process, TableManager or similar should manually read local assignments pending keys (reading assignments stable will be covered in a separate ticket) and schedule corresponding rebalance. h3. Implementation Notes It's possible that assignemnts.pending keys will be stale at the moment of processing, so in order to overcome given issue following common-for-current-rebalance steps are proposed: # Start all new needed nodes {{partition.assignments.pending / partition.assignments.stable}} # After successful starts - check if current node is the leader of raft group (leader response must be updated by current term), if it is # Read distributed {{partition.assignments.pending }}and if the retrieved revision is less or equal to the one retrieved within initial local read run{{ }}{{{}RaftGroupService#changePeersAsync(leaderTerm, peers){}}}{{{}. {}}}{{RaftGroupService#changePeersAsync}}{{ from old terms must be skipped.}} Seems that https://github.com/apache/ignite-3/blob/main/modules/table/tech-notes/rebalance.md should be also updated a bit. > Catch-up rebalance on node restart: assignments keys > ---------------------------------------------------- > > Key: IGNITE-20187 > URL: https://issues.apache.org/jira/browse/IGNITE-20187 > Project: Ignite > Issue Type: Improvement > Reporter: Alexander Lapin > Priority: Major > Labels: ignite-3 > > h3. Motivation > Prior to the implementation of the meta storage compaction and the related > node restart updates, the node restored its volatile state in terms of > assignments through ms.watches starting from APPLIED_REVISION + 1. Meaning > that after the restart, the node was notified about missing state through > {*}the events{*}. However, it's no longer true: new logic assumes that the > node will register ms.watch starting from APPLIED_REVISION + X + 1 and will > manually read local meta storage state for APPLIED_REVISION +X along with > related processing. The implementation of the above process is the essence of > this ticket. > h3. Definition of Done > Within node restart process, TableManager or similar should manually read > local assignments pending keys (reading assignments stable will be covered in > a separate ticket) and schedule corresponding rebalance. > h3. Implementation Notes > It's possible that assignemnts.pending keys will be stale at the moment of > processing, so in order to overcome given issue following > common-for-current-rebalance steps are proposed: > # Start all new needed nodes {{partition.assignments.pending / > partition.assignments.stable}} > # After successful starts - check if current node is the leader of raft > group (leader response must be updated by current term), if it is > # Read distributed {{partition.assignments.pending }}and if the retrieved > revision is less or equal to the one retrieved within initial local read run > RaftGroupService#changePeersAsync(leaderTerm, peers) > RaftGroupService#changePeersAsync from old terms must be skipped. > Seems that > https://github.com/apache/ignite-3/blob/main/modules/table/tech-notes/rebalance.md > should be also updated a bit. -- This message was sent by Atlassian Jira (v8.20.10#820010)