[ 
https://issues.apache.org/jira/browse/IGNITE-20603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789996#comment-17789996
 ] 

Vladislav Pyatkov commented on IGNITE-20603:
--------------------------------------------

[~maliev] Thank you for your contribution.
Merged 95107c31be013298108deeeb1322874a9952a40a

> Restore logical topology change event on a node restart
> -------------------------------------------------------
>
>                 Key: IGNITE-20603
>                 URL: https://issues.apache.org/jira/browse/IGNITE-20603
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Mirza Aliev
>            Assignee: Mirza Aliev
>            Priority: Major
>              Labels: ignite-3
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> h3. *Motivation*
> It is possible that some events were propagated to {{ms.logicalTopology}}, 
> but restart happened when we were updating topologyAugmentationMap and other 
> states in {{DistributionZoneManager#createMetastorageTopologyListener}}. That 
> means that augmentation that must be added to 
> {{zone.topologyAugmentationMap}} wasn't added and we need to recover this 
> information, or nodesAttributes wasn't propogated to MS.
> h3. *Definition of done*
> On a node restart, all states, that were going to be updated during watch 
> event in  {{DistributionZoneManager#createMetastorageTopologyListener}} must 
> be recovered
> h3. *Implementation notes*
> (outdated, see UPD)
> For every zone, compare {{MS.local.logicalTopology.revision}} with 
> max(maxScUpFromMap, maxScDownFromMap). If {{logicalTopology.revision}} is 
> greater than max(maxScUpFromMap, maxScDownFromMap), that means that some 
> topology changes haven't been propagated to topologyAugmentationMap before 
> restart and appropriate timers haven't been scheduled. To fill the gap in 
> topologyAugmentationMap, compare {{MS.local.logicalTopology}} with 
> {{lastSeenLogicalTopology}} and enhance topologyAugmentationMap with the 
> nodes that did not have time to be propagated to topologyAugmentationMap 
> before restart. {{lastSeenTopology}} is calculated in the following way: we 
> read {{MS.local.dataNodes}}, also we take max(scaleUpTriggerKey, 
> scaleDownTriggerKey) and retrieve all additions and removals of nodes from 
> the topologyAugmentationMap using max(scaleUpTriggerKey, scaleDownTriggerKey) 
> as the left bound. After that apply these changes to the map with nodes 
> counters from {{MS.local.dataNodes}} and take nodes only with the positive 
> counters. This is the lastSeenTopology. Comparing it with 
> {{MS.local.logicalTopology}} will tell us which nodes were not added or 
> removed and weren't propagated to topologyAugmentationMap before restart. We 
> take these differences and add them to the topologyAugmentationMap. As a 
> revision (key for topologyAugmentationMap) take 
> {{MS.local.logicalTopology.revision}}. It is safe to take this revision, 
> because if some node was added to the {{ms.topology}} after immediate data 
> nodes recalculation, this added node must restore this immediate data nodes' 
> recalculation intent.
> UPD: Implementation notes are outdated, we've implemented a bit different 
> approach: now we save the last handled topology to MS, and on restart we 
> restore global states according to states from local metastorage and check if 
> the current ms.logicalTopology differs from the one that was handled in 
> DistributionZoneManager#createMetastorageTopologyListener (we check revision 
> of this events), then we just repeat the logic from 
> DistributionZoneManager#createMetastorageTopologyListener with the new 
> logical topology from the ms.logicalTopology.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to