Hello Ignite developers, I would like to start a discussion about design of important improvement enabling automatic activation of cluster with durable store turned on [1]. Also it will help us to solve an issue with data divergence (e.g. this may happen when half of the cluster goes down and updates are applied to another half, and than online and offline parts of the cluster switch).
The idea is to introduce a *BaselineTopology *concept. Simplifying it is just a collection of nodes that are expected to be in the cluster. User establishes BaselineTopology (BT) on a cluster of desired configuration (I mean here number of nodes in the first place), after that this topology is persisted. Once established BT represents a "frozen state" of topology which means that affinity function uses it instead of actual topology. As a result no rebalancing can happen until BT is reestablished. Having BT established it is easy to implement automatic activation: when nodes of starting cluster join it one by one, a special listener may trigger cluster activation when composition of nodes matches with the one described by BaselineTopology. API for BaselineTopology manipulation may look like this: *Ignite::activation::establishBaselineTopology();* *Ignite::activation::establishBaselineTopology(BaselineTopology bltTop);* Both methods will establish BT and activate cluster once it is established. The first one allows user to establish BT using current topology. If any changes happen to the topology during establishing process, user will be notified and allowed to proceed or abort the procedure. Second method allows to use some monitoring'n'management tools like WebConsole where user can prepare a list of nodes, using them create a BT and send to the cluster a command to finally establish it. >From high level BaselineTopology entity contains only collection of nodes: *BaselineTopology {* * Collection<TopologyNode> nodes;* *}* *TopologyNode* here contains information about node - its consistent id and set of user attributes used to calculate affinity function. In order to support data divergence prevention some kind of versioning must be added to BT entity to refuse joining new node but we can clarify it later. Please provide your feedback/thoughts and ask any questions about suggested improvement. Thanks, Sergey. [1] https://issues.apache.org/jira/browse/IGNITE-5851