I also would like to provide more use cases of how BLT is supposed to work (let me call it this way until we come up with a better one):
1. User creates new BLT using WebConsole or other tool and "applies" it to brand-new cluster. 2. User starts up brand-new cluster with desired amount of nodes and activates it. At the moment of activation BLT is created with all server non-daemon nodes presented in the cluster. 3. User starts up a cluster with previously prepared BLT -> when set of nodes in the cluster matches with BLT cluster gets automatically activated. 4. User has an up-and-running active cluster and starts a few more nodes. They join the cluster but no partitions are assigned to them. User recreates BLT on new cluster topology -> partitions are assigned to new nodes. 5. User takes out nodes from cluster (e.g. for maintenance purposes): no rebalance happens until user recreates BLT on new cluster topology. 6. If some parameters reach critical levels (e.g. number of backups for a partition is too low) coordinator automatically recreates BLT and thus triggers rebalancing. I hope these use cases will help to clarify purposes of the proposed feature. On Thu, Aug 3, 2017 at 4:08 PM, Alexey Goncharuk <alexey.goncha...@gmail.com > wrote: > My understanding of Baseline Topology is the set of nodes which are > *expected* to be in the cluster. > Let me go a little bit further because BT (or whatever name we choose) may > and will solve more issues than just auto-activation: > > 1) More graceful control over rebalancing than just rebalance delay. If a > server is shut down for maintenance and there are enough backup nodes in > the cluster, there is no need to rebalance. > 2) Guarantee that there will be no conflicting key-value mappings due to > incorrect cluster activation. For example, consider a scenario when there > was a cluster of 10 nodes, then the cluster was shut down, started first 5 > nodes, activated, made some updates, shut down 5 nodes, start up other 5 > nodes, activate, make some updates, start up first 5 nodes. Currently, > there is no way to determine that there was an incompatible topology change > which leads to data inconsistency. > 3) When a cluster is shutting down node-by-node, we must track a node which > has 'seen' a partition last time and not activate the cluster until all > nodes are present. Otherwise, again, we may activate too early and see > outdated values. > > I do not want to add any 'faster' hacks here because they will only make > the issue above appear more likely. Besides, BT should be available in 2.2 > anyway, so no need to rush with hacks. > > --AG > > 2017-08-03 15:09 GMT+03:00 Yakov Zhdanov <yzhda...@apache.org>: > > > >Obvious connotation of "minimal set" is a set that cannot be decreased. > > > > >But lets consider the following case: user has a cluster of 50 nodes and > > >decides to switch off 3 nodes for maintenance for a while. Ok, user just > > >does it and then recreates this "minimal node set" to only 47 nodes. > > > > >So initial minimal node set was decreased - something counter-intuitive > to > > >me and may cause confusion as well. > > > > That was my point. If I have 50 nodes and 3 backups I can restart on 48, > 49 > > and 50 without data loss. In case of 48 and 49 after cluster gets > activated > > missing backups are assigned and rebalancing starts. > > > > --Yakov > > >