I also would like to provide more use cases of how BLT is supposed to work
(let me call it this way until we come up with a better one):

   1. User creates new BLT using WebConsole or other tool and "applies" it
   to brand-new cluster.

   2. User starts up brand-new cluster with desired amount of nodes and
   activates it. At the moment of activation BLT is created with all server
   non-daemon nodes presented in the cluster.

   3. User starts up a cluster with previously prepared BLT -> when set of
   nodes in the cluster matches with BLT cluster gets automatically activated.

   4. User has an up-and-running active cluster and starts a few more
   nodes. They join the cluster but no partitions are assigned to them.
   User recreates BLT on new cluster topology -> partitions are assigned to
   new nodes.

   5. User takes out nodes from cluster (e.g. for maintenance purposes): no
   rebalance happens until user recreates BLT on new cluster topology.

   6. If some parameters reach critical levels (e.g. number of backups for
   a partition is too low) coordinator automatically recreates BLT and thus
   triggers rebalancing.


I hope these use cases will help to clarify purposes of the proposed
feature.

On Thu, Aug 3, 2017 at 4:08 PM, Alexey Goncharuk <alexey.goncha...@gmail.com
> wrote:

> My understanding of Baseline Topology is the set of nodes which are
> *expected* to be in the cluster.
> Let me go a little bit further because BT (or whatever name we choose) may
> and will solve more issues than just auto-activation:
>
> 1) More graceful control over rebalancing than just rebalance delay. If a
> server is shut down for maintenance and there are enough backup nodes in
> the cluster, there is no need to rebalance.
> 2) Guarantee that there will be no conflicting key-value mappings due to
> incorrect cluster activation. For example, consider a scenario when there
> was a cluster of 10 nodes, then the cluster was shut down, started first 5
> nodes, activated, made some updates, shut down 5 nodes, start up other 5
> nodes, activate, make some updates, start up first 5 nodes. Currently,
> there is no way to determine that there was an incompatible topology change
> which leads to data inconsistency.
> 3) When a cluster is shutting down node-by-node, we must track a node which
> has 'seen' a partition last time and not activate the cluster until all
> nodes are present. Otherwise, again, we may activate too early and see
> outdated values.
>
> I do not want to add any 'faster' hacks here because they will only make
> the issue above appear more likely. Besides, BT should be available in 2.2
> anyway, so no need to rush with hacks.
>
> --AG
>
> 2017-08-03 15:09 GMT+03:00 Yakov Zhdanov <yzhda...@apache.org>:
>
> > >Obvious connotation of "minimal set" is a set that cannot be decreased.
> >
> > >But lets consider the following case: user has a cluster of 50 nodes and
> > >decides to switch off 3 nodes for maintenance for a while. Ok, user just
> > >does it and then recreates this "minimal node set" to only 47 nodes.
> >
> > >So initial minimal node set was decreased - something counter-intuitive
> to
> > >me and may cause confusion as well.
> >
> > That was my point. If I have 50 nodes and 3 backups I can restart on 48,
> 49
> > and 50 without data loss. In case of 48 and 49 after cluster gets
> activated
> > missing backups are assigned and rebalancing starts.
> >
> > --Yakov
> >
>

Reply via email to