> * Based on some sort of policies when the actual cluster topology differs too much from the baseline or when some critical condition happens (e.g., when there are no more backups for a partition)
Good point, Alex! I would even go further. If cluster is active and under load and nodes continue joining and leaving then we can have several BT's that are possible to restart on - the main condition is to have all the up to date data partitions. I.e. if you have 4 servers and 3 backups most probably you can have all the data with 2, 3 and, of course, 4 nodes. Makes sense? I would also think of different name. Topology (for me) also implies the version, but here only nodes carrying data are important. How about "restart nodes set"? --Yakov