On Thu, 2017-11-30 at 07:55 +0100, Ulrich Windl wrote: > > > > Kristoffer Gronlund <kgronl...@suse.com> wrote: > > > Adam Spiers <aspi...@suse.com> writes: > > > > > > > - The whole cluster is shut down cleanly. > > > > > > > > - The whole cluster is then started up again. (Side question: > > > > what > > > > happens if the last node to shut down is not the first to > > > > start up? > > > > How will the cluster ensure it has the most recent version of > > > > the > > > > CIB? Without that, how would it know whether the last man > > > > standing > > > > was shut down cleanly or not?) > > > > > > This is my opinion, I don't really know what the "official" > > > pacemaker > > > stance is: There is no such thing as shutting down a cluster > > > cleanly. A > > > cluster is a process stretching over multiple nodes - if they all > > > shut > > > down, the process is gone. When you start up again, you > > > effectively have > > > a completely new cluster. > > > > Sorry, I don't follow you at all here. When you start the cluster > > up > > again, the cluster config from before the shutdown is still there. > > That's very far from being a completely new cluster :-) > > The problem is you cannot "start the cluster" in pacemaker; you can > only "start nodes". The nodes will come up one by one. As opposed (as > I had said) to HP Sertvice Guard, where there is a "cluster formation > timeout". That is, the nodes wait for the specified time for the > cluster to "form". Then the cluster starts as a whole. Of course that > only applies if the whole cluster was down, not if a single node was > down.
I'm not sure what that would specifically entail, but I'm guessing we have some of the pieces already: - Corosync has a wait_for_all option if you want the cluster to be unable to have quorum at start-up until every node has joined. I don't think you can set a timeout that cancels it, though. - Pacemaker will wait dc-deadtime for the first DC election to complete. (if I understand it correctly ...) - Higher-level tools can start or stop all nodes together (e.g. pcs has pcs cluster start/stop --all). > > > > > When starting up, how is the cluster, at any point, to know if > > > the > > > cluster it has knowledge of is the "latest" cluster? > > > > That was exactly my question. > > > > > The next node could have a newer version of the CIB which adds > > > yet > > > more nodes to the cluster. > > > > Yes, exactly. If the first node to start up was not the last man > > standing, the CIB history is effectively being forked. So how is > > this > > issue avoided? > > Quorum? "Cluster formation delay"? > > > > > > The only way to bring up a cluster from being completely stopped > > > is to > > > treat it as creating a completely new cluster. The first node to > > > start > > > "creates" the cluster and later nodes join that cluster. > > > > That's ignoring the cluster config, which persists even when the > > cluster's down. > > > > But to be clear, you picked a small side question from my original > > post and answered that. The main questions I had were about > > startup > > fencing :-) -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org