Re: [ClusterLabs] Antw: Re: questions about startup fencing
On 05/12/17 10:01 +0100, Tomas Jelinek wrote: > The first attempt to fix the issue was to put nodes into standby mode with > --lifetime=reboot: > https://github.com/ClusterLabs/pcs/commit/ea6f37983191776fd46d90f22dc1432e0bfc0b91 > > This didn't work for several reasons. One of them was back then there was no > reliable way to set standby mode with --lifetime=reboot for more than one > node in a single step. (This may have been fixed in the meantime.) There > were however other serious reasons for not putting the nodes into standby as > was explained by Andrew: > - it [putting the nodes into standby first] means shutdown takes longer (no > node stops until all the resources stop) > - it makes shutdown more complex (== more fragile), eg... > - it result in pcs waiting forever for resources to stop > - if a stop fails and the cluster is configured to start at boot, then the > node will get fenced and happily run resources when it returns > (because all the nodes are up so we still have quorum) Isn't one-off stopping of a cluster without actually disabling cluster software to run on boot rather antithetical? And beside, isn't this ressurection scenario possible also with the current parallel (hence subject to race condition) stop in such case, anyway? > - only potentially benefits resources that have no (or very few) dependants > and can stop quicker than it takes pcs to get through its "initiate parallel > shutdown" loop (which should be rather fast since there is no ssh connection > setup overheads) > > So we ended up with just stopping pacemaker in parallel: > https://github.com/ClusterLabs/pcs/commit/1ab2dd1b13839df7e5e9809cde25ac1dbae42c3d -- Jan (Poki) pgpfTSpu5jsUE.pgp Description: PGP signature ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: questions about startup fencing
On Tue, 5 Dec 2017 10:05:03 +0100 Tomas Jelinek wrote: > Dne 4.12.2017 v 17:21 Jehan-Guillaume de Rorthais napsal(a): > > On Mon, 4 Dec 2017 16:50:47 +0100 > > Tomas Jelinek wrote: > > > >> Dne 4.12.2017 v 14:21 Jehan-Guillaume de Rorthais napsal(a): > >>> On Mon, 4 Dec 2017 12:31:06 +0100 > >>> Tomas Jelinek wrote: > >>> > Dne 4.12.2017 v 10:36 Jehan-Guillaume de Rorthais napsal(a): > > On Fri, 01 Dec 2017 16:34:08 -0600 > > Ken Gaillot wrote: > > > >> On Thu, 2017-11-30 at 07:55 +0100, Ulrich Windl wrote: > >>> > >>> > Kristoffer Gronlund wrote: > > Adam Spiers writes: > > > >> - The whole cluster is shut down cleanly. > >> > >> - The whole cluster is then started up again. (Side question: > >> what > >> happens if the last node to shut down is not the first to > >> start up? > >> How will the cluster ensure it has the most recent version of > >> the > >> CIB? Without that, how would it know whether the last man > >> standing > >> was shut down cleanly or not?) > > > > This is my opinion, I don't really know what the "official" > > pacemaker > > stance is: There is no such thing as shutting down a cluster > > cleanly. A > > cluster is a process stretching over multiple nodes - if they all > > shut > > down, the process is gone. When you start up again, you > > effectively have > > a completely new cluster. > > Sorry, I don't follow you at all here. When you start the cluster > up > again, the cluster config from before the shutdown is still there. > That's very far from being a completely new cluster :-) > >>> > >>> The problem is you cannot "start the cluster" in pacemaker; you can > >>> only "start nodes". The nodes will come up one by one. As opposed (as > >>> I had said) to HP Sertvice Guard, where there is a "cluster formation > >>> timeout". That is, the nodes wait for the specified time for the > >>> cluster to "form". Then the cluster starts as a whole. Of course that > >>> only applies if the whole cluster was down, not if a single node was > >>> down. > >> > >> I'm not sure what that would specifically entail, but I'm guessing we > >> have some of the pieces already: > >> > >> - Corosync has a wait_for_all option if you want the cluster to be > >> unable to have quorum at start-up until every node has joined. I don't > >> think you can set a timeout that cancels it, though. > >> > >> - Pacemaker will wait dc-deadtime for the first DC election to > >> complete. (if I understand it correctly ...) > >> > >> - Higher-level tools can start or stop all nodes together (e.g. pcs has > >> pcs cluster start/stop --all). > > > > Based on this discussion, I have some questions about pcs: > > > > * how is it shutting down the cluster when issuing "pcs cluster stop > > --all"? > > First, it sends a request to each node to stop pacemaker. The requests > are sent in parallel which prevents resources from being moved from node > to node. Once pacemaker stops on all nodes, corosync is stopped on all > nodes in the same manner. > >>> > >>> What if for some external reasons one node is slower (load, network, > >>> whatever) than the others and start reacting ? Sending queries in parallel > >>> doesn't feels safe enough in regard with all the race conditions that can > >>> occurs in the same time. > >>> > >>> Am I missing something ? > >>> > >> > >> If a node gets the request later than others, some resources may be > >> moved to it before it starts shutting down pacemaker as well. Pcs waits > >> for all nodes to shutdown pacemaker before it moves to shutting down > >> corosync. This way, quorum is maintained the whole time pacemaker is > >> shutting down and therefore no services are blocked from stopping due to > >> lack of quorum. > > > > OK, so if admins or RA expect to start in, the same conditions the cluster > > was shut downed, we have to take care of the shutdown ourselves by hands. > > Considering disabling the resource before shutting down might be the best > > option in the situation as the CRM will take care of switching off things > > correctly in a proper transition. > > My understanding is that pacemaker takes care of switching off things > correctly in a proper transition on its shutdown. So there should be no > extra care needed. Pacemaker developers, however, need to confirm that. Sure, but then, the resource would move away from the node if some other node(s) (with appropriate constraints) are up when the transition is computed. If you are looking at master resources, this could raise a lot of maste
Re: [ClusterLabs] Antw: Re: questions about startup fencing
Dne 4.12.2017 v 17:21 Jehan-Guillaume de Rorthais napsal(a): On Mon, 4 Dec 2017 16:50:47 +0100 Tomas Jelinek wrote: Dne 4.12.2017 v 14:21 Jehan-Guillaume de Rorthais napsal(a): On Mon, 4 Dec 2017 12:31:06 +0100 Tomas Jelinek wrote: Dne 4.12.2017 v 10:36 Jehan-Guillaume de Rorthais napsal(a): On Fri, 01 Dec 2017 16:34:08 -0600 Ken Gaillot wrote: On Thu, 2017-11-30 at 07:55 +0100, Ulrich Windl wrote: Kristoffer Gronlund wrote: Adam Spiers writes: - The whole cluster is shut down cleanly. - The whole cluster is then started up again. (Side question: what happens if the last node to shut down is not the first to start up? How will the cluster ensure it has the most recent version of the CIB? Without that, how would it know whether the last man standing was shut down cleanly or not?) This is my opinion, I don't really know what the "official" pacemaker stance is: There is no such thing as shutting down a cluster cleanly. A cluster is a process stretching over multiple nodes - if they all shut down, the process is gone. When you start up again, you effectively have a completely new cluster. Sorry, I don't follow you at all here. When you start the cluster up again, the cluster config from before the shutdown is still there. That's very far from being a completely new cluster :-) The problem is you cannot "start the cluster" in pacemaker; you can only "start nodes". The nodes will come up one by one. As opposed (as I had said) to HP Sertvice Guard, where there is a "cluster formation timeout". That is, the nodes wait for the specified time for the cluster to "form". Then the cluster starts as a whole. Of course that only applies if the whole cluster was down, not if a single node was down. I'm not sure what that would specifically entail, but I'm guessing we have some of the pieces already: - Corosync has a wait_for_all option if you want the cluster to be unable to have quorum at start-up until every node has joined. I don't think you can set a timeout that cancels it, though. - Pacemaker will wait dc-deadtime for the first DC election to complete. (if I understand it correctly ...) - Higher-level tools can start or stop all nodes together (e.g. pcs has pcs cluster start/stop --all). Based on this discussion, I have some questions about pcs: * how is it shutting down the cluster when issuing "pcs cluster stop --all"? First, it sends a request to each node to stop pacemaker. The requests are sent in parallel which prevents resources from being moved from node to node. Once pacemaker stops on all nodes, corosync is stopped on all nodes in the same manner. What if for some external reasons one node is slower (load, network, whatever) than the others and start reacting ? Sending queries in parallel doesn't feels safe enough in regard with all the race conditions that can occurs in the same time. Am I missing something ? If a node gets the request later than others, some resources may be moved to it before it starts shutting down pacemaker as well. Pcs waits for all nodes to shutdown pacemaker before it moves to shutting down corosync. This way, quorum is maintained the whole time pacemaker is shutting down and therefore no services are blocked from stopping due to lack of quorum. OK, so if admins or RA expect to start in, the same conditions the cluster was shut downed, we have to take care of the shutdown ourselves by hands. Considering disabling the resource before shutting down might be the best option in the situation as the CRM will take care of switching off things correctly in a proper transition. My understanding is that pacemaker takes care of switching off things correctly in a proper transition on its shutdown. So there should be no extra care needed. Pacemaker developers, however, need to confirm that. That's fine to me, as a cluster shutdown should be part of a controlled procedure. I have to update my online docs I suppose now. Thank you for your answers! ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: questions about startup fencing
Dne 4.12.2017 v 23:17 Ken Gaillot napsal(a): On Mon, 2017-12-04 at 22:08 +0300, Andrei Borzenkov wrote: 04.12.2017 18:47, Tomas Jelinek пишет: Dne 4.12.2017 v 16:02 Kristoffer Grönlund napsal(a): Tomas Jelinek writes: * how is it shutting down the cluster when issuing "pcs cluster stop --all"? First, it sends a request to each node to stop pacemaker. The requests are sent in parallel which prevents resources from being moved from node to node. Once pacemaker stops on all nodes, corosync is stopped on all nodes in the same manner. * any race condition possible where the cib will record only one node up before the last one shut down? * will the cluster start safely? That definitely sounds racy to me. The best idea I can think of would be to set all nodes except one in standby, and then shutdown pacemaker everywhere... What issues does it solve? Which node should be the one? How do you get the nodes out of standby mode on startup? Is --lifetime=reboot valid for cluster properties? It is accepted by crm_attribute and actually puts value as transient_attribute. standby is a node attribute, so lifetime does apply normally. Right, I forgot about this. I was dealing with 'pcs cluster stop --all' back in January 2015, so I don't remember all the details anymore. However, I was able to dig out the private email thread where stopping a cluster was discussed with pacemaker developers including Andrew Beekhof and David Vossel. Originally, pcs was stopping nodes in parallel in such a manner that each node stopped pacemaker and then corosync independently of other nodes. This caused loss of quorum during stopping the cluster, as nodes hosting resources which stopped fast disconnected from corosync sooner than nodes hosting resources which stopped slowly. Due to quorum missing, some resources could not be stopped and the cluster stop failed. This is covered in here: https://bugzilla.redhat.com/show_bug.cgi?id=1180506 The first attempt to fix the issue was to put nodes into standby mode with --lifetime=reboot: https://github.com/ClusterLabs/pcs/commit/ea6f37983191776fd46d90f22dc1432e0bfc0b91 This didn't work for several reasons. One of them was back then there was no reliable way to set standby mode with --lifetime=reboot for more than one node in a single step. (This may have been fixed in the meantime.) There were however other serious reasons for not putting the nodes into standby as was explained by Andrew: - it [putting the nodes into standby first] means shutdown takes longer (no node stops until all the resources stop) - it makes shutdown more complex (== more fragile), eg... - it result in pcs waiting forever for resources to stop - if a stop fails and the cluster is configured to start at boot, then the node will get fenced and happily run resources when it returns (because all the nodes are up so we still have quorum) - only potentially benefits resources that have no (or very few) dependants and can stop quicker than it takes pcs to get through its "initiate parallel shutdown" loop (which should be rather fast since there is no ssh connection setup overheads) So we ended up with just stopping pacemaker in parallel: https://github.com/ClusterLabs/pcs/commit/1ab2dd1b13839df7e5e9809cde25ac1dbae42c3d I hope this shed light on why pcs stops clusters the way it does and that standby was considered but rejected for good reasons. Regards, Tomas ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: questions about startup fencing
On Mon, 2017-12-04 at 22:08 +0300, Andrei Borzenkov wrote: > 04.12.2017 18:47, Tomas Jelinek пишет: > > Dne 4.12.2017 v 16:02 Kristoffer Grönlund napsal(a): > > > Tomas Jelinek writes: > > > > > > > > > > > > > * how is it shutting down the cluster when issuing "pcs > > > > > cluster stop > > > > > --all"? > > > > > > > > First, it sends a request to each node to stop pacemaker. The > > > > requests > > > > are sent in parallel which prevents resources from being moved > > > > from node > > > > to node. Once pacemaker stops on all nodes, corosync is stopped > > > > on all > > > > nodes in the same manner. > > > > > > > > > * any race condition possible where the cib will record only > > > > > one > > > > > node up before > > > > > the last one shut down? > > > > > * will the cluster start safely? > > > > > > That definitely sounds racy to me. The best idea I can think of > > > would be > > > to set all nodes except one in standby, and then shutdown > > > pacemaker > > > everywhere... > > > > > > > What issues does it solve? Which node should be the one? > > > > How do you get the nodes out of standby mode on startup? > > Is --lifetime=reboot valid for cluster properties? It is accepted by > crm_attribute and actually puts value as transient_attribute. standby is a node attribute, so lifetime does apply normally. -- Ken Gaillot ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: questions about startup fencing
04.12.2017 18:47, Tomas Jelinek пишет: > Dne 4.12.2017 v 16:02 Kristoffer Grönlund napsal(a): >> Tomas Jelinek writes: >> * how is it shutting down the cluster when issuing "pcs cluster stop --all"? >>> >>> First, it sends a request to each node to stop pacemaker. The requests >>> are sent in parallel which prevents resources from being moved from node >>> to node. Once pacemaker stops on all nodes, corosync is stopped on all >>> nodes in the same manner. >>> * any race condition possible where the cib will record only one node up before the last one shut down? * will the cluster start safely? >> >> That definitely sounds racy to me. The best idea I can think of would be >> to set all nodes except one in standby, and then shutdown pacemaker >> everywhere... >> > > What issues does it solve? Which node should be the one? > > How do you get the nodes out of standby mode on startup? Is --lifetime=reboot valid for cluster properties? It is accepted by crm_attribute and actually puts value as transient_attribute. ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: questions about startup fencing
On Mon, 4 Dec 2017 16:50:47 +0100 Tomas Jelinek wrote: > Dne 4.12.2017 v 14:21 Jehan-Guillaume de Rorthais napsal(a): > > On Mon, 4 Dec 2017 12:31:06 +0100 > > Tomas Jelinek wrote: > > > >> Dne 4.12.2017 v 10:36 Jehan-Guillaume de Rorthais napsal(a): > >>> On Fri, 01 Dec 2017 16:34:08 -0600 > >>> Ken Gaillot wrote: > >>> > On Thu, 2017-11-30 at 07:55 +0100, Ulrich Windl wrote: > > > > > >> Kristoffer Gronlund wrote: > >>> Adam Spiers writes: > >>> > - The whole cluster is shut down cleanly. > > - The whole cluster is then started up again. (Side question: > what > happens if the last node to shut down is not the first to > start up? > How will the cluster ensure it has the most recent version of > the > CIB? Without that, how would it know whether the last man > standing > was shut down cleanly or not?) > >>> > >>> This is my opinion, I don't really know what the "official" > >>> pacemaker > >>> stance is: There is no such thing as shutting down a cluster > >>> cleanly. A > >>> cluster is a process stretching over multiple nodes - if they all > >>> shut > >>> down, the process is gone. When you start up again, you > >>> effectively have > >>> a completely new cluster. > >> > >> Sorry, I don't follow you at all here. When you start the cluster > >> up > >> again, the cluster config from before the shutdown is still there. > >> That's very far from being a completely new cluster :-) > > > > The problem is you cannot "start the cluster" in pacemaker; you can > > only "start nodes". The nodes will come up one by one. As opposed (as > > I had said) to HP Sertvice Guard, where there is a "cluster formation > > timeout". That is, the nodes wait for the specified time for the > > cluster to "form". Then the cluster starts as a whole. Of course that > > only applies if the whole cluster was down, not if a single node was > > down. > > I'm not sure what that would specifically entail, but I'm guessing we > have some of the pieces already: > > - Corosync has a wait_for_all option if you want the cluster to be > unable to have quorum at start-up until every node has joined. I don't > think you can set a timeout that cancels it, though. > > - Pacemaker will wait dc-deadtime for the first DC election to > complete. (if I understand it correctly ...) > > - Higher-level tools can start or stop all nodes together (e.g. pcs has > pcs cluster start/stop --all). > >>> > >>> Based on this discussion, I have some questions about pcs: > >>> > >>> * how is it shutting down the cluster when issuing "pcs cluster stop > >>> --all"? > >> > >> First, it sends a request to each node to stop pacemaker. The requests > >> are sent in parallel which prevents resources from being moved from node > >> to node. Once pacemaker stops on all nodes, corosync is stopped on all > >> nodes in the same manner. > > > > What if for some external reasons one node is slower (load, network, > > whatever) than the others and start reacting ? Sending queries in parallel > > doesn't feels safe enough in regard with all the race conditions that can > > occurs in the same time. > > > > Am I missing something ? > > > > If a node gets the request later than others, some resources may be > moved to it before it starts shutting down pacemaker as well. Pcs waits > for all nodes to shutdown pacemaker before it moves to shutting down > corosync. This way, quorum is maintained the whole time pacemaker is > shutting down and therefore no services are blocked from stopping due to > lack of quorum. OK, so if admins or RA expect to start in, the same conditions the cluster was shut downed, we have to take care of the shutdown ourselves by hands. Considering disabling the resource before shutting down might be the best option in the situation as the CRM will take care of switching off things correctly in a proper transition. That's fine to me, as a cluster shutdown should be part of a controlled procedure. I have to update my online docs I suppose now. Thank you for your answers! -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: questions about startup fencing
Dne 4.12.2017 v 14:21 Jehan-Guillaume de Rorthais napsal(a): On Mon, 4 Dec 2017 12:31:06 +0100 Tomas Jelinek wrote: Dne 4.12.2017 v 10:36 Jehan-Guillaume de Rorthais napsal(a): On Fri, 01 Dec 2017 16:34:08 -0600 Ken Gaillot wrote: On Thu, 2017-11-30 at 07:55 +0100, Ulrich Windl wrote: Kristoffer Gronlund wrote: Adam Spiers writes: - The whole cluster is shut down cleanly. - The whole cluster is then started up again. (Side question: what happens if the last node to shut down is not the first to start up? How will the cluster ensure it has the most recent version of the CIB? Without that, how would it know whether the last man standing was shut down cleanly or not?) This is my opinion, I don't really know what the "official" pacemaker stance is: There is no such thing as shutting down a cluster cleanly. A cluster is a process stretching over multiple nodes - if they all shut down, the process is gone. When you start up again, you effectively have a completely new cluster. Sorry, I don't follow you at all here. When you start the cluster up again, the cluster config from before the shutdown is still there. That's very far from being a completely new cluster :-) The problem is you cannot "start the cluster" in pacemaker; you can only "start nodes". The nodes will come up one by one. As opposed (as I had said) to HP Sertvice Guard, where there is a "cluster formation timeout". That is, the nodes wait for the specified time for the cluster to "form". Then the cluster starts as a whole. Of course that only applies if the whole cluster was down, not if a single node was down. I'm not sure what that would specifically entail, but I'm guessing we have some of the pieces already: - Corosync has a wait_for_all option if you want the cluster to be unable to have quorum at start-up until every node has joined. I don't think you can set a timeout that cancels it, though. - Pacemaker will wait dc-deadtime for the first DC election to complete. (if I understand it correctly ...) - Higher-level tools can start or stop all nodes together (e.g. pcs has pcs cluster start/stop --all). Based on this discussion, I have some questions about pcs: * how is it shutting down the cluster when issuing "pcs cluster stop --all"? First, it sends a request to each node to stop pacemaker. The requests are sent in parallel which prevents resources from being moved from node to node. Once pacemaker stops on all nodes, corosync is stopped on all nodes in the same manner. What if for some external reasons one node is slower (load, network, whatever) than the others and start reacting ? Sending queries in parallel doesn't feels safe enough in regard with all the race conditions that can occurs in the same time. Am I missing something ? If a node gets the request later than others, some resources may be moved to it before it starts shutting down pacemaker as well. Pcs waits for all nodes to shutdown pacemaker before it moves to shutting down corosync. This way, quorum is maintained the whole time pacemaker is shutting down and therefore no services are blocked from stopping due to lack of quorum. ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: questions about startup fencing
Dne 4.12.2017 v 16:02 Kristoffer Grönlund napsal(a): Tomas Jelinek writes: * how is it shutting down the cluster when issuing "pcs cluster stop --all"? First, it sends a request to each node to stop pacemaker. The requests are sent in parallel which prevents resources from being moved from node to node. Once pacemaker stops on all nodes, corosync is stopped on all nodes in the same manner. * any race condition possible where the cib will record only one node up before the last one shut down? * will the cluster start safely? That definitely sounds racy to me. The best idea I can think of would be to set all nodes except one in standby, and then shutdown pacemaker everywhere... What issues does it solve? Which node should be the one? How do you get the nodes out of standby mode on startup? Sure, 'pcs cluster start --all' could do that. If it is used to start the cluster that is. What if you start the cluster by restarting the nodes? Or by starting corosync and pacemaker via systemd without using pcs? Or by any other method? There is no reliable way to get nodes out of standby/maintenance mode on start, so we must stick to simple pacemaker shutdown. Moreover, even if pcs is used, how do we know a node was put into standby because the whole cluster was stopped and not because a user set it to standby manually for whatever reason? ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: questions about startup fencing
On 12/04/2017 04:02 PM, Kristoffer Grönlund wrote: > Tomas Jelinek writes: > >>> * how is it shutting down the cluster when issuing "pcs cluster stop --all"? >> First, it sends a request to each node to stop pacemaker. The requests >> are sent in parallel which prevents resources from being moved from node >> to node. Once pacemaker stops on all nodes, corosync is stopped on all >> nodes in the same manner. >> >>> * any race condition possible where the cib will record only one node up >>> before >>>the last one shut down? >>> * will the cluster start safely? > That definitely sounds racy to me. The best idea I can think of would be > to set all nodes except one in standby, and then shutdown pacemaker > everywhere... Really mean standby or rather maintenance to keep resources from switching to the still alive nodes during shutdown? Regards, Klaus ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: questions about startup fencing
Tomas Jelinek writes: >> >> * how is it shutting down the cluster when issuing "pcs cluster stop --all"? > > First, it sends a request to each node to stop pacemaker. The requests > are sent in parallel which prevents resources from being moved from node > to node. Once pacemaker stops on all nodes, corosync is stopped on all > nodes in the same manner. > >> * any race condition possible where the cib will record only one node up >> before >>the last one shut down? >> * will the cluster start safely? That definitely sounds racy to me. The best idea I can think of would be to set all nodes except one in standby, and then shutdown pacemaker everywhere... -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: questions about startup fencing
On Mon, 4 Dec 2017 12:31:06 +0100 Tomas Jelinek wrote: > Dne 4.12.2017 v 10:36 Jehan-Guillaume de Rorthais napsal(a): > > On Fri, 01 Dec 2017 16:34:08 -0600 > > Ken Gaillot wrote: > > > >> On Thu, 2017-11-30 at 07:55 +0100, Ulrich Windl wrote: > >>> > >>> > Kristoffer Gronlund wrote: > > Adam Spiers writes: > > > >> - The whole cluster is shut down cleanly. > >> > >> - The whole cluster is then started up again. (Side question: > >> what > >> happens if the last node to shut down is not the first to > >> start up? > >> How will the cluster ensure it has the most recent version of > >> the > >> CIB? Without that, how would it know whether the last man > >> standing > >> was shut down cleanly or not?) > > > > This is my opinion, I don't really know what the "official" > > pacemaker > > stance is: There is no such thing as shutting down a cluster > > cleanly. A > > cluster is a process stretching over multiple nodes - if they all > > shut > > down, the process is gone. When you start up again, you > > effectively have > > a completely new cluster. > > Sorry, I don't follow you at all here. When you start the cluster > up > again, the cluster config from before the shutdown is still there. > That's very far from being a completely new cluster :-) > >>> > >>> The problem is you cannot "start the cluster" in pacemaker; you can > >>> only "start nodes". The nodes will come up one by one. As opposed (as > >>> I had said) to HP Sertvice Guard, where there is a "cluster formation > >>> timeout". That is, the nodes wait for the specified time for the > >>> cluster to "form". Then the cluster starts as a whole. Of course that > >>> only applies if the whole cluster was down, not if a single node was > >>> down. > >> > >> I'm not sure what that would specifically entail, but I'm guessing we > >> have some of the pieces already: > >> > >> - Corosync has a wait_for_all option if you want the cluster to be > >> unable to have quorum at start-up until every node has joined. I don't > >> think you can set a timeout that cancels it, though. > >> > >> - Pacemaker will wait dc-deadtime for the first DC election to > >> complete. (if I understand it correctly ...) > >> > >> - Higher-level tools can start or stop all nodes together (e.g. pcs has > >> pcs cluster start/stop --all). > > > > Based on this discussion, I have some questions about pcs: > > > > * how is it shutting down the cluster when issuing "pcs cluster stop > > --all"? > > First, it sends a request to each node to stop pacemaker. The requests > are sent in parallel which prevents resources from being moved from node > to node. Once pacemaker stops on all nodes, corosync is stopped on all > nodes in the same manner. What if for some external reasons one node is slower (load, network, whatever) than the others and start reacting ? Sending queries in parallel doesn't feels safe enough in regard with all the race conditions that can occurs in the same time. Am I missing something ? ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: questions about startup fencing
Dne 4.12.2017 v 10:36 Jehan-Guillaume de Rorthais napsal(a): On Fri, 01 Dec 2017 16:34:08 -0600 Ken Gaillot wrote: On Thu, 2017-11-30 at 07:55 +0100, Ulrich Windl wrote: Kristoffer Gronlund wrote: Adam Spiers writes: - The whole cluster is shut down cleanly. - The whole cluster is then started up again. (Side question: what happens if the last node to shut down is not the first to start up? How will the cluster ensure it has the most recent version of the CIB? Without that, how would it know whether the last man standing was shut down cleanly or not?) This is my opinion, I don't really know what the "official" pacemaker stance is: There is no such thing as shutting down a cluster cleanly. A cluster is a process stretching over multiple nodes - if they all shut down, the process is gone. When you start up again, you effectively have a completely new cluster. Sorry, I don't follow you at all here. When you start the cluster up again, the cluster config from before the shutdown is still there. That's very far from being a completely new cluster :-) The problem is you cannot "start the cluster" in pacemaker; you can only "start nodes". The nodes will come up one by one. As opposed (as I had said) to HP Sertvice Guard, where there is a "cluster formation timeout". That is, the nodes wait for the specified time for the cluster to "form". Then the cluster starts as a whole. Of course that only applies if the whole cluster was down, not if a single node was down. I'm not sure what that would specifically entail, but I'm guessing we have some of the pieces already: - Corosync has a wait_for_all option if you want the cluster to be unable to have quorum at start-up until every node has joined. I don't think you can set a timeout that cancels it, though. - Pacemaker will wait dc-deadtime for the first DC election to complete. (if I understand it correctly ...) - Higher-level tools can start or stop all nodes together (e.g. pcs has pcs cluster start/stop --all). Based on this discussion, I have some questions about pcs: * how is it shutting down the cluster when issuing "pcs cluster stop --all"? First, it sends a request to each node to stop pacemaker. The requests are sent in parallel which prevents resources from being moved from node to node. Once pacemaker stops on all nodes, corosync is stopped on all nodes in the same manner. * any race condition possible where the cib will record only one node up before the last one shut down? * will the cluster start safely? IIRC, crmsh does not implement the full cluster shutdown, only one node shut down at a time. Is it because Pacemaker has no way to shutdown the whole cluster by stopping all resources everywhere forbidding failovers in the process? Is it required to include a bunch of "pcs resource disable " before shutting down the cluster? No. Regards, Tomas Thanks, ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: questions about startup fencing
On Fri, 01 Dec 2017 16:34:08 -0600 Ken Gaillot wrote: > On Thu, 2017-11-30 at 07:55 +0100, Ulrich Windl wrote: > > > > > > > Kristoffer Gronlund wrote: > > > > Adam Spiers writes: > > > > > > > > > - The whole cluster is shut down cleanly. > > > > > > > > > > - The whole cluster is then started up again. (Side question: > > > > > what > > > > > happens if the last node to shut down is not the first to > > > > > start up? > > > > > How will the cluster ensure it has the most recent version of > > > > > the > > > > > CIB? Without that, how would it know whether the last man > > > > > standing > > > > > was shut down cleanly or not?) > > > > > > > > This is my opinion, I don't really know what the "official" > > > > pacemaker > > > > stance is: There is no such thing as shutting down a cluster > > > > cleanly. A > > > > cluster is a process stretching over multiple nodes - if they all > > > > shut > > > > down, the process is gone. When you start up again, you > > > > effectively have > > > > a completely new cluster. > > > > > > Sorry, I don't follow you at all here. When you start the cluster > > > up > > > again, the cluster config from before the shutdown is still there. > > > That's very far from being a completely new cluster :-) > > > > The problem is you cannot "start the cluster" in pacemaker; you can > > only "start nodes". The nodes will come up one by one. As opposed (as > > I had said) to HP Sertvice Guard, where there is a "cluster formation > > timeout". That is, the nodes wait for the specified time for the > > cluster to "form". Then the cluster starts as a whole. Of course that > > only applies if the whole cluster was down, not if a single node was > > down. > > I'm not sure what that would specifically entail, but I'm guessing we > have some of the pieces already: > > - Corosync has a wait_for_all option if you want the cluster to be > unable to have quorum at start-up until every node has joined. I don't > think you can set a timeout that cancels it, though. > > - Pacemaker will wait dc-deadtime for the first DC election to > complete. (if I understand it correctly ...) > > - Higher-level tools can start or stop all nodes together (e.g. pcs has > pcs cluster start/stop --all). Based on this discussion, I have some questions about pcs: * how is it shutting down the cluster when issuing "pcs cluster stop --all"? * any race condition possible where the cib will record only one node up before the last one shut down? * will the cluster start safely? IIRC, crmsh does not implement the full cluster shutdown, only one node shut down at a time. Is it because Pacemaker has no way to shutdown the whole cluster by stopping all resources everywhere forbidding failovers in the process? Is it required to include a bunch of "pcs resource disable " before shutting down the cluster? Thanks, -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: questions about startup fencing
On Thu, 2017-11-30 at 07:55 +0100, Ulrich Windl wrote: > > > > Kristoffer Gronlund wrote: > > > Adam Spiers writes: > > > > > > > - The whole cluster is shut down cleanly. > > > > > > > > - The whole cluster is then started up again. (Side question: > > > > what > > > > happens if the last node to shut down is not the first to > > > > start up? > > > > How will the cluster ensure it has the most recent version of > > > > the > > > > CIB? Without that, how would it know whether the last man > > > > standing > > > > was shut down cleanly or not?) > > > > > > This is my opinion, I don't really know what the "official" > > > pacemaker > > > stance is: There is no such thing as shutting down a cluster > > > cleanly. A > > > cluster is a process stretching over multiple nodes - if they all > > > shut > > > down, the process is gone. When you start up again, you > > > effectively have > > > a completely new cluster. > > > > Sorry, I don't follow you at all here. When you start the cluster > > up > > again, the cluster config from before the shutdown is still there. > > That's very far from being a completely new cluster :-) > > The problem is you cannot "start the cluster" in pacemaker; you can > only "start nodes". The nodes will come up one by one. As opposed (as > I had said) to HP Sertvice Guard, where there is a "cluster formation > timeout". That is, the nodes wait for the specified time for the > cluster to "form". Then the cluster starts as a whole. Of course that > only applies if the whole cluster was down, not if a single node was > down. I'm not sure what that would specifically entail, but I'm guessing we have some of the pieces already: - Corosync has a wait_for_all option if you want the cluster to be unable to have quorum at start-up until every node has joined. I don't think you can set a timeout that cancels it, though. - Pacemaker will wait dc-deadtime for the first DC election to complete. (if I understand it correctly ...) - Higher-level tools can start or stop all nodes together (e.g. pcs has pcs cluster start/stop --all). > > > > > When starting up, how is the cluster, at any point, to know if > > > the > > > cluster it has knowledge of is the "latest" cluster? > > > > That was exactly my question. > > > > > The next node could have a newer version of the CIB which adds > > > yet > > > more nodes to the cluster. > > > > Yes, exactly. If the first node to start up was not the last man > > standing, the CIB history is effectively being forked. So how is > > this > > issue avoided? > > Quorum? "Cluster formation delay"? > > > > > > The only way to bring up a cluster from being completely stopped > > > is to > > > treat it as creating a completely new cluster. The first node to > > > start > > > "creates" the cluster and later nodes join that cluster. > > > > That's ignoring the cluster config, which persists even when the > > cluster's down. > > > > But to be clear, you picked a small side question from my original > > post and answered that. The main questions I had were about > > startup > > fencing :-) -- Ken Gaillot ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Antw: Re: questions about startup fencing
> Kristoffer Gronlund wrote: >>Adam Spiers writes: >> >>> - The whole cluster is shut down cleanly. >>> >>> - The whole cluster is then started up again. (Side question: what >>> happens if the last node to shut down is not the first to start up? >>> How will the cluster ensure it has the most recent version of the >>> CIB? Without that, how would it know whether the last man standing >>> was shut down cleanly or not?) >> >>This is my opinion, I don't really know what the "official" pacemaker >>stance is: There is no such thing as shutting down a cluster cleanly. A >>cluster is a process stretching over multiple nodes - if they all shut >>down, the process is gone. When you start up again, you effectively have >>a completely new cluster. > > Sorry, I don't follow you at all here. When you start the cluster up > again, the cluster config from before the shutdown is still there. > That's very far from being a completely new cluster :-) The problem is you cannot "start the cluster" in pacemaker; you can only "start nodes". The nodes will come up one by one. As opposed (as I had said) to HP Sertvice Guard, where there is a "cluster formation timeout". That is, the nodes wait for the specified time for the cluster to "form". Then the cluster starts as a whole. Of course that only applies if the whole cluster was down, not if a single node was down. > >>When starting up, how is the cluster, at any point, to know if the >>cluster it has knowledge of is the "latest" cluster? > > That was exactly my question. > >>The next node could have a newer version of the CIB which adds yet >>more nodes to the cluster. > > Yes, exactly. If the first node to start up was not the last man > standing, the CIB history is effectively being forked. So how is this > issue avoided? Quorum? "Cluster formation delay"? > >>The only way to bring up a cluster from being completely stopped is to >>treat it as creating a completely new cluster. The first node to start >>"creates" the cluster and later nodes join that cluster. > > That's ignoring the cluster config, which persists even when the > cluster's down. > > But to be clear, you picked a small side question from my original > post and answered that. The main questions I had were about startup > fencing :-) > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Antw: Re: questions about startup fencing
> Adam Spiers writes: > >> - The whole cluster is shut down cleanly. >> >> - The whole cluster is then started up again. (Side question: what >> happens if the last node to shut down is not the first to start up? >> How will the cluster ensure it has the most recent version of the >> CIB? Without that, how would it know whether the last man standing >> was shut down cleanly or not?) > > This is my opinion, I don't really know what the "official" pacemaker > stance is: There is no such thing as shutting down a cluster cleanly. A > cluster is a process stretching over multiple nodes - if they all shut > down, the process is gone. When you start up again, you effectively have > a completely new cluster. > > When starting up, how is the cluster, at any point, to know if the > cluster it has knowledge of is the "latest" cluster? The next node could > have a newer version of the CIB which adds yet more nodes to the > cluster. > > The only way to bring up a cluster from being completely stopped is to > treat it as creating a completely new cluster. The first node to start > "creates" the cluster and later nodes join that cluster. I think it is (once again) a problem of pacemaker: In HP Service Guard there was a "cmhaltnode" to halt a node, and a "cmhaltcluster" (AFAIR) to halt the whole cluster. The other direction was "cmrunnode" and "cmruncluster" (AFAIR). So when doing it on the cluster level, all nodes end with the same information (and can start with the "latest"... > > Cheers, > Kristoffer > > -- > // Kristoffer Grönlund > // kgronl...@suse.com > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org