Re: [Pacemaker] best/proper way to shut down a node for service
On Wed, Jan 23, 2013 at 2:21 PM, Brian J. Murrell br...@interlinx.bc.ca wrote: OK. So you have a corosync cluster of nodes with pacemaker managing resources on them, including (of course) STONITH. What's the best/proper way to shut down a node, say, for maintenance such that pacemaker doesn't go trying to fix that situation and STONITHing it to try to bring it back up, etc.? If you shut down the cluster on that node, the other wont do anything to it. Nothing special is needed, just reverse whatever you did to start it up (on some systems its service openais stop, on others its service pacemaker stop service cman stop) Currently my practice for STONITH is to have it reboot. Maybe it's a better practice to have STONITH configured to just power a node down and not try to power it back up for this exact reason? Any other suggestions welcome. Cheers, b. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] best/proper way to shut down a node for service
Hi, On Wed, Jan 23, 2013 at 11:28 PM, Brian J. Murrell br...@interlinx.bc.ca wrote: On 13-01-23 03:32 AM, Dan Frincu wrote: Hi, Hi, I usually put the node in standby, which means it can no longer run any resources on it. Both Pacemaker and Corosync continue to run, node provides quorum. But a node in standby will still be STONITHed if it goes AWOL. I put a node in standby and then yanked it's power and it's peer started STONITH operations on it. That's the part I want to avoid. You have to explain what AWOL means in this context, even in a 2-node cluster, putting one node in standby without changing no-quorum-policy to ignore or setting stonith-enabled=false will just move off the resources from the node. Failure to stop a resource running on a node which is in the shutdown procedure (which means resources will be stopped - shutting down Pacemaker or by putting the node in standby would have the same effect on the resources, telling them to stop) will lead to STONITH. So just to emphasize this again, if there is a stop failure, regardless of how you turn off the resource (Pacemaker shutdown, putting the node in standby, telling the resource to move to another node, etc.), that will STONITH the node. Now, going back to no-quorum-policy, default action is stop, so in a 2-node cluster, if you shutdown Pacemaker without setting no-quorum-policy to ignore, when quorum is lost, resources on the remaining node stop. By putting the node in standby, quorum is still met, this does not take place. Once a node is in standby, if you want to stop pacemaker and corosync, that won't lead into the node running AWOL situation you've mentioned earlier. Having more than 2 nodes in a cluster means shutdown of pacemaker and corosync/putting the node in standby won't affect quorum as the other nodes still work. Either way, choose whatever fits your requirement best, I just added some comments related to how this would work and what would be the possible problems in a 2-node cluster. HTH, Dan b. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- Dan Frincu CCNA, RHCE ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] best/proper way to shut down a node for service
Hi, On Wed, Jan 23, 2013 at 5:21 AM, Brian J. Murrell br...@interlinx.bc.ca wrote: OK. So you have a corosync cluster of nodes with pacemaker managing resources on them, including (of course) STONITH. What's the best/proper way to shut down a node, say, for maintenance such that pacemaker doesn't go trying to fix that situation and STONITHing it to try to bring it back up, etc.? Currently my practice for STONITH is to have it reboot. Maybe it's a better practice to have STONITH configured to just power a node down and not try to power it back up for this exact reason? Any other suggestions welcome. I usually put the node in standby, which means it can no longer run any resources on it. Both Pacemaker and Corosync continue to run, node provides quorum. For global cluster maintenance, such as when upgrading to a major software version, maintenance-mode is needed. HTH, Dan Cheers, b. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- Dan Frincu CCNA, RHCE ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] best/proper way to shut down a node for service
Hi, We have a 2-node active/standby PGSQL/DRBD Cluster with STONITH and we put one node in standby Then shutdown pacemaker on this standby node (service pacemaker stop), wait some sec, then doing the same With corosync (service corosync stop), again wait some seconds and always have a look at crm_mon –r on the active node. After that, the standby nodes status should be OFFLINE (standby). Then we can safely reboot or shutdown this node. When ist rebootet, we first start DRBD and let it sync completly – then restart corosync (wich autostarts pacemaker) with Service corosync start. After some moments it will become standby again in the cluster and you can Put it back online with crm node online nodename. This works very well and we dont experience any crm hang on the active node like we did when we missed to stop pacemaker and then corosync Before reboot. Also you can put everything in maintenance-mode=true, but then even on the active node PGSQL isnt monitored (restarted if it shuts down), therefore We only use maintenance if we really do manual steps to PG or updating the cluster software. Greets from Berlin, Martin Von: Dan Frincu df.clus...@gmail.commailto:df.clus...@gmail.com Antworten an: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.orgmailto:pacemaker@oss.clusterlabs.org Datum: Wednesday, January 23, 2013 9:32 AM An: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.orgmailto:pacemaker@oss.clusterlabs.org Betreff: Re: [Pacemaker] best/proper way to shut down a node for service Hi, On Wed, Jan 23, 2013 at 5:21 AM, Brian J. Murrell br...@interlinx.bc.camailto:br...@interlinx.bc.ca wrote: OK. So you have a corosync cluster of nodes with pacemaker managing resources on them, including (of course) STONITH. What's the best/proper way to shut down a node, say, for maintenance such that pacemaker doesn't go trying to fix that situation and STONITHing it to try to bring it back up, etc.? Currently my practice for STONITH is to have it reboot. Maybe it's a better practice to have STONITH configured to just power a node down and not try to power it back up for this exact reason? Any other suggestions welcome. I usually put the node in standby, which means it can no longer run any resources on it. Both Pacemaker and Corosync continue to run, node provides quorum. For global cluster maintenance, such as when upgrading to a major software version, maintenance-mode is needed. HTH, Dan Cheers, b. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.orgmailto:Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- Dan Frincu CCNA, RHCE ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.orgmailto:Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] best/proper way to shut down a node for service
I've asked this before, you should be able to search the question. Essentially if pacemaker is shut down gracefully the remaining nodes are happy to leave it be. Generally I standby the node and then stop openais ... I have been caught out once bringing a node back online which was in standby. The logical volumes were some how active on the node coming back into the cluster, the monitor operations detected this (key here: monitor operations fire even in standby I believe) and shut down resources on the active node as part of the recovery process. On Wednesday, 23 January 2013, Dan Frincu wrote: Hi, On Wed, Jan 23, 2013 at 5:21 AM, Brian J. Murrell br...@interlinx.bc.cajavascript:; wrote: OK. So you have a corosync cluster of nodes with pacemaker managing resources on them, including (of course) STONITH. What's the best/proper way to shut down a node, say, for maintenance such that pacemaker doesn't go trying to fix that situation and STONITHing it to try to bring it back up, etc.? Currently my practice for STONITH is to have it reboot. Maybe it's a better practice to have STONITH configured to just power a node down and not try to power it back up for this exact reason? Any other suggestions welcome. I usually put the node in standby, which means it can no longer run any resources on it. Both Pacemaker and Corosync continue to run, node provides quorum. For global cluster maintenance, such as when upgrading to a major software version, maintenance-mode is needed. HTH, Dan Cheers, b. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org javascript:; http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- Dan Frincu CCNA, RHCE ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org javascript:; http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] best/proper way to shut down a node for service
On 13-01-23 03:32 AM, Dan Frincu wrote: Hi, Hi, I usually put the node in standby, which means it can no longer run any resources on it. Both Pacemaker and Corosync continue to run, node provides quorum. But a node in standby will still be STONITHed if it goes AWOL. I put a node in standby and then yanked it's power and it's peer started STONITH operations on it. That's the part I want to avoid. b. signature.asc Description: OpenPGP digital signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] best/proper way to shut down a node for service
Indeed ... thats the correct behavior as it was still an active cluster member, it just happens to not be running any resources as its in standby. If you shutdown (gracefully) openais and its showing happily as 'offline' on the remaining node(s) then all will be well. On 24 January 2013 10:28, Brian J. Murrell br...@interlinx.bc.ca wrote: On 13-01-23 03:32 AM, Dan Frincu wrote: Hi, Hi, I usually put the node in standby, which means it can no longer run any resources on it. Both Pacemaker and Corosync continue to run, node provides quorum. But a node in standby will still be STONITHed if it goes AWOL. I put a node in standby and then yanked it's power and it's peer started STONITH operations on it. That's the part I want to avoid. b. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] best/proper way to shut down a node for service
OK. So you have a corosync cluster of nodes with pacemaker managing resources on them, including (of course) STONITH. What's the best/proper way to shut down a node, say, for maintenance such that pacemaker doesn't go trying to fix that situation and STONITHing it to try to bring it back up, etc.? Currently my practice for STONITH is to have it reboot. Maybe it's a better practice to have STONITH configured to just power a node down and not try to power it back up for this exact reason? Any other suggestions welcome. Cheers, b. signature.asc Description: OpenPGP digital signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org