Re: [Pacemaker] best/proper way to shut down a node for service

2013-01-31 Thread Andrew Beekhof
On Wed, Jan 23, 2013 at 2:21 PM, Brian J. Murrell br...@interlinx.bc.ca wrote:
 OK.  So you have a corosync cluster of nodes with pacemaker managing
 resources on them, including (of course) STONITH.

 What's the best/proper way to shut down a node, say, for maintenance
 such that pacemaker doesn't go trying to fix that situation and
 STONITHing it to try to bring it back up, etc.?

If you shut down the cluster on that node, the other wont do anything to it.
Nothing special is needed, just reverse whatever you did to start it
up (on some systems its service openais stop, on others its service
pacemaker stop  service cman stop)


 Currently my practice for STONITH is to have it reboot.  Maybe it's a
 better practice to have STONITH configured to just power a node down and
 not try to power it back up for this exact reason?

 Any other suggestions welcome.

 Cheers,
 b.


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] best/proper way to shut down a node for service

2013-01-24 Thread Dan Frincu
Hi,

On Wed, Jan 23, 2013 at 11:28 PM, Brian J. Murrell
br...@interlinx.bc.ca wrote:
 On 13-01-23 03:32 AM, Dan Frincu wrote:
 Hi,

 Hi,

 I usually put the node in standby, which means it can no longer run
 any resources on it. Both Pacemaker and Corosync continue to run, node
 provides quorum.

 But a node in standby will still be STONITHed if it goes AWOL.  I put a
 node in standby and then yanked it's power and it's peer started STONITH
 operations on it.  That's the part I want to avoid.

You have to explain what AWOL means in this context, even in a 2-node
cluster, putting one node in standby without changing no-quorum-policy
to ignore or setting stonith-enabled=false will just move off the
resources from the node.

Failure to stop a resource running on a node which is in the shutdown
procedure (which means resources will be stopped - shutting down
Pacemaker or by putting the node in standby would have the same effect
on the resources, telling them to stop) will lead to STONITH.

So just to emphasize this again, if there is a stop failure,
regardless of how you turn off the resource (Pacemaker shutdown,
putting the node in standby, telling the resource to move to another
node, etc.), that will STONITH the node.

Now, going back to no-quorum-policy, default action is stop, so in a
2-node cluster, if you shutdown Pacemaker without setting
no-quorum-policy to ignore, when quorum is lost, resources on the
remaining node stop. By putting the node in standby, quorum is still
met, this does not take place.

Once a node is in standby, if you want to stop pacemaker and corosync,
that won't lead into the node running AWOL situation you've
mentioned earlier.

Having more than 2 nodes in a cluster means shutdown of pacemaker and
corosync/putting the node in standby won't affect quorum as the other
nodes still work.

Either way, choose whatever fits your requirement best, I just added
some comments related to how this would work and what would be the
possible problems in a 2-node cluster.

HTH,
Dan


 b.



 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org




-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] best/proper way to shut down a node for service

2013-01-23 Thread Dan Frincu
Hi,

On Wed, Jan 23, 2013 at 5:21 AM, Brian J. Murrell br...@interlinx.bc.ca wrote:
 OK.  So you have a corosync cluster of nodes with pacemaker managing
 resources on them, including (of course) STONITH.

 What's the best/proper way to shut down a node, say, for maintenance
 such that pacemaker doesn't go trying to fix that situation and
 STONITHing it to try to bring it back up, etc.?

 Currently my practice for STONITH is to have it reboot.  Maybe it's a
 better practice to have STONITH configured to just power a node down and
 not try to power it back up for this exact reason?

 Any other suggestions welcome.

I usually put the node in standby, which means it can no longer run
any resources on it. Both Pacemaker and Corosync continue to run, node
provides quorum.

For global cluster maintenance, such as when upgrading to a major
software version, maintenance-mode is needed.

HTH,
Dan


 Cheers,
 b.


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org




-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] best/proper way to shut down a node for service

2013-01-23 Thread Martin Seener
Hi,

We have a 2-node active/standby PGSQL/DRBD Cluster with STONITH and we put one 
node in standby
Then shutdown pacemaker on this standby node (service pacemaker stop), wait 
some sec, then doing the same
With corosync (service corosync stop), again wait some seconds and always have 
a look at crm_mon –r on the active node.

After that, the standby nodes status should be OFFLINE (standby). Then we can 
safely reboot or shutdown this node.

When ist rebootet, we first start DRBD and let it sync completly – then restart 
corosync (wich autostarts pacemaker) with
Service corosync start. After some moments it will become standby again in 
the cluster and you can
Put it back online with crm node online nodename.

This works very well and we dont experience any crm hang on the active node 
like we did when we missed to stop pacemaker and then corosync
Before reboot.

Also you can put everything in maintenance-mode=true, but then even on the 
active node PGSQL isnt monitored (restarted if it shuts down), therefore
We only use maintenance if we really do manual steps to PG or updating the 
cluster software.

Greets from Berlin,

Martin


Von: Dan Frincu df.clus...@gmail.commailto:df.clus...@gmail.com
Antworten an: The Pacemaker cluster resource manager 
pacemaker@oss.clusterlabs.orgmailto:pacemaker@oss.clusterlabs.org
Datum: Wednesday, January 23, 2013 9:32 AM
An: The Pacemaker cluster resource manager 
pacemaker@oss.clusterlabs.orgmailto:pacemaker@oss.clusterlabs.org
Betreff: Re: [Pacemaker] best/proper way to shut down a node for service

Hi,

On Wed, Jan 23, 2013 at 5:21 AM, Brian J. Murrell 
br...@interlinx.bc.camailto:br...@interlinx.bc.ca wrote:
OK.  So you have a corosync cluster of nodes with pacemaker managing
resources on them, including (of course) STONITH.

What's the best/proper way to shut down a node, say, for maintenance
such that pacemaker doesn't go trying to fix that situation and
STONITHing it to try to bring it back up, etc.?

Currently my practice for STONITH is to have it reboot.  Maybe it's a
better practice to have STONITH configured to just power a node down and
not try to power it back up for this exact reason?

Any other suggestions welcome.

I usually put the node in standby, which means it can no longer run
any resources on it. Both Pacemaker and Corosync continue to run, node
provides quorum.

For global cluster maintenance, such as when upgrading to a major
software version, maintenance-mode is needed.

HTH,
Dan


Cheers,
b.


___
Pacemaker mailing list: 
Pacemaker@oss.clusterlabs.orgmailto:Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




--
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: 
Pacemaker@oss.clusterlabs.orgmailto:Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] best/proper way to shut down a node for service

2013-01-23 Thread David Morton
I've asked this before, you should be able to search the question.
Essentially if pacemaker is shut down gracefully the remaining nodes are
happy to leave it be.

Generally I standby the node and then stop openais ... I have been caught
out once bringing a node back online which was in standby. The logical
volumes were some how active on the node coming back into the cluster, the
monitor operations detected this (key here: monitor operations fire even in
standby I believe) and shut down resources on the active node as part of
the recovery process.

On Wednesday, 23 January 2013, Dan Frincu wrote:

 Hi,

 On Wed, Jan 23, 2013 at 5:21 AM, Brian J. Murrell 
 br...@interlinx.bc.cajavascript:;
 wrote:
  OK.  So you have a corosync cluster of nodes with pacemaker managing
  resources on them, including (of course) STONITH.
 
  What's the best/proper way to shut down a node, say, for maintenance
  such that pacemaker doesn't go trying to fix that situation and
  STONITHing it to try to bring it back up, etc.?
 
  Currently my practice for STONITH is to have it reboot.  Maybe it's a
  better practice to have STONITH configured to just power a node down and
  not try to power it back up for this exact reason?
 
  Any other suggestions welcome.

 I usually put the node in standby, which means it can no longer run
 any resources on it. Both Pacemaker and Corosync continue to run, node
 provides quorum.

 For global cluster maintenance, such as when upgrading to a major
 software version, maintenance-mode is needed.

 HTH,
 Dan

 
  Cheers,
  b.
 
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org javascript:;
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 



 --
 Dan Frincu
 CCNA, RHCE

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org javascript:;
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] best/proper way to shut down a node for service

2013-01-23 Thread Brian J. Murrell
On 13-01-23 03:32 AM, Dan Frincu wrote:
 Hi,

Hi,

 I usually put the node in standby, which means it can no longer run
 any resources on it. Both Pacemaker and Corosync continue to run, node
 provides quorum.

But a node in standby will still be STONITHed if it goes AWOL.  I put a
node in standby and then yanked it's power and it's peer started STONITH
operations on it.  That's the part I want to avoid.

b.




signature.asc
Description: OpenPGP digital signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] best/proper way to shut down a node for service

2013-01-23 Thread David Morton
Indeed ... thats the correct behavior as it was still an active cluster
member, it just happens to not be running any resources as its in standby.
If you shutdown (gracefully) openais and its showing happily as 'offline'
on the remaining node(s) then all will be well.


On 24 January 2013 10:28, Brian J. Murrell br...@interlinx.bc.ca wrote:

 On 13-01-23 03:32 AM, Dan Frincu wrote:
  Hi,

 Hi,

  I usually put the node in standby, which means it can no longer run
  any resources on it. Both Pacemaker and Corosync continue to run, node
  provides quorum.

 But a node in standby will still be STONITHed if it goes AWOL.  I put a
 node in standby and then yanked it's power and it's peer started STONITH
 operations on it.  That's the part I want to avoid.

 b.



 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] best/proper way to shut down a node for service

2013-01-22 Thread Brian J. Murrell
OK.  So you have a corosync cluster of nodes with pacemaker managing
resources on them, including (of course) STONITH.

What's the best/proper way to shut down a node, say, for maintenance
such that pacemaker doesn't go trying to fix that situation and
STONITHing it to try to bring it back up, etc.?

Currently my practice for STONITH is to have it reboot.  Maybe it's a
better practice to have STONITH configured to just power a node down and
not try to power it back up for this exact reason?

Any other suggestions welcome.

Cheers,
b.



signature.asc
Description: OpenPGP digital signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org