Re: [Pacemaker] [DRBD-user] DRBD active/passive on Pacemaker+CMAN cluster unexpectedly performs STONITH when promoting

2014-07-04 Thread Lars Ellenberg
On Thu, Jul 03, 2014 at 04:05:36AM +0200, Giuseppe Ragusa wrote:
> Hi all,
> I deployed a 2 nodes (physical) RHCS Pacemaker cluster on CentOS 6.5 x86_64 
> (fully up-to-date) with:
> 
> cman-3.0.12.1-59.el6_5.2.x86_64
> pacemaker-1.1.10-14.el6_5.3.x86_64
> pcs-0.9.90-2.el6.centos.3.noarch
> qemu-kvm-0.12.1.2-2.415.el6_5.10.x86_64
> qemu-kvm-tools-0.12.1.2-2.415.el6_5.10.x86_64
> drbd-utils-8.9.0-1.el6.x86_64
> drbd-udev-8.9.0-1.el6.x86_64
> drbd-rgmanager-8.9.0-1.el6.x86_64
> drbd-bash-completion-8.9.0-1.el6.x86_64
> drbd-pacemaker-8.9.0-1.el6.x86_64
> drbd-8.9.0-1.el6.x86_64
> drbd-km-2.6.32_431.20.3.el6.x86_64-8.4.5-1.x86_64
> kernel-2.6.32-431.20.3.el6.x86_64
> 
> The aim is to run KVM virtual machines backed by DRBD (8.4.5) in an
> active/passive mode (no dual primary and so no live migration).
>
> Just to err on the side of consistency against HA (and to pave the way
> for a possible dual-primary live-migration-capable setup), I
> configured DRBD for resource-and-stonith with rhcs_fence (that's why I
> installed drbd-rgmanager) as fence-peer handler and stonith devices
> configured in Pacemaker (pcmk-redirect in cluster.conf).
> 
> The setup "almost" works (all seems ok with: "pcs status", "crm_mon
> -Arf1", "corosync-cfgtool -s", "corosync-objctl | grep member") , but
> every time it needs a resource promotion (to Master, i.e. becoming
> primary) it either fails or fences the other node (the one supposed to
> become Slave i.e. secondary) and only then succeeds.
>
> It happens, for example both on initial resource definition (when
> attempting first start) and on node entering standby (when trying to
> automatically move the resources by stopping then starting them).
> 
> I collected a full "pcs cluster report" and I can provide a CIB dump,
> but I will initially paste here an excerpt from my configuration just
> in case it happens to be a simple configuration error that someone can
> spot on the fly ;> (hoping...)
> 
> Keep in mind that the setup has separated redundant network
> connections for LAN (1 Gib/s LACP to switches), Corosync (1 Gib/s
> roundrobin back-to-back) and DRBD (10 Gib/s roundrobin back-to-back)
> and that FQDNs are correctly resolved through /etc/hosts

Make sure youre DRBD are "Connected UpToDate/UpToDate"
before you let the cluster take over control of who is master.

> DRBD:
> 
> /etc/drbd.d/global_common.conf:
> 
> --
> 
> global {
> usage-count no;
> }
> 
> common {
> protocol C;
> disk {
> on-io-errordetach;
> fencingresource-and-stonith;
> disk-barrierno;
> disk-flushesno;
> al-extents3389;
> c-plan-ahead200;
> c-fill-target15M;
> c-max-rate100M;
> c-min-rate10M;
> }
> net {
> after-sb-0pridiscard-zero-changes;
> after-sb-1pridiscard-secondary;
> after-sb-2pridisconnect;
> csums-algsha1;
> data-integrity-algsha1;
> max-buffers8000;
> max-epoch-size8000;
> unplug-watermark16;
> sndbuf-size0;
> verify-algsha1;
> }
> startup {
> wfc-timeout300;
> outdated-wfc-timeout80;
> degr-wfc-timeout120;
> }
> handlers {
> fence-peer"/usr/lib/drbd/rhcs_fence";
> }
> }
> 
> --
> 
> Sample DRBD resource (there are others, similar)
> /etc/drbd.d/dc_vm.res:
> 
> --
> 
> resource dc_vm {
> device  /dev/drbd1;
> disk/dev/VolGroup00/dc_vm;
> meta-disk   internal;
> on cluster1.verolengo.privatelan {
> address ipv4 172.16.200.1:7790;
> }
> on cluster2.verolengo.privatelan {
> address ipv4 172.16.200.2:7790;
> }
> }
> 
> --
> 
> RHCS:
> 
> /etc/cluster/cluster.conf
> 
> --
> 
> 
> 
>transport="udpu" port="5405"/>
>token_retransmits_before_loss_const="20" rrp_mode="passive" secauth="on"/>
>   
> 
>   
>   
> 
>   
> 
>   
> 
> 
>   
>   
> 
>   
> 
>   
> 
>   
>   
> 
>   
>   
>   
>   
> 
> 
>   
> 
> 
> --
> 
> Pacemaker:
> 
> PROPERTIES:
> 
> pcs property set default-resource-stickiness=100
> pcs property set no-quorum-policy=ignore
> 
> STONITH:
> 
> pcs stonith create ilocluster1 fence

Re: [Pacemaker] Pacemaker 1.1: cloned stonith resources require --force to be added to levels

2014-07-04 Thread Andrew Beekhof

On 4 Jul 2014, at 1:29 pm, Giuseppe Ragusa  wrote:

> >> Hi all,
> >> while creating a cloned stonith resource
> > 
> > Any particular reason you feel the need to clone it?
>  
> In the end, I suppose it's only a "purist mindset" :) because it is a PDU 
> whose power outlets control both nodes, so
> its resource "should be" active (and monitored) on both nodes "independently".
> I understand that it would work anyway, leaving it not cloned and not 
> location-constrained
> just as regular, "dedicated" stonith devices would not need to be 
> location-constrained, right?
> 
> >> for multi-level STONITH on a fully-up-to-date CentOS 6.5 
> >> (pacemaker-1.1.10-14.el6_5.3.x86_64):
> >> 
> >> pcs cluster cib stonith_cfg
> >> pcs -f stonith_cfg stonith create pdu1 fence_apc action="off" \
> >> ipaddr="pdu1.verolengo.privatelan" login="cluster" passwd="test" \
> >> pcmk_host_map="cluster1.verolengo.privatelan:3,cluster1.verolengo.privatelan:4,cluster2.verolengo.privatelan:6,cluster2.verolengo.privatelan:7"
> >>  \
> >> pcmk_host_check="static-list" 
> >> pcmk_host_list="cluster1.verolengo.privatelan,cluster2.verolengo.privatelan"
> >>  op monitor interval="240s"
> >> pcs -f stonith_cfg resource clone pdu1 pdu1Clone
> >> pcs -f stonith_cfg stonith level add 2 cluster1.verolengo.privatelan 
> >> pdu1Clone
> >> pcs -f stonith_cfg stonith level add 2 cluster2.verolengo.privatelan 
> >> pdu1Clone
> >> 
> >> 
> >> the last 2 lines do not succeed unless I add the option "--force" and even 
> >> so I still get errors when issuing verify:
> >> 
> >> [root@cluster1 ~]# pcs stonith level verify
> >> Error: pdu1Clone is not a stonith id
> > 
> > If you check, I think you'll find there is no such resource as 'pdu1Clone'.
> > I don't believe pcs lets you decide what the clone name is.
> 
> You're right! (obviously ;> )
> It's been automatically named pdu1-clone
> 
> I suppose that there's still too much crmsh in my memory :)
> 
> Anyway, removing the stonith level (to start from scratch) and using the 
> correct clone name does not change the result:
> 
> [root@cluster1 etc]# pcs -f stonith_cfg stonith level add 2 
> cluster1.verolengo.privatelan pdu1-clone
> Error: pdu1-clone is not a stonith id (use --force to override)

I bet we didn't think of that.
What if you just do:

   pcs -f stonith_cfg stonith level add 2 cluster1.verolengo.privatelan pdu1

Does that work?

> [root@cluster1 etc]# pcs -f stonith_cfg stonith level add 2 
> cluster1.verolengo.privatelan pdu1-clone --force
> [root@cluster1 etc]# pcs -f stonith_cfg stonith level verify
> Error: pdu1-clone is not a stonith id
> [root@cluster1 etc]# echo $?
> 1
> 
> I suppose that I should devise a testing strategy (like pulling iLO cables 
> and causing a stonith to happen) to verify it.
>  
> Many thanks again for your help.
> 
> Regards,
> Giuseppe
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Creating a safe cluster-node shutdown script (for when UPS goes OnBattery+LowBattery)

2014-07-04 Thread Digimer

On 04/07/14 02:16 PM, Giuseppe Ragusa wrote:

Hi all,
I'm trying to create a script as per subject (on CentOS 6.5,
CMAN+Pacemaker, only DRBD+KVM active/passive resources; SNMP-UPS
monitored by NUT).

Ideally I think that each node should stop (disable) all locally-running
VirtualDomain resources (doing so cleanly demotes than downs the DRBD
resources underneath), then put itself in standby and finally shutdown.

On further startup, manual intervention would be required to unstandby
all nodes and enable resources (nodes already in standby and resources
already disabled before blackout should be manually distinguished).

Is this strategy conceptually safe?

Unfortunately, various searches have turned out no "prior art" :)


I started work on something similar with apcupsd (first I had to make it 
work with multiple UPSes, which I did). Then I decided not to actually 
implement, and decided instead to leave it up to an admin to decide 
how/when/if to initiate a graceful shutdown.


My rationale was that this placed way too much potential damage in the 
hands of, effectively, a single trigger. One bad bug and you could bring 
down a perfectly fine cluster.


Instead, what I did was ensure that any power event triggered an alert 
email (x2, as both nodes ran the monitoring app). This way, I (and the 
client's admins) would be notified immediately if anything happened. 
Then it was up to us to decide how/if to initiate a graceful shutdown.


One real-world example;

A couple months ago, a client's neighborhood was hit with a prolonged 
power outage. Eventually, we decided to gracefully shut down. However, 
one of the windows VMs had downloaded and prepped to install about 30 
updates (no idea how this happened, except windows). Anyway, the VM took 
more time to shut down than the batteries could support. So half-way 
through, we withdrew one node and powered it off to shed load and gain 
battery runtime. This kind of logic can not reasonably be coded into a 
script.


My $0.02.

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker 1.1: cloned stonith resources require --force to be added to levels

2014-07-04 Thread Giuseppe Ragusa
From: and...@beekhof.net
Date: Fri, 4 Jul 2014 22:50:28 +1000
To: pacemaker@oss.clusterlabs.org
Subject: Re: [Pacemaker] Pacemaker 1.1: cloned stonith resources require
--force to be added to levels

 
On 4 Jul 2014, at 1:29 pm, Giuseppe Ragusa  wrote:
 
> >> Hi all,
> >> while creating a cloned stonith resource
> > 
> > Any particular reason you feel the need to clone it?
>  
> In the end, I suppose it's only a "purist mindset" :) because it is a PDU 
> whose power outlets control both nodes, so
> its resource "should be" active (and monitored) on both nodes "independently".
> I understand that it would work anyway, leaving it not cloned and not 
> location-constrained
> just as regular, "dedicated" stonith devices would not need to be 
> location-constrained, right?
> 
> >> for multi-level STONITH on a fully-up-to-date CentOS 6.5 
> >> (pacemaker-1.1.10-14.el6_5.3.x86_64):
> >> 
> >> pcs cluster cib stonith_cfg
> >> pcs -f stonith_cfg stonith create pdu1 fence_apc action="off" \
> >> ipaddr="pdu1.verolengo.privatelan" login="cluster" passwd="test" \
> >> pcmk_host_map="cluster1.verolengo.privatelan:3,cluster1.verolengo.privatelan:4,cluster2.verolengo.privatelan:6,cluster2.verolengo.privatelan:7"
> >>  \
> >> pcmk_host_check="static-list" 
> >> pcmk_host_list="cluster1.verolengo.privatelan,cluster2.verolengo.privatelan"
> >>  op monitor interval="240s"
> >> pcs -f stonith_cfg resource clone pdu1 pdu1Clone
> >> pcs -f stonith_cfg stonith level add 2 cluster1.verolengo.privatelan 
> >> pdu1Clone
> >> pcs -f stonith_cfg stonith level add 2 cluster2.verolengo.privatelan 
> >> pdu1Clone
> >> 
> >> 
> >> the last 2 lines do not succeed unless I add the option "--force" and even 
> >> so I still get errors when issuing verify:
> >> 
> >> [root@cluster1 ~]# pcs stonith level verify
> >> Error: pdu1Clone is not a stonith id
> > 
> > If you check, I think you'll find there is no such resource as 'pdu1Clone'.
> > I don't believe pcs lets you decide what the clone name is.
> 
> You're right! (obviously ;> )
> It's been automatically named pdu1-clone
> 
> I suppose that there's still too much crmsh in my memory :)
> 
> Anyway, removing the stonith level (to start from scratch) and using the 
> correct clone name does not change the result:
> 
> [root@cluster1 etc]# pcs -f stonith_cfg stonith level add 2 
> cluster1.verolengo.privatelan pdu1-clone
> Error: pdu1-clone is not a stonith id (use --force to override)
 
I bet we didn't think of that.
What if you just do:
 
   pcs -f stonith_cfg stonith level add 2 cluster1.verolengo.privatelan pdu1
 
Does that work?
 


Yes, no errors at all and verify successful.

Remember that a full real test (to verify actual second level functionality in 
presence of first level failure)
is still pending for both the plain and cloned setup.

Apropos: I read through the list archives that stonith resources (being 
resources, after all)
could themselves cause fencing (!) if failing (start, monitor, stop) and that 
an ad-hoc
on-fail setting could be used to prevent that.
Maybe my aforementioned naive testing procedure (pull the iLO cable) could 
provoke that?
Would you suggest to configure such an on-fail option?

Many thanks again for your help (and all your valuable work, of course!).

Regards,
Giuseppe
  ___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [DRBD-user] DRBD active/passive on Pacemaker+CMAN cluster unexpectedly performs STONITH when promoting

2014-07-04 Thread Giuseppe Ragusa
> > The setup "almost" works (all seems ok with: "pcs status", "crm_mon
> > -Arf1", "corosync-cfgtool -s", "corosync-objctl | grep member") , but
> > every time it needs a resource promotion (to Master, i.e. becoming
> > primary) it either fails or fences the other node (the one supposed to
> > become Slave i.e. secondary) and only then succeeds.
> >
> > It happens, for example both on initial resource definition (when
> > attempting first start) and on node entering standby (when trying to
> > automatically move the resources by stopping then starting them).
> > 
> > I collected a full "pcs cluster report" and I can provide a CIB dump,
> > but I will initially paste here an excerpt from my configuration just
> > in case it happens to be a simple configuration error that someone can
> > spot on the fly ;> (hoping...)
> > 
> > Keep in mind that the setup has separated redundant network
> > connections for LAN (1 Gib/s LACP to switches), Corosync (1 Gib/s
> > roundrobin back-to-back) and DRBD (10 Gib/s roundrobin back-to-back)
> > and that FQDNs are correctly resolved through /etc/hosts
> 
> Make sure youre DRBD are "Connected UpToDate/UpToDate"
> before you let the cluster take over control of who is master.

Thanks for your important reminder.

Actually they had been "Connected UpToDate/UpToDate", and I subsequently had 
all manually demoted to secondary
then down-ed before eventually stopping the (manually started) DRBD service.

Only at the end did I start/configure the cluster.

The problem is now resolved and it seems that my improper use of rhcs_fence as 
fence-peer was the culprit (now switched to crm-fence-peer.sh), but I still do 
not understand why rhcs_fence was called at all in the beginning (once called, 
it may have caused unforeseen consequences, I admit) since DRBD docs clearly 
state that communication disruption must be involved in order to call 
fence-peer into action.

Many thanks again.

Regards,
Giuseppe

  ___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Creating a safe cluster-node shutdown script (for when UPS goes OnBattery+LowBattery)

2014-07-04 Thread Giuseppe Ragusa
> Date: Fri, 4 Jul 2014 23:17:07 +0900
> From: li...@alteeve.ca
> To: pacemaker@oss.clusterlabs.org
> Subject: Re: [Pacemaker] Creating a safe cluster-node shutdown script (for 
> when UPS goes OnBattery+LowBattery)
> 
> On 04/07/14 02:16 PM, Giuseppe Ragusa wrote:
> > Hi all,
> > I'm trying to create a script as per subject (on CentOS 6.5,
> > CMAN+Pacemaker, only DRBD+KVM active/passive resources; SNMP-UPS
> > monitored by NUT).
> >
> > Ideally I think that each node should stop (disable) all locally-running
> > VirtualDomain resources (doing so cleanly demotes than downs the DRBD
> > resources underneath), then put itself in standby and finally shutdown.
> >
> > On further startup, manual intervention would be required to unstandby
> > all nodes and enable resources (nodes already in standby and resources
> > already disabled before blackout should be manually distinguished).
> >
> > Is this strategy conceptually safe?
> >
> > Unfortunately, various searches have turned out no "prior art" :)
> 
> I started work on something similar with apcupsd (first I had to make it 
> work with multiple UPSes, which I did). Then I decided not to actually 
> implement, and decided instead to leave it up to an admin to decide 
> how/when/if to initiate a graceful shutdown.
> 
> My rationale was that this placed way too much potential damage in the 
> hands of, effectively, a single trigger. One bad bug and you could bring 
> down a perfectly fine cluster.

Perfectly reasonable, in fact I was limiting my effort to a single, narrowly 
defined case.

> Instead, what I did was ensure that any power event triggered an alert 
> email (x2, as both nodes ran the monitoring app). This way, I (and the 
> client's admins) would be notified immediately if anything happened. 
> Then it was up to us to decide how/if to initiate a graceful shutdown.

My clients business setup is peculiar too: too big to disregard HA 
solutions, but
too small to have staff/consultants on call for "secondary" 
emergencies (like
power going extendedly down during summer storms 
etc.).

> One real-world example;
> 
> A couple months ago, a client's neighborhood was hit with a prolonged 
> power outage. Eventually, we decided to gracefully shut down. However, 
> one of the windows VMs had downloaded and prepped to install about 30 
> updates (no idea how this happened, except windows). Anyway, the VM took 
> more time to shut down than the batteries could support. So half-way 
> through, we withdrew one node and powered it off to shed load and gain 
> battery runtime. This kind of logic can not reasonably be coded into a 
> script.

Enlightening tale!

Thinking of it: I suppose that more VM-intensive needs (VDI etc.) would qualify 
for VM-specific
HA solutions (like oVirt/OpenStack) where VMs could be treated totally as 
physical
machines (install UPS agents on the guest OS and let them go); on a "classic" 
HA clustering
solution instead, I suppose that VMs should be server VMs (or treated like 
that) and
even Windows admins would know multiple ways (interactive, GPO, registry) to 
ensure
controlled behaviour of updates installation (tipically "interactive 
installation during a maintenance
window"). Leaving "install by default on shutdown" on does not speak well for 
those admins ;>

> My $0.02.
> 
> -- 
> Digimer

Many thanks for your suggestions and shared experiences!

Regards,
Giuseppe

  ___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org