Re: [Linux-HA] Antw: Re: SLES11 SP2 HAE: problematic change for LVM RA

2013-12-04 Thread David Vossel




- Original Message -
> From: "Lars Marowsky-Bree" 
> To: "General Linux-HA mailing list" 
> Sent: Wednesday, December 4, 2013 3:49:17 AM
> Subject: Re: [Linux-HA] Antw: Re: SLES11 SP2 HAE: problematic change for LVM 
> RA
> 
> On 2013-12-04T10:25:58, Ulrich Windl 
> wrote:
> 
> > > You thought it was working, but in fact it wasn't. ;-)
> > "working" meaning "the resource started".
> > "not working" meaning "the resource does not start"
> > 
> > You see I have minimal requirements ;-)
> 
> I'm sorry; we couldn't possibly test all misconfigurations. So this
> slipped through, we didn't expect someone to set that for a
> non-clustered VG previously.

Updates have been made to the LVM agent to allow exclusive activation without 
clvmd.

http://www.davidvossel.com/wiki/index.php?title=HA_LVM

-- Vossel

> 
> > >> You could argue that it never should have worked. Anyway: If you want
> > >> to activate a VG on exactly one node you should not need cLVM; only if
> > >> you man to activate the VG on multiple nodes (as for a cluster file
> > >> system)...
> > > 
> > > You don't need cLVM to activate a VG on exactly one node. Correct. And
> > > you don't. The cluster stack will never activate a resource twice.
> > 
> > Occasionally two safty lines are better than one. We HAD filesystem
> > corruptions due to the cluster doing things it shouldn't do.
> 
> And that's perfectly fine. All you need to do to activate this is
> "vgchange -c y" on the specific volume group, and the exclusive=true
> flag will work just fine.
> 
> > > If you don't want that to happen, exclusive=true is not what you want to
> > > set.
> > That makes sense, but what I don't like is that I have to mess with local
> > lvm.conf files...
> 
> You don't. Just drop exclusive=true, or set the clustered flag on the
> VG.
> 
> You only have to change anything in the lvm.conf if you want to use tags
> for exclusivity protection (I defer to the LVM RA help for how to use
> that, I've never tried it).
> 
> 
> Regards,
> Lars
> 
> --
> Architect Storage/HA
> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer,
> HRB 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
> 
> ___
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] drbd/pacemaker multiple tgt targets, portblock, and race conditions (long-ish)

2013-12-04 Thread Jefferson Ogata

On 2013-11-21 16:34, Jefferson Ogata wrote:

On 2013-11-20 08:35, Jefferson Ogata wrote:

Indeed, using iptables with REJECT and tcp-reset, this seems to piss off
the initiators, creating immediate i/o errors. But one can use DROP on
incoming SYN packets and let established connections drain. I've been
trying to get this to work but am finding that it takes so long for some
connections to drain that something times out. I haven't given up on
this approach, tho. Testing this stuff can be tricky because if i make
one mistake, stonith kicks in and i end up having to wait 5-10 minutes
for the machine to reboot and resync its DRBD devices.


Follow-up on this: the original race condition i reported still occurs
with this strategy: if existing TCP connections are allowed to drain by
passing packets from established initiator connections (by blocking only
SYN packets), then the initiator can also send new requests to the
target during the takedown process; the takedown removes LUNs from the
live target and the initiator generates an i/o error if it happens to
try to access a LUN that has been removed before the connection is removed.

This happens because the configuration looks something like this (crm):

group foo portblock vip iSCSITarget:target iSCSILogicalUnit:lun1
iSCSILogicalUnit:lun2 iSCSILogicalUnit:lun3 portunblock

On takedown, if portblock is tweaked to pass packets for existing
connections so they can drain, there's a window while LUNs lun3, lun2,
lun1 are being removed from the target where this race condition occurs.
The connection isn't removed until iSCSITarget runs to stop the target.

A way to handle this that should actually work is to write a new RA that
deletes the connections from the target *before* the LUNs are removed
during takedown. The config would look something like this, then:

group foo portblock vip iSCSITarget:target iSCSILogicalUnit:lun1
iSCSILogicalUnit:lun2 iSCSILogicalUnit:lun3 tgtConnections portunblock

On takedown, then, portunblock will block new incoming connections,
tgtConnections will shut down existing connections and wait for them to
drain, then the LUNs can be safely removed before the target is taken down.

I'll write this RA today and see how that works.


So, this strategy worked. The final RA is attached. The config (crm) 
then looks like this, using the tweaked portblock RA that blocks syn 
only, the tgtUser RA that adds a tgtd user, and the tweaked iSCSITarget 
RA that doesn't add a user if no password is provided (see previous 
discussion for the latter two RAs). This is a two-node cluster using 
DRBD-backed LVMs and multiple targets. The names have been changed to 
protect the innocent, and is simplified to only a single target for 
brevity, but it should be clear how to do multiple DRBDs/VGs/targets. 
I've left out the stonith config here also.



primitive tgtd lsb:tgtd op monitor interval="10s"
clone clone.tgtd tgtd
primitive user.username ocf:local:tgtUser params username="username" 
password="password"

clone clone.user.username user.username
order clone.tgtd_before_clone.user.username inf: clone.tgtd:start 
clone.user.username:start


primitive drbd.pv1 ocf:linbit:drbd params drbd_resource="pv1" op monitor 
role="Slave" interval="29s" timeout="600s" op monitor role="Master" 
interval="31s" timeout="600s" op start timeout="240s" op stop timeout="240s"
ms ms.drbd.pv1 drbd.pv1 meta master-max="1" master-node-max="1" 
clone-max="2" clone-node-max="1" notify="true"
primitive lvm.vg1 ocf:heartbeat:LVM params volgrpname="vg1" op monitor 
interval="30s" timeout="30s" op start timeout="30s" op stop timeout="30s"

order ms.drbd.pv1_before_lvm.vg1 inf: ms.drbd.pv1:promote lvm.vg1:start
colocation ms.drbd.pv1_with_lvm.vg1 inf: ms.drbd.pv1:Master lvm.vg1

primitive target.1 ocf:local:iSCSITarget params iqn="iqnt1" tid="1" 
incoming_username="username" implementation="tgt" portals="" op monitor 
interval="30s" op start timeout="30s" op stop timeout="120s"
primitive lun.1.1 ocf:heartbeat:iSCSILogicalUnit params 
target_iqn="iqnt1" lun="1" path="/dev/vg1/lv1" 
additional_parameters="scsi_id=vg1/lv1 
mode_page=8:0:18:0x10:0:0xff:0xff:0:0:0xff:0xff:0xff:0xff:0x80:0x14:0:0:0:0:0:0" 
implementation="tgt" op monitor interval="30s" op start timeout="30s" op 
stop timeout="120s"
primitive ip.192.168.1.244 ocf:heartbeat:IPaddr params 
ip="192.168.1.244" cidr_netmask="24" nic="bond0"
primitive portblock.ip.192.168.1.244 ocf:local:portblock params 
ip="192.168.1.244" action="block" protocol="tcp" portno="3260" 
syn_only="true" op monitor interval="10s" timeout="10s" depth="0"
primitive tgtfinal.1 ocf:local:tgtFinal params tid="1" op monitor 
interval="30s" timeout="30s" op stop timeout="60s"
primitive portunblock.ip.192.168.1.244 ocf:local:portblock params 
ip="192.168.1.244" action="unblock" protocol="tcp" portno="3260" 
syn_only="true" op monitor interval="10s" timeout="10s" depth="0"


group group.target.1 lvm.vg1 portblock.ip.192.168.12.244 
ip.192.168.12.244 targ

Re: [Linux-HA] FYI: resource-agents-3.9.2-40.el6.x86_64 kills heartbeat-3.0.4

2013-12-04 Thread Jefferson Ogata

On 2013-12-02 00:03, Andrew Beekhof wrote:

On Wed, Nov 27, 2013, at 06:15 PM, Jefferson Ogata wrote:

On 2013-11-28 01:55, Andrew Beekhof wrote:

On 28 Nov 2013, at 11:29 am, Jefferson Ogata  wrote:

On 2013-11-28 00:12, Dimitri Maziuk wrote:

Just so you know:

RedHat's (centos, actually) latest build of resource-agents sets $HA_BIN
to /usr/libexec/heartbeat. The daemon in heartbeat-3.0.4 RPM is
/usr/lib64/heartbeat/heartbeat so $HA_BIN/heartbeat binary does not exist.

(And please hold the "upgrade to pacemaker" comments: I'm hoping if I
wait just a little bit longer I can upgrade to ceph and openstack -- or
retire, whichever comes first ;)


Hey, "upgrading" to pacemaker wouldn't necessarily help. Red Hat broke that last month by 
dropping most of the resource agents they'd initially shipped. (Don't you love "Technology 
Previews"?)


Thats the whole point behind the "tech preview" label... it means the software 
is not yet in a form that Red Hat will support and is subject to changes _exactly_ like 
the one made to resource-agents.


Um, yes, i know. That's why i mentioned it.


Ok, sorry, I wasn't sure.


It's nicer, however, when Red Hat takes a conservative position with the
Tech Preview. They could have shipped a minimal set of resource agents
in the first place,


3 years ago we didn't know if pacemaker would _ever_ be supported in
RHEL-6, so stripping out agents wasn't on our radar.
I'm sure the only reason it and the rest of pacemaker shipped at all was
to humor the guy they'd just hired.

It was only at the point that supporting pacemaker in 6.5 became likely
that someone took a look at the full list and had a heart-attack.


so people would have a better idea what they had to
provide on their own end, instead of pulling the rug out with nary a
mention of what they were doing.


Yes, that was not good.
One of the challenges I find at Red Hat is the gaps between when a
decision is made, when we're allowed to talk about it and when customers
find out about it.  As a developermore  its the things we spent
significant time on that first come to mind when writing release notes,
not the 3s it took to remove some files from the spec file - even though
the latter is going to have a bigger affect :-(

We can only say that lessons have been learned and that we will do
better if there is a similar situation next time.


+1 Insightful/Informative/Interesting.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Antw: Re: SLES11 SP2 HAE: problematic change for LVM RA

2013-12-04 Thread Lars Marowsky-Bree
On 2013-12-04T10:25:58, Ulrich Windl  wrote:

> > You thought it was working, but in fact it wasn't. ;-)
> "working" meaning "the resource started".
> "not working" meaning "the resource does not start"
> 
> You see I have minimal requirements ;-)

I'm sorry; we couldn't possibly test all misconfigurations. So this
slipped through, we didn't expect someone to set that for a
non-clustered VG previously.

> >> You could argue that it never should have worked. Anyway: If you want
> >> to activate a VG on exactly one node you should not need cLVM; only if
> >> you man to activate the VG on multiple nodes (as for a cluster file
> >> system)...
> > 
> > You don't need cLVM to activate a VG on exactly one node. Correct. And
> > you don't. The cluster stack will never activate a resource twice.
> 
> Occasionally two safty lines are better than one. We HAD filesystem
> corruptions due to the cluster doing things it shouldn't do.

And that's perfectly fine. All you need to do to activate this is
"vgchange -c y" on the specific volume group, and the exclusive=true
flag will work just fine.

> > If you don't want that to happen, exclusive=true is not what you want to
> > set.
> That makes sense, but what I don't like is that I have to mess with local
> lvm.conf files...

You don't. Just drop exclusive=true, or set the clustered flag on the
VG.

You only have to change anything in the lvm.conf if you want to use tags
for exclusivity protection (I defer to the LVM RA help for how to use
that, I've never tried it).


Regards,
Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Antw: Re: SLES11 SP2 HAE: problematic change for LVM RA

2013-12-04 Thread Ulrich Windl
>>> Lars Marowsky-Bree  schrieb am 03.12.2013 um 15:34 in
Nachricht
<20131203143443.gl27...@suse.de>:
> On 2013-12-02T09:22:10, Ulrich Windl 
wrote:
> 
>> >> No!
>> > 
>> > Then it can't work. Exclusive activation only works for clustered volume
>> > groups, since it uses the DLM to protect against the VG being activated
>> > more than once in the cluster.
>> Hi!
>> 
>> Try it with "resource-agents-3.9.4-0.26.84": it works; with 
> "resource-agents-3.9.5-0.6.26.11" it doesn't work ;-)
> 
> You thought it was working, but in fact it wasn't. ;-)

"working" meaning "the resource started".
"not working" meaning "the resource does not start"

You see I have minimal requirements ;-)

> 
> Or at least, not as you expected.
> 
>> You could argue that it never should have worked. Anyway: If you want
>> to activate a VG on exactly one node you should not need cLVM; only if
>> you man to activate the VG on multiple nodes (as for a cluster file
>> system)...
> 
> You don't need cLVM to activate a VG on exactly one node. Correct. And
> you don't. The cluster stack will never activate a resource twice.

Occasionally two safty lines are better than one. We HAD filesystem
corruptions due to the cluster doing things it shouldn't do.

> 
> You need cLVM if you want LVM2 to enforce that at the LVM2 level -
> because it does this by getting a lock on the VG/LV, since otherwise
> LVM2 has no way of knowing if the VG/LV is currently active somewhere
> else. And this is what "exclusive=true" turns on.
> 
> If you don't want that to happen, exclusive=true is not what you want to
> set.

That makes sense, but what I don't like is that I have to mess with local
lvm.conf files...


All the best,
Ulrich

> 
> 
> Regards,
> Lars
> 
> -- 
> Architect Storage/HA
> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer,

> HRB 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
> 
> ___
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org 
> http://lists.linux-ha.org/mailman/listinfo/linux-ha 
> See also: http://linux-ha.org/ReportingProblems 


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems