[Pacemaker] change sbd watchdog timeout in a running cluster

2013-03-26 Thread emmanuel segura
Hello List

How can i change the sbd watchdog timeout without stopping the cluster?

Thanks

-- 
esta es mi vida e me la vivo hasta que dios quiera
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] change sbd watchdog timeout in a running cluster

2013-03-26 Thread Lars Marowsky-Bree
On 2013-03-26T15:56:48, emmanuel segura  wrote:

> Hello List
> 
> How can i change the sbd watchdog timeout without stopping the cluster?

Very, very carefully.

Stop the external/sbd resource, so that fencing blocks while you're
doing this.

You can then manually stop the sbd daemon on all nodes (by sending the
node an "exit" message), change the timeout, and restart it.

Once it's running everywhere again, restart the fencing resource.


Regards,
Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] change sbd watchdog timeout in a running cluster

2013-03-26 Thread emmanuel segura
Hello Lars

So the procedura should be:

crm resource stop stonith_sbd
sbd -d /dev/sda1 message exit = (on every node)
sbd -d /dev/sda1 -1 90 -4 180 create
crm resource start stonith_sbd

Thanks


2013/3/26 Lars Marowsky-Bree 

> On 2013-03-26T15:56:48, emmanuel segura  wrote:
>
> > Hello List
> >
> > How can i change the sbd watchdog timeout without stopping the cluster?
>
> Very, very carefully.
>
> Stop the external/sbd resource, so that fencing blocks while you're
> doing this.
>
> You can then manually stop the sbd daemon on all nodes (by sending the
> node an "exit" message), change the timeout, and restart it.
>
> Once it's running everywhere again, restart the fencing resource.
>
>
> Regards,
> Lars
>
> --
> Architect Storage/HA
> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix
> Imendörffer, HRB 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] change sbd watchdog timeout in a running cluster

2013-03-26 Thread Lars Marowsky-Bree
On 2013-03-26T16:48:30, emmanuel segura  wrote:

> Hello Lars
> 
> So the procedura should be:
> 
> crm resource stop stonith_sbd
> sbd -d /dev/sda1 message exit = (on every node)
> sbd -d /dev/sda1 -1 90 -4 180 create
> crm resource start stonith_sbd

Yes. But I wonder why you need such a long timeout. That looks ...
wrong?


Regards,
Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] change sbd watchdog timeout in a running cluster

2013-03-26 Thread emmanuel segura
Hello Lars

Because we have a vm(suse 11) cluster on a esx cluster, as datastore we are
using a netapp in cluster, the last night we had a netapp failover, no
problem with other vm servers, but all vm in cluster with pacemaker+sbd get
has rebooted

This beacuse the watchdog time is 5 seconds

Thanks

2013/3/26 Lars Marowsky-Bree 

> On 2013-03-26T16:48:30, emmanuel segura  wrote:
>
> > Hello Lars
> >
> > So the procedura should be:
> >
> > crm resource stop stonith_sbd
> > sbd -d /dev/sda1 message exit = (on every node)
> > sbd -d /dev/sda1 -1 90 -4 180 create
> > crm resource start stonith_sbd
>
> Yes. But I wonder why you need such a long timeout. That looks ...
> wrong?
>
>
> Regards,
> Lars
>
> --
> Architect Storage/HA
> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix
> Imendörffer, HRB 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] change sbd watchdog timeout in a running cluster

2013-03-26 Thread emmanuel segura
Hello Lars

Why do you think the long timeout is wrong?

Do i need to change the stonith-timeout on pacemaker?

Thanks

2013/3/26 Lars Marowsky-Bree 

> On 2013-03-26T16:48:30, emmanuel segura  wrote:
>
> > Hello Lars
> >
> > So the procedura should be:
> >
> > crm resource stop stonith_sbd
> > sbd -d /dev/sda1 message exit = (on every node)
> > sbd -d /dev/sda1 -1 90 -4 180 create
> > crm resource start stonith_sbd
>
> Yes. But I wonder why you need such a long timeout. That looks ...
> wrong?
>
>
> Regards,
> Lars
>
> --
> Architect Storage/HA
> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix
> Imendörffer, HRB 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] change sbd watchdog timeout in a running cluster

2013-03-26 Thread Lars Marowsky-Bree
On 2013-03-26T17:13:34, emmanuel segura  wrote:

> Hello Lars
> 
> Because we have a vm(suse 11) cluster on a esx cluster, as datastore we are
> using a netapp in cluster, the last night we had a netapp failover, no
> problem with other vm servers, but all vm in cluster with pacemaker+sbd get
> has rebooted
> 
> This beacuse the watchdog time is 5 seconds

To protect against that, you should use multiple disks. As long as the
majority of them remains within the latency limits, you will not
experience a fail-over.

Admittedly, 5s is on the short side for these. But 90s for watchdog
means you'll end up with 120+ seconds for msgwait, meaning all
fail-overs will be delayed accordingly. That's not going to be helpful.

And yes, you need to increase stonith-timeout to be approx. 50% larger
than msgwait, at least.



Regards,
Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] change sbd watchdog timeout in a running cluster

2013-03-26 Thread emmanuel segura
Hello Lars

what timeout you recommend me

Thanks a lot

2013/3/26 Lars Marowsky-Bree 

> On 2013-03-26T17:13:34, emmanuel segura  wrote:
>
> > Hello Lars
> >
> > Because we have a vm(suse 11) cluster on a esx cluster, as datastore we
> are
> > using a netapp in cluster, the last night we had a netapp failover, no
> > problem with other vm servers, but all vm in cluster with pacemaker+sbd
> get
> > has rebooted
> >
> > This beacuse the watchdog time is 5 seconds
>
> To protect against that, you should use multiple disks. As long as the
> majority of them remains within the latency limits, you will not
> experience a fail-over.
>
> Admittedly, 5s is on the short side for these. But 90s for watchdog
> means you'll end up with 120+ seconds for msgwait, meaning all
> fail-overs will be delayed accordingly. That's not going to be helpful.
>
> And yes, you need to increase stonith-timeout to be approx. 50% larger
> than msgwait, at least.
>
>
>
> Regards,
> Lars
>
> --
> Architect Storage/HA
> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix
> Imendörffer, HRB 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] DRBD+LVM+NFS problems

2013-03-26 Thread Dennis Jacobfeuerborn

On 26.03.2013 06:14, Vladislav Bogdanov wrote:

26.03.2013 04:23, Dennis Jacobfeuerborn wrote:

I have now reduced the configuration further and removed LVM from the
picture. Still the cluster fails when I set the master node to standby.
What's interesting is that things get fixed when I issue a simple
"cleanup" for the filesystem resource.

This is what my current config looks like:

node nfs1 \
 attributes standby="off"
node nfs2
primitive p_drbd_web1 ocf:linbit:drbd \
 params drbd_resource="web1" \
 op monitor interval="15" role="Master" \
 op monitor interval="30" role="Slave"
primitive p_fs_web1 ocf:heartbeat:Filesystem \
 params device="/dev/drbd0" \
 directory="/srv/nfs/web1" fstype="ext4" \
 op monitor interval="10s"
ms ms_drbd_web1 p_drbd_web1 \
 meta master-max="1" master-node-max="1" \
 clone-max="2" clone-node-max="1" notify="true"
colocation c_web1_on_drbd inf: ms_drbd_web1:Master p_fs_web1


Above means: "colocate ms_drbd_web1:Master with p_fs_web1", or "promote
ms_drbd_web1 where p_fs_web1 is (or "is about to be")".

Probably that is not exactly what you want (although that is also valid,
but uses different logic internally). I usually place resources in a
different order in colocation and order constraints, and that works.


Indeed I had the colocation semnatics backwards. With that change the 
failover work correctly, thanks!


I still have problems when using LVM although these don't seem to be 
pacemaker related. I have defined /dev/drbd0 with a backing device 
/dev/vdb. The problem is that when I create a physical volume on 
/dev/drbd0 and do a pvs the output shows the physical volume on /dev/vdb 
instead. I already disabled caching in /etc/lvm/lvm.conf, prepended a 
filter "r|/dev/vdb.*|" and recreated the initramfs and reboot but LVM 
still sees the backing device as the physical volume and not the 
actually replicated /dev/drbd0.


Any idea why LVM is still scanning /dev/vdb for physical volumes despite 
the filter?


Regards,
  Dennis

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Linking lib/cib and lib/pengine to each other?

2013-03-26 Thread Andrew Beekhof
On Mon, Mar 25, 2013 at 10:55 PM, Viacheslav Dubrovskyi
 wrote:
> 23.03.2013 08:27, Viacheslav Dubrovskyi пишет:
>> Hi.
>>
>> I'm building a package for my distributive. Everything is built, but the
>> package does not pass our internal tests. I get errors like this:
>> verify-elf: ERROR: ./usr/lib/libpe_status.so.4.1.0: undefined symbol:
>> get_object_root

Was this the only undefined symbol?
It might be better to remove use of that function instead.

>>
>> It mean, that libpe_status.so not linked with libcib.so where defined
>> get_object_root. I can easy fix it adding
>> libpe_status_la_LIBADD  =  $(top_builddir)/lib/cib/libcib.la
>> in lib/pengine/Makefile.am
>>
>> But for this I need build libcib before lib/pengine. And it's impossible
>> too, because libcib used symbols from lib/pengine. So we have situation,
>> when two library must be linked to each other.
>>
>> And this is very bad because then the in fact it should be one library.
>> Or symbols should be put in a third library, such as common.
>>
>> Can anyone comment on this situation?
> Patch for fix this error.
>
> --
> WBR,
> Viacheslav Dubrovskyi
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] stonith and avoiding split brain in two nodes cluster

2013-03-26 Thread Andrew Beekhof
On Tue, Mar 26, 2013 at 6:30 PM, Angel L. Mateo  wrote:
> El 25/03/13 20:50, Jacek Konieczny escribió:
>
>> On Mon, 25 Mar 2013 20:01:28 +0100
>> "Angel L. Mateo"  wrote:

 quorum {
 provider: corosync_votequorum
 expected_votes: 2
 two_node: 1
 }

 Corosync will then manage quorum for the two-node cluster and
 Pacemaker
>>>
>>>
>>>I'm using corosync 1.1 which is the one  provided with my
>>> distribution (ubuntu 12.04). I could also use cman.
>>
>>
>> I don't think corosync 1.1 can do that, but I guess in this case cman
>> should be able provide this functionality.
>>
> Sorry, it's corosync 1.4, not 1.1.
>
>
 can use that. You still need proper fencing to enforce the quorum
 (both for pacemaker and the storage layer – dlm in case you use
 clvmd), but no
 extra quorum node is needed.

>>>I hace configured a dlm resource usted with clvm.
>>>
>>>One doubt... With this configuration, how split brain problem is
>>> handled?
>>
>>
>> The first node to notice that the other is unreachable will fence (kill)
>> the other, making sure it is the only one operating on the shared data.
>> Even though it is only half of the node, the cluster is considered
>> quorate as the other node is known not to be running any cluster
>> resources.
>>
>> When the fenced node reboots its cluster stack starts, but with no
>> quorum until it comminicates with the surviving node again. So no
>> cluster services are started there until both nodes communicate
>> properly and the proper quorum is recovered.
>>
> But, will this work with corosync 1.4? Alghtough with corosync 1.4 I
> may won't be able to use quorum configuration you said (I'll try), I have
> configured no-quorum-policy="ignore" so the cluster could still run in the
> case of one node failing. Could this be a problem?

Its essentially required for two-node clusters as quorum makes no sense.
Without it the cluster would stop everything (everywhere) when a node
failed (because quorum was lost).

But it also tells pacemaker it can fence failed nodes (this is a good
thing, as we can't recover the services from a failed node until we're
100% sure the node is powered off)

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] DRBD+LVM+NFS problems

2013-03-26 Thread Vladislav Bogdanov
Dennis Jacobfeuerborn  wrote:

>On 26.03.2013 06:14, Vladislav Bogdanov wrote:
>> 26.03.2013 04:23, Dennis Jacobfeuerborn wrote:
>>> I have now reduced the configuration further and removed LVM from
>the
>>> picture. Still the cluster fails when I set the master node to
>standby.
>>> What's interesting is that things get fixed when I issue a simple
>>> "cleanup" for the filesystem resource.
>>>
>>> This is what my current config looks like:
>>>
>>> node nfs1 \
>>>  attributes standby="off"
>>> node nfs2
>>> primitive p_drbd_web1 ocf:linbit:drbd \
>>>  params drbd_resource="web1" \
>>>  op monitor interval="15" role="Master" \
>>>  op monitor interval="30" role="Slave"
>>> primitive p_fs_web1 ocf:heartbeat:Filesystem \
>>>  params device="/dev/drbd0" \
>>>  directory="/srv/nfs/web1" fstype="ext4" \
>>>  op monitor interval="10s"
>>> ms ms_drbd_web1 p_drbd_web1 \
>>>  meta master-max="1" master-node-max="1" \
>>>  clone-max="2" clone-node-max="1" notify="true"
>>> colocation c_web1_on_drbd inf: ms_drbd_web1:Master p_fs_web1
>>
>> Above means: "colocate ms_drbd_web1:Master with p_fs_web1", or
>"promote
>> ms_drbd_web1 where p_fs_web1 is (or "is about to be")".
>>
>> Probably that is not exactly what you want (although that is also
>valid,
>> but uses different logic internally). I usually place resources in a
>> different order in colocation and order constraints, and that works.
>
>Indeed I had the colocation semnatics backwards. With that change the 
>failover work correctly, thanks!
>
>I still have problems when using LVM although these don't seem to be 
>pacemaker related. I have defined /dev/drbd0 with a backing device 
>/dev/vdb. The problem is that when I create a physical volume on 
>/dev/drbd0 and do a pvs the output shows the physical volume on
>/dev/vdb 
>instead. I already disabled caching in /etc/lvm/lvm.conf, prepended a 
>filter "r|/dev/vdb.*|" and recreated the initramfs and reboot but LVM 
>still sees the backing device as the physical volume and not the 
>actually replicated /dev/drbd0.
>
>Any idea why LVM is still scanning /dev/vdb for physical volumes
>despite 
>the filter?
>

That is because /dev has a lot of symlinks which point to that vdb device file. 
You'd better allow what is allowed and deny everything else.

>Regards,
>   Dennis
>
>___
>Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>Project Home: http://www.clusterlabs.org
>Getting started:
>http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>Bugs: http://bugs.clusterlabs.org



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Linking lib/cib and lib/pengine to each other?

2013-03-26 Thread Viacheslav Dubrovskyi
26.03.2013 19:41, Andrew Beekhof пишет:
>>> Hi.
>>>
>>> I'm building a package for my distributive. Everything is built, but the
>>> package does not pass our internal tests. I get errors like this:
>>> verify-elf: ERROR: ./usr/lib/libpe_status.so.4.1.0: undefined symbol:
>>> get_object_root
> Was this the only undefined symbol?
For  lib/cib -> lib/pengine yes.
Another undefined symbols from lib/common and it's easy fixed.
> It might be better to remove use of that function instead.
Maybe yes, but I do not know what the developers were thinking when done so.
For me as maintainer easy fix linkage.
>
>>> It mean, that libpe_status.so not linked with libcib.so where defined
>>> get_object_root. I can easy fix it adding
>>> libpe_status_la_LIBADD  =  $(top_builddir)/lib/cib/libcib.la
>>> in lib/pengine/Makefile.am
>>>
>>> But for this I need build libcib before lib/pengine. And it's impossible
>>> too, because libcib used symbols from lib/pengine. So we have situation,
>>> when two library must be linked to each other.
>>>
>>> And this is very bad because then the in fact it should be one library.
>>> Or symbols should be put in a third library, such as common.
>>>
>>> Can anyone comment on this situation?
>> Patch for fix this error.
>>
>>

-- 
WBR,
Viacheslav Dubrovskyi


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Problem on creating CIB entry in CRM - shadow cannot be created

2013-03-26 Thread Donna Livingstone
We are attempting to move our rhel 6.3 pacemaker/drbd  environment to a rhel 
6.4 pacemaker environment
and as you can see below we cannot create a shadow CIB. crm_shadow -w also core 
dumps. On 6.3 everything works.


Versions are given below.


[root@vccstest1 ~]# crm
crm(live)# cib new ills
INFO: ills shadow CIB created
ERROR: ills: no such shadow CIB
crm(live)#

This did create a shadow.ills in /var/lib/pacemaker/cib ie
[root@vccstest1 cib]# pwd
/var/lib/pacemaker/cib
[root@vccstest1 cib]# ls -al shadow.il*
8 -rw--- 1 root root 6984 Mar 26 10:09 shadow.ills
[root@vccstest1 cib]#

- Same bug is reported here :  
http://comments.gmane.org/gmane.linux.highavailability.pacemaker/16636

Also :
[root@vccstest1 cib]# crm_shadow -w
Segmentation fault (core dumped)
[root@vccstest1 cib]#



Versions on 6.4(as loaded from rhn) :

[root@vccstest1 bin]# rpm -qa|grep -i pacemaker
pacemaker-cli-1.1.8-7.el6.x86_64
pacemaker-cts-1.1.8-7.el6.x86_64
pacemaker-libs-devel-1.1.8-7.el6.x86_64
pacemaker-libs-1.1.8-7.el6.x86_64
pacemaker-1.1.8-7.el6.x86_64
drbd-pacemaker-8.3.15-2.el6.x86_64
pacemaker-cluster-libs-1.1.8-7.el6.x86_64
[root@vccstest1 bin]# rpm -qa|grep -i corosync
corosynclib-devel-1.4.1-15.el6.x86_64
corosynclib-1.4.1-15.el6.x86_64
corosync-1.4.1-15.el6.x86_64
[root@vccstest1 bin]# rpm -qa|grep -i drbd
drbd-bash-completion-8.3.15-2.el6.x86_64
drbd-udev-8.3.15-2.el6.x86_64
drbd-rgmanager-8.3.15-2.el6.x86_64
drbd-xen-8.3.15-2.el6.x86_64
drbd-utils-8.3.15-2.el6.x86_64
kmod-drbd-8.3.15_2.6.32_279.14.1-2.el6.x86_64
drbd-pacemaker-8.3.15-2.el6.x86_64
drbd-8.3.15-2.el6.x86_64
drbd-heartbeat-8.3.15-2.el6.x86_64
[root@vccstest1 bin]#  

Versions on 6.3 :

[root@wimpas1 ~]# rpm -qa|grep pacemaker
pacemaker-libs-devel-1.1.7-6.el6.x86_64
pacemaker-cli-1.1.7-6.el6.x86_64
pacemaker-libs-1.1.7-6.el6.x86_64
pacemaker-cluster-libs-1.1.7-6.el6.x86_64
drbd-pacemaker-8.3.15-2.el6.x86_64
pacemaker-1.1.7-6.el6.x86_64
[root@wimpas1 ~]# rpm -qa|grep corosync
corosynclib-1.4.1-7.el6_3.1.x86_64
corosync-1.4.1-7.el6_3.1.x86_64
corosynclib-devel-1.4.1-7.el6_3.1.x86_64
[root@wimpas1 ~]# rpm -qa|grep drbd
drbd-bash-completion-8.3.15-2.el6.x86_64
drbd-udev-8.3.15-2.el6.x86_64
drbd-rgmanager-8.3.15-2.el6.x86_64
drbd-xen-8.3.15-2.el6.x86_64
drbd-utils-8.3.15-2.el6.x86_64
kmod-drbd-8.3.15_2.6.32_279.14.1-2.el6.x86_64
drbd-pacemaker-8.3.15-2.el6.x86_64
drbd-8.3.15-2.el6.x86_64
drbd-heartbeat-8.3.15-2.el6.x86_64
[root@wimpas1 ~]# 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Linking lib/cib and lib/pengine to each other?

2013-03-26 Thread Andrew Beekhof
Give https://github.com/beekhof/pacemaker/commit/53c9122 a try

On Wed, Mar 27, 2013 at 7:43 AM, Viacheslav Dubrovskyi  wrote:
> 26.03.2013 19:41, Andrew Beekhof пишет:
 Hi.

 I'm building a package for my distributive. Everything is built, but the
 package does not pass our internal tests. I get errors like this:
 verify-elf: ERROR: ./usr/lib/libpe_status.so.4.1.0: undefined symbol:
 get_object_root
>> Was this the only undefined symbol?
> For  lib/cib -> lib/pengine yes.
> Another undefined symbols from lib/common and it's easy fixed.
>> It might be better to remove use of that function instead.
> Maybe yes, but I do not know what the developers were thinking when done so.
> For me as maintainer easy fix linkage.
>>
 It mean, that libpe_status.so not linked with libcib.so where defined
 get_object_root. I can easy fix it adding
 libpe_status_la_LIBADD  =  $(top_builddir)/lib/cib/libcib.la
 in lib/pengine/Makefile.am

 But for this I need build libcib before lib/pengine. And it's impossible
 too, because libcib used symbols from lib/pengine. So we have situation,
 when two library must be linked to each other.

 And this is very bad because then the in fact it should be one library.
 Or symbols should be put in a third library, such as common.

 Can anyone comment on this situation?
>>> Patch for fix this error.
>>>
>>>
>
> --
> WBR,
> Viacheslav Dubrovskyi
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Problem on creating CIB entry in CRM - shadow cannot be created

2013-03-26 Thread Andrew Beekhof
On Wed, Mar 27, 2013 at 8:00 AM, Donna Livingstone
 wrote:
> We are attempting to move our rhel 6.3 pacemaker/drbd  environment to a rhel 
> 6.4 pacemaker environment
> and as you can see below we cannot create a shadow CIB. crm_shadow -w also 
> core dumps. On 6.3 everything works.
>
>
> Versions are given below.
>
>
> [root@vccstest1 ~]# crm
> crm(live)# cib new ills
> INFO: ills shadow CIB created
> ERROR: ills: no such shadow CIB
> crm(live)#
>
> This did create a shadow.ills in /var/lib/pacemaker/cib ie
> [root@vccstest1 cib]# pwd
> /var/lib/pacemaker/cib
> [root@vccstest1 cib]# ls -al shadow.il*
> 8 -rw--- 1 root root 6984 Mar 26 10:09 shadow.ills
> [root@vccstest1 cib]#
>
> - Same bug is reported here :  
> http://comments.gmane.org/gmane.linux.highavailability.pacemaker/16636

Yep, looks like a shell issue (which you can confirm by running
"crm_shadow -c name" by hand).

>
> Also :
> [root@vccstest1 cib]# crm_shadow -w
> Segmentation fault (core dumped)
> [root@vccstest1 cib]#

Looks like its segfaulting because CIB_shadow is not defined in the
environment (a side effect of "crm cib new" failing).
I'll fix the segfault, in the meantime, perhaps try "crm_shadow -c
name" and then run the crm commands as normal.

>
>
>
> Versions on 6.4(as loaded from rhn) :
>
> [root@vccstest1 bin]# rpm -qa|grep -i pacemaker
> pacemaker-cli-1.1.8-7.el6.x86_64
> pacemaker-cts-1.1.8-7.el6.x86_64
> pacemaker-libs-devel-1.1.8-7.el6.x86_64
> pacemaker-libs-1.1.8-7.el6.x86_64
> pacemaker-1.1.8-7.el6.x86_64
> drbd-pacemaker-8.3.15-2.el6.x86_64
> pacemaker-cluster-libs-1.1.8-7.el6.x86_64
> [root@vccstest1 bin]# rpm -qa|grep -i corosync
> corosynclib-devel-1.4.1-15.el6.x86_64
> corosynclib-1.4.1-15.el6.x86_64
> corosync-1.4.1-15.el6.x86_64
> [root@vccstest1 bin]# rpm -qa|grep -i drbd
> drbd-bash-completion-8.3.15-2.el6.x86_64
> drbd-udev-8.3.15-2.el6.x86_64
> drbd-rgmanager-8.3.15-2.el6.x86_64
> drbd-xen-8.3.15-2.el6.x86_64
> drbd-utils-8.3.15-2.el6.x86_64
> kmod-drbd-8.3.15_2.6.32_279.14.1-2.el6.x86_64
> drbd-pacemaker-8.3.15-2.el6.x86_64
> drbd-8.3.15-2.el6.x86_64
> drbd-heartbeat-8.3.15-2.el6.x86_64
> [root@vccstest1 bin]#
>
> Versions on 6.3 :
>
> [root@wimpas1 ~]# rpm -qa|grep pacemaker
> pacemaker-libs-devel-1.1.7-6.el6.x86_64
> pacemaker-cli-1.1.7-6.el6.x86_64
> pacemaker-libs-1.1.7-6.el6.x86_64
> pacemaker-cluster-libs-1.1.7-6.el6.x86_64
> drbd-pacemaker-8.3.15-2.el6.x86_64
> pacemaker-1.1.7-6.el6.x86_64
> [root@wimpas1 ~]# rpm -qa|grep corosync
> corosynclib-1.4.1-7.el6_3.1.x86_64
> corosync-1.4.1-7.el6_3.1.x86_64
> corosynclib-devel-1.4.1-7.el6_3.1.x86_64
> [root@wimpas1 ~]# rpm -qa|grep drbd
> drbd-bash-completion-8.3.15-2.el6.x86_64
> drbd-udev-8.3.15-2.el6.x86_64
> drbd-rgmanager-8.3.15-2.el6.x86_64
> drbd-xen-8.3.15-2.el6.x86_64
> drbd-utils-8.3.15-2.el6.x86_64
> kmod-drbd-8.3.15_2.6.32_279.14.1-2.el6.x86_64
> drbd-pacemaker-8.3.15-2.el6.x86_64
> drbd-8.3.15-2.el6.x86_64
> drbd-heartbeat-8.3.15-2.el6.x86_64
> [root@wimpas1 ~]#
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] stonith and avoiding split brain in two nodes cluster

2013-03-26 Thread Angel L. Mateo

El 25/03/13 20:50, Jacek Konieczny escribió:

On Mon, 25 Mar 2013 20:01:28 +0100
"Angel L. Mateo"  wrote:

quorum {
provider: corosync_votequorum
expected_votes: 2
two_node: 1
}

Corosync will then manage quorum for the two-node cluster and
Pacemaker


   I'm using corosync 1.1 which is the one  provided with my
distribution (ubuntu 12.04). I could also use cman.


I don't think corosync 1.1 can do that, but I guess in this case cman
should be able provide this functionality.


Sorry, it's corosync 1.4, not 1.1.


can use that. You still need proper fencing to enforce the quorum
(both for pacemaker and the storage layer – dlm in case you use
clvmd), but no
extra quorum node is needed.


   I hace configured a dlm resource usted with clvm.

   One doubt... With this configuration, how split brain problem is
handled?


The first node to notice that the other is unreachable will fence (kill)
the other, making sure it is the only one operating on the shared data.
Even though it is only half of the node, the cluster is considered
quorate as the other node is known not to be running any cluster
resources.

When the fenced node reboots its cluster stack starts, but with no
quorum until it comminicates with the surviving node again. So no
cluster services are started there until both nodes communicate
properly and the proper quorum is recovered.

	But, will this work with corosync 1.4? Alghtough with corosync 1.4 I 
may won't be able to use quorum configuration you said (I'll try), I 
have configured no-quorum-policy="ignore" so the cluster could still run 
in the case of one node failing. Could this be a problem?


--
Angel L. Mateo Martínez
Sección de Telemática
Área de Tecnologías de la Información
y las Comunicaciones Aplicadas (ATICA)
http://www.um.es/atica
Tfo: 868889150
Fax: 86337

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] DRBD+LVM+NFS problems

2013-03-26 Thread emmanuel segura
Hello Dennis

This constrain is wrong

colocation c_web1_on_drbd inf: ms_drbd_web1:Master p_fs_web1

it should be

colocation c_web1_on_drbd inf: p_fs_web1 ms_drbd_web1:Master

Thanks

2013/3/26 Dennis Jacobfeuerborn 

> I have now reduced the configuration further and removed LVM from the
> picture. Still the cluster fails when I set the master node to standby.
> What's interesting is that things get fixed when I issue a simple
> "cleanup" for the filesystem resource.
>
> This is what my current config looks like:
>
> node nfs1 \
> attributes standby="off"
> node nfs2
> primitive p_drbd_web1 ocf:linbit:drbd \
> params drbd_resource="web1" \
> op monitor interval="15" role="Master" \
> op monitor interval="30" role="Slave"
> primitive p_fs_web1 ocf:heartbeat:Filesystem \
> params device="/dev/drbd0" \
> directory="/srv/nfs/web1" fstype="ext4" \
> op monitor interval="10s"
> ms ms_drbd_web1 p_drbd_web1 \
> meta master-max="1" master-node-max="1" \
> clone-max="2" clone-node-max="1" notify="true"
> colocation c_web1_on_drbd inf: ms_drbd_web1:Master p_fs_web1
> order o_drbd_before_web1 inf: ms_drbd_web1:promote p_fs_web1
> property $id="cib-bootstrap-options" \
> dc-version="1.1.8-7.el6-**394e906" \
> cluster-infrastructure="**classic openais (with plugin)" \
> expected-quorum-votes="2" \
> stonith-enabled="false" \
> no-quorum-policy="ignore" \
> last-lrm-refresh="1364259713" \
> maintenance-mode="false"
> rsc_defaults $id="rsc-options" \
> resource-stickiness="100"
>
> I cannot figure out what is wrong with this configuration.
>
> Regards,
>   Dennis
>
> On 25.03.2013 13:09, Dennis Jacobfeuerborn wrote:
>
>> I just found the following in the dmesg output which might or might not
>> add to understanding the problem:
>>
>> device-mapper: table: 253:2: linear: dm-linear: Device lookup failed
>> device-mapper: ioctl: error adding target to table
>>
>> Regards,
>>Dennis
>>
>> On 25.03.2013 13:04, Dennis Jacobfeuerborn wrote:
>>
>>> Hi,
>>> I'm currently trying create a two node redundant NFS setup on CentOS 6.4
>>> using pacemaker and crmsh.
>>>
>>> I use this Document as a starting poing:
>>> https://www.suse.com/**documentation/sle_ha/**singlehtml/book_sleha_**
>>> techguides/book_sleha_**techguides.html
>>>
>>>
>>>
>>> The first issue is that using these instructions I get the cluster up
>>> and running but the moment I try to stop the pacemaker service on the
>>> current master node several resources just fail and everything goes
>>> pear-shaped.
>>>
>>> Since the problem seems to relate to the nfs bits in the configuration I
>>> removed these in order to get to a minimal working setup and then add
>>> things piece by piece in order to find the source of the problem.
>>>
>>> Now I am at a point where I basically have only
>>> DRBD+LVM+Filesystems+IPAddr2 configured and now LVM seems to act up.
>>>
>>> I can start the cluster and everything is fine but the moment I stop
>>> pacemaker on the master i end up with this as a status:
>>>
>>> ===
>>> Node nfs2: standby
>>> Online: [ nfs1 ]
>>>
>>>   Master/Slave Set: ms_drbd_nfs [p_drbd_nfs]
>>>   Masters: [ nfs1 ]
>>>   Stopped: [ p_drbd_nfs:1 ]
>>>
>>> Failed actions:
>>>  p_lvm_nfs_start_0 (node=nfs1, call=505, rc=1, status=complete):
>>> unknown error
>>> ===
>>>
>>> and in the log on nfs1 I see:
>>> LVM(p_lvm_nfs)[7515]:2013/03/25_12:34:21 ERROR: device-mapper:
>>> reload ioctl on failed: Invalid argument device-mapper: reload ioctl on
>>> failed: Invalid argument 2 logical volume(s) in volume group "nfs" now
>>> active
>>>
>>> However a lvs in this state shows:
>>> [root@nfs1 ~]# lvs
>>>LV  VGAttr  LSize   Pool Origin Data%  Move Log
>>>web1nfs   -wi--   2,00g
>>>web2nfs   -wi--   2,00g
>>>lv_root vg_nfs1.local -wi-ao---   2,45g
>>>lv_swap vg_nfs1.local -wi-ao--- 256,00m
>>>
>>> So the volume group is present.
>>>
>>> My current configuration looks like this:
>>>
>>> node nfs1 \
>>>  attributes standby="off"
>>> node nfs2 \
>>>  attributes standby="on"
>>> primitive p_drbd_nfs ocf:linbit:drbd \
>>>  params drbd_resource="nfs" \
>>>  op monitor interval="15" role="Master" \
>>>  op monitor interval="30" role="Slave"
>>> primitive p_fs_web1 ocf:heartbeat:Filesystem \
>>>  params device="/dev/nfs/web1" \
>>>directory="/srv/nfs/web1" \
>>>fstype="ext4" \
>>>  op monitor interval="10s"
>>> primitive p_fs_web2 ocf:heartbeat:Filesystem \
>>>  params device="/dev/nfs/web2" \
>>>directory="/srv/nfs/web2" \
>>>fstype="ext4" \
>>>  op monitor interval="10s"
>>> primitive p_ip_nfs ocf:heartbeat:IPaddr2 \
>>>  params ip="10.99.0.142" cid

Re: [Pacemaker] OCF Resource agent promote question

2013-03-26 Thread Rainer Brestan
 

Hi Steve,

when Pacemaker does promotion, it has already selected a specific node to become master.

It is far too late in this state to try to update master scores.

 

But there is another problem with xlog in PostgreSQL.

 

According to some discussion on PostgreSQL mailing lists, not relevant xlog entries dont go into the xlog counter during redo and/or start. This is specially true for CHECKPOINT xlog records, where this situation can be easely reproduced.

This can lead to the situation, where the replication is up to date, but the slave shows an lower xlog value.

This issue was solved in 9.2.3, where wal receiver always counts the end of applied records.

 

There is also a second boring issue. The timeline change is replicated to the slaves, but they do not save it anywhere. In case slave starts up again and do not have access to the WAL archive, it cannot start any more. This was also addressed as patch in 9.2 branch, but i havent test if also fixed in 9.2.3.

 

For data replication, no matter if PostgreSQL or any other database, you have always two choices of work.

- Data consistency is the top most priority. Dont go in operation, unless everything fine.

- Availability is the top most priority. Always try to have at least one running instance, even if data might not be latest.

 

The current pgsql RA does quite a good job for the first choice.

 

It currently has some limitations.

- After switchover, no matter of manual/automatic, it needs some work from maintenance personnel.

- Some failure scenarios of fault series lead to a non existing master without manual work.

- Geo-redundant replication with multi-site cluster ticket system (booth) does not work.

- If availability or unattended work is the priority, it cannot be used out of the box.

 

But it has a very good structure to be extended for other needs.

 

And this is what i currently implement.

Extend the RA to support both choices of work and prepare it for a multi-site cluster ticket system.

 

Regards, Rainer


Gesendet: Dienstag, 26. März 2013 um 00:01 Uhr
Von: "Andreas Kurz" 
An: pacemaker@oss.clusterlabs.org
Betreff: Re: [Pacemaker] OCF Resource agent promote question

Hi Steve,

On 2013-03-25 18:44, Steven Bambling wrote:
> All,
>
> I'm trying to work on a OCF resource agent that uses postgresql
> streaming replication. I'm running into a few issues that I hope might
> be answered or at least some pointers given to steer me in the right
> direction.

Why are you not using the existing pgsql RA? It is capable of doing
synchronous and asynchronous replication and it is known to work fine.

Best regards,
Andreas

--
Need help with Pacemaker?
http://www.hastexo.com/now


>
> 1. A quick way of obtaining a list of "Online" nodes in the cluster
> that a resource will be able to migrate to. I've accomplished it with
> some grep and see but its not pretty or fast.
>
> # time pcs status | grep Online | sed -e "s/.*\[\(.*\)\]/\1/" | sed 's/ //'
> p1.example.net  p2.example.net
> 
>
> real0m2.797s
> user0m0.084s
> sys0m0.024s
>
> Once I get a list of active/online nodes in the cluster my thinking was
> to use PSQL to get the current xlog location and lag or each of the
> remaining nodes and compare them. If the node has a greater log
> position and/or less lag it will be given a greater master preference.
>
> 2. How to force a monitor/probe before a promote is run on ALL nodes to
> make sure that the master preference is up to date before
> migrating/failing over the resource.
> - I was thinking that maybe during the promote call it could get the log
> location and lag from each of the nodes via an psql call ( like above)
> and then force the resource to a specific node. Is there a way to do
> this and does it sound like a sane idea ?
>
>
> The start of my RA is located here suggestions and comments 100%
> welcome https://github.com/smbambling/pgsqlsr/blob/master/pgsqlsr
>
> v/r
>
> STEVE
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] OCF Resource agent promote question

2013-03-26 Thread Steven Bambling
I'm guessing that you are referring to this RA 
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/pgsql with 
additions by T. Matsuo.

>From reading the "wiki" ( hopefully I have misinterpreted this :) ) on his 
>Github page it looks like this RA was written to work in a Active/Passive 
>senario that is really suited for 2 nodes.  We are hoping to be able to have 
>N+1 nodes, so that if the master fails the secondary node will still have 
>synchronous partners to replicate with ensuring the data is written in 
>multiple locations.

If I have misinterpreted the use case of this resource, please let me know.  
Also any additional hints or corrects would be much appreciated.

v/r

STEVE



On Mar 25, 2013, at 7:01 PM, Andreas Kurz 
mailto:andr...@hastexo.com>> wrote:

Hi Steve,

On 2013-03-25 18:44, Steven Bambling wrote:
All,

I'm trying to work on a OCF resource agent that uses postgresql
streaming replication.  I'm running into a few issues that I hope might
be answered or at least some pointers given to steer me in the right
direction.

Why are you not using the existing pgsql RA? It is capable of doing
synchronous and asynchronous replication and it is known to work fine.

Best regards,
Andreas

--
Need help with Pacemaker?
http://www.hastexo.com/now



1.  A quick way of obtaining a list of "Online" nodes in the cluster
that a resource will be able to migrate to.  I've accomplished it with
some grep and see but its not pretty or fast.

# time pcs status | grep Online | sed -e "s/.*\[\(.*\)\]/\1/" | sed 's/ //'
p1.example.net  p2.example.net


real0m2.797s
user0m0.084s
sys0m0.024s

Once I get a list of active/online nodes in the cluster my thinking was
to use PSQL to get the current xlog location and lag or each of the
remaining nodes and compare them.  If the node has a greater log
position and/or less lag it will be given a greater master preference.

2.  How to force a monitor/probe before a promote is run on ALL nodes to
make sure that the master preference is up to date before
migrating/failing over the resource.
- I was thinking that maybe during the promote call it could get the log
location and lag from each of the nodes via an psql call ( like above)
and then force the resource to a specific node.  Is there a way to do
this and does it sound like a sane idea ?


The start of my RA is located here suggestions and comments 100%
welcome https://github.com/smbambling/pgsqlsr/blob/master/pgsqlsr

v/r

STEVE


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org





___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] OCF Resource agent promote question

2013-03-26 Thread Steven Bambling

On Mar 26, 2013, at 6:32 AM, Rainer Brestan 
mailto:rainer.bres...@gmx.net>> wrote:


Hi Steve,
when Pacemaker does promotion, it has already selected a specific node to 
become master.
It is far too late in this state to try to update master scores.

But there is another problem with xlog in PostgreSQL.

According to some discussion on PostgreSQL mailing lists, not relevant xlog 
entries dont go into the xlog counter during redo and/or start. This is 
specially true for CHECKPOINT xlog records, where this situation can be easely 
reproduced.
This can lead to the situation, where the replication is up to date, but the 
slave shows an lower xlog value.
This issue was solved in 9.2.3, where wal receiver always counts the end of 
applied records.

We are currently testing with 9.2.3.  I'm using the functions 
http://www.databasesoup.com/2012/10/determining-furthest-ahead-replica.html 
along with tweaking a function to get the replay_lag in bytes to have a more 
accurate measurement.

There is also a second boring issue. The timeline change is replicated to the 
slaves, but they do not save it anywhere. In case slave starts up again and do 
not have access to the WAL archive, it cannot start any more. This was also 
addressed as patch in 9.2 branch, but i havent test if also fixed in 9.2.3.

After talking with one of the Postgres guys it was recommended that we look at 
an alternative solution to the built in trigger file that will make the master 
jump to a new timeline.  We are in place moving the recovery.conf to 
recovery.done via the resource agent and then restarting the the postgresql 
service on the "new" master so that it maintains the original timeline that the 
slaves will recognize.

For data replication, no matter if PostgreSQL or any other database, you have 
always two choices of work.
- Data consistency is the top most priority. Dont go in operation, unless 
everything fine.
- Availability is the top most priority. Always try to have at least one 
running instance, even if data might not be latest.

The current pgsql RA does quite a good job for the first choice.

It currently has some limitations.
- After switchover, no matter of manual/automatic, it needs some work from 
maintenance personnel.
- Some failure scenarios of fault series lead to a non existing master without 
manual work.
- Geo-redundant replication with multi-site cluster ticket system (booth) does 
not work.
- If availability or unattended work is the priority, it cannot be used out of 
the box.

But it has a very good structure to be extended for other needs.

And this is what i currently implement.
Extend the RA to support both choices of work and prepare it for a multi-site 
cluster ticket system.

Would you be willing to share your extended RA?  Also do you run a cluster with 
more then 2 nodes ?

v/r

STEVE



Regards, Rainer
Gesendet: Dienstag, 26. März 2013 um 00:01 Uhr
Von: "Andreas Kurz" mailto:andr...@hastexo.com>>
An: pacemaker@oss.clusterlabs.org
Betreff: Re: [Pacemaker] OCF Resource agent promote question
Hi Steve,

On 2013-03-25 18:44, Steven Bambling wrote:
> All,
>
> I'm trying to work on a OCF resource agent that uses postgresql
> streaming replication. I'm running into a few issues that I hope might
> be answered or at least some pointers given to steer me in the right
> direction.

Why are you not using the existing pgsql RA? It is capable of doing
synchronous and asynchronous replication and it is known to work fine.

Best regards,
Andreas

--
Need help with Pacemaker?
http://www.hastexo.com/now


>
> 1. A quick way of obtaining a list of "Online" nodes in the cluster
> that a resource will be able to migrate to. I've accomplished it with
> some grep and see but its not pretty or fast.
>
> # time pcs status | grep Online | sed -e "s/.*\[\(.*\)\]/\1/" | sed 's/ //'
> p1.example.net 
> > 
> p2.example.net
> >
>
> real0m2.797s
> user0m0.084s
> sys0m0.024s
>
> Once I get a list of active/online nodes in the cluster my thinking was
> to use PSQL to get the current xlog location and lag or each of the
> remaining nodes and compare them. If the node has a greater log
> position and/or less lag it will be given a greater master preference.
>
> 2. How to force a monitor/probe before a promote is run on ALL nodes to
> make sure that the master preference is up to date before
> migrating/failing over the resource.
> - I was thinking that maybe during the promote call it could get the log
> location and lag from each of the nodes via an psql call ( like above)
> and then force the resource to a specific node. Is there a way to do
> this and does it sound like a sane idea ?
>
>
> The start of my RA is located here suggestions and comments 100%
> welcome https://github.com/smbambling/pgsqlsr/blob/master/pgsqlsr
>
> v/r
>
> STEVE
>
>
> _

Re: [Pacemaker] OCF Resource agent promote question

2013-03-26 Thread Rainer Brestan
 

Hi Steve,

pgsql RA does the same, it compares the last_xlog_replay_location of all nodes for master promotion.

Doing a promote as a restart instead of promote command to conserve timeline id is also on configurable option (restart_on_promote) of the current RA.

And the RA is definitely capable of having more than two instances. It goes through the parameter node_list and doing its actions for every member in the node list.

Originally it might be planned only to have only one slave, but the current implementation does not have this limitation. It has code for sync replication of more than two nodes, when some of them fall back into async to not promote them.

 

Of course, i will share the extension with the community, when they are ready for use. And the feature of having more than two instances is not removed. I am not running more than two instances on one site, current usage is to have two instances on one site and having two sites and manage master by booth. But it also under discussion to have more than two instances on one site, just to have no availability interruption in case of one server down and the other promote with restart.

The implementation is nearly finished, then begins the stress tests of failure scenarios.

 

Rainer


Gesendet: Dienstag, 26. März 2013 um 11:55 Uhr
Von: "Steven Bambling" 
An: "The Pacemaker cluster resource manager" 
Betreff: Re: [Pacemaker] OCF Resource agent promote question


 

On Mar 26, 2013, at 6:32 AM, Rainer Brestan  wrote:
 



 

Hi Steve,

when Pacemaker does promotion, it has already selected a specific node to become master.

It is far too late in this state to try to update master scores.

 

But there is another problem with xlog in PostgreSQL.

 

According to some discussion on PostgreSQL mailing lists, not relevant xlog entries dont go into the xlog counter during redo and/or start. This is specially true for CHECKPOINT xlog records, where this situation can be easely reproduced.

This can lead to the situation, where the replication is up to date, but the slave shows an lower xlog value.

This issue was solved in 9.2.3, where wal receiver always counts the end of applied records.





 
We are currently testing with 9.2.3.  I'm using the functions http://www.databasesoup.com/2012/10/determining-furthest-ahead-replica.html along with tweaking a function to get the replay_lag in bytes to have a more accurate measurement.





 

There is also a second boring issue. The timeline change is replicated to the slaves, but they do not save it anywhere. In case slave starts up again and do not have access to the WAL archive, it cannot start any more. This was also addressed as patch in 9.2 branch, but i havent test if also fixed in 9.2.3.





After talking with one of the Postgres guys it was recommended that we look at an alternative solution to the built in trigger file that will make the master jump to a new timeline.  We are in place moving the recovery.conf to recovery.done via the resource agent and then restarting the the postgresql service on the "new" master so that it maintains the original timeline that the slaves will recognize.   




 

For data replication, no matter if PostgreSQL or any other database, you have always two choices of work.

- Data consistency is the top most priority. Dont go in operation, unless everything fine.

- Availability is the top most priority. Always try to have at least one running instance, even if data might not be latest.

 

The current pgsql RA does quite a good job for the first choice.

 

It currently has some limitations.

- After switchover, no matter of manual/automatic, it needs some work from maintenance personnel.

- Some failure scenarios of fault series lead to a non existing master without manual work.

- Geo-redundant replication with multi-site cluster ticket system (booth) does not work.

- If availability or unattended work is the priority, it cannot be used out of the box.

 

But it has a very good structure to be extended for other needs.

 

And this is what i currently implement.

Extend the RA to support both choices of work and prepare it for a multi-site cluster ticket system.





 
Would you be willing to share your extended RA?  Also do you run a cluster with more then 2 nodes ?

 

v/r

 

STEVE

 

 




 

Regards, Rainer


Gesendet: Dienstag, 26. März 2013 um 00:01 Uhr
Von: "Andreas Kurz" 
An: pacemaker@oss.clusterlabs.org
Betreff: Re: [Pacemaker] OCF Resource agent promote question

Hi Steve,

On 2013-03-25 18:44, Steven Bambling wrote:
> All,
>
> I'm trying to work on a OCF resource agent that uses postgresql
> streaming replication. I'm running into a few issues that I hope might
> be answered or at least some pointers given to steer me in the right
> direction.

Why are you not using the existing pgsql RA? It is capable of doing
synchronous and asynchronous replication and it is known to work fine.

Best regards,
Andreas

-

Re: [Pacemaker] OCF Resource agent promote question

2013-03-26 Thread Steven Bambling
Excellent thanks so much for the clarification.  I'll drop this new RA in and 
see if I can get things working.

STEVE


On Mar 26, 2013, at 7:38 AM, Rainer Brestan 
mailto:rainer.bres...@gmx.net>>
 wrote:


Hi Steve,
pgsql RA does the same, it compares the last_xlog_replay_location of all nodes 
for master promotion.
Doing a promote as a restart instead of promote command to conserve timeline id 
is also on configurable option (restart_on_promote) of the current RA.
And the RA is definitely capable of having more than two instances. It goes 
through the parameter node_list and doing its actions for every member in the 
node list.
Originally it might be planned only to have only one slave, but the current 
implementation does not have this limitation. It has code for sync replication 
of more than two nodes, when some of them fall back into async to not promote 
them.

Of course, i will share the extension with the community, when they are ready 
for use. And the feature of having more than two instances is not removed. I am 
not running more than two instances on one site, current usage is to have two 
instances on one site and having two sites and manage master by booth. But it 
also under discussion to have more than two instances on one site, just to have 
no availability interruption in case of one server down and the other promote 
with restart.
The implementation is nearly finished, then begins the stress tests of failure 
scenarios.

Rainer
Gesendet: Dienstag, 26. März 2013 um 11:55 Uhr
Von: "Steven Bambling" mailto:smbambl...@arin.net>>
An: "The Pacemaker cluster resource manager" 
mailto:pacemaker@oss.clusterlabs.org>>
Betreff: Re: [Pacemaker] OCF Resource agent promote question

On Mar 26, 2013, at 6:32 AM, Rainer Brestan 
> wrote:


Hi Steve,
when Pacemaker does promotion, it has already selected a specific node to 
become master.
It is far too late in this state to try to update master scores.

But there is another problem with xlog in PostgreSQL.

According to some discussion on PostgreSQL mailing lists, not relevant xlog 
entries dont go into the xlog counter during redo and/or start. This is 
specially true for CHECKPOINT xlog records, where this situation can be easely 
reproduced.
This can lead to the situation, where the replication is up to date, but the 
slave shows an lower xlog value.
This issue was solved in 9.2.3, where wal receiver always counts the end of 
applied records.

We are currently testing with 9.2.3.  I'm using the functions 
http://www.databasesoup.com/2012/10/determining-furthest-ahead-replica.html 
along with tweaking a function to get the replay_lag in bytes to have a more 
accurate measurement.

There is also a second boring issue. The timeline change is replicated to the 
slaves, but they do not save it anywhere. In case slave starts up again and do 
not have access to the WAL archive, it cannot start any more. This was also 
addressed as patch in 9.2 branch, but i havent test if also fixed in 9.2.3.

After talking with one of the Postgres guys it was recommended that we look at 
an alternative solution to the built in trigger file that will make the master 
jump to a new timeline.  We are in place moving the recovery.conf to 
recovery.done via the resource agent and then restarting the the postgresql 
service on the "new" master so that it maintains the original timeline that the 
slaves will recognize.

For data replication, no matter if PostgreSQL or any other database, you have 
always two choices of work.
- Data consistency is the top most priority. Dont go in operation, unless 
everything fine.
- Availability is the top most priority. Always try to have at least one 
running instance, even if data might not be latest.

The current pgsql RA does quite a good job for the first choice.

It currently has some limitations.
- After switchover, no matter of manual/automatic, it needs some work from 
maintenance personnel.
- Some failure scenarios of fault series lead to a non existing master without 
manual work.
- Geo-redundant replication with multi-site cluster ticket system (booth) does 
not work.
- If availability or unattended work is the priority, it cannot be used out of 
the box.

But it has a very good structure to be extended for other needs.

And this is what i currently implement.
Extend the RA to support both choices of work and prepare it for a multi-site 
cluster ticket system.

Would you be willing to share your extended RA?  Also do you run a cluster with 
more then 2 nodes ?

v/r

STEVE



Regards, Rainer
Gesendet: Dienstag, 26. März 2013 um 00:01 Uhr
Von: "Andreas Kurz" >
An: pacemaker@oss.clusterlabs.org
Betreff: Re: [Pacemaker] OCF Resource agent promote question
Hi Steve,

On 2013-03-25 18:44, Steven Bambling wrote:
> All,
>
> I'm trying to work on a OCF resource agent that uses postgresql
> streaming replication. I'm running into a few issues that I hope might
> be answered or at least some pointers given to steer me in the right
> direction