Hi Takatoshi, All, Thanks for your reply. I see that you have invested significant effort in the development of the RA. I spent the last day trying to set up the RA, but without much success.
My infrastructure is very similar to yours, except for the fact that currently I am testing with a single network adapter. Replication works nicely when I start the databases manually, not using corosync. When I try to start using corosync,I see that the ping resources start normally, but the msPostgresql starts on both nodes in slave mode, and I see "HS:alone" In the Wiki you state, the if I start on a signle node only, PSQL should start in Master mode (PRI), but this is not the case. The recovery.conf file is created immediately, and from the logs I see no attempt at all to promote the node. In the postgres logs I see that node1, which is supposed to be a master, tries to connect to the vip-rep IP address, which is NOT brought up, because it depends on the Master role... Do you have any idea? My environment: Debian Squeeze, with backported pacemaker (Version: 1.1.5) - official pacemaker in debian is rather old and buggy Postgres 9.1, streaming replication, sync mode Node1: psql1, 10.12.1.21 Node1: psql2, 10.12.1.22 Crm config: node psql1 \ attributes standby="off" node psql2 \ attributes standby="off" primitive pingCheck ocf:pacemaker:ping \ params name="default_ping_set" host_list="10.12.1.1" multiplier="100" \ op start interval="0s" timeout="60s" on-fail="restart" \ op monitor interval="10s" timeout="60s" on-fail="restart" \ op stop interval="0s" timeout="60s" on-fail="ignore" primitive postgresql ocf:heartbeat:pgsql \ params pgctl="/usr/lib/postgresql/9.1/bin/pg_ctl" psql="/usr/bin/psql" pgdata="/var/lib/postgresql/9.1/main" config="/etc/postgresql/9.1/main/postgresql.conf" pgctldata="/usr/lib/postgresql/9.1/bin/pg_controldata" rep_mode="sync" node_list="psql1 psql2" restore_command="cp /var/lib/postgresql/9.1/main/pg_archive/%f %p" master_ip="10.12.1.28" \ op start interval="0s" timeout="60s" on-fail="restart" \ op monitor interval="7s" timeout="60s" on-fail="restart" \ op monitor interval="2s" role="Master" timeout="60s" on-fail="restart" \ op promote interval="0s" timeout="60s" on-fail="restart" \ op demote interval="0s" timeout="60s" on-fail="block" \ op stop interval="0s" timeout="60s" on-fail="block" \ op notify interval="0s" timeout="60s" primitive vip-master ocf:heartbeat:IPaddr2 \ params ip="10.12.1.20" nic="eth0" cidr_netmask="24" \ op start interval="0s" timeout="60s" on-fail="restart" \ op monitor interval="10s" timeout="60s" on-fail="restart" \ op stop interval="0s" timeout="60s" on-fail="block" \ meta target-role="Started" primitive vip-rep ocf:heartbeat:IPaddr2 \ params ip="10.12.1.28" nic="eth0" cidr_netmask="24" \ op start interval="0s" timeout="60s" on-fail="restart" \ op monitor interval="10s" timeout="60s" on-fail="restart" \ op stop interval="0s" timeout="60s" on-fail="block" \ meta target-role="Started" primitive vip-slave ocf:heartbeat:IPaddr2 \ params ip="10.12.1.27" nic="eth0" cidr_netmask="24" \ meta resource-stickiness="1" \ op start interval="0s" timeout="60s" on-fail="restart" \ op monitor interval="10s" timeout="60s" on-fail="restart" \ op stop interval="0s" timeout="60s" on-fail="block" group master-group vip-master vip-rep ms msPostgresql postgresql \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Master" clone clnPingCheck pingCheck location rsc_location-1 vip-slave \ rule $id="rsc_location-1-rule" 200: pgsql-status eq HS:sync \ rule $id="rsc_location-1-rule-0" 100: pgsql-status eq PRI \ rule $id="rsc_location-1-rule-1" -inf: not_defined pgsql-status \ rule $id="rsc_location-1-rule-2" -inf: pgsql-status ne HS:sync and pgsql-status ne PRI location rsc_location-2 msPostgresql \ rule $id="rsc_location-2-rule" $role="master" 200: #uname eq psql1 \ rule $id="rsc_location-2-rule-0" $role="master" 100: #uname eq psql2 \ rule $id="rsc_location-2-rule-1" $role="master" -inf: defined fail-count-vip-master \ rule $id="rsc_location-2-rule-2" $role="master" -inf: defined fail-count-vip-rep \ rule $id="rsc_location-2-rule-3" -inf: not_defined default_ping_set or default_ping_set lt 100 colocation rsc_colocation-1 inf: msPostgresql clnPingCheck colocation rsc_colocation-2 inf: master-group msPostgresql:Master order rsc_order-1 0: clnPingCheck msPostgresql order rsc_order-2 0: msPostgresql:promote master-group:start symmetrical=false order rsc_order-3 0: msPostgresql:demote master-group:stop symmetrical=false property $id="cib-bootstrap-options" \ dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" rsc_defaults $id="rsc-options" \ resource-stickiness="INFINITY" \ migration-threshold="1" Regards, Attila -----Original Message----- From: Takatoshi MATSUO [mailto:matsuo....@gmail.com] Sent: 2011. november 17. 8:04 To: The Pacemaker cluster resource manager Subject: Re: [Pacemaker] Postgresql streaming replication failover - RA needed Hi All I create a RA for PosstgrSQL 9.1 Streaming Replication based on pgsql. RA https://github.com/t-matsuo/resource-agents/blob/pgsql91/heartbeat/pgsql Documents https://github.com/t-matsuo/resource-agents/wiki It is almost totally changed from previous patch http://lists.linux-ha.org/pipermail/linux-ha-dev/2011-February/018193.html . It create recovery.conf and promote PostgreSQL automatically. Additionally it can switch between the synchronous and asynchronous replication automatically. If you please, use them and comment. Regards, Takatoshi MATSUO 2011/11/17 Serge Dubrouski <serge...@gmail.com>: > > > On Wed, Nov 16, 2011 at 12:55 PM, Attila Megyeri > <amegy...@minerva-soft.com> > wrote: >> >> Hi Florian, >> >> -----Original Message----- >> From: Florian Haas [mailto:flor...@hastexo.com] >> Sent: 2011. november 16. 11:49 >> To: The Pacemaker cluster resource manager >> Subject: Re: [Pacemaker] Postgresql streaming replication failover - >> RA needed >> >> Hi Attila, >> >> On 2011-11-16 10:27, Attila Megyeri wrote: >> > Hi All, >> > >> > >> > >> > We have a two-node postgresql 9.1 system configured using streaming >> > replicaiton(active/active with a read-only slave). >> > >> > We want to automate the failover process and I couldn't really find >> > a resource agent that could do the job. >> >> That is correct; the pgsql resource agent (unlike its mysql >> counterpart) does not support streaming replication. We've had a >> contributor submit a patch at one point, but it was somewhat >> ill-conceived and thus did not make it into the upstream repo. The relevant >> thread is here: >> >> http://lists.linux-ha.org/pipermail/linux-ha-dev/2011-February/018195 >> .html >> >> Would you feel comfortable modifying the pgsql resource agent to >> support replication? If so, we could revisit this issue and >> potentially add streaming replication support to pgsql. >> >> >> Well I'm not sure I would be able to do that change. Failover is >> relatively easy to do but I really have no idea how to do the failback part. > > And that's exactly the reason why I haven't implemented it yet. With > the current way how replication is done in PostgreSQL there is no easy > way to switch between roles, or at least I don't know about a such way. > Implementing just fail-over functionality by creating a trigger file > on a slave server in the case of failure on master side doesn't create > a full master-slave implementation in my opinion. > >> >> I will definitively have to sort this out somehow, I am just unsure >> whether I will try to use the repmgr mentioned in the video, or >> pacemaker with some level of customization... >> >> Is the resource agent that you mentioned available somewhere? >> >> Thanks. >> Attila >> >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacem >> aker > > > > -- > Serge Dubrouski. > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacema > ker > > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org