Hi

Thank you for comments.

> First of all thanks for doing this. It's definitely a neat option to
> have. Unfortunately I haven't yet worked with ProstgreSQL 9.0. Could
> you please elaborate little bit more on how it works? What happens if
> master dies? Who is responsible for creating/deleting recovery.conf
> file?

This patch does not setup recovery.conf for hot standby automatically,
so you need setup it manually before starting hot standby.
If recovery.conf exists, resource agent will become slave.
Die of master causes fail-over correctly.

> Also please be more careful with quoting variables in your if statements.

I checked and quoted.




I elaborate.

 **************** Condition *********************************

1. Streaming replication works correctly.
2. Streaming replication works correctly if you swap
   primary server for hot standby server.

   FYA I wrote my configuration at the end of the post.
       See "Example of configuration".

 **************** Setup and start ***************************

1. You decide which server do you want to be master.
   In this instance I assume that you choice node1.

2. node1# rm /var/lib/pgsql/9.0/data/recovery.conf

3. node1# /etc/init.d/heartbeat start
   Wait for "ONLINE node1".

4. node1# crm configure load update pgsql.crm

5. Wait for master started as below
   node1# crm_mon -1 -fA
   ------------------------------------------------------------
   ============
   Last updated: Tue Feb  8 18:51:00 2011
   Stack: Heartbeat
   Current DC: node1 (0c140f90-7de3-438f-b1b5-3b9722bbde21) -
partition with quorum
   Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
   2 Nodes configured, unknown expected votes
   3 Resources configured.
   ============

   Online: [ node1 ]

    vip    (ocf::heartbeat:IPaddr2):       Started node1
    Master/Slave Set: msPostgresql
        Masters: [ node1 ]
        Stopped: [ postgresql:1 ]
    Clone Set: clnPingd
        Started: [ node1 ]
        Stopped: [ pingd:1 ]

   Node Attributes:
   * Node node1:
       + master-postgresql:0               : 1000
       + pgsql-status                      : PRI
       + ping_set                          : 100

   Migration summary:
   * Node node1:
   ------------------------------------------------------------

   If start operation fails, please check PostgreSQL state using "crm_mon -A".
   "pgsql-status" line shows something.

6. Backup node1's data and restore to node2.
   In my case I use rsync command as below.

   node2# psql -h 192.168.2.107 -U postgres -c "SELECT pg_start_backup('label')"
   node2# rsync -avr --delete --exclude=postmaster.pid
192.168.2.107:/var/lib/pgsql/9.0/data/ /var/lib/pgsql/9.0/data/
   node2# psql -h 192.168.2.107 -U postgres -c "SELECT pg_stop_backup()"

7. Put recovery.conf for node2.
   node2# cp recovery.conf /var/lib/pgsql/9.0/data/
   node2# chown postgres:postgres /var/lib/pgsql/9.0/data/recovery.conf

8. node2# /etc/init.d/heartbeat start

9. Wait for slave started as below.
   node2# crm_mon -1 -fA
   ------------------------------------------------------------
   ============
   Last updated: Tue Feb  8 19:41:00 2011
   Stack: Heartbeat
   Current DC: node1 (0c140f90-7de3-438f-b1b5-3b9722bbde21) -
partition with quorum
   Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
   2 Nodes configured, unknown expected votes
   3 Resources configured.
   ============

   Online: [ node1 node2 ]

   vip     (ocf::heartbeat:IPaddr2):       Started node1
    Master/Slave Set: msPostgresql
        Masters: [ node1 ]
        Slaves: [ node2 ]
    Clone Set: clnPingd
        Started: [ node1 node2 ]

   Node Attributes:
   * Node node1:
       + master-postgresql:0               : 1000
       + pgsql-status                      : PRI
       + ping_set                          : 100
   * Node node2:
       + master-postgresql:1               : 100
       + pgsql-status                      : HS
       + ping_set                          : 100

   Migration summary:
   * Node node1:
   * Node node2:
   ------------------------------------------------------------

   If master doesn't exist in other node or recovery.conf doesn't
exist in node2,
   start operation fails.

 ********************** Other Operation ***************************************

1. If node1's master dies, node2 becomes a master.
   node1# killall -9 postgres
   node1# crm_mon -1 -fA
   ------------------------------------------------------------
   ============
   Last updated: Tue Feb  8 20:12:34 2011
   Stack: Heartbeat
   Current DC: node1 (0c140f90-7de3-438f-b1b5-3b9722bbde21) -
partition with quorum
   Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
   2 Nodes configured, unknown expected votes
   3 Resources configured.
   ============

   Online: [ node1 node2 ]

   vip     (ocf::heartbeat:IPaddr2):       Started node2
    Master/Slave Set: msPostgresql
        Masters: [ node2 ]
        Stopped: [ postgresql:0 ]
    Clone Set: clnPingd
        Started: [ node1 node2 ]

   Node Attributes:
   * Node node1:
       + master-postgresql:0               : -INFINITY
       + pgsql-status                      : STOP:with errors
       + ping_set                          : 100
   * Node node2:
       + master-postgresql:1               : 1000
       + pgsql-status                      : PRI
       + ping_set                          : 100

   Migration summary:
   * Node node1:
      postgresql:0: migration-threshold=1 fail-count=1
   * Node node2:

   Failed actions:
       postgresql:0_monitor_9000 (node=node1, call=14, rc=7,
status=complete): not running
   ------------------------------------------------------------

2. To recover node1, backup node2's data and restore to node1.

   node1# psql -h 192.168.2.207 -U postgres -c "SELECT pg_start_backup('label')"
   node1# rsync -avr --delete --exclude=postmaster.pid
192.168.2.207:/var/lib/pgsql/9.0/data/ /var/lib/pgsql/9.0/data/
   node1# psql -h 192.168.2.207 -U postgres -c "SELECT pg_stop_backup()"

3. Put recovery.conf for node1.
   node1# cp recovery.conf /var/lib/pgsql/9.0/data/
   node1# chown postgres:postgres /var/lib/pgsql/9.0/data/recovery.conf

4. Remove flag file of repressing start.
   node1# rm /var/lig/pgsql/PGSQL.5432.repress_start

5. node1# crm resource cleanup msPostgresql

6. Node1 becomes slave as below.
   node1# crm_mon -1 -fA
   ------------------------------------------------------------
   ============
   Last updated: Tue Feb  8 20:32:04 2011
   Stack: Heartbeat
   Current DC: node1 (0c140f90-7de3-438f-b1b5-3b9722bbde21) -
partition with quorum
   Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
   2 Nodes configured, unknown expected votes
   3 Resources configured.
   ============

   Online: [ node1 node2 ]

   vip     (ocf::heartbeat:IPaddr2):       Started node2
    Master/Slave Set: msPostgresql
        Masters: [ node2 ]
        Slaves: [ node1 ]
    Clone Set: clnPingd
        Started: [ node1 node2 ]

   Node Attributes:
   * Node node1:
       + master-postgresql:0               : 100
       + pgsql-status                      : HS
       + ping_set                          : 100
   * Node node2:
       + master-postgresql:1               : 1000
       + pgsql-status                      : PRI
       + ping_set                          : 100

   Migration summary:
   * Node node1:
   * Node node2:
   ------------------------------------------------------------



 ********************** Example of configuration ****************************

  *  postgresql.conf (common configuration for node1 and node2)
    - Please mkdir /var/lib/pgsql/9.0/data/pg_archive preliminarily
    - To cause fail-over correctly, set listen_addresses = '*'
      if you use virtual IP together.
---------------------------------------------------------------------------
listen_addresses = '*'
port = 5432
wal_level = hot_standby
max_wal_senders = 3
wal_keep_segments = 16
archive_mode = on
archive_command = 'cp %p /var/lib/pgsql/9.0/data/pg_archive/%f'
---------------------------------------------------------------------------

  * pg_hba.conf (common configuration for node1 and node2)
    - 192.168.2.0/24 is LAN for replication.
    - Node1 has 192.168.2.107 and Node2 has 192.168.2.207.
---------------------------------------------------------------------------
local   all             all                                     trust
host    all             all             127.0.0.1/32            trust
host    all             all             ::1/128                 trust
host    all             all             192.168.2.0/24          trust
host    replication     all             192.168.2.0/24          trust
---------------------------------------------------------------------------

  * recovery.conf for node1
---------------------------------------------------------------------------
standby_mode = 'on'
primary_conninfo = 'host=192.168.2.207 port=5432 user=postgres'
restore_command = 'cp /var/lib/pgsql/9.0/data/pg_archive/%f %p'
trigger_file = '/var/lib/pgsql/PGSQL.5432.trigger'
---------------------------------------------------------------------------

  * recovery.conf for node2
---------------------------------------------------------------------------
standby_mode = 'on'
primary_conninfo = 'host=192.168.2.107 port=5432 user=postgres'
restore_command = 'cp /var/lib/pgsql/9.0/data/pg_archive/%f %p'
trigger_file = '/var/lib/pgsql/PGSQL.5432.trigger'
---------------------------------------------------------------------------

  * pgsql.crm for crm command
---------------------------------------------------------------------------
property \
        no-quorum-policy="ignore" \
        stonith-enabled="false" \
        startup-fencing="false" \
rsc_defaults \
        resource-stickiness="INFINITY" \
        migration-threshold="1"
ms msPostgresql postgresql \
        meta \
                master-max="1" \
                master-node-max="1" \
                clone-max="2" \
                clone-node-max="1" \
                notify="true"
clone clnPingd pingd \
        meta \
                clone-max="2" \
                clone-node-max="1"
primitive vip ocf:heartbeat:IPaddr2 \
        params \
                ip="172.20.24.127" \
                nic="eth0" \
                cidr_netmask="16" \
        op start   timeout="60s" interval="0s"  on-fail="restart" \
        op monitor timeout="60s" interval="10s" on-fail="restart" \
        op stop    timeout="60s" interval="0s"  on-fail="block"
primitive postgresql ocf:heartbeat:pgsql \
        params \
                pgctl="/usr/pgsql-9.0/bin/pg_ctl" \
                psql="/usr/pgsql-9.0/bin/psql" \
                pgdata="/var/lib/pgsql/9.0/data/" \
                start_opt="-p 5432" \
                rep_mode="async" \
        op start   timeout="3600s" interval="0s"  on-fail="restart" \
        op monitor timeout="60s"   interval="10s" on-fail="restart" \
        op monitor timeout="60s"   interval="9s"  on-fail="restart"
role="Master" \
        op promote timeout="3600s" interval="0s"  on-fail="restart" \
        op demote  timeout="60s"   interval="0s"  on-fail="block" \
        op stop    timeout="60s"   interval="0s"  on-fail="block"
primitive pingd ocf:pacemaker:pingd \
        params \
                name="ping_set" \
                host_list="172.20.24.103" \
                multiplier="100" \
                dampen="0" \
        op start   timeout="60s" interval="0s"  on-fail="restart" \
        op monitor timeout="60s" interval="10s" on-fail="restart" \
        op stop    timeout="60s" interval="0s"  on-fail="ignore"

location rsc_location-1 vip \
        rule -inf: not_defined ping_set or ping_set lt 100
location rsc_location-2 msPostgresql \
        rule -inf: not_defined ping_set or ping_set lt 100 \
        rule -inf: defined fail-count-vip

colocation rsc_colocation-1 inf: vip                 clnPingd
colocation rsc_colocation-2 inf: msPostgresql        clnPingd
colocation rsc_colocation-3 inf: msPostgresql:Master vip

order rsc_order-1 0: clnPingd     msPostgresql
order rsc_order-2 0: msPostgresql vip
order rsc_order-3 0: vip          msPostgresql:promote
---------------------------------------------------------------------------


Regards,
Takatoshi MATSUO

Attachment: pgsql.patch
Description: Binary data

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to