Re: [Linux-ha-dev] [PATCH] manage PostgreSQL 9.0 streaming replication using Master/Slave

Serge Dubrouski Thu, 10 Feb 2011 08:20:48 -0800

On Thu, Feb 10, 2011 at 1:28 AM, Takatoshi MATSUO <matsuo....@gmail.com> wrote:
> Hello Serge
>
>> Hello -
>
>> I'm not sure that I really like this approach.
>
>> 1. It doesn't follow standard Master/Slave logic of Pacemaker and that
>> can create a lot of confusion. On start RA decides on what role shall
>> be started where based not on Pacemaker configuration but on existence
>> of recovery.conf file which is out of Pacemaker scope.
>
> I think so, but streaming replication doesn't have an interface
> to check its data is up-to-date or not such as DRBD.
> I kicked around a idea using pg_controldata command to check
> checkpoint location, but it is not perfect.
> Therefore I adopt this method involuntarily.
>
> Do you have a good idea?


What happens if all nodes have recovery.conf file? Can we make it a
must attribute for M/S configuration and then just promote master node
by creating trigger file? Obviously recovery.conf should point to a
floating IP address that tied up to master role.

>
>> Moving master
>> role form one node to another cannot be done simply with Pacemaker
>> command. I mean master will be moved, but slave won't be created.
>> Though Pacemaker will be forced to believe that it still exists and
>> administrator will have to do several manual steps to make it real.
>> It's too tricky in my opinion.
>
> Yes, but streaming replication can't transit from primary to
> hot standby, in addition it doesn't have a function to resync automatically
> if local data is too old.
> So I give up to creating slave automatically at this moment.

Completely stopping a node on demote operation and forcing Pacemaker
to believe that it's still up basically incorrect and cannot be
accepted. This approach can create a situation when administrator
moves master role from one node to another and then moves it back and
after that cluster reports that everything is up while actually it's
all down. Now take administrator out of the picture and add pingd
resource, simple network glitch puts cluster into unrecoverable state.

Ideally demote operation should stop a master node and then restart it
in hot-standby mode. It's up to administrator to make sure that no
node with outdated data gets promoted to the master role. One should
follow standard procedures: cluster software shouldn't be configured
for autostart at the boot time, administrator has to make sure that
data was refreshed if the node was down for some prolonged time.

>
> I think what is important is that primary is running somewhere.
>
>
> I discussed them with a PostgreSQL developer.
> I hope these functions will be implemented in the future.
>
>> 2, What is the reason of such heavy use of ATTR_UPDATER. What process
>> is the consumer of that pgsql_status attribute? Calling attr_updater
>> so often makes RA slow.
>
> It's just auxiliary function to check complicate states
> so I don't care to delete this.

Why not to use "ocf_log info" and keep all logging in one place? Using
that additional attribute would make sense if we had a GUI that was
able to refresh such attributes on the fly, otherwise I don't see to
much value in it. Tailing ha-log ore messages log file seems simpler
than running cibadmin -Ql and grepping for that attribute.

>
> But I don't feel so slow in my circumstance.
> Doesn't attrd_updater work in synchronization?
>
>> I need some time to build PosgreSQL 9.0 test cluster and try some more tests.
>
>
>
> Regards,
> Takatoshi MATSUO
>
> 2011/2/10, Serge Dubrouski <serge...@gmail.com>:
>> Hello -
>>
>> I'm not sure that I really like this approach.
>>
>> 1. It doesn't follow standard Master/Slav logic of Pacemaker and that
>> can create a lot of confusion. On start RA decides on what role shall
>> be started where based not on Pacemaker configuration but on existence
>> of recovery.conf file which is out of Pacemaker scope. Moving master
>> role form one node to another cannot be done simply with Pacemaker
>> command. I mean master will be moved, but slave won't be created.
>> Though Pacemaker will be forced to believe that it still exists and
>> administrator will have to do several manual steps to make it real.
>> It's too tricky in my opinion.
>>
>> 2, What is the reason of such heavy use of ATTR_UPDATER. What process
>> is the consumer of that pgsql_status attribute? Calling attr_updater
>> so often makes RA slow.
>>
>> I need some time to build PosgreSQL 9.0 test cluster and try some more
>> tests.
>>
>> On Tue, Feb 8, 2011 at 6:03 AM, Takatoshi MATSUO <matsuo....@gmail.com>
>> wrote:
>>> Hi
>>>
>>> Thank you for comments.
>>>
>>>> First of all thanks for doing this. It's definitely a neat option to
>>>> have. Unfortunately I haven't yet worked with ProstgreSQL 9.0. Could
>>>> you please elaborate little bit more on how it works? What happens if
>>>> master dies? Who is responsible for creating/deleting recovery.conf
>>>> file?
>>>
>>> This patch does not setup recovery.conf for hot standby automatically,
>>> so you need setup it manually before starting hot standby.
>>> If recovery.conf exists, resource agent will become slave.
>>> Die of master causes fail-over correctly.
>>>
>>>> Also please be more careful with quoting variables in your if statements.
>>>
>>> I checked and quoted.
>>>
>>>
>>>
>>>
>>> I elaborate.
>>>
>>>  **************** Condition *********************************
>>>
>>> 1. Streaming replication works correctly.
>>> 2. Streaming replication works correctly if you swap
>>>   primary server for hot standby server.
>>>
>>>   FYA I wrote my configuration at the end of the post.
>>>       See "Example of configuration".
>>>
>>>  **************** Setup and start ***************************
>>>
>>> 1. You decide which server do you want to be master.
>>>   In this instance I assume that you choice node1.
>>>
>>> 2. node1# rm /var/lib/pgsql/9.0/data/recovery.conf
>>>
>>> 3. node1# /etc/init.d/heartbeat start
>>>   Wait for "ONLINE node1".
>>>
>>> 4. node1# crm configure load update pgsql.crm
>>>
>>> 5. Wait for master started as below
>>>   node1# crm_mon -1 -fA
>>>   ------------------------------------------------------------
>>>   ============
>>>   Last updated: Tue Feb  8 18:51:00 2011
>>>   Stack: Heartbeat
>>>   Current DC: node1 (0c140f90-7de3-438f-b1b5-3b9722bbde21) -
>>> partition with quorum
>>>   Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
>>>   2 Nodes configured, unknown expected votes
>>>   3 Resources configured.
>>>   ============
>>>
>>>   Online: [ node1 ]
>>>
>>>    vip    (ocf::heartbeat:IPaddr2):       Started node1
>>>    Master/Slave Set: msPostgresql
>>>        Masters: [ node1 ]
>>>        Stopped: [ postgresql:1 ]
>>>    Clone Set: clnPingd
>>>        Started: [ node1 ]
>>>        Stopped: [ pingd:1 ]
>>>
>>>   Node Attributes:
>>>   * Node node1:
>>>       + master-postgresql:0               : 1000
>>>       + pgsql-status                      : PRI
>>>       + ping_set                          : 100
>>>
>>>   Migration summary:
>>>   * Node node1:
>>>   ------------------------------------------------------------
>>>
>>>   If start operation fails, please check PostgreSQL state using "crm_mon
>>> -A".
>>>   "pgsql-status" line shows something.
>>>
>>> 6. Backup node1's data and restore to node2.
>>>   In my case I use rsync command as below.
>>>
>>>   node2# psql -h 192.168.2.107 -U postgres -c "SELECT
>>> pg_start_backup('label')"
>>>   node2# rsync -avr --delete --exclude=postmaster.pid
>>> 192.168.2.107:/var/lib/pgsql/9.0/data/ /var/lib/pgsql/9.0/data/
>>>   node2# psql -h 192.168.2.107 -U postgres -c "SELECT pg_stop_backup()"
>>>
>>> 7. Put recovery.conf for node2.
>>>   node2# cp recovery.conf /var/lib/pgsql/9.0/data/
>>>   node2# chown postgres:postgres /var/lib/pgsql/9.0/data/recovery.conf
>>>
>>> 8. node2# /etc/init.d/heartbeat start
>>>
>>> 9. Wait for slave started as below.
>>>   node2# crm_mon -1 -fA
>>>   ------------------------------------------------------------
>>>   ============
>>>   Last updated: Tue Feb  8 19:41:00 2011
>>>   Stack: Heartbeat
>>>   Current DC: node1 (0c140f90-7de3-438f-b1b5-3b9722bbde21) -
>>> partition with quorum
>>>   Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
>>>   2 Nodes configured, unknown expected votes
>>>   3 Resources configured.
>>>   ============
>>>
>>>   Online: [ node1 node2 ]
>>>
>>>   vip     (ocf::heartbeat:IPaddr2):       Started node1
>>>    Master/Slave Set: msPostgresql
>>>        Masters: [ node1 ]
>>>        Slaves: [ node2 ]
>>>    Clone Set: clnPingd
>>>        Started: [ node1 node2 ]
>>>
>>>   Node Attributes:
>>>   * Node node1:
>>>       + master-postgresql:0               : 1000
>>>       + pgsql-status                      : PRI
>>>       + ping_set                          : 100
>>>   * Node node2:
>>>       + master-postgresql:1               : 100
>>>       + pgsql-status                      : HS
>>>       + ping_set                          : 100
>>>
>>>   Migration summary:
>>>   * Node node1:
>>>   * Node node2:
>>>   ------------------------------------------------------------
>>>
>>>   If master doesn't exist in other node or recovery.conf doesn't
>>> exist in node2,
>>>   start operation fails.
>>>
>>>  ********************** Other Operation
>>> ***************************************
>>>
>>> 1. If node1's master dies, node2 becomes a master.
>>>   node1# killall -9 postgres
>>>   node1# crm_mon -1 -fA
>>>   ------------------------------------------------------------
>>>   ============
>>>   Last updated: Tue Feb  8 20:12:34 2011
>>>   Stack: Heartbeat
>>>   Current DC: node1 (0c140f90-7de3-438f-b1b5-3b9722bbde21) -
>>> partition with quorum
>>>   Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
>>>   2 Nodes configured, unknown expected votes
>>>   3 Resources configured.
>>>   ============
>>>
>>>   Online: [ node1 node2 ]
>>>
>>>   vip     (ocf::heartbeat:IPaddr2):       Started node2
>>>    Master/Slave Set: msPostgresql
>>>        Masters: [ node2 ]
>>>        Stopped: [ postgresql:0 ]
>>>    Clone Set: clnPingd
>>>        Started: [ node1 node2 ]
>>>
>>>   Node Attributes:
>>>   * Node node1:
>>>       + master-postgresql:0               : -INFINITY
>>>       + pgsql-status                      : STOP:with errors
>>>       + ping_set                          : 100
>>>   * Node node2:
>>>       + master-postgresql:1               : 1000
>>>       + pgsql-status                      : PRI
>>>       + ping_set                          : 100
>>>
>>>   Migration summary:
>>>   * Node node1:
>>>      postgresql:0: migration-threshold=1 fail-count=1
>>>   * Node node2:
>>>
>>>   Failed actions:
>>>       postgresql:0_monitor_9000 (node=node1, call=14, rc=7,
>>> status=complete): not running
>>>   ------------------------------------------------------------
>>>
>>> 2. To recover node1, backup node2's data and restore to node1.
>>>
>>>   node1# psql -h 192.168.2.207 -U postgres -c "SELECT
>>> pg_start_backup('label')"
>>>   node1# rsync -avr --delete --exclude=postmaster.pid
>>> 192.168.2.207:/var/lib/pgsql/9.0/data/ /var/lib/pgsql/9.0/data/
>>>   node1# psql -h 192.168.2.207 -U postgres -c "SELECT pg_stop_backup()"
>>>
>>> 3. Put recovery.conf for node1.
>>>   node1# cp recovery.conf /var/lib/pgsql/9.0/data/
>>>   node1# chown postgres:postgres /var/lib/pgsql/9.0/data/recovery.conf
>>>
>>> 4. Remove flag file of repressing start.
>>>   node1# rm /var/lig/pgsql/PGSQL.5432.repress_start
>>>
>>> 5. node1# crm resource cleanup msPostgresql
>>>
>>> 6. Node1 becomes slave as below.
>>>   node1# crm_mon -1 -fA
>>>   ------------------------------------------------------------
>>>   ============
>>>   Last updated: Tue Feb  8 20:32:04 2011
>>>   Stack: Heartbeat
>>>   Current DC: node1 (0c140f90-7de3-438f-b1b5-3b9722bbde21) -
>>> partition with quorum
>>>   Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
>>>   2 Nodes configured, unknown expected votes
>>>   3 Resources configured.
>>>   ============
>>>
>>>   Online: [ node1 node2 ]
>>>
>>>   vip     (ocf::heartbeat:IPaddr2):       Started node2
>>>    Master/Slave Set: msPostgresql
>>>        Masters: [ node2 ]
>>>        Slaves: [ node1 ]
>>>    Clone Set: clnPingd
>>>        Started: [ node1 node2 ]
>>>
>>>   Node Attributes:
>>>   * Node node1:
>>>       + master-postgresql:0               : 100
>>>       + pgsql-status                      : HS
>>>       + ping_set                          : 100
>>>   * Node node2:
>>>       + master-postgresql:1               : 1000
>>>       + pgsql-status                      : PRI
>>>       + ping_set                          : 100
>>>
>>>   Migration summary:
>>>   * Node node1:
>>>   * Node node2:
>>>   ------------------------------------------------------------
>>>
>>>
>>>
>>>  ********************** Example of configuration
>>> ****************************
>>>
>>>  *  postgresql.conf (common configuration for node1 and node2)
>>>    - Please mkdir /var/lib/pgsql/9.0/data/pg_archive preliminarily
>>>    - To cause fail-over correctly, set listen_addresses = '*'
>>>      if you use virtual IP together.
>>> ---------------------------------------------------------------------------
>>> listen_addresses = '*'
>>> port = 5432
>>> wal_level = hot_standby
>>> max_wal_senders = 3
>>> wal_keep_segments = 16
>>> archive_mode = on
>>> archive_command = 'cp %p /var/lib/pgsql/9.0/data/pg_archive/%f'
>>> ---------------------------------------------------------------------------
>>>
>>>  * pg_hba.conf (common configuration for node1 and node2)
>>>    - 192.168.2.0/24 is LAN for replication.
>>>    - Node1 has 192.168.2.107 and Node2 has 192.168.2.207.
>>> ---------------------------------------------------------------------------
>>> local   all             all                                     trust
>>> host    all             all             127.0.0.1/32            trust
>>> host    all             all             ::1/128                 trust
>>> host    all             all             192.168.2.0/24          trust
>>> host    replication     all             192.168.2.0/24          trust
>>> ---------------------------------------------------------------------------
>>>
>>>  * recovery.conf for node1
>>> ---------------------------------------------------------------------------
>>> standby_mode = 'on'
>>> primary_conninfo = 'host=192.168.2.207 port=5432 user=postgres'
>>> restore_command = 'cp /var/lib/pgsql/9.0/data/pg_archive/%f %p'
>>> trigger_file = '/var/lib/pgsql/PGSQL.5432.trigger'
>>> ---------------------------------------------------------------------------
>>>
>>>  * recovery.conf for node2
>>> ---------------------------------------------------------------------------
>>> standby_mode = 'on'
>>> primary_conninfo = 'host=192.168.2.107 port=5432 user=postgres'
>>> restore_command = 'cp /var/lib/pgsql/9.0/data/pg_archive/%f %p'
>>> trigger_file = '/var/lib/pgsql/PGSQL.5432.trigger'
>>> ---------------------------------------------------------------------------
>>>
>>>  * pgsql.crm for crm command
>>> ---------------------------------------------------------------------------
>>> property \
>>>        no-quorum-policy="ignore" \
>>>        stonith-enabled="false" \
>>>        startup-fencing="false" \
>>> rsc_defaults \
>>>        resource-stickiness="INFINITY" \
>>>        migration-threshold="1"
>>> ms msPostgresql postgresql \
>>>        meta \
>>>                master-max="1" \
>>>                master-node-max="1" \
>>>                clone-max="2" \
>>>                clone-node-max="1" \
>>>                notify="true"
>>> clone clnPingd pingd \
>>>        meta \
>>>                clone-max="2" \
>>>                clone-node-max="1"
>>> primitive vip ocf:heartbeat:IPaddr2 \
>>>        params \
>>>                ip="172.20.24.127" \
>>>                nic="eth0" \
>>>                cidr_netmask="16" \
>>>        op start   timeout="60s" interval="0s"  on-fail="restart" \
>>>        op monitor timeout="60s" interval="10s" on-fail="restart" \
>>>        op stop    timeout="60s" interval="0s"  on-fail="block"
>>> primitive postgresql ocf:heartbeat:pgsql \
>>>        params \
>>>                pgctl="/usr/pgsql-9.0/bin/pg_ctl" \
>>>                psql="/usr/pgsql-9.0/bin/psql" \
>>>                pgdata="/var/lib/pgsql/9.0/data/" \
>>>                start_opt="-p 5432" \
>>>                rep_mode="async" \
>>>        op start   timeout="3600s" interval="0s"  on-fail="restart" \
>>>        op monitor timeout="60s"   interval="10s" on-fail="restart" \
>>>        op monitor timeout="60s"   interval="9s"  on-fail="restart"
>>> role="Master" \
>>>        op promote timeout="3600s" interval="0s"  on-fail="restart" \
>>>        op demote  timeout="60s"   interval="0s"  on-fail="block" \
>>>        op stop    timeout="60s"   interval="0s"  on-fail="block"
>>> primitive pingd ocf:pacemaker:pingd \
>>>        params \
>>>                name="ping_set" \
>>>                host_list="172.20.24.103" \
>>>                multiplier="100" \
>>>                dampen="0" \
>>>        op start   timeout="60s" interval="0s"  on-fail="restart" \
>>>        op monitor timeout="60s" interval="10s" on-fail="restart" \
>>>        op stop    timeout="60s" interval="0s"  on-fail="ignore"
>>>
>>> location rsc_location-1 vip \
>>>        rule -inf: not_defined ping_set or ping_set lt 100
>>> location rsc_location-2 msPostgresql \
>>>        rule -inf: not_defined ping_set or ping_set lt 100 \
>>>        rule -inf: defined fail-count-vip
>>>
>>> colocation rsc_colocation-1 inf: vip                 clnPingd
>>> colocation rsc_colocation-2 inf: msPostgresql        clnPingd
>>> colocation rsc_colocation-3 inf: msPostgresql:Master vip
>>>
>>> order rsc_order-1 0: clnPingd     msPostgresql
>>> order rsc_order-2 0: msPostgresql vip
>>> order rsc_order-3 0: vip          msPostgresql:promote
>>> ---------------------------------------------------------------------------
>>>
>>>
>>> Regards,
>>> Takatoshi MATSUO
>>>
>>> _______________________________________________________
>>> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>>> Home Page: http://linux-ha.org/
>>>
>>>
>>
>>
>>
>> --
>> Serge Dubrouski.
>> _______________________________________________________
>> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> Home Page: http://linux-ha.org/
>>
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
>



-- 
Serge Dubrouski.
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [PATCH] manage PostgreSQL 9.0 streaming replication using Master/Slave

Reply via email to