Hello, again. Why you didn't answer me? I so need your help!!
-------- Пересылаемое сообщение --------
От кого: Юлия Школьникова <[email protected]>
Кому: [email protected]
Дата: Mon 19 Nov 2012 16:37:21
Тема: [Pacemaker] Problem with monitor
Hello,
I configure master/slave cluster for postgresql 9.1 based on corosync и
pacemaker.
I do it using this presentation:
http://schedule2012.rmll.info/IMG/pdf/postgresql-9-0-ha.pdf.
Resource agent (pgsql-ms) for master/slave postgresql I took from this:
https://github.com/roidelapluie/puppet-cluster.
My nodes are node1 и node2.
My config file of pacemaker:
node node1
node node2
primitive DBIP ocf:heartbeat:IPaddr2 \
params nic="eth0" ip="10.76.112.183" cidr_netmask="22" \
op monitor interval="30s" \
meta target-role="Started" is-managed="true"
primitive pgsql ocf:inuits:pgsql-ms \
op monitor interval="5s" role="Master" \
op monitor interval="10s" role="Slave"
primitive ping ocf:pacemaker:ping \
params host_list="10.76.112.1" \
op monitor interval="10s" timeout="10s" \
op start interval="0" timeout="45s"
group PSQL DBIP
ms pgsql-ms pgsql \
params pgsqlconfig="/var/lib/pgsql/9.1/data/postgresql.conf"
lsb_script="/etc/init.d/postgresql-9.1"
pgsqlrecovery="/var/lib/pgsql/9.1/data/recovery.conf" \
meta clone-max="2" clone-node-max="1" master-max="1" master-node-max="1"
notify="true"
clone clone-ping ping \
meta globally-unique="false"
location connected PSQL \
rule $id="connected-rule" -inf: not_defined pingd or pingd lte 0
colocation ip_psql inf: PSQL pgsql-ms:Master
property $id="cib-bootstrap-options" \
dc-version="1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore" \
default-resource-stickiness="INFINITY" \
last-lrm-refresh="1352470332"
rsc_defaults $id="rsc_defaults-options" \
migration-threshold="INFINITY" \
failure-timeout="10" \
resource-stickiness="INFINITY"
Then I try to test my cluster:
1) If I switch off the master, then the slave becomes a new master as expected.
This works fine and can be repeated many times
2) But if I try to stop postgresql (to simulate a failure of postgresql) with
command: service postgresql-9.1 stop, the following occurs:
Given node1 is master, node2 is slave.
On the node1 I run "service postgresql-9.1 stop" and the node2 becomes the
master.
Now, on the node2 I run "service postgresql-9.1 stop" and the node1 becomes the
master again.
At this time a monitoring of my resource on node1 stops, and the following
entry appears in the log:
node1 crmd[1362]: info: process_lrm_event: LRM operation pgsql:0_monitor_10000
(call=33, status=1, cib-update=0, confirmed=true) Cancelled
Now if I run "service postgresql-9.1 stop" on the node1, pacemaker doesn't see
that postgresql have stopped and doesn't try to restart it
and promote node2 to master.
If I run "crm resource reprobe" montor action resumes to work.
I can not understand why the operation monitor stops working. Please, help me.
Shkolnikova Yulia.
----------------------------------------------------------------------
_______________________________________________
Pacemaker mailing list: [email protected]
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org