>>> zulucloud <zulucl...@mailbox.org> schrieb am 07.10.2015 um 16:12 in >>> Nachricht <5615284e.8050...@mailbox.org>: > Hi, > i got a problem i don't understand, maybe someone can give me a hint. > > My 2-node cluster (named ali and baba) is configured to run mysql, an IP > for mysql and the filesystem resource (on drbd master) together as a > GROUP. After doing some crash-tests i ended up having filesystem and > mysql running happily on one host (ali), and the related IP on the other > (baba) .... although, the IP's not really up and running, crm_mon just > SHOWS it as started there. In fact it's nowhere up, neither on ali nor > on baba.
Then it's most likely a bug in the resource agent. To make sure, try "crm resource reprobe" and be patient after that for some seconds. Then recheck the displayed status. > > crm_mon shows that pacemaker tried to start it on baba, but gave up > after fail-count=1000000. This could mean: Multiple start attempty failed, as did stop attempts, so the cluster thinks it might be running. It looks very much like a configuration problem to me. > > Q1: why doesn't pacemaker put the IP on ali, where all the rest of it's > group lives? See the log files in detail. > Q2: why doesn't pacemaker try to start the IP on ali, after max > failcount had been reached on baba? Do you have fencing enabled? > Q3: why is crm_mon showing the IP as "started", when it's down after > 100000 tries? See above. > > Thanks :) 8-) > > > config (some parts removed): > ------------------------------- > node ali > node baba > > primitive res_drbd ocf:linbit:drbd \ > params drbd_resource="r0" \ > op stop interval="0" timeout="100" \ > op start interval="0" timeout="240" \ > op promote interval="0" timeout="90" \ > op demote interval="0" timeout="90" \ > op notify interval="0" timeout="90" \ > op monitor interval="40" role="Slave" timeout="20" \ > op monitor interval="20" role="Master" timeout="20" > primitive res_fs ocf:heartbeat:Filesystem \ > params device="/dev/drbd0" directory="/drbd_mnt" fstype="ext4" \ > op monitor interval="30s" > primitive res_hamysql_ip ocf:heartbeat:IPaddr2 \ > params ip="XXX.XXX.XXX.224" nic="eth0" cidr_netmask="23" \ > op monitor interval="10s" timeout="20s" depth="0" > primitive res_mysql lsb:mysql \ > op start interval="0" timeout="15" \ > op stop interval="0" timeout="15" \ > op monitor start-delay="30" interval="15" time-out="15" > > group gr_mysqlgroup res_fs res_mysql res_hamysql_ip \ > meta target-role="Started" > ms ms_drbd res_drbd \ > meta master-max="1" master-node-max="1" clone-max="2" > clone-node-max="1" notify="true" > > colocation col_fs_on_drbd_master inf: res_fs:Started ms_drbd:Master > > order ord_drbd_master_then_fs inf: ms_drbd:promote res_fs:start > > property $id="cib-bootstrap-options" \ > dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \ > cluster-infrastructure="openais" \ > stonith-enabled="false" \ > no-quorum-policy="ignore" \ > expected-quorum-votes="2" \ > last-lrm-refresh="1438857246" > > > crm_mon -rnf (some parts removed): > --------------------------------- > Node ali: online > res_fs (ocf::heartbeat:Filesystem) Started > res_mysql (lsb:mysql) Started > res_drbd:0 (ocf::linbit:drbd) Master > Node baba: online > res_hamysql_ip (ocf::heartbeat:IPaddr2) Started > res_drbd:1 (ocf::linbit:drbd) Slave > > Inactive resources: > > Migration summary: > > * Node baba: > res_hamysql_ip: migration-threshold=1000000 fail-count=1000000 > > Failed actions: > res_hamysql_ip_stop_0 (node=a891vl107s, call=35, rc=1, > status=complete): unknown error > > corosync.log: > -------------- > pengine: [1223]: WARN: should_dump_input: Ignoring requirement that > res_hamysql_ip_stop_0 comeplete before gr_mysqlgroup_stopped_0: > unmanaged failed resources cannot prevent shutdown > > pengine: [1223]: WARN: should_dump_input: Ignoring requirement that > res_hamysql_ip_stop_0 comeplete before gr_mysqlgroup_stopped_0: > unmanaged failed resources cannot prevent shutdown > > Software: > ---------- > corosync 1.2.1-4 > pacemaker 1.0.9.1+hg15626-1 > drbd8-utils 2:8.3.7-2.1 > (for some reason it's not possible to update at this time) > > > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org