[Linux-HA] node ignored after reboot

Juha Heinanen Wed, 18 Mar 2009 10:06:17 -0700

i set up the example apache cluster of document 

http://www.clusterlabs.org/wiki/DRBD_HowTo_1.0


but used mysql server instead of apache server.  crm of my test cluster
looks like this:

node $id="8df8447f-6ecf-41a7-a131-c89fd59a120d" lenny1
node $id="f13aff7b-6c94-43ac-9a24-b118e62d5325" lenny2
primitive drbd0 ocf:heartbeat:drbd \
        params drbd_resource="drbd0" \
        op monitor interval="59s" role="Master" timeout="30s" \
        op monitor interval="60s" role="Slave" timeout="30s"
primitive fs0 ocf:heartbeat:Filesystem \
        params ftype="ext3" directory="/var/lib/mysql" device="/dev/drbd0" \
        meta target-role="Started"
primitive mysql-server lsb:mysql \
        op monitor interval="10s" timeout="30s" start-delay="10s"
primitive virtual-ip ocf:heartbeat:IPaddr2 \
        params ip="192.98.102.10" broadcast="192.98.102.255" nic="eth1" 
cidr_netmask="24" \
        op monitor interval="21s" timeout="5s"
group mysql-group fs0 mysql-server virtual-ip
ms ms-drbd0 drbd0 \
        meta clone-max="2" notify="true" globally-unique="false" 
target-role="Started"
colocation mysql-group-on-ms-drbd0 inf: mysql-group ms-drbd0:Master
order ms-drbd0-before-mysql-group inf: ms-drbd0:promote mysql-group:start
property $id="cib-bootstrap-options" \
        dc-version="1.0.2-ec6b0bbee1f3aa72c4c2559997e675db6ab39160" \
        default-resource-stickiness="1"

initially both nodes were online, lenny2 being the master.  then i tried
what happens when i reboot lenny1. when lenny1 was powered off, cluster
looked correctly like this:

# crm_mon -1

============
Last updated: Wed Mar 18 14:12:09 2009
Current DC: lenny2 (f13aff7b-6c94-43ac-9a24-b118e62d5325)
Version: 1.0.2-ec6b0bbee1f3aa72c4c2559997e675db6ab39160
2 Nodes configured.
2 Resources configured.
============

Node: lenny1 (8df8447f-6ecf-41a7-a131-c89fd59a120d): OFFLINE
Node: lenny2 (f13aff7b-6c94-43ac-9a24-b118e62d5325): online

Master/Slave Set: ms-drbd0
    drbd0:0     (ocf::heartbeat:drbd):  Stopped 
    drbd0:1     (ocf::heartbeat:drbd):  Master lenny2
Resource Group: mysql-group
    fs0 (ocf::heartbeat:Filesystem):    Started lenny2
    mysql-server        (lsb:mysql):    Started lenny2
    virtual-ip  (ocf::heartbeat:IPaddr2):       Started lenny2

when i powered lenny1 on again, i expected that after is becomes online
again, but it was totally ignored.

the log is below. versions of software are heartbeat 2.99.2 and
pacemaker 1.0.2.  

any glues why lenny1 was ignored and my very first test to achieve high
availability with heartbeat/pacemaker failed?  people on pacemaker list
suspected ccm, which is part of heartbeat.

-- juha

------------------------------------------

this came to syslog when lenny1 was powered off:

r...@lenny2:~# heartbeat[1831]: 2009/03/18_14:12:32 WARN: node lenny1: is dead
heartbeat[1831]: 2009/03/18_14:12:32 info: Link lenny1:eth1 dead.
crmd[1923]: 2009/03/18_14:12:32 notice: crmd_ha_status_callback: Status update: 
Node lenny1 now has status [dead] (DC=true)
crmd[1923]: 2009/03/18_14:12:32 info: crm_update_peer_proc: lenny1.ais is now 
offline
crmd[1923]: 2009/03/18_14:12:32 info: te_graph_trigger: Transition 12 is now 
complete
crmd[1923]: 2009/03/18_14:12:32 info: notify_crmd: Transition 12 status: done - 
<null>

and this when it was powered on again:

heartbeat[1831]: 2009/03/18_14:12:56 info: Heartbeat restart on node lenny1
heartbeat[1831]: 2009/03/18_14:12:56 info: Link lenny1:eth1 up.
heartbeat[1831]: 2009/03/18_14:12:56 info: Status update for node lenny1: 
status init
heartbeat[1831]: 2009/03/18_14:12:56 info: Status update for node lenny1: 
status up
crmd[1923]: 2009/03/18_14:12:56 notice: crmd_ha_status_callback: Status update: 
Node lenny1 now has status [init] (DC=true)
crmd[1923]: 2009/03/18_14:12:56 info: crm_update_peer_proc: lenny1.ais is now 
online
crmd[1923]: 2009/03/18_14:12:56 notice: crmd_ha_status_callback: Status update: 
Node lenny1 now has status [up] (DC=true)
heartbeat[1831]: 2009/03/18_14:13:26 info: Status update for node lenny1: 
status active
crmd[1923]: 2009/03/18_14:13:26 notice: crmd_ha_status_callback: Status update: 
Node lenny1 now has status [active] (DC=true)
cib[1919]: 2009/03/18_14:13:26 info: cib_client_status_callback: Status update: 
Client lenny1/cib now has status [join]
cib[1919]: 2009/03/18_14:13:26 info: crm_update_peer_proc: lenny1.cib is now 
online
heartbeat[1831]: 2009/03/18_14:13:30 WARN: 1 lost packet(s) for [lenny1] [55:57]
heartbeat[1831]: 2009/03/18_14:13:30 info: No pkts missing from lenny1!
crmd[1923]: 2009/03/18_14:13:30 notice: crmd_client_status_callback: Status 
update: Client lenny1/crmd now has status [online] (DC=true)
crmd[1923]: 2009/03/18_14:13:30 info: crm_update_peer_proc: lenny1.crmd is now 
online
heartbeat[1831]: 2009/03/18_14:13:31 WARN: 1 lost packet(s) for [lenny1] [59:61]
heartbeat[1831]: 2009/03/18_14:13:31 info: No pkts missing from lenny1!
crmd[1923]: 2009/03/18_14:13:33 WARN: crmd_ha_msg_callback: Ignoring HA message 
(op=join_announce) from lenny1: not in our membership list (size=1)
crmd[1923]: 2009/03/18_14:13:43 WARN: crmd_ha_msg_callback: Ignoring HA message 
(op=vote) from lenny1: not in our membership list (size=1)
cib[1919]: 2009/03/18_14:13:46 WARN: cib_peer_callback: Discarding 
cib_slave_all message (50) from lenny1: not in our membership
cib[1919]: 2009/03/18_14:13:47 WARN: cib_peer_callback: Discarding cib_replace 
message (54) from lenny1: not in our membership
cib[1919]: 2009/03/18_14:13:48 WARN: cib_peer_callback: Discarding 
cib_apply_diff message (58) from lenny1: not in our membership
cib[1919]: 2009/03/18_14:13:50 WARN: cib_peer_callback: Discarding 
cib_apply_diff message (5c) from lenny1: not in our membership
cib[1919]: 2009/03/18_14:13:51 WARN: cib_peer_callback: Discarding 
cib_apply_diff message (5e) from lenny1: not in our membership
cib[1919]: 2009/03/18_14:13:51 WARN: cib_peer_callback: Discarding 
cib_apply_diff message (5f) from lenny1: not in our membership
cib[1919]: 2009/03/18_14:13:51 WARN: cib_peer_callback: Discarding 
cib_apply_diff message (60) from lenny1: not in our membership
cib[1919]: 2009/03/18_14:13:51 WARN: cib_peer_callback: Discarding 
cib_apply_diff message (61) from lenny1: not in our membership
cib[1919]: 2009/03/18_14:13:51 WARN: cib_peer_callback: Discarding 
cib_apply_diff message (62) from lenny1: not in our membership
heartbeat[1831]: 2009/03/18_14:16:01 info: all clients are now paused
cib[1919]: 2009/03/18_14:16:27 info: cib_stats: Processed 32 operations 
(19062.00us average, 0% utilization) in the last 10min
heartbeat[1831]: 2009/03/18_14:18:02 WARN: Message hist queue is filling up 
(376 messages in queue)
heartbeat[1831]: 2009/03/18_14:18:03 WARN: Message hist queue is filling up 
(377 messages in queue)
...
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] node ignored after reboot

Reply via email to