Hi,

we have a 2-node-cluster based on SLES11 , openais (0.80.3-26.8.1) and 
pacemaker (1.0.5-0.5.6). Sometimes the failover from one node (named cuzzonib) 
to the second node (named cuzzonia) fails with the following messages:

Apr 16 13:16:14 cuzzonib lrmd: [6706]: info: Try to stop STONITH resource 
<rsc_id=iRMC_cuzzoniaInstance:0> : Device=external/ipmi
Apr 16 13:16:14 cuzzonib crmd: [18479]: info: process_lrm_event: LRM operation 
iRMC_cuzzoniaInstance:0_stop_0 (call=51, rc=0, cib-update=108, confirmed=true) 
ok
Apr 16 13:16:14 cuzzonib crmd: [18479]: info: match_graph_event: Action 
iRMC_cuzzoniaInstance:0_stop_0 (25) confirmed on cuzzonib (rc=0)

Apr 16 13:16:14 cuzzonib crmd: [18479]: info: te_pseudo_action: Pseudo action 
29 fired and confirmed
Apr 16 13:16:14 cuzzonib crmd: [18479]: info: te_crm_command: Executing 
crm-event (79): do_shutdown on cuzzonib
Apr 16 13:16:14 cuzzonib crmd: [18479]: info: te_crm_command: crm-event (79) is 
a local shutdown

Apr 16 13:16:17 cuzzonib logger: /etc/xen/scripts/xen-hotplug-cleanup: 
XENBUS_PATH=backend/vkbd/4/0
Apr 16 13:16:17 cuzzonib logger: /etc/xen/scripts/xen-hotplug-cleanup: 
XENBUS_PATH=backend/console/4/0
Apr 16 13:16:17 cuzzonib logger: /etc/xen/scripts/xen-hotplug-cleanup: 
XENBUS_PATH=backend/vfb/4/0
Apr 16 13:16:17 cuzzonib logger: /etc/xen/scripts/xen-hotplug-cleanup: 
XENBUS_PATH=backend/vif/4/0
Apr 16 13:16:17 cuzzonib logger: /etc/xen/scripts/block: remove 
XENBUS_PATH=backend/vbd/4/51712
Apr 16 13:16:17 cuzzonib logger: /etc/xen/scripts/block: remove 
XENBUS_PATH=backend/vbd/4/51744
Apr 16 13:16:17 cuzzonib logger: /etc/xen/scripts/xen-hotplug-cleanup: 
XENBUS_PATH=backend/vbd/4/51712
Apr 16 13:16:17 cuzzonib logger: /etc/xen/scripts/xen-hotplug-cleanup: 
XENBUS_PATH=backend/vbd/4/51744

Apr 16 13:16:32 cuzzonib openais[18468]: [crm  ] notice: pcmk_shutdown: Still 
waiting for crmd (pid=18479, seq=6) to terminate..
.
Apr 16 13:16:38 cuzzonib openais[18468]: [TOTEM] The token was lost in the 
OPERATIONAL state.
Apr 16 13:16:38 cuzzonib openais[18468]: [TOTEM] Receive multicast socket recv 
buffer size (262142 bytes).
Apr 16 13:16:38 cuzzonib openais[18468]: [TOTEM] Transmit multicast socket send 
buffer size (262142 bytes).
Apr 16 13:16:38 cuzzonib openais[18468]: [TOTEM] entering GATHER state from 2.
Apr 16 13:16:58 cuzzonib openais[18468]: [TOTEM] entering GATHER state from 0.
Apr 16 13:16:58 cuzzonib openais[18468]: [TOTEM] Creating commit token because 
I am the rep.
Apr 16 13:16:58 cuzzonib openais[18468]: [TOTEM] Saving state aru 14b high seq 
received 14b
Apr 16 13:16:58 cuzzonib openais[18468]: [TOTEM] Storing new sequence id for 
ring bb4
Apr 16 13:16:58 cuzzonib openais[18468]: [TOTEM] entering COMMIT state.
Apr 16 13:16:58 cuzzonib openais[18468]: [TOTEM] entering RECOVERY state.
Apr 16 13:16:58 cuzzonib openais[18468]: [TOTEM] position [0] member 
192.168.10.5:
Apr 16 13:16:58 cuzzonib openais[18468]: [TOTEM] previous ring seq 2992 rep 
192.168.10.3
Apr 16 13:16:58 cuzzonib openais[18468]: [TOTEM] aru 14b high delivered 14b 
received flag 1
Apr 16 13:16:58 cuzzonib openais[18468]: [TOTEM] Did not need to originate any 
messages in recovery.
Apr 16 13:16:58 cuzzonib openais[18468]: [TOTEM] Sending initial ORF token
Apr 16 13:16:58 cuzzonib openais[18468]: [CLM  ] CLM CONFIGURATION CHANGE
Apr 16 13:16:58 cuzzonib openais[18468]: [CLM  ] New Configuration:
Apr 16 13:16:58 cuzzonib openais[18468]: [CLM  ]        r(0) ip(192.168.10.5)

Apr 16 13:16:58 cuzzonib openais[18468]: [CLM  ] Members Left:
Apr 16 13:16:58 cuzzonib crmd: [18479]: notice: ais_dispatch: Membership 2996: 
quorum lost
Apr 16 13:16:58 cuzzonib cib: [18475]: notice: ais_dispatch: Membership 2996: 
quorum lost
Apr 16 13:16:58 cuzzonib crmd: [18479]: info: ais_status_callback: status: 
cuzzonia is now lost (was member)

Apr 16 13:16:58 cuzzonib cib: [18475]: info: crm_update_peer: Node cuzzonia: 
id=51030208 state=lost (new) addr=r(0) ip(192.168.10.3)  votes=1 born=2992 
seen=2992 proc=00000000000000000000000000053312

Afterwards the second cluster node (cuzzonia) is rebooted.
What could be the reason for the problem ?

Regards,
Armin Haussecker







_______________________________________________
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to