Hi, On Fri, Dec 24, 2010 at 12:05:27PM +0100, Simone Felici wrote: > > Hi to all! > > I've an issue with my cluster env. First of all my config: > > Two Cluster CentOS5.5 Active+Standby with one DRBD partition managing a > Nagios service, ip, and storage. > The config files at the bottom. > > I'm trying to test fence option to prevent split brain and problems on double > access on drbd partition. > Starting on a sane situation, manual switching of the resources or > simulating kernel-panic, crash of process or whatever, all works well. If > I try to shutdown the eth1 (192.168.100.0 as well as cross cable to drbd > mirroring) the active stay as it is, it calls the fence option adding the > entry to crm config: > location drbd-fence-by-handler-ServerData ServerData \ > rule $id="drbd-fence-by-handler-rule-ServerData" $role="Master" -inf: > #uname ne opsview-core01-tn > > But the standby node kills the corosync process:
How? Did the corosync process crash (looks like it)? Did you find any core dumps? > *** STANDBY NODE LOG *** > Dec 24 11:00:04 corosync [TOTEM ] Incrementing problem counter for seqid > 14158 iface 192.168.100.12 to [1 of 10] > Dec 24 11:00:04 corosync [TOTEM ] Incrementing problem counter for seqid > 14160 iface 192.168.100.12 to [2 of 10] > Dec 24 11:00:05 corosync [TOTEM ] Incrementing problem counter for seqid > 14162 iface 192.168.100.12 to [3 of 10] > Dec 24 11:00:05 corosync [TOTEM ] Incrementing problem counter for seqid > 14164 iface 192.168.100.12 to [4 of 10] > Dec 24 11:00:06 corosync [TOTEM ] Decrementing problem counter for iface > 192.168.100.12 to [3 of 10] > Dec 24 11:00:06 corosync [TOTEM ] Incrementing problem counter for seqid > 14166 iface 192.168.100.12 to [4 of 10] > Dec 24 11:00:06 corosync [TOTEM ] Incrementing problem counter for seqid > 14168 iface 192.168.100.12 to [5 of 10] > Dec 24 11:00:07 corosync [TOTEM ] Incrementing problem counter for seqid > 14170 iface 192.168.100.12 to [6 of 10] > Dec 24 11:00:08 corosync [TOTEM ] Incrementing problem counter for seqid > 14172 iface 192.168.100.12 to [7 of 10] > Dec 24 11:00:08 corosync [TOTEM ] Decrementing problem counter for iface > 192.168.100.12 to [6 of 10] > Dec 24 11:00:08 corosync [TOTEM ] Incrementing problem counter for seqid > 14174 iface 192.168.100.12 to [7 of 10] > Dec 24 11:00:09 corosync [TOTEM ] Incrementing problem counter for seqid > 14176 iface 192.168.100.12 to [8 of 10] > Dec 24 11:00:09 corosync [TOTEM ] Incrementing problem counter for seqid > 14178 iface 192.168.100.12 to [9 of 10] > Dec 24 11:00:10 corosync [TOTEM ] Decrementing problem counter for iface > 192.168.100.12 to [8 of 10] > Dec 24 11:00:10 corosync [TOTEM ] Incrementing problem counter for seqid > 14180 iface 192.168.100.12 to [9 of 10] > Dec 24 11:00:10 corosync [TOTEM ] Incrementing problem counter for seqid > 14182 iface 192.168.100.12 to [10 of 10] > Dec 24 11:00:10 corosync [TOTEM ] Marking seqid 14182 ringid 0 interface > 192.168.100.12 FAULTY - adminisrtative intervention required. > Dec 24 11:00:11 corosync [TOTEM ] FAILED TO RECEIVE > Dec 24 11:00:12 corosync [TOTEM ] FAILED TO RECEIVE > Dec 24 11:00:12 corosync [TOTEM ] FAILED TO RECEIVE > Dec 24 11:00:12 corosync [TOTEM ] FAILED TO RECEIVE > Dec 24 11:00:12 corosync [TOTEM ] FAILED TO RECEIVE > Dec 24 11:00:12 corosync [TOTEM ] FAILED TO RECEIVE > Dec 24 11:00:13 corosync [TOTEM ] FAILED TO RECEIVE > Dec 24 11:00:13 corosync [TOTEM ] FAILED TO RECEIVE > Dec 24 11:00:13 corosync [TOTEM ] FAILED TO RECEIVE > Dec 24 11:00:13 corosync [TOTEM ] FAILED TO RECEIVE > Dec 24 11:00:14 opsview-core02-tn stonithd: [5151]: ERROR: ais_dispatch: > Receiving message body failed: (2) Library error: No such file or > directory (2) At this point the corosync process is no more. Best to send the backtrace to the openais list. Thanks, Dejan _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker