Hi, On Wed, Feb 27, 2008 at 09:45:06AM +1300, Luke Pascoe wrote: > > How do you induce a failover? > > Right now just the simplest kind, a network failure. I'm simply > disconnecting the LAN interface on one of the hosts. > > It doesn't matter which host I disconnect, 02 reboots. > > > Anything in the logs? Perhaps post them. > > Nothing obvious. Attached are the ha-log and ha-debug files from NODE01 and > NODE02. NODE02 is primary and has the resources, I disconnect eth0 on > NODE01, NODE02 acquires it's resources (there are none) and then promptly > reboots.
Strange. So, it reboots without a warning. These are all logs you found? All I found was: heartbeat[15343]: 2008/02/27_08:40:35 ERROR: Current node [node01] not in configuration! The node names should match what you get from uname -n. The version you are running is 2.0.8 which is before fastfail was introduced. And there's no stonith configured. I don't see how heartbeat could reboot the host. Are you sure there's nothing else rebooting it? Thanks, Dejan > :( > > Regards > > Luke Pascoe > Linux Systems Engineer > Asterisk > > T 09 366 8835 > F 09 302 1772 > M 0274 266649 > E [EMAIL PROTECTED] > W www.asterisk.co.nz > > Level 9, Gen-i Tower, 66 Wyndham Street > PO Box 8804, Auckland, New Zealand > > > Dejan Muhamedagic wrote: >> Hi, >> On Tue, Feb 26, 2008 at 04:35:06PM +1300, Luke Pascoe wrote: >>> Hello >>> >>> I'm trying to do NFS failover in a test environment with an underlying >>> OCFS2 filesystem. This is something that's apparently been done before >>> and certainly HA NFS isn't new. >>> >>> I've followed several HowTos, all of which seem to suggest pretty much >>> the same setup, but I seem to get the same problem no matter how I >>> configure it. >>> >>> Here's the setup: >>> >>> 2 VMWare VMs (NODE01 and NODE02) running RHEL4 U5 x86_64 with a shared >>> fibre channel SAN volume. That volume is OCFS2 formatted and mounted as >>> /data on both hosts. >>> >>> Each host has 2 interfaces. An external facing 10.0.0.0/24 and an >>> internal 192.168.100.0/30 >>> >>> Heartbeat is installed on both nodes and configured identically as >>> follows: >>> >>> =ha.cf= >>> keepalive 2 >>> deadtime 30 >>> warntime 10 >>> initdead 120 >>> bcast eth0 >>> ucast eth1 192.168.100.1 # Obviously on NODE02 this is 192.168.100.2 >>> auto_failback on >>> node NODE02 NODE01 >>> ping 10.0.0.133 # this is an unrelated LAN host >>> respawn hacluster /usr/lib64/heartbeat/ipfail >>> use_logd yes >>> crm off >>> =/ha.cf= >>> >>> =haresources= >>> NODE01 10.199.133.90 nfslock nfs_wrapper >>> =/haresources= >>> >>> BTW, crm is off because I tried it with it on and got EXACTLY the same >>> result. >>> >>> Here's the problem: >>> >>> Both hosts start up just fine, NODE01 picks up all 3 resources and >>> everything's roses. If I induce a failure on NODE01, NODE02 correctly >>> acquires the resources and everything is still roses. HOWEVER, ~30 >>> seconds later NODE02 reboots. Now the odd thing is, it doesn't matter >>> which is the primary node, or which host has the failure or even which >>> has the resources, NODE02 always reboots when there's a failure. >> How do you induce a failover? >>> Even if the resources are started on NODE02 and NODE01 has a failure (ie, >>> everything should stay as it is, no failover required) 30 seconds after >>> the failure NODE02 reboots!!! >>> >>> I've got NTP syncing the time, so it's not a clock issue, and I've tried >>> twiddling just about every setting in the config, to no avail. >>> >>> Any help? Please? >> Anything in the logs? Perhaps post them. >> Thanks, >> Dejan >>> -- >>> Regards >>> >>> Luke Pascoe >>> Linux Systems Engineer >>> Asterisk >>> >>> T 09 366 8835 >>> F 09 302 1772 >>> M 0274 266649 >>> E [EMAIL PROTECTED] >>> W www.asterisk.co.nz >>> >>> Level 9, Gen-i Tower, 66 Wyndham Street >>> PO Box 8804, Auckland, New Zealand >>> _______________________________________________ >>> Linux-HA mailing list >>> Linux-HA@lists.linux-ha.org >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>> See also: http://linux-ha.org/ReportingProblems >> _______________________________________________ >> Linux-HA mailing list >> Linux-HA@lists.linux-ha.org >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems > logd[15280]: 2008/02/27_08:40:35 info: logd started with /etc/logd.cf. > logd[15281]: 2008/02/27_08:40:35 info: G_main_add_SignalHandler: Added signal > handler for signal 15 > logd[15280]: 2008/02/27_08:40:35 info: G_main_add_SignalHandler: Added signal > handler for signal 15 > heartbeat[15343]: 2008/02/27_08:40:35 info: Enabling logging daemon > heartbeat[15343]: 2008/02/27_08:40:35 info: logfile and debug file are those > specified in logd config file (default /etc/logd.cf) > heartbeat[15343]: 2008/02/27_08:40:35 ERROR: Current node [node01] not in > configuration! > heartbeat[15343]: 2008/02/27_08:40:35 info: By default, cluster nodes are > named by `uname -n` and must be declared with a 'node' directive in the ha.cf > file. > heartbeat[15343]: 2008/02/27_08:40:35 info: See also: > http://linux-ha.org/ha.cf/NodeDirective > heartbeat[15343]: 2008/02/27_08:40:35 ERROR: Configuration error, heartbeat > not started. > heartbeat[15489]: 2008/02/27_08:42:39 info: Enabling logging daemon > heartbeat[15489]: 2008/02/27_08:42:39 info: logfile and debug file are those > specified in logd config file (default /etc/logd.cf) > heartbeat[15489]: 2008/02/27_08:42:39 info: ************************** > heartbeat[15489]: 2008/02/27_08:42:39 info: Configuration validated. Starting > heartbeat 2.0.8 > heartbeat[15490]: 2008/02/27_08:42:39 info: heartbeat: version 2.0.8 > heartbeat[15490]: 2008/02/27_08:42:39 info: Heartbeat generation: 37 > heartbeat[15490]: 2008/02/27_08:42:39 info: G_main_add_TriggerHandler: Added > signal manual handler > heartbeat[15490]: 2008/02/27_08:42:39 info: G_main_add_TriggerHandler: Added > signal manual handler > heartbeat[15490]: 2008/02/27_08:42:39 info: Removing > /var/run/heartbeat/rsctmp failed, recreating. > heartbeat[15490]: 2008/02/27_08:42:39 info: glib: UDP Broadcast heartbeat > started on port 694 (694) interface eth0 > heartbeat[15490]: 2008/02/27_08:42:39 info: glib: UDP Broadcast heartbeat > closed on port 694 interface eth0 - Status: 1 > heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ucast: write socket > priority set to IPTOS_LOWDELAY on eth1 > heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ucast: bound send socket to > device: eth1 > heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ucast: bound receive socket > to device: eth1 > heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ucast: started on port 694 > interface eth1 to 192.168.100.2 > heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ping heartbeat started. > heartbeat[15490]: 2008/02/27_08:42:39 info: G_main_add_SignalHandler: Added > signal handler for signal 17 > heartbeat[15490]: 2008/02/27_08:42:39 info: Local status now set to: 'up' > heartbeat[15490]: 2008/02/27_08:42:40 info: Link 10.0.0.133:10.0.0.133 up. > heartbeat[15490]: 2008/02/27_08:42:40 info: Status update for node > 10.0.0.133: status ping > heartbeat[15490]: 2008/02/27_08:42:40 info: Link node01:eth0 up. > heartbeat[15490]: 2008/02/27_08:42:45 info: Link node02:eth0 up. > heartbeat[15490]: 2008/02/27_08:42:45 info: Status update for node node02: > status up > heartbeat[15501]: 2008/02/27_08:42:45 debug: notify_world: setting SIGCHLD > Handler to SIG_DFL > heartbeat[15490]: 2008/02/27_08:42:45 info: Link node02:eth1 up. > harc[15501][15505]: 2008/02/27_08:42:45 info: Running /etc/ha.d/rc.d/status > status > heartbeat[15490]: 2008/02/27_08:42:45 debug: get_delnodelist: delnodelist= > heartbeat[15490]: 2008/02/27_08:42:45 info: Comm_now_up(): updating status to > active > heartbeat[15490]: 2008/02/27_08:42:45 info: Local status now set to: 'active' > heartbeat[15490]: 2008/02/27_08:42:45 info: Starting child client > "/usr/lib64/heartbeat/ipfail" (90,90) > heartbeat[15490]: 2008/02/27_08:42:45 WARN: G_CH_dispatch_int: Dispatch > function for read child took too long to execute: 90 ms (> 50 ms) (GSource: > 0x653408) > heartbeat[15508]: 2008/02/27_08:42:45 info: Starting > "/usr/lib64/heartbeat/ipfail" as uid 90 gid 90 (pid 15508) > heartbeat[15490]: 2008/02/27_08:42:45 info: Status update for node node02: > status active > heartbeat[15509]: 2008/02/27_08:42:45 debug: notify_world: setting SIGCHLD > Handler to SIG_DFL > harc[15509][15512]: 2008/02/27_08:42:45 info: Running /etc/ha.d/rc.d/status > status > heartbeat[15490]: 2008/02/27_08:42:56 info: remote resource transition > completed. > heartbeat[15490]: 2008/02/27_08:42:56 info: remote resource transition > completed. > heartbeat[15490]: 2008/02/27_08:42:56 info: Initial resource acquisition > complete (T_RESOURCES(us)) > heartbeat[15515]: 2008/02/27_08:42:56 info: No local resources > [/usr/lib64/heartbeat/ResourceManager listkeys node01] to acquire. > heartbeat[15490]: 2008/02/27_08:44:45 WARN: node 10.0.0.133: is dead > heartbeat[15490]: 2008/02/27_08:44:45 info: Link 10.0.0.133:10.0.0.133 dead. > heartbeat[15592]: 2008/02/27_08:44:45 debug: notify_world: setting SIGCHLD > Handler to SIG_DFL > harc[15592][15595]: 2008/02/27_08:44:45 info: Running /etc/ha.d/rc.d/status > status > heartbeat[15490]: 2008/02/27_08:44:46 info: Link node02:eth0 dead. > heartbeat[15490]: 2008/02/27_08:44:56 info: node01 wants to go standby [all] > heartbeat[15490]: 2008/02/27_08:44:57 info: standby: node02 can take our all > resources > heartbeat[15604]: 2008/02/27_08:44:57 info: give up all HA resources > (standby). > ResourceManager[15614][15622]: 2008/02/27_08:44:57 info: Releasing resource > group: node02 10.0.0.90 nfslock nfs_wrapper > ResourceManager[15614][15632]: 2008/02/27_08:44:57 info: Running > /etc/init.d/nfs_wrapper stop > ResourceManager[15614][15633]: 2008/02/27_08:44:57 debug: Starting > /etc/init.d/nfs_wrapper stop > heartbeat[15490]: 2008/02/27_08:46:22 WARN: node node02: is dead > heartbeat[15490]: 2008/02/27_08:46:22 info: Cancelling pending standby > operation > heartbeat[15490]: 2008/02/27_08:46:22 WARN: No STONITH device configured. > heartbeat[15490]: 2008/02/27_08:46:22 WARN: Shared disks are not protected. > heartbeat[15490]: 2008/02/27_08:46:22 info: Resources being acquired from > node02. > heartbeat[15490]: 2008/02/27_08:46:22 debug: StartNextRemoteRscReq(): child > count 1 > heartbeat[15490]: 2008/02/27_08:46:22 info: Link node02:eth1 dead. > heartbeat[15710]: 2008/02/27_08:46:23 info: No local resources > [/usr/lib64/heartbeat/ResourceManager listkeys node01] to acquire. > heartbeat[15490]: 2008/02/27_08:46:23 debug: StartNextRemoteRscReq(): child > count 1 > heartbeat[15490]: 2008/02/27_08:46:59 info: Link 10.0.0.133:10.0.0.133 up. > heartbeat[15490]: 2008/02/27_08:46:59 WARN: Late heartbeat: Node 10.0.0.133: > interval 164130 ms > heartbeat[15490]: 2008/02/27_08:46:59 info: Status update for node > 10.0.0.133: status ping > ResourceManager[15614][15749]: 2008/02/27_08:47:01 debug: > /etc/init.d/nfs_wrapper stop done. RC=0 > ResourceManager[15614][15765]: 2008/02/27_08:47:01 info: Running > /etc/init.d/nfslock stop > ResourceManager[15614][15766]: 2008/02/27_08:47:02 debug: Starting > /etc/init.d/nfslock stop > ResourceManager[15614][15804]: 2008/02/27_08:47:03 debug: /etc/init.d/nfslock > stop done. RC=0 > ResourceManager[15614][15820]: 2008/02/27_08:47:04 info: Running > /etc/ha.d/resource.d/IPaddr 10.0.0.90 stop > ResourceManager[15614][15821]: 2008/02/27_08:47:04 debug: Starting > /etc/ha.d/resource.d/IPaddr 10.0.0.90 stop > IPaddr[15822][15839]: 2008/02/27_08:47:04 INFO: Success > ResourceManager[15614][15840]: 2008/02/27_08:47:04 debug: > /etc/ha.d/resource.d/IPaddr 10.0.0.90 stop done. RC=0 > heartbeat[15604]: 2008/02/27_08:47:04 info: all HA resource release completed > (standby). > heartbeat[15490]: 2008/02/27_08:47:04 ERROR: Ignored standby message 'done' > from node01 in state 0 > heartbeat[15490]: 2008/02/27_08:47:04 WARN: G_SIG_dispatch: Dispatch function > for SIGCHLD took too long to execute: 40 ms (> 10 ms) (GSource: 0x64ab68) > heartbeat[15841]: 2008/02/27_08:47:04 debug: notify_world: setting SIGCHLD > Handler to SIG_DFL > harc[15841][15844]: 2008/02/27_08:47:04 info: Running /etc/ha.d/rc.d/status > status > mach_down[15847][15862]: 2008/02/27_08:47:05 info: Taking over resource group > 10.0.0.90 > ResourceManager[15863][15871]: 2008/02/27_08:47:05 info: Acquiring resource > group: node02 10.0.0.90 nfslock nfs_wrapper > IPaddr[15883][15900]: 2008/02/27_08:47:06 INFO: Resource is stopped > ResourceManager[15863][15916]: 2008/02/27_08:47:06 info: Running > /etc/ha.d/resource.d/IPaddr 10.0.0.90 start > ResourceManager[15863][15917]: 2008/02/27_08:47:06 debug: Starting > /etc/ha.d/resource.d/IPaddr 10.0.0.90 start > IPaddr[15927][15936]: 2008/02/27_08:47:07 INFO: Using calculated nic for > 10.0.0.90: eth0 > IPaddr[15927][15941]: 2008/02/27_08:47:07 DEBUG: Using calculated netmask for > 10.0.0.90: 255.255.255.192 > IPaddr[15927][15946]: 2008/02/27_08:47:07 DEBUG: Using calculated broadcast > for 10.0.0.90: 10.0.0.127 > IPaddr[15927][15963]: 2008/02/27_08:47:07 INFO: eval /sbin/ifconfig eth0:0 > 10.0.0.90 netmask 255.255.255.192 broadcast 10.0.0.127 > IPaddr[15927][15968]: 2008/02/27_08:47:08 DEBUG: Sending Gratuitous Arp for > 10.0.0.90 on eth0:0 [eth0] > IPaddr[15918][15982]: 2008/02/27_08:47:08 INFO: Success > ResourceManager[15863][15983]: 2008/02/27_08:47:08 debug: > /etc/ha.d/resource.d/IPaddr 10.0.0.90 start done. RC=0 > ResourceManager[15863][16014]: 2008/02/27_08:47:09 info: Running > /etc/init.d/nfslock start > ResourceManager[15863][16015]: 2008/02/27_08:47:09 debug: Starting > /etc/init.d/nfslock start > ResourceManager[15863][16035]: 2008/02/27_08:47:10 debug: /etc/init.d/nfslock > start done. RC=0 > ResourceManager[15863][16070]: 2008/02/27_08:47:10 info: Running > /etc/init.d/nfs_wrapper start > ResourceManager[15863][16071]: 2008/02/27_08:47:10 debug: Starting > /etc/init.d/nfs_wrapper start > ResourceManager[15863][16118]: 2008/02/27_08:47:11 debug: > /etc/init.d/nfs_wrapper start done. RC=0 > mach_down[15847][16119]: 2008/02/27_08:47:11 info: > /usr/lib64/heartbeat/mach_down: nice_failback: foreign resources acquired > mach_down[15847][16123]: 2008/02/27_08:47:11 info: mach_down takeover > complete for node node02. > heartbeat[15490]: 2008/02/27_08:47:11 info: mach_down takeover complete. > logd[23277]: 2008/02/27_08:42:43 info: logd started with /etc/logd.cf. > logd[23280]: 2008/02/27_08:42:44 info: G_main_add_SignalHandler: Added signal > handler for signal 15 > logd[23277]: 2008/02/27_08:42:44 info: G_main_add_SignalHandler: Added signal > handler for signal 15 > heartbeat[23340]: 2008/02/27_08:42:44 info: Enabling logging daemon > heartbeat[23340]: 2008/02/27_08:42:44 info: logfile and debug file are those > specified in logd config file (default /etc/logd.cf) > heartbeat[23340]: 2008/02/27_08:42:44 info: ************************** > heartbeat[23340]: 2008/02/27_08:42:44 info: Configuration validated. Starting > heartbeat 2.0.8 > heartbeat[23341]: 2008/02/27_08:42:44 info: heartbeat: version 2.0.8 > heartbeat[23341]: 2008/02/27_08:42:44 info: Heartbeat generation: 35 > heartbeat[23341]: 2008/02/27_08:42:44 info: G_main_add_TriggerHandler: Added > signal manual handler > heartbeat[23341]: 2008/02/27_08:42:44 info: G_main_add_TriggerHandler: Added > signal manual handler > heartbeat[23341]: 2008/02/27_08:42:44 info: Removing > /var/run/heartbeat/rsctmp failed, recreating. > heartbeat[23341]: 2008/02/27_08:42:44 info: glib: UDP Broadcast heartbeat > started on port 694 (694) interface eth0 > heartbeat[23341]: 2008/02/27_08:42:44 info: glib: UDP Broadcast heartbeat > closed on port 694 interface eth0 - Status: 1 > heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ucast: write socket > priority set to IPTOS_LOWDELAY on eth1 > heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ucast: bound send socket to > device: eth1 > heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ucast: bound receive socket > to device: eth1 > heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ucast: started on port 694 > interface eth1 to 192.168.100.1 > heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ping heartbeat started. > heartbeat[23341]: 2008/02/27_08:42:45 info: G_main_add_SignalHandler: Added > signal handler for signal 17 > heartbeat[23341]: 2008/02/27_08:42:45 info: Local status now set to: 'up' > heartbeat[23341]: 2008/02/27_08:42:46 info: Link node01:eth0 up. > heartbeat[23341]: 2008/02/27_08:42:46 info: Status update for node node01: > status up > heartbeat[23341]: 2008/02/27_08:42:46 info: Link node02:eth0 up. > heartbeat[23341]: 2008/02/27_08:42:46 info: Link 10.0.0.133:10.0.0.133 up. > heartbeat[23341]: 2008/02/27_08:42:46 info: Status update for node > 10.0.0.133: status ping > heartbeat[23341]: 2008/02/27_08:42:46 debug: get_delnodelist: delnodelist= > heartbeat[23352]: 2008/02/27_08:42:46 debug: notify_world: setting SIGCHLD > Handler to SIG_DFL > heartbeat[23341]: 2008/02/27_08:42:46 info: Comm_now_up(): updating status to > active > heartbeat[23341]: 2008/02/27_08:42:46 info: Local status now set to: 'active' > heartbeat[23341]: 2008/02/27_08:42:46 info: Starting child client > "/usr/lib64/heartbeat/ipfail" (90,90) > heartbeat[23341]: 2008/02/27_08:42:47 WARN: G_CH_dispatch_int: Dispatch > function for read child took too long to execute: 90 ms (> 50 ms) (GSource: > 0x653408) > heartbeat[23341]: 2008/02/27_08:42:47 info: Status update for node node01: > status active > heartbeat[23341]: 2008/02/27_08:42:47 debug: StartNextRemoteRscReq(): child > count 1 > heartbeat[23341]: 2008/02/27_08:42:47 info: Link node01:eth1 up. > heartbeat[23356]: 2008/02/27_08:42:47 info: Starting > "/usr/lib64/heartbeat/ipfail" as uid 90 gid 90 (pid 23356) > harc[23352][23357]: 2008/02/27_08:42:47 info: Running /etc/ha.d/rc.d/status > status > heartbeat[23360]: 2008/02/27_08:42:47 debug: notify_world: setting SIGCHLD > Handler to SIG_DFL > harc[23360][23363]: 2008/02/27_08:42:47 info: Running /etc/ha.d/rc.d/status > status > heartbeat[23341]: 2008/02/27_08:42:57 info: local resource transition > completed. > heartbeat[23341]: 2008/02/27_08:42:57 info: Initial resource acquisition > complete (T_RESOURCES(us)) > heartbeat[23341]: 2008/02/27_08:42:57 info: remote resource transition > completed. > IPaddr[23393][23410]: 2008/02/27_08:42:57 INFO: Resource is stopped > heartbeat[23366]: 2008/02/27_08:42:57 info: Local Resource acquisition > completed. > heartbeat[23341]: 2008/02/27_08:42:57 debug: StartNextRemoteRscReq(): child > count 1 > heartbeat[23414]: 2008/02/27_08:42:57 debug: notify_world: setting SIGCHLD > Handler to SIG_DFL > harc[23414][23417]: 2008/02/27_08:42:57 info: Running > /etc/ha.d/rc.d/ip-request-resp ip-request-resp > ip-request-resp[23414][23420]: 2008/02/27_08:42:57 received ip-request-resp > 10.0.0.90 OK yes > ResourceManager[23421][23429]: 2008/02/27_08:42:57 info: Acquiring resource > group: node02 10.0.0.90 nfslock nfs_wrapper > IPaddr[23441][23458]: 2008/02/27_08:42:58 INFO: Resource is stopped > ResourceManager[23421][23474]: 2008/02/27_08:42:58 info: Running > /etc/ha.d/resource.d/IPaddr 10.0.0.90 start > ResourceManager[23421][23475]: 2008/02/27_08:42:58 debug: Starting > /etc/ha.d/resource.d/IPaddr 10.0.0.90 start > IPaddr[23485][23494]: 2008/02/27_08:42:58 INFO: Using calculated nic for > 10.0.0.90: eth0 > IPaddr[23485][23499]: 2008/02/27_08:42:58 DEBUG: Using calculated netmask for > 10.0.0.90: 255.255.255.192 > IPaddr[23485][23504]: 2008/02/27_08:42:58 DEBUG: Using calculated broadcast > for 10.0.0.90: 10.0.0.127 > IPaddr[23485][23521]: 2008/02/27_08:42:58 INFO: eval /sbin/ifconfig eth0:0 > 10.0.0.90 netmask 255.255.255.192 broadcast 10.0.0.127 > IPaddr[23485][23526]: 2008/02/27_08:42:58 DEBUG: Sending Gratuitous Arp for > 10.0.0.90 on eth0:0 [eth0] > IPaddr[23476][23540]: 2008/02/27_08:42:59 INFO: Success > ResourceManager[23421][23541]: 2008/02/27_08:42:59 debug: > /etc/ha.d/resource.d/IPaddr 10.0.0.90 start done. RC=0 > ResourceManager[23421][23572]: 2008/02/27_08:42:59 info: Running > /etc/init.d/nfslock start > ResourceManager[23421][23573]: 2008/02/27_08:42:59 debug: Starting > /etc/init.d/nfslock start > ResourceManager[23421][23593]: 2008/02/27_08:42:59 debug: /etc/init.d/nfslock > start done. RC=0 > ResourceManager[23421][23628]: 2008/02/27_08:43:00 info: Running > /etc/init.d/nfs_wrapper start > ResourceManager[23421][23629]: 2008/02/27_08:43:00 debug: Starting > /etc/init.d/nfs_wrapper start > ResourceManager[23421][23676]: 2008/02/27_08:43:01 debug: > /etc/init.d/nfs_wrapper start done. RC=0 > heartbeat[23341]: 2008/02/27_08:44:46 info: Link node01:eth0 dead. > heartbeat[23341]: 2008/02/27_08:44:58 info: node01 wants to go standby [all] > logd[4645]: 2008/02/27_08:48:48 info: logd started with /etc/logd.cf. > logd[4650]: 2008/02/27_08:48:48 info: G_main_add_SignalHandler: Added signal > handler for signal 15 > logd[4645]: 2008/02/27_08:48:48 info: G_main_add_SignalHandler: Added signal > handler for signal 15 > ... > logd[15280]: 2008/02/27_08:40:35 info: logd started with /etc/logd.cf. > logd[15281]: 2008/02/27_08:40:35 info: G_main_add_SignalHandler: Added signal > handler for signal 15 > logd[15280]: 2008/02/27_08:40:35 info: G_main_add_SignalHandler: Added signal > handler for signal 15 > heartbeat[15343]: 2008/02/27_08:40:35 info: Enabling logging daemon > heartbeat[15343]: 2008/02/27_08:40:35 info: logfile and debug file are those > specified in logd config file (default /etc/logd.cf) > heartbeat[15343]: 2008/02/27_08:40:35 ERROR: Current node [node01] not in > configuration! > heartbeat[15343]: 2008/02/27_08:40:35 info: By default, cluster nodes are > named by `uname -n` and must be declared with a 'node' directive in the ha.cf > file. > heartbeat[15343]: 2008/02/27_08:40:35 info: See also: > http://linux-ha.org/ha.cf/NodeDirective > heartbeat[15343]: 2008/02/27_08:40:35 ERROR: Configuration error, heartbeat > not started. > heartbeat[15489]: 2008/02/27_08:42:39 info: Enabling logging daemon > heartbeat[15489]: 2008/02/27_08:42:39 info: logfile and debug file are those > specified in logd config file (default /etc/logd.cf) > heartbeat[15489]: 2008/02/27_08:42:39 info: ************************** > heartbeat[15489]: 2008/02/27_08:42:39 info: Configuration validated. Starting > heartbeat 2.0.8 > heartbeat[15490]: 2008/02/27_08:42:39 info: heartbeat: version 2.0.8 > heartbeat[15490]: 2008/02/27_08:42:39 info: Heartbeat generation: 37 > heartbeat[15490]: 2008/02/27_08:42:39 info: G_main_add_TriggerHandler: Added > signal manual handler > heartbeat[15490]: 2008/02/27_08:42:39 info: G_main_add_TriggerHandler: Added > signal manual handler > heartbeat[15490]: 2008/02/27_08:42:39 info: Removing > /var/run/heartbeat/rsctmp failed, recreating. > heartbeat[15490]: 2008/02/27_08:42:39 info: glib: UDP Broadcast heartbeat > started on port 694 (694) interface eth0 > heartbeat[15490]: 2008/02/27_08:42:39 info: glib: UDP Broadcast heartbeat > closed on port 694 interface eth0 - Status: 1 > heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ucast: write socket > priority set to IPTOS_LOWDELAY on eth1 > heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ucast: bound send socket to > device: eth1 > heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ucast: bound receive socket > to device: eth1 > heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ucast: started on port 694 > interface eth1 to 192.168.100.2 > heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ping heartbeat started. > heartbeat[15490]: 2008/02/27_08:42:39 info: G_main_add_SignalHandler: Added > signal handler for signal 17 > heartbeat[15490]: 2008/02/27_08:42:39 info: Local status now set to: 'up' > heartbeat[15490]: 2008/02/27_08:42:40 info: Link 10.0.0.133:10.0.0.133 up. > heartbeat[15490]: 2008/02/27_08:42:40 info: Status update for node > 10.0.0.133: status ping > heartbeat[15490]: 2008/02/27_08:42:40 info: Link node01:eth0 up. > heartbeat[15490]: 2008/02/27_08:42:45 info: Link node02:eth0 up. > heartbeat[15490]: 2008/02/27_08:42:45 info: Status update for node node02: > status up > heartbeat[15490]: 2008/02/27_08:42:45 info: Link node02:eth1 up. > harc[15501][15505]: 2008/02/27_08:42:45 info: Running /etc/ha.d/rc.d/status > status > heartbeat[15490]: 2008/02/27_08:42:45 info: Comm_now_up(): updating status to > active > heartbeat[15490]: 2008/02/27_08:42:45 info: Local status now set to: 'active' > heartbeat[15490]: 2008/02/27_08:42:45 info: Starting child client > "/usr/lib64/heartbeat/ipfail" (90,90) > heartbeat[15490]: 2008/02/27_08:42:45 WARN: G_CH_dispatch_int: Dispatch > function for read child took too long to execute: 90 ms (> 50 ms) (GSource: > 0x653408) > heartbeat[15508]: 2008/02/27_08:42:45 info: Starting > "/usr/lib64/heartbeat/ipfail" as uid 90 gid 90 (pid 15508) > heartbeat[15490]: 2008/02/27_08:42:45 info: Status update for node node02: > status active > harc[15509][15512]: 2008/02/27_08:42:45 info: Running /etc/ha.d/rc.d/status > status > heartbeat[15490]: 2008/02/27_08:42:56 info: remote resource transition > completed. > heartbeat[15490]: 2008/02/27_08:42:56 info: remote resource transition > completed. > heartbeat[15490]: 2008/02/27_08:42:56 info: Initial resource acquisition > complete (T_RESOURCES(us)) > heartbeat[15515]: 2008/02/27_08:42:56 info: No local resources > [/usr/lib64/heartbeat/ResourceManager listkeys node01] to acquire. > heartbeat[15490]: 2008/02/27_08:44:45 WARN: node 10.0.0.133: is dead > heartbeat[15490]: 2008/02/27_08:44:45 info: Link 10.0.0.133:10.0.0.133 dead. > harc[15592][15595]: 2008/02/27_08:44:45 info: Running /etc/ha.d/rc.d/status > status > heartbeat[15490]: 2008/02/27_08:44:46 info: Link node02:eth0 dead. > heartbeat[15490]: 2008/02/27_08:44:56 info: node01 wants to go standby [all] > heartbeat[15490]: 2008/02/27_08:44:57 info: standby: node02 can take our all > resources > heartbeat[15604]: 2008/02/27_08:44:57 info: give up all HA resources > (standby). > ResourceManager[15614][15622]: 2008/02/27_08:44:57 info: Releasing resource > group: node02 10.0.0.90 nfslock nfs_wrapper > ResourceManager[15614][15632]: 2008/02/27_08:44:57 info: Running > /etc/init.d/nfs_wrapper stop > heartbeat[15490]: 2008/02/27_08:46:22 WARN: node node02: is dead > heartbeat[15490]: 2008/02/27_08:46:22 info: Cancelling pending standby > operation > heartbeat[15490]: 2008/02/27_08:46:22 WARN: No STONITH device configured. > heartbeat[15490]: 2008/02/27_08:46:22 WARN: Shared disks are not protected. > heartbeat[15490]: 2008/02/27_08:46:22 info: Resources being acquired from > node02. > heartbeat[15490]: 2008/02/27_08:46:22 info: Link node02:eth1 dead. > heartbeat[15710]: 2008/02/27_08:46:23 info: No local resources > [/usr/lib64/heartbeat/ResourceManager listkeys node01] to acquire. > heartbeat[15490]: 2008/02/27_08:46:59 info: Link 10.0.0.133:10.0.0.133 up. > heartbeat[15490]: 2008/02/27_08:46:59 WARN: Late heartbeat: Node 10.0.0.133: > interval 164130 ms > heartbeat[15490]: 2008/02/27_08:46:59 info: Status update for node > 10.0.0.133: status ping > ResourceManager[15614][15765]: 2008/02/27_08:47:01 info: Running > /etc/init.d/nfslock stop > ResourceManager[15614][15820]: 2008/02/27_08:47:04 info: Running > /etc/ha.d/resource.d/IPaddr 10.0.0.90 stop > IPaddr[15822][15839]: 2008/02/27_08:47:04 INFO: Success > heartbeat[15604]: 2008/02/27_08:47:04 info: all HA resource release completed > (standby). > heartbeat[15490]: 2008/02/27_08:47:04 ERROR: Ignored standby message 'done' > from node01 in state 0 > heartbeat[15490]: 2008/02/27_08:47:04 WARN: G_SIG_dispatch: Dispatch function > for SIGCHLD took too long to execute: 40 ms (> 10 ms) (GSource: 0x64ab68) > harc[15841][15844]: 2008/02/27_08:47:04 info: Running /etc/ha.d/rc.d/status > status > mach_down[15847][15862]: 2008/02/27_08:47:05 info: Taking over resource group > 10.0.0.90 > ResourceManager[15863][15871]: 2008/02/27_08:47:05 info: Acquiring resource > group: node02 10.0.0.90 nfslock nfs_wrapper > IPaddr[15883][15900]: 2008/02/27_08:47:06 INFO: Resource is stopped > ResourceManager[15863][15916]: 2008/02/27_08:47:06 info: Running > /etc/ha.d/resource.d/IPaddr 10.0.0.90 start > IPaddr[15927][15936]: 2008/02/27_08:47:07 INFO: Using calculated nic for > 10.0.0.90: eth0 > IPaddr[15927][15941]: 2008/02/27_08:47:07 DEBUG: Using calculated netmask for > 10.0.0.90: 255.255.255.192 > IPaddr[15927][15946]: 2008/02/27_08:47:07 DEBUG: Using calculated broadcast > for 10.0.0.90: 10.0.0.127 > IPaddr[15927][15963]: 2008/02/27_08:47:07 INFO: eval /sbin/ifconfig eth0:0 > 10.0.0.90 netmask 255.255.255.192 broadcast 10.0.0.127 > IPaddr[15927][15968]: 2008/02/27_08:47:08 DEBUG: Sending Gratuitous Arp for > 10.0.0.90 on eth0:0 [eth0] > IPaddr[15918][15982]: 2008/02/27_08:47:08 INFO: Success > ResourceManager[15863][16014]: 2008/02/27_08:47:09 info: Running > /etc/init.d/nfslock start > ResourceManager[15863][16070]: 2008/02/27_08:47:10 info: Running > /etc/init.d/nfs_wrapper start > mach_down[15847][16119]: 2008/02/27_08:47:11 info: > /usr/lib64/heartbeat/mach_down: nice_failback: foreign resources acquired > mach_down[15847][16123]: 2008/02/27_08:47:11 info: mach_down takeover > complete for node node02. > heartbeat[15490]: 2008/02/27_08:47:11 info: mach_down takeover complete. > logd[23277]: 2008/02/27_08:42:43 info: logd started with /etc/logd.cf. > logd[23280]: 2008/02/27_08:42:44 info: G_main_add_SignalHandler: Added signal > handler for signal 15 > logd[23277]: 2008/02/27_08:42:44 info: G_main_add_SignalHandler: Added signal > handler for signal 15 > heartbeat[23340]: 2008/02/27_08:42:44 info: Enabling logging daemon > heartbeat[23340]: 2008/02/27_08:42:44 info: logfile and debug file are those > specified in logd config file (default /etc/logd.cf) > heartbeat[23340]: 2008/02/27_08:42:44 info: ************************** > heartbeat[23340]: 2008/02/27_08:42:44 info: Configuration validated. Starting > heartbeat 2.0.8 > heartbeat[23341]: 2008/02/27_08:42:44 info: heartbeat: version 2.0.8 > heartbeat[23341]: 2008/02/27_08:42:44 info: Heartbeat generation: 35 > heartbeat[23341]: 2008/02/27_08:42:44 info: G_main_add_TriggerHandler: Added > signal manual handler > heartbeat[23341]: 2008/02/27_08:42:44 info: G_main_add_TriggerHandler: Added > signal manual handler > heartbeat[23341]: 2008/02/27_08:42:44 info: Removing > /var/run/heartbeat/rsctmp failed, recreating. > heartbeat[23341]: 2008/02/27_08:42:44 info: glib: UDP Broadcast heartbeat > started on port 694 (694) interface eth0 > heartbeat[23341]: 2008/02/27_08:42:44 info: glib: UDP Broadcast heartbeat > closed on port 694 interface eth0 - Status: 1 > heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ucast: write socket > priority set to IPTOS_LOWDELAY on eth1 > heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ucast: bound send socket to > device: eth1 > heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ucast: bound receive socket > to device: eth1 > heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ucast: started on port 694 > interface eth1 to 192.168.100.1 > heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ping heartbeat started. > heartbeat[23341]: 2008/02/27_08:42:45 info: G_main_add_SignalHandler: Added > signal handler for signal 17 > heartbeat[23341]: 2008/02/27_08:42:45 info: Local status now set to: 'up' > heartbeat[23341]: 2008/02/27_08:42:46 info: Link node01:eth0 up. > heartbeat[23341]: 2008/02/27_08:42:46 info: Status update for node node01: > status up > heartbeat[23341]: 2008/02/27_08:42:46 info: Link node02:eth0 up. > heartbeat[23341]: 2008/02/27_08:42:46 info: Link 10.0.0.133:10.0.0.133 up. > heartbeat[23341]: 2008/02/27_08:42:46 info: Status update for node > 10.0.0.133: status ping > heartbeat[23341]: 2008/02/27_08:42:46 info: Comm_now_up(): updating status to > active > heartbeat[23341]: 2008/02/27_08:42:46 info: Local status now set to: 'active' > heartbeat[23341]: 2008/02/27_08:42:46 info: Starting child client > "/usr/lib64/heartbeat/ipfail" (90,90) > heartbeat[23341]: 2008/02/27_08:42:47 WARN: G_CH_dispatch_int: Dispatch > function for read child took too long to execute: 90 ms (> 50 ms) (GSource: > 0x653408) > heartbeat[23341]: 2008/02/27_08:42:47 info: Status update for node node01: > status active > heartbeat[23341]: 2008/02/27_08:42:47 info: Link node01:eth1 up. > heartbeat[23356]: 2008/02/27_08:42:47 info: Starting > "/usr/lib64/heartbeat/ipfail" as uid 90 gid 90 (pid 23356) > harc[23352][23357]: 2008/02/27_08:42:47 info: Running /etc/ha.d/rc.d/status > status > harc[23360][23363]: 2008/02/27_08:42:47 info: Running /etc/ha.d/rc.d/status > status > heartbeat[23341]: 2008/02/27_08:42:57 info: local resource transition > completed. > heartbeat[23341]: 2008/02/27_08:42:57 info: Initial resource acquisition > complete (T_RESOURCES(us)) > heartbeat[23341]: 2008/02/27_08:42:57 info: remote resource transition > completed. > IPaddr[23393][23410]: 2008/02/27_08:42:57 INFO: Resource is stopped > heartbeat[23366]: 2008/02/27_08:42:57 info: Local Resource acquisition > completed. > harc[23414][23417]: 2008/02/27_08:42:57 info: Running > /etc/ha.d/rc.d/ip-request-resp ip-request-resp > ip-request-resp[23414][23420]: 2008/02/27_08:42:57 received ip-request-resp > 10.0.0.90 OK yes > ResourceManager[23421][23429]: 2008/02/27_08:42:57 info: Acquiring resource > group: node02 10.0.0.90 nfslock nfs_wrapper > IPaddr[23441][23458]: 2008/02/27_08:42:58 INFO: Resource is stopped > ResourceManager[23421][23474]: 2008/02/27_08:42:58 info: Running > /etc/ha.d/resource.d/IPaddr 10.0.0.90 start > IPaddr[23485][23494]: 2008/02/27_08:42:58 INFO: Using calculated nic for > 10.0.0.90: eth0 > IPaddr[23485][23499]: 2008/02/27_08:42:58 DEBUG: Using calculated netmask for > 10.0.0.90: 255.255.255.192 > IPaddr[23485][23504]: 2008/02/27_08:42:58 DEBUG: Using calculated broadcast > for 10.0.0.90: 10.0.0.127 > IPaddr[23485][23521]: 2008/02/27_08:42:58 INFO: eval /sbin/ifconfig eth0:0 > 10.0.0.90 netmask 255.255.255.192 broadcast 10.0.0.127 > IPaddr[23485][23526]: 2008/02/27_08:42:58 DEBUG: Sending Gratuitous Arp for > 10.0.0.90 on eth0:0 [eth0] > IPaddr[23476][23540]: 2008/02/27_08:42:59 INFO: Success > ResourceManager[23421][23572]: 2008/02/27_08:42:59 info: Running > /etc/init.d/nfslock start > ResourceManager[23421][23628]: 2008/02/27_08:43:00 info: Running > /etc/init.d/nfs_wrapper start > heartbeat[23341]: 2008/02/27_08:44:46 info: Link node01:eth0 dead. > heartbeat[23341]: 2008/02/27_08:44:58 info: node01 wants to go standby [all] > logd[4645]: 2008/02/27_08:48:48 info: logd started with /etc/logd.cf. > logd[4650]: 2008/02/27_08:48:48 info: G_main_add_SignalHandler: Added signal > handler for signal 15 > logd[4645]: 2008/02/27_08:48:48 info: G_main_add_SignalHandler: Added signal > handler for signal 15 > ... > _______________________________________________ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems