> How do you induce a failover?

Right now just the simplest kind, a network failure. I'm simply disconnecting the LAN interface on one of the hosts.

It doesn't matter which host I disconnect, 02 reboots.

> Anything in the logs? Perhaps post them.

Nothing obvious. Attached are the ha-log and ha-debug files from NODE01 and NODE02. NODE02 is primary and has the resources, I disconnect eth0 on NODE01, NODE02 acquires it's resources (there are none) and then promptly reboots.

:(

Regards

Luke Pascoe
Linux Systems Engineer
Asterisk

T 09 366 8835
F 09 302 1772
M 0274 266649
E [EMAIL PROTECTED]
W www.asterisk.co.nz

Level 9, Gen-i Tower, 66 Wyndham Street
PO Box 8804, Auckland, New Zealand


Dejan Muhamedagic wrote:
Hi,

On Tue, Feb 26, 2008 at 04:35:06PM +1300, Luke Pascoe wrote:
Hello

I'm trying to do NFS failover in a test environment with an underlying OCFS2 filesystem. This is something that's apparently been done before and certainly HA NFS isn't new.

I've followed several HowTos, all of which seem to suggest pretty much the same setup, but I seem to get the same problem no matter how I configure it.

Here's the setup:

2 VMWare VMs (NODE01 and NODE02) running RHEL4 U5 x86_64 with a shared fibre channel SAN volume. That volume is OCFS2 formatted and mounted as /data on both hosts.

Each host has 2 interfaces. An external facing 10.0.0.0/24 and an internal 192.168.100.0/30

Heartbeat is installed on both nodes and configured identically as follows:

=ha.cf=
keepalive 2
deadtime 30
warntime 10
initdead 120
bcast   eth0
ucast eth1 192.168.100.1 # Obviously on NODE02 this is 192.168.100.2
auto_failback on
node NODE02 NODE01
 ping 10.0.0.133 # this is an unrelated LAN host
respawn hacluster /usr/lib64/heartbeat/ipfail
   use_logd yes
 crm off
=/ha.cf=

=haresources=
NODE01 10.199.133.90 nfslock nfs_wrapper
=/haresources=

BTW, crm is off because I tried it with it on and got EXACTLY the same result.

Here's the problem:

Both hosts start up just fine, NODE01 picks up all 3 resources and everything's roses. If I induce a failure on NODE01, NODE02 correctly acquires the resources and everything is still roses. HOWEVER, ~30 seconds later NODE02 reboots. Now the odd thing is, it doesn't matter which is the primary node, or which host has the failure or even which has the resources, NODE02 always reboots when there's a failure.

How do you induce a failover?

Even if the resources are started on NODE02 and NODE01 has a failure (ie, everything should stay as it is, no failover required) 30 seconds after the failure NODE02 reboots!!!

I've got NTP syncing the time, so it's not a clock issue, and I've tried twiddling just about every setting in the config, to no avail.

Any help? Please?

Anything in the logs? Perhaps post them.

Thanks,

Dejan

--
Regards

Luke Pascoe
Linux Systems Engineer
Asterisk

T 09 366 8835
F 09 302 1772
M 0274 266649
E [EMAIL PROTECTED]
W www.asterisk.co.nz

Level 9, Gen-i Tower, 66 Wyndham Street
PO Box 8804, Auckland, New Zealand
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

logd[15280]: 2008/02/27_08:40:35 info: logd started with /etc/logd.cf.
logd[15281]: 2008/02/27_08:40:35 info: G_main_add_SignalHandler: Added signal 
handler for signal 15
logd[15280]: 2008/02/27_08:40:35 info: G_main_add_SignalHandler: Added signal 
handler for signal 15
heartbeat[15343]: 2008/02/27_08:40:35 info: Enabling logging daemon 
heartbeat[15343]: 2008/02/27_08:40:35 info: logfile and debug file are those 
specified in logd config file (default /etc/logd.cf)
heartbeat[15343]: 2008/02/27_08:40:35 ERROR: Current node [node01] not in 
configuration!
heartbeat[15343]: 2008/02/27_08:40:35 info: By default, cluster nodes are named 
by `uname -n` and must be declared with a 'node' directive in the ha.cf file.
heartbeat[15343]: 2008/02/27_08:40:35 info: See also: 
http://linux-ha.org/ha.cf/NodeDirective
heartbeat[15343]: 2008/02/27_08:40:35 ERROR: Configuration error, heartbeat not 
started.
heartbeat[15489]: 2008/02/27_08:42:39 info: Enabling logging daemon 
heartbeat[15489]: 2008/02/27_08:42:39 info: logfile and debug file are those 
specified in logd config file (default /etc/logd.cf)
heartbeat[15489]: 2008/02/27_08:42:39 info: **************************
heartbeat[15489]: 2008/02/27_08:42:39 info: Configuration validated. Starting 
heartbeat 2.0.8
heartbeat[15490]: 2008/02/27_08:42:39 info: heartbeat: version 2.0.8
heartbeat[15490]: 2008/02/27_08:42:39 info: Heartbeat generation: 37
heartbeat[15490]: 2008/02/27_08:42:39 info: G_main_add_TriggerHandler: Added 
signal manual handler
heartbeat[15490]: 2008/02/27_08:42:39 info: G_main_add_TriggerHandler: Added 
signal manual handler
heartbeat[15490]: 2008/02/27_08:42:39 info: Removing /var/run/heartbeat/rsctmp 
failed, recreating.
heartbeat[15490]: 2008/02/27_08:42:39 info: glib: UDP Broadcast heartbeat 
started on port 694 (694) interface eth0
heartbeat[15490]: 2008/02/27_08:42:39 info: glib: UDP Broadcast heartbeat 
closed on port 694 interface eth0 - Status: 1
heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ucast: write socket priority 
set to IPTOS_LOWDELAY on eth1
heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ucast: bound send socket to 
device: eth1
heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ucast: bound receive socket 
to device: eth1
heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ucast: started on port 694 
interface eth1 to 192.168.100.2
heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ping heartbeat started.
heartbeat[15490]: 2008/02/27_08:42:39 info: G_main_add_SignalHandler: Added 
signal handler for signal 17
heartbeat[15490]: 2008/02/27_08:42:39 info: Local status now set to: 'up'
heartbeat[15490]: 2008/02/27_08:42:40 info: Link 10.0.0.133:10.0.0.133 up.
heartbeat[15490]: 2008/02/27_08:42:40 info: Status update for node 10.0.0.133: 
status ping
heartbeat[15490]: 2008/02/27_08:42:40 info: Link node01:eth0 up.
heartbeat[15490]: 2008/02/27_08:42:45 info: Link node02:eth0 up.
heartbeat[15490]: 2008/02/27_08:42:45 info: Status update for node node02: 
status up
heartbeat[15501]: 2008/02/27_08:42:45 debug: notify_world: setting SIGCHLD 
Handler to SIG_DFL
heartbeat[15490]: 2008/02/27_08:42:45 info: Link node02:eth1 up.
harc[15501][15505]: 2008/02/27_08:42:45 info: Running /etc/ha.d/rc.d/status 
status
heartbeat[15490]: 2008/02/27_08:42:45 debug: get_delnodelist: delnodelist= 
heartbeat[15490]: 2008/02/27_08:42:45 info: Comm_now_up(): updating status to 
active
heartbeat[15490]: 2008/02/27_08:42:45 info: Local status now set to: 'active'
heartbeat[15490]: 2008/02/27_08:42:45 info: Starting child client 
"/usr/lib64/heartbeat/ipfail" (90,90)
heartbeat[15490]: 2008/02/27_08:42:45 WARN: G_CH_dispatch_int: Dispatch 
function for read child took too long to execute: 90 ms (> 50 ms) (GSource: 
0x653408)
heartbeat[15508]: 2008/02/27_08:42:45 info: Starting 
"/usr/lib64/heartbeat/ipfail" as uid 90  gid 90 (pid 15508)
heartbeat[15490]: 2008/02/27_08:42:45 info: Status update for node node02: 
status active
heartbeat[15509]: 2008/02/27_08:42:45 debug: notify_world: setting SIGCHLD 
Handler to SIG_DFL
harc[15509][15512]: 2008/02/27_08:42:45 info: Running /etc/ha.d/rc.d/status 
status
heartbeat[15490]: 2008/02/27_08:42:56 info: remote resource transition 
completed.
heartbeat[15490]: 2008/02/27_08:42:56 info: remote resource transition 
completed.
heartbeat[15490]: 2008/02/27_08:42:56 info: Initial resource acquisition 
complete (T_RESOURCES(us))
heartbeat[15515]: 2008/02/27_08:42:56 info: No local resources 
[/usr/lib64/heartbeat/ResourceManager listkeys node01] to acquire.
heartbeat[15490]: 2008/02/27_08:44:45 WARN: node 10.0.0.133: is dead
heartbeat[15490]: 2008/02/27_08:44:45 info: Link 10.0.0.133:10.0.0.133 dead.
heartbeat[15592]: 2008/02/27_08:44:45 debug: notify_world: setting SIGCHLD 
Handler to SIG_DFL
harc[15592][15595]: 2008/02/27_08:44:45 info: Running /etc/ha.d/rc.d/status 
status
heartbeat[15490]: 2008/02/27_08:44:46 info: Link node02:eth0 dead.
heartbeat[15490]: 2008/02/27_08:44:56 info: node01 wants to go standby [all]
heartbeat[15490]: 2008/02/27_08:44:57 info: standby: node02 can take our all 
resources
heartbeat[15604]: 2008/02/27_08:44:57 info: give up all HA resources (standby).
ResourceManager[15614][15622]: 2008/02/27_08:44:57 info: Releasing resource 
group: node02 10.0.0.90 nfslock nfs_wrapper
ResourceManager[15614][15632]: 2008/02/27_08:44:57 info: Running 
/etc/init.d/nfs_wrapper  stop
ResourceManager[15614][15633]: 2008/02/27_08:44:57 debug: Starting 
/etc/init.d/nfs_wrapper  stop
heartbeat[15490]: 2008/02/27_08:46:22 WARN: node node02: is dead
heartbeat[15490]: 2008/02/27_08:46:22 info: Cancelling pending standby operation
heartbeat[15490]: 2008/02/27_08:46:22 WARN: No STONITH device configured.
heartbeat[15490]: 2008/02/27_08:46:22 WARN: Shared disks are not protected.
heartbeat[15490]: 2008/02/27_08:46:22 info: Resources being acquired from 
node02.
heartbeat[15490]: 2008/02/27_08:46:22 debug: StartNextRemoteRscReq(): child 
count 1
heartbeat[15490]: 2008/02/27_08:46:22 info: Link node02:eth1 dead.
heartbeat[15710]: 2008/02/27_08:46:23 info: No local resources 
[/usr/lib64/heartbeat/ResourceManager listkeys node01] to acquire.
heartbeat[15490]: 2008/02/27_08:46:23 debug: StartNextRemoteRscReq(): child 
count 1
heartbeat[15490]: 2008/02/27_08:46:59 info: Link 10.0.0.133:10.0.0.133 up.
heartbeat[15490]: 2008/02/27_08:46:59 WARN: Late heartbeat: Node 10.0.0.133: 
interval 164130 ms
heartbeat[15490]: 2008/02/27_08:46:59 info: Status update for node 10.0.0.133: 
status ping
ResourceManager[15614][15749]: 2008/02/27_08:47:01 debug: 
/etc/init.d/nfs_wrapper  stop done. RC=0
ResourceManager[15614][15765]: 2008/02/27_08:47:01 info: Running 
/etc/init.d/nfslock  stop
ResourceManager[15614][15766]: 2008/02/27_08:47:02 debug: Starting 
/etc/init.d/nfslock  stop
ResourceManager[15614][15804]: 2008/02/27_08:47:03 debug: /etc/init.d/nfslock  
stop done. RC=0
ResourceManager[15614][15820]: 2008/02/27_08:47:04 info: Running 
/etc/ha.d/resource.d/IPaddr 10.0.0.90 stop
ResourceManager[15614][15821]: 2008/02/27_08:47:04 debug: Starting 
/etc/ha.d/resource.d/IPaddr 10.0.0.90 stop
IPaddr[15822][15839]: 2008/02/27_08:47:04 INFO:  Success
ResourceManager[15614][15840]: 2008/02/27_08:47:04 debug: 
/etc/ha.d/resource.d/IPaddr 10.0.0.90 stop done. RC=0
heartbeat[15604]: 2008/02/27_08:47:04 info: all HA resource release completed 
(standby).
heartbeat[15490]: 2008/02/27_08:47:04 ERROR: Ignored standby message 'done' 
from node01 in state 0
heartbeat[15490]: 2008/02/27_08:47:04 WARN: G_SIG_dispatch: Dispatch function 
for SIGCHLD took too long to execute: 40 ms (> 10 ms) (GSource: 0x64ab68)
heartbeat[15841]: 2008/02/27_08:47:04 debug: notify_world: setting SIGCHLD 
Handler to SIG_DFL
harc[15841][15844]: 2008/02/27_08:47:04 info: Running /etc/ha.d/rc.d/status 
status
mach_down[15847][15862]: 2008/02/27_08:47:05 info: Taking over resource group 
10.0.0.90
ResourceManager[15863][15871]: 2008/02/27_08:47:05 info: Acquiring resource 
group: node02 10.0.0.90 nfslock nfs_wrapper
IPaddr[15883][15900]: 2008/02/27_08:47:06 INFO:  Resource is stopped
ResourceManager[15863][15916]: 2008/02/27_08:47:06 info: Running 
/etc/ha.d/resource.d/IPaddr 10.0.0.90 start
ResourceManager[15863][15917]: 2008/02/27_08:47:06 debug: Starting 
/etc/ha.d/resource.d/IPaddr 10.0.0.90 start
IPaddr[15927][15936]: 2008/02/27_08:47:07 INFO: Using calculated nic for 
10.0.0.90: eth0
IPaddr[15927][15941]: 2008/02/27_08:47:07 DEBUG: Using calculated netmask for 
10.0.0.90: 255.255.255.192
IPaddr[15927][15946]: 2008/02/27_08:47:07 DEBUG: Using calculated broadcast for 
10.0.0.90: 10.0.0.127
IPaddr[15927][15963]: 2008/02/27_08:47:07 INFO: eval /sbin/ifconfig eth0:0 
10.0.0.90 netmask 255.255.255.192 broadcast 10.0.0.127
IPaddr[15927][15968]: 2008/02/27_08:47:08 DEBUG: Sending Gratuitous Arp for 
10.0.0.90 on eth0:0 [eth0]
IPaddr[15918][15982]: 2008/02/27_08:47:08 INFO:  Success
ResourceManager[15863][15983]: 2008/02/27_08:47:08 debug: 
/etc/ha.d/resource.d/IPaddr 10.0.0.90 start done. RC=0
ResourceManager[15863][16014]: 2008/02/27_08:47:09 info: Running 
/etc/init.d/nfslock  start
ResourceManager[15863][16015]: 2008/02/27_08:47:09 debug: Starting 
/etc/init.d/nfslock  start
ResourceManager[15863][16035]: 2008/02/27_08:47:10 debug: /etc/init.d/nfslock  
start done. RC=0
ResourceManager[15863][16070]: 2008/02/27_08:47:10 info: Running 
/etc/init.d/nfs_wrapper  start
ResourceManager[15863][16071]: 2008/02/27_08:47:10 debug: Starting 
/etc/init.d/nfs_wrapper  start
ResourceManager[15863][16118]: 2008/02/27_08:47:11 debug: 
/etc/init.d/nfs_wrapper  start done. RC=0
mach_down[15847][16119]: 2008/02/27_08:47:11 info: 
/usr/lib64/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[15847][16123]: 2008/02/27_08:47:11 info: mach_down takeover complete 
for node node02.
heartbeat[15490]: 2008/02/27_08:47:11 info: mach_down takeover complete.
logd[23277]: 2008/02/27_08:42:43 info: logd started with /etc/logd.cf.
logd[23280]: 2008/02/27_08:42:44 info: G_main_add_SignalHandler: Added signal 
handler for signal 15
logd[23277]: 2008/02/27_08:42:44 info: G_main_add_SignalHandler: Added signal 
handler for signal 15
heartbeat[23340]: 2008/02/27_08:42:44 info: Enabling logging daemon 
heartbeat[23340]: 2008/02/27_08:42:44 info: logfile and debug file are those 
specified in logd config file (default /etc/logd.cf)
heartbeat[23340]: 2008/02/27_08:42:44 info: **************************
heartbeat[23340]: 2008/02/27_08:42:44 info: Configuration validated. Starting 
heartbeat 2.0.8
heartbeat[23341]: 2008/02/27_08:42:44 info: heartbeat: version 2.0.8
heartbeat[23341]: 2008/02/27_08:42:44 info: Heartbeat generation: 35
heartbeat[23341]: 2008/02/27_08:42:44 info: G_main_add_TriggerHandler: Added 
signal manual handler
heartbeat[23341]: 2008/02/27_08:42:44 info: G_main_add_TriggerHandler: Added 
signal manual handler
heartbeat[23341]: 2008/02/27_08:42:44 info: Removing /var/run/heartbeat/rsctmp 
failed, recreating.
heartbeat[23341]: 2008/02/27_08:42:44 info: glib: UDP Broadcast heartbeat 
started on port 694 (694) interface eth0
heartbeat[23341]: 2008/02/27_08:42:44 info: glib: UDP Broadcast heartbeat 
closed on port 694 interface eth0 - Status: 1
heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ucast: write socket priority 
set to IPTOS_LOWDELAY on eth1
heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ucast: bound send socket to 
device: eth1
heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ucast: bound receive socket 
to device: eth1
heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ucast: started on port 694 
interface eth1 to 192.168.100.1
heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ping heartbeat started.
heartbeat[23341]: 2008/02/27_08:42:45 info: G_main_add_SignalHandler: Added 
signal handler for signal 17
heartbeat[23341]: 2008/02/27_08:42:45 info: Local status now set to: 'up'
heartbeat[23341]: 2008/02/27_08:42:46 info: Link node01:eth0 up.
heartbeat[23341]: 2008/02/27_08:42:46 info: Status update for node node01: 
status up
heartbeat[23341]: 2008/02/27_08:42:46 info: Link node02:eth0 up.
heartbeat[23341]: 2008/02/27_08:42:46 info: Link 10.0.0.133:10.0.0.133 up.
heartbeat[23341]: 2008/02/27_08:42:46 info: Status update for node 10.0.0.133: 
status ping
heartbeat[23341]: 2008/02/27_08:42:46 debug: get_delnodelist: delnodelist= 
heartbeat[23352]: 2008/02/27_08:42:46 debug: notify_world: setting SIGCHLD 
Handler to SIG_DFL
heartbeat[23341]: 2008/02/27_08:42:46 info: Comm_now_up(): updating status to 
active
heartbeat[23341]: 2008/02/27_08:42:46 info: Local status now set to: 'active'
heartbeat[23341]: 2008/02/27_08:42:46 info: Starting child client 
"/usr/lib64/heartbeat/ipfail" (90,90)
heartbeat[23341]: 2008/02/27_08:42:47 WARN: G_CH_dispatch_int: Dispatch 
function for read child took too long to execute: 90 ms (> 50 ms) (GSource: 
0x653408)
heartbeat[23341]: 2008/02/27_08:42:47 info: Status update for node node01: 
status active
heartbeat[23341]: 2008/02/27_08:42:47 debug: StartNextRemoteRscReq(): child 
count 1
heartbeat[23341]: 2008/02/27_08:42:47 info: Link node01:eth1 up.
heartbeat[23356]: 2008/02/27_08:42:47 info: Starting 
"/usr/lib64/heartbeat/ipfail" as uid 90  gid 90 (pid 23356)
harc[23352][23357]: 2008/02/27_08:42:47 info: Running /etc/ha.d/rc.d/status 
status
heartbeat[23360]: 2008/02/27_08:42:47 debug: notify_world: setting SIGCHLD 
Handler to SIG_DFL
harc[23360][23363]: 2008/02/27_08:42:47 info: Running /etc/ha.d/rc.d/status 
status
heartbeat[23341]: 2008/02/27_08:42:57 info: local resource transition completed.
heartbeat[23341]: 2008/02/27_08:42:57 info: Initial resource acquisition 
complete (T_RESOURCES(us))
heartbeat[23341]: 2008/02/27_08:42:57 info: remote resource transition 
completed.
IPaddr[23393][23410]: 2008/02/27_08:42:57 INFO:  Resource is stopped
heartbeat[23366]: 2008/02/27_08:42:57 info: Local Resource acquisition 
completed.
heartbeat[23341]: 2008/02/27_08:42:57 debug: StartNextRemoteRscReq(): child 
count 1
heartbeat[23414]: 2008/02/27_08:42:57 debug: notify_world: setting SIGCHLD 
Handler to SIG_DFL
harc[23414][23417]: 2008/02/27_08:42:57 info: Running 
/etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp[23414][23420]: 2008/02/27_08:42:57 received ip-request-resp 
10.0.0.90 OK yes
ResourceManager[23421][23429]: 2008/02/27_08:42:57 info: Acquiring resource 
group: node02 10.0.0.90 nfslock nfs_wrapper
IPaddr[23441][23458]: 2008/02/27_08:42:58 INFO:  Resource is stopped
ResourceManager[23421][23474]: 2008/02/27_08:42:58 info: Running 
/etc/ha.d/resource.d/IPaddr 10.0.0.90 start
ResourceManager[23421][23475]: 2008/02/27_08:42:58 debug: Starting 
/etc/ha.d/resource.d/IPaddr 10.0.0.90 start
IPaddr[23485][23494]: 2008/02/27_08:42:58 INFO: Using calculated nic for 
10.0.0.90: eth0
IPaddr[23485][23499]: 2008/02/27_08:42:58 DEBUG: Using calculated netmask for 
10.0.0.90: 255.255.255.192
IPaddr[23485][23504]: 2008/02/27_08:42:58 DEBUG: Using calculated broadcast for 
10.0.0.90: 10.0.0.127
IPaddr[23485][23521]: 2008/02/27_08:42:58 INFO: eval /sbin/ifconfig eth0:0 
10.0.0.90 netmask 255.255.255.192 broadcast 10.0.0.127
IPaddr[23485][23526]: 2008/02/27_08:42:58 DEBUG: Sending Gratuitous Arp for 
10.0.0.90 on eth0:0 [eth0]
IPaddr[23476][23540]: 2008/02/27_08:42:59 INFO:  Success
ResourceManager[23421][23541]: 2008/02/27_08:42:59 debug: 
/etc/ha.d/resource.d/IPaddr 10.0.0.90 start done. RC=0
ResourceManager[23421][23572]: 2008/02/27_08:42:59 info: Running 
/etc/init.d/nfslock  start
ResourceManager[23421][23573]: 2008/02/27_08:42:59 debug: Starting 
/etc/init.d/nfslock  start
ResourceManager[23421][23593]: 2008/02/27_08:42:59 debug: /etc/init.d/nfslock  
start done. RC=0
ResourceManager[23421][23628]: 2008/02/27_08:43:00 info: Running 
/etc/init.d/nfs_wrapper  start
ResourceManager[23421][23629]: 2008/02/27_08:43:00 debug: Starting 
/etc/init.d/nfs_wrapper  start
ResourceManager[23421][23676]: 2008/02/27_08:43:01 debug: 
/etc/init.d/nfs_wrapper  start done. RC=0
heartbeat[23341]: 2008/02/27_08:44:46 info: Link node01:eth0 dead.
heartbeat[23341]: 2008/02/27_08:44:58 info: node01 wants to go standby [all]
logd[4645]: 2008/02/27_08:48:48 info: logd started with /etc/logd.cf.
logd[4650]: 2008/02/27_08:48:48 info: G_main_add_SignalHandler: Added signal 
handler for signal 15
logd[4645]: 2008/02/27_08:48:48 info: G_main_add_SignalHandler: Added signal 
handler for signal 15
...
logd[15280]: 2008/02/27_08:40:35 info: logd started with /etc/logd.cf.
logd[15281]: 2008/02/27_08:40:35 info: G_main_add_SignalHandler: Added signal 
handler for signal 15
logd[15280]: 2008/02/27_08:40:35 info: G_main_add_SignalHandler: Added signal 
handler for signal 15
heartbeat[15343]: 2008/02/27_08:40:35 info: Enabling logging daemon 
heartbeat[15343]: 2008/02/27_08:40:35 info: logfile and debug file are those 
specified in logd config file (default /etc/logd.cf)
heartbeat[15343]: 2008/02/27_08:40:35 ERROR: Current node [node01] not in 
configuration!
heartbeat[15343]: 2008/02/27_08:40:35 info: By default, cluster nodes are named 
by `uname -n` and must be declared with a 'node' directive in the ha.cf file.
heartbeat[15343]: 2008/02/27_08:40:35 info: See also: 
http://linux-ha.org/ha.cf/NodeDirective
heartbeat[15343]: 2008/02/27_08:40:35 ERROR: Configuration error, heartbeat not 
started.
heartbeat[15489]: 2008/02/27_08:42:39 info: Enabling logging daemon 
heartbeat[15489]: 2008/02/27_08:42:39 info: logfile and debug file are those 
specified in logd config file (default /etc/logd.cf)
heartbeat[15489]: 2008/02/27_08:42:39 info: **************************
heartbeat[15489]: 2008/02/27_08:42:39 info: Configuration validated. Starting 
heartbeat 2.0.8
heartbeat[15490]: 2008/02/27_08:42:39 info: heartbeat: version 2.0.8
heartbeat[15490]: 2008/02/27_08:42:39 info: Heartbeat generation: 37
heartbeat[15490]: 2008/02/27_08:42:39 info: G_main_add_TriggerHandler: Added 
signal manual handler
heartbeat[15490]: 2008/02/27_08:42:39 info: G_main_add_TriggerHandler: Added 
signal manual handler
heartbeat[15490]: 2008/02/27_08:42:39 info: Removing /var/run/heartbeat/rsctmp 
failed, recreating.
heartbeat[15490]: 2008/02/27_08:42:39 info: glib: UDP Broadcast heartbeat 
started on port 694 (694) interface eth0
heartbeat[15490]: 2008/02/27_08:42:39 info: glib: UDP Broadcast heartbeat 
closed on port 694 interface eth0 - Status: 1
heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ucast: write socket priority 
set to IPTOS_LOWDELAY on eth1
heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ucast: bound send socket to 
device: eth1
heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ucast: bound receive socket 
to device: eth1
heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ucast: started on port 694 
interface eth1 to 192.168.100.2
heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ping heartbeat started.
heartbeat[15490]: 2008/02/27_08:42:39 info: G_main_add_SignalHandler: Added 
signal handler for signal 17
heartbeat[15490]: 2008/02/27_08:42:39 info: Local status now set to: 'up'
heartbeat[15490]: 2008/02/27_08:42:40 info: Link 10.0.0.133:10.0.0.133 up.
heartbeat[15490]: 2008/02/27_08:42:40 info: Status update for node 10.0.0.133: 
status ping
heartbeat[15490]: 2008/02/27_08:42:40 info: Link node01:eth0 up.
heartbeat[15490]: 2008/02/27_08:42:45 info: Link node02:eth0 up.
heartbeat[15490]: 2008/02/27_08:42:45 info: Status update for node node02: 
status up
heartbeat[15490]: 2008/02/27_08:42:45 info: Link node02:eth1 up.
harc[15501][15505]: 2008/02/27_08:42:45 info: Running /etc/ha.d/rc.d/status 
status
heartbeat[15490]: 2008/02/27_08:42:45 info: Comm_now_up(): updating status to 
active
heartbeat[15490]: 2008/02/27_08:42:45 info: Local status now set to: 'active'
heartbeat[15490]: 2008/02/27_08:42:45 info: Starting child client 
"/usr/lib64/heartbeat/ipfail" (90,90)
heartbeat[15490]: 2008/02/27_08:42:45 WARN: G_CH_dispatch_int: Dispatch 
function for read child took too long to execute: 90 ms (> 50 ms) (GSource: 
0x653408)
heartbeat[15508]: 2008/02/27_08:42:45 info: Starting 
"/usr/lib64/heartbeat/ipfail" as uid 90  gid 90 (pid 15508)
heartbeat[15490]: 2008/02/27_08:42:45 info: Status update for node node02: 
status active
harc[15509][15512]: 2008/02/27_08:42:45 info: Running /etc/ha.d/rc.d/status 
status
heartbeat[15490]: 2008/02/27_08:42:56 info: remote resource transition 
completed.
heartbeat[15490]: 2008/02/27_08:42:56 info: remote resource transition 
completed.
heartbeat[15490]: 2008/02/27_08:42:56 info: Initial resource acquisition 
complete (T_RESOURCES(us))
heartbeat[15515]: 2008/02/27_08:42:56 info: No local resources 
[/usr/lib64/heartbeat/ResourceManager listkeys node01] to acquire.
heartbeat[15490]: 2008/02/27_08:44:45 WARN: node 10.0.0.133: is dead
heartbeat[15490]: 2008/02/27_08:44:45 info: Link 10.0.0.133:10.0.0.133 dead.
harc[15592][15595]: 2008/02/27_08:44:45 info: Running /etc/ha.d/rc.d/status 
status
heartbeat[15490]: 2008/02/27_08:44:46 info: Link node02:eth0 dead.
heartbeat[15490]: 2008/02/27_08:44:56 info: node01 wants to go standby [all]
heartbeat[15490]: 2008/02/27_08:44:57 info: standby: node02 can take our all 
resources
heartbeat[15604]: 2008/02/27_08:44:57 info: give up all HA resources (standby).
ResourceManager[15614][15622]: 2008/02/27_08:44:57 info: Releasing resource 
group: node02 10.0.0.90 nfslock nfs_wrapper
ResourceManager[15614][15632]: 2008/02/27_08:44:57 info: Running 
/etc/init.d/nfs_wrapper  stop
heartbeat[15490]: 2008/02/27_08:46:22 WARN: node node02: is dead
heartbeat[15490]: 2008/02/27_08:46:22 info: Cancelling pending standby operation
heartbeat[15490]: 2008/02/27_08:46:22 WARN: No STONITH device configured.
heartbeat[15490]: 2008/02/27_08:46:22 WARN: Shared disks are not protected.
heartbeat[15490]: 2008/02/27_08:46:22 info: Resources being acquired from 
node02.
heartbeat[15490]: 2008/02/27_08:46:22 info: Link node02:eth1 dead.
heartbeat[15710]: 2008/02/27_08:46:23 info: No local resources 
[/usr/lib64/heartbeat/ResourceManager listkeys node01] to acquire.
heartbeat[15490]: 2008/02/27_08:46:59 info: Link 10.0.0.133:10.0.0.133 up.
heartbeat[15490]: 2008/02/27_08:46:59 WARN: Late heartbeat: Node 10.0.0.133: 
interval 164130 ms
heartbeat[15490]: 2008/02/27_08:46:59 info: Status update for node 10.0.0.133: 
status ping
ResourceManager[15614][15765]: 2008/02/27_08:47:01 info: Running 
/etc/init.d/nfslock  stop
ResourceManager[15614][15820]: 2008/02/27_08:47:04 info: Running 
/etc/ha.d/resource.d/IPaddr 10.0.0.90 stop
IPaddr[15822][15839]: 2008/02/27_08:47:04 INFO:  Success
heartbeat[15604]: 2008/02/27_08:47:04 info: all HA resource release completed 
(standby).
heartbeat[15490]: 2008/02/27_08:47:04 ERROR: Ignored standby message 'done' 
from node01 in state 0
heartbeat[15490]: 2008/02/27_08:47:04 WARN: G_SIG_dispatch: Dispatch function 
for SIGCHLD took too long to execute: 40 ms (> 10 ms) (GSource: 0x64ab68)
harc[15841][15844]: 2008/02/27_08:47:04 info: Running /etc/ha.d/rc.d/status 
status
mach_down[15847][15862]: 2008/02/27_08:47:05 info: Taking over resource group 
10.0.0.90
ResourceManager[15863][15871]: 2008/02/27_08:47:05 info: Acquiring resource 
group: node02 10.0.0.90 nfslock nfs_wrapper
IPaddr[15883][15900]: 2008/02/27_08:47:06 INFO:  Resource is stopped
ResourceManager[15863][15916]: 2008/02/27_08:47:06 info: Running 
/etc/ha.d/resource.d/IPaddr 10.0.0.90 start
IPaddr[15927][15936]: 2008/02/27_08:47:07 INFO: Using calculated nic for 
10.0.0.90: eth0
IPaddr[15927][15941]: 2008/02/27_08:47:07 DEBUG: Using calculated netmask for 
10.0.0.90: 255.255.255.192
IPaddr[15927][15946]: 2008/02/27_08:47:07 DEBUG: Using calculated broadcast for 
10.0.0.90: 10.0.0.127
IPaddr[15927][15963]: 2008/02/27_08:47:07 INFO: eval /sbin/ifconfig eth0:0 
10.0.0.90 netmask 255.255.255.192 broadcast 10.0.0.127
IPaddr[15927][15968]: 2008/02/27_08:47:08 DEBUG: Sending Gratuitous Arp for 
10.0.0.90 on eth0:0 [eth0]
IPaddr[15918][15982]: 2008/02/27_08:47:08 INFO:  Success
ResourceManager[15863][16014]: 2008/02/27_08:47:09 info: Running 
/etc/init.d/nfslock  start
ResourceManager[15863][16070]: 2008/02/27_08:47:10 info: Running 
/etc/init.d/nfs_wrapper  start
mach_down[15847][16119]: 2008/02/27_08:47:11 info: 
/usr/lib64/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[15847][16123]: 2008/02/27_08:47:11 info: mach_down takeover complete 
for node node02.
heartbeat[15490]: 2008/02/27_08:47:11 info: mach_down takeover complete.
logd[23277]: 2008/02/27_08:42:43 info: logd started with /etc/logd.cf.
logd[23280]: 2008/02/27_08:42:44 info: G_main_add_SignalHandler: Added signal 
handler for signal 15
logd[23277]: 2008/02/27_08:42:44 info: G_main_add_SignalHandler: Added signal 
handler for signal 15
heartbeat[23340]: 2008/02/27_08:42:44 info: Enabling logging daemon 
heartbeat[23340]: 2008/02/27_08:42:44 info: logfile and debug file are those 
specified in logd config file (default /etc/logd.cf)
heartbeat[23340]: 2008/02/27_08:42:44 info: **************************
heartbeat[23340]: 2008/02/27_08:42:44 info: Configuration validated. Starting 
heartbeat 2.0.8
heartbeat[23341]: 2008/02/27_08:42:44 info: heartbeat: version 2.0.8
heartbeat[23341]: 2008/02/27_08:42:44 info: Heartbeat generation: 35
heartbeat[23341]: 2008/02/27_08:42:44 info: G_main_add_TriggerHandler: Added 
signal manual handler
heartbeat[23341]: 2008/02/27_08:42:44 info: G_main_add_TriggerHandler: Added 
signal manual handler
heartbeat[23341]: 2008/02/27_08:42:44 info: Removing /var/run/heartbeat/rsctmp 
failed, recreating.
heartbeat[23341]: 2008/02/27_08:42:44 info: glib: UDP Broadcast heartbeat 
started on port 694 (694) interface eth0
heartbeat[23341]: 2008/02/27_08:42:44 info: glib: UDP Broadcast heartbeat 
closed on port 694 interface eth0 - Status: 1
heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ucast: write socket priority 
set to IPTOS_LOWDELAY on eth1
heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ucast: bound send socket to 
device: eth1
heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ucast: bound receive socket 
to device: eth1
heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ucast: started on port 694 
interface eth1 to 192.168.100.1
heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ping heartbeat started.
heartbeat[23341]: 2008/02/27_08:42:45 info: G_main_add_SignalHandler: Added 
signal handler for signal 17
heartbeat[23341]: 2008/02/27_08:42:45 info: Local status now set to: 'up'
heartbeat[23341]: 2008/02/27_08:42:46 info: Link node01:eth0 up.
heartbeat[23341]: 2008/02/27_08:42:46 info: Status update for node node01: 
status up
heartbeat[23341]: 2008/02/27_08:42:46 info: Link node02:eth0 up.
heartbeat[23341]: 2008/02/27_08:42:46 info: Link 10.0.0.133:10.0.0.133 up.
heartbeat[23341]: 2008/02/27_08:42:46 info: Status update for node 10.0.0.133: 
status ping
heartbeat[23341]: 2008/02/27_08:42:46 info: Comm_now_up(): updating status to 
active
heartbeat[23341]: 2008/02/27_08:42:46 info: Local status now set to: 'active'
heartbeat[23341]: 2008/02/27_08:42:46 info: Starting child client 
"/usr/lib64/heartbeat/ipfail" (90,90)
heartbeat[23341]: 2008/02/27_08:42:47 WARN: G_CH_dispatch_int: Dispatch 
function for read child took too long to execute: 90 ms (> 50 ms) (GSource: 
0x653408)
heartbeat[23341]: 2008/02/27_08:42:47 info: Status update for node node01: 
status active
heartbeat[23341]: 2008/02/27_08:42:47 info: Link node01:eth1 up.
heartbeat[23356]: 2008/02/27_08:42:47 info: Starting 
"/usr/lib64/heartbeat/ipfail" as uid 90  gid 90 (pid 23356)
harc[23352][23357]: 2008/02/27_08:42:47 info: Running /etc/ha.d/rc.d/status 
status
harc[23360][23363]: 2008/02/27_08:42:47 info: Running /etc/ha.d/rc.d/status 
status
heartbeat[23341]: 2008/02/27_08:42:57 info: local resource transition completed.
heartbeat[23341]: 2008/02/27_08:42:57 info: Initial resource acquisition 
complete (T_RESOURCES(us))
heartbeat[23341]: 2008/02/27_08:42:57 info: remote resource transition 
completed.
IPaddr[23393][23410]: 2008/02/27_08:42:57 INFO:  Resource is stopped
heartbeat[23366]: 2008/02/27_08:42:57 info: Local Resource acquisition 
completed.
harc[23414][23417]: 2008/02/27_08:42:57 info: Running 
/etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp[23414][23420]: 2008/02/27_08:42:57 received ip-request-resp 
10.0.0.90 OK yes
ResourceManager[23421][23429]: 2008/02/27_08:42:57 info: Acquiring resource 
group: node02 10.0.0.90 nfslock nfs_wrapper
IPaddr[23441][23458]: 2008/02/27_08:42:58 INFO:  Resource is stopped
ResourceManager[23421][23474]: 2008/02/27_08:42:58 info: Running 
/etc/ha.d/resource.d/IPaddr 10.0.0.90 start
IPaddr[23485][23494]: 2008/02/27_08:42:58 INFO: Using calculated nic for 
10.0.0.90: eth0
IPaddr[23485][23499]: 2008/02/27_08:42:58 DEBUG: Using calculated netmask for 
10.0.0.90: 255.255.255.192
IPaddr[23485][23504]: 2008/02/27_08:42:58 DEBUG: Using calculated broadcast for 
10.0.0.90: 10.0.0.127
IPaddr[23485][23521]: 2008/02/27_08:42:58 INFO: eval /sbin/ifconfig eth0:0 
10.0.0.90 netmask 255.255.255.192 broadcast 10.0.0.127
IPaddr[23485][23526]: 2008/02/27_08:42:58 DEBUG: Sending Gratuitous Arp for 
10.0.0.90 on eth0:0 [eth0]
IPaddr[23476][23540]: 2008/02/27_08:42:59 INFO:  Success
ResourceManager[23421][23572]: 2008/02/27_08:42:59 info: Running 
/etc/init.d/nfslock  start
ResourceManager[23421][23628]: 2008/02/27_08:43:00 info: Running 
/etc/init.d/nfs_wrapper  start
heartbeat[23341]: 2008/02/27_08:44:46 info: Link node01:eth0 dead.
heartbeat[23341]: 2008/02/27_08:44:58 info: node01 wants to go standby [all]
logd[4645]: 2008/02/27_08:48:48 info: logd started with /etc/logd.cf.
logd[4650]: 2008/02/27_08:48:48 info: G_main_add_SignalHandler: Added signal 
handler for signal 15
logd[4645]: 2008/02/27_08:48:48 info: G_main_add_SignalHandler: Added signal 
handler for signal 15
...
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to