> How do you induce a failover?
Right now just the simplest kind, a network failure. I'm simply
disconnecting the LAN interface on one of the hosts.
It doesn't matter which host I disconnect, 02 reboots.
> Anything in the logs? Perhaps post them.
Nothing obvious. Attached are the ha-log and ha-debug files from NODE01
and NODE02. NODE02 is primary and has the resources, I disconnect eth0
on NODE01, NODE02 acquires it's resources (there are none) and then
promptly reboots.
:(
Regards
Luke Pascoe
Linux Systems Engineer
Asterisk
T 09 366 8835
F 09 302 1772
M 0274 266649
E [EMAIL PROTECTED]
W www.asterisk.co.nz
Level 9, Gen-i Tower, 66 Wyndham Street
PO Box 8804, Auckland, New Zealand
Dejan Muhamedagic wrote:
Hi,
On Tue, Feb 26, 2008 at 04:35:06PM +1300, Luke Pascoe wrote:
Hello
I'm trying to do NFS failover in a test environment with an underlying
OCFS2 filesystem. This is something that's apparently been done before and
certainly HA NFS isn't new.
I've followed several HowTos, all of which seem to suggest pretty much the
same setup, but I seem to get the same problem no matter how I configure
it.
Here's the setup:
2 VMWare VMs (NODE01 and NODE02) running RHEL4 U5 x86_64 with a shared
fibre channel SAN volume. That volume is OCFS2 formatted and mounted as
/data on both hosts.
Each host has 2 interfaces. An external facing 10.0.0.0/24 and an internal
192.168.100.0/30
Heartbeat is installed on both nodes and configured identically as follows:
=ha.cf=
keepalive 2
deadtime 30
warntime 10
initdead 120
bcast eth0
ucast eth1 192.168.100.1 # Obviously on NODE02 this is 192.168.100.2
auto_failback on
node NODE02 NODE01
ping 10.0.0.133 # this is an unrelated LAN host
respawn hacluster /usr/lib64/heartbeat/ipfail
use_logd yes
crm off
=/ha.cf=
=haresources=
NODE01 10.199.133.90 nfslock nfs_wrapper
=/haresources=
BTW, crm is off because I tried it with it on and got EXACTLY the same
result.
Here's the problem:
Both hosts start up just fine, NODE01 picks up all 3 resources and
everything's roses. If I induce a failure on NODE01, NODE02 correctly
acquires the resources and everything is still roses. HOWEVER, ~30 seconds
later NODE02 reboots. Now the odd thing is, it doesn't matter which is the
primary node, or which host has the failure or even which has the
resources, NODE02 always reboots when there's a failure.
How do you induce a failover?
Even if the resources are started on NODE02 and NODE01 has a failure (ie,
everything should stay as it is, no failover required) 30 seconds after the
failure NODE02 reboots!!!
I've got NTP syncing the time, so it's not a clock issue, and I've tried
twiddling just about every setting in the config, to no avail.
Any help? Please?
Anything in the logs? Perhaps post them.
Thanks,
Dejan
--
Regards
Luke Pascoe
Linux Systems Engineer
Asterisk
T 09 366 8835
F 09 302 1772
M 0274 266649
E [EMAIL PROTECTED]
W www.asterisk.co.nz
Level 9, Gen-i Tower, 66 Wyndham Street
PO Box 8804, Auckland, New Zealand
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
logd[15280]: 2008/02/27_08:40:35 info: logd started with /etc/logd.cf.
logd[15281]: 2008/02/27_08:40:35 info: G_main_add_SignalHandler: Added signal
handler for signal 15
logd[15280]: 2008/02/27_08:40:35 info: G_main_add_SignalHandler: Added signal
handler for signal 15
heartbeat[15343]: 2008/02/27_08:40:35 info: Enabling logging daemon
heartbeat[15343]: 2008/02/27_08:40:35 info: logfile and debug file are those
specified in logd config file (default /etc/logd.cf)
heartbeat[15343]: 2008/02/27_08:40:35 ERROR: Current node [node01] not in
configuration!
heartbeat[15343]: 2008/02/27_08:40:35 info: By default, cluster nodes are named
by `uname -n` and must be declared with a 'node' directive in the ha.cf file.
heartbeat[15343]: 2008/02/27_08:40:35 info: See also:
http://linux-ha.org/ha.cf/NodeDirective
heartbeat[15343]: 2008/02/27_08:40:35 ERROR: Configuration error, heartbeat not
started.
heartbeat[15489]: 2008/02/27_08:42:39 info: Enabling logging daemon
heartbeat[15489]: 2008/02/27_08:42:39 info: logfile and debug file are those
specified in logd config file (default /etc/logd.cf)
heartbeat[15489]: 2008/02/27_08:42:39 info: **************************
heartbeat[15489]: 2008/02/27_08:42:39 info: Configuration validated. Starting
heartbeat 2.0.8
heartbeat[15490]: 2008/02/27_08:42:39 info: heartbeat: version 2.0.8
heartbeat[15490]: 2008/02/27_08:42:39 info: Heartbeat generation: 37
heartbeat[15490]: 2008/02/27_08:42:39 info: G_main_add_TriggerHandler: Added
signal manual handler
heartbeat[15490]: 2008/02/27_08:42:39 info: G_main_add_TriggerHandler: Added
signal manual handler
heartbeat[15490]: 2008/02/27_08:42:39 info: Removing /var/run/heartbeat/rsctmp
failed, recreating.
heartbeat[15490]: 2008/02/27_08:42:39 info: glib: UDP Broadcast heartbeat
started on port 694 (694) interface eth0
heartbeat[15490]: 2008/02/27_08:42:39 info: glib: UDP Broadcast heartbeat
closed on port 694 interface eth0 - Status: 1
heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ucast: write socket priority
set to IPTOS_LOWDELAY on eth1
heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ucast: bound send socket to
device: eth1
heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ucast: bound receive socket
to device: eth1
heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ucast: started on port 694
interface eth1 to 192.168.100.2
heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ping heartbeat started.
heartbeat[15490]: 2008/02/27_08:42:39 info: G_main_add_SignalHandler: Added
signal handler for signal 17
heartbeat[15490]: 2008/02/27_08:42:39 info: Local status now set to: 'up'
heartbeat[15490]: 2008/02/27_08:42:40 info: Link 10.0.0.133:10.0.0.133 up.
heartbeat[15490]: 2008/02/27_08:42:40 info: Status update for node 10.0.0.133:
status ping
heartbeat[15490]: 2008/02/27_08:42:40 info: Link node01:eth0 up.
heartbeat[15490]: 2008/02/27_08:42:45 info: Link node02:eth0 up.
heartbeat[15490]: 2008/02/27_08:42:45 info: Status update for node node02:
status up
heartbeat[15501]: 2008/02/27_08:42:45 debug: notify_world: setting SIGCHLD
Handler to SIG_DFL
heartbeat[15490]: 2008/02/27_08:42:45 info: Link node02:eth1 up.
harc[15501][15505]: 2008/02/27_08:42:45 info: Running /etc/ha.d/rc.d/status
status
heartbeat[15490]: 2008/02/27_08:42:45 debug: get_delnodelist: delnodelist=
heartbeat[15490]: 2008/02/27_08:42:45 info: Comm_now_up(): updating status to
active
heartbeat[15490]: 2008/02/27_08:42:45 info: Local status now set to: 'active'
heartbeat[15490]: 2008/02/27_08:42:45 info: Starting child client
"/usr/lib64/heartbeat/ipfail" (90,90)
heartbeat[15490]: 2008/02/27_08:42:45 WARN: G_CH_dispatch_int: Dispatch
function for read child took too long to execute: 90 ms (> 50 ms) (GSource:
0x653408)
heartbeat[15508]: 2008/02/27_08:42:45 info: Starting
"/usr/lib64/heartbeat/ipfail" as uid 90 gid 90 (pid 15508)
heartbeat[15490]: 2008/02/27_08:42:45 info: Status update for node node02:
status active
heartbeat[15509]: 2008/02/27_08:42:45 debug: notify_world: setting SIGCHLD
Handler to SIG_DFL
harc[15509][15512]: 2008/02/27_08:42:45 info: Running /etc/ha.d/rc.d/status
status
heartbeat[15490]: 2008/02/27_08:42:56 info: remote resource transition
completed.
heartbeat[15490]: 2008/02/27_08:42:56 info: remote resource transition
completed.
heartbeat[15490]: 2008/02/27_08:42:56 info: Initial resource acquisition
complete (T_RESOURCES(us))
heartbeat[15515]: 2008/02/27_08:42:56 info: No local resources
[/usr/lib64/heartbeat/ResourceManager listkeys node01] to acquire.
heartbeat[15490]: 2008/02/27_08:44:45 WARN: node 10.0.0.133: is dead
heartbeat[15490]: 2008/02/27_08:44:45 info: Link 10.0.0.133:10.0.0.133 dead.
heartbeat[15592]: 2008/02/27_08:44:45 debug: notify_world: setting SIGCHLD
Handler to SIG_DFL
harc[15592][15595]: 2008/02/27_08:44:45 info: Running /etc/ha.d/rc.d/status
status
heartbeat[15490]: 2008/02/27_08:44:46 info: Link node02:eth0 dead.
heartbeat[15490]: 2008/02/27_08:44:56 info: node01 wants to go standby [all]
heartbeat[15490]: 2008/02/27_08:44:57 info: standby: node02 can take our all
resources
heartbeat[15604]: 2008/02/27_08:44:57 info: give up all HA resources (standby).
ResourceManager[15614][15622]: 2008/02/27_08:44:57 info: Releasing resource
group: node02 10.0.0.90 nfslock nfs_wrapper
ResourceManager[15614][15632]: 2008/02/27_08:44:57 info: Running
/etc/init.d/nfs_wrapper stop
ResourceManager[15614][15633]: 2008/02/27_08:44:57 debug: Starting
/etc/init.d/nfs_wrapper stop
heartbeat[15490]: 2008/02/27_08:46:22 WARN: node node02: is dead
heartbeat[15490]: 2008/02/27_08:46:22 info: Cancelling pending standby operation
heartbeat[15490]: 2008/02/27_08:46:22 WARN: No STONITH device configured.
heartbeat[15490]: 2008/02/27_08:46:22 WARN: Shared disks are not protected.
heartbeat[15490]: 2008/02/27_08:46:22 info: Resources being acquired from
node02.
heartbeat[15490]: 2008/02/27_08:46:22 debug: StartNextRemoteRscReq(): child
count 1
heartbeat[15490]: 2008/02/27_08:46:22 info: Link node02:eth1 dead.
heartbeat[15710]: 2008/02/27_08:46:23 info: No local resources
[/usr/lib64/heartbeat/ResourceManager listkeys node01] to acquire.
heartbeat[15490]: 2008/02/27_08:46:23 debug: StartNextRemoteRscReq(): child
count 1
heartbeat[15490]: 2008/02/27_08:46:59 info: Link 10.0.0.133:10.0.0.133 up.
heartbeat[15490]: 2008/02/27_08:46:59 WARN: Late heartbeat: Node 10.0.0.133:
interval 164130 ms
heartbeat[15490]: 2008/02/27_08:46:59 info: Status update for node 10.0.0.133:
status ping
ResourceManager[15614][15749]: 2008/02/27_08:47:01 debug:
/etc/init.d/nfs_wrapper stop done. RC=0
ResourceManager[15614][15765]: 2008/02/27_08:47:01 info: Running
/etc/init.d/nfslock stop
ResourceManager[15614][15766]: 2008/02/27_08:47:02 debug: Starting
/etc/init.d/nfslock stop
ResourceManager[15614][15804]: 2008/02/27_08:47:03 debug: /etc/init.d/nfslock
stop done. RC=0
ResourceManager[15614][15820]: 2008/02/27_08:47:04 info: Running
/etc/ha.d/resource.d/IPaddr 10.0.0.90 stop
ResourceManager[15614][15821]: 2008/02/27_08:47:04 debug: Starting
/etc/ha.d/resource.d/IPaddr 10.0.0.90 stop
IPaddr[15822][15839]: 2008/02/27_08:47:04 INFO: Success
ResourceManager[15614][15840]: 2008/02/27_08:47:04 debug:
/etc/ha.d/resource.d/IPaddr 10.0.0.90 stop done. RC=0
heartbeat[15604]: 2008/02/27_08:47:04 info: all HA resource release completed
(standby).
heartbeat[15490]: 2008/02/27_08:47:04 ERROR: Ignored standby message 'done'
from node01 in state 0
heartbeat[15490]: 2008/02/27_08:47:04 WARN: G_SIG_dispatch: Dispatch function
for SIGCHLD took too long to execute: 40 ms (> 10 ms) (GSource: 0x64ab68)
heartbeat[15841]: 2008/02/27_08:47:04 debug: notify_world: setting SIGCHLD
Handler to SIG_DFL
harc[15841][15844]: 2008/02/27_08:47:04 info: Running /etc/ha.d/rc.d/status
status
mach_down[15847][15862]: 2008/02/27_08:47:05 info: Taking over resource group
10.0.0.90
ResourceManager[15863][15871]: 2008/02/27_08:47:05 info: Acquiring resource
group: node02 10.0.0.90 nfslock nfs_wrapper
IPaddr[15883][15900]: 2008/02/27_08:47:06 INFO: Resource is stopped
ResourceManager[15863][15916]: 2008/02/27_08:47:06 info: Running
/etc/ha.d/resource.d/IPaddr 10.0.0.90 start
ResourceManager[15863][15917]: 2008/02/27_08:47:06 debug: Starting
/etc/ha.d/resource.d/IPaddr 10.0.0.90 start
IPaddr[15927][15936]: 2008/02/27_08:47:07 INFO: Using calculated nic for
10.0.0.90: eth0
IPaddr[15927][15941]: 2008/02/27_08:47:07 DEBUG: Using calculated netmask for
10.0.0.90: 255.255.255.192
IPaddr[15927][15946]: 2008/02/27_08:47:07 DEBUG: Using calculated broadcast for
10.0.0.90: 10.0.0.127
IPaddr[15927][15963]: 2008/02/27_08:47:07 INFO: eval /sbin/ifconfig eth0:0
10.0.0.90 netmask 255.255.255.192 broadcast 10.0.0.127
IPaddr[15927][15968]: 2008/02/27_08:47:08 DEBUG: Sending Gratuitous Arp for
10.0.0.90 on eth0:0 [eth0]
IPaddr[15918][15982]: 2008/02/27_08:47:08 INFO: Success
ResourceManager[15863][15983]: 2008/02/27_08:47:08 debug:
/etc/ha.d/resource.d/IPaddr 10.0.0.90 start done. RC=0
ResourceManager[15863][16014]: 2008/02/27_08:47:09 info: Running
/etc/init.d/nfslock start
ResourceManager[15863][16015]: 2008/02/27_08:47:09 debug: Starting
/etc/init.d/nfslock start
ResourceManager[15863][16035]: 2008/02/27_08:47:10 debug: /etc/init.d/nfslock
start done. RC=0
ResourceManager[15863][16070]: 2008/02/27_08:47:10 info: Running
/etc/init.d/nfs_wrapper start
ResourceManager[15863][16071]: 2008/02/27_08:47:10 debug: Starting
/etc/init.d/nfs_wrapper start
ResourceManager[15863][16118]: 2008/02/27_08:47:11 debug:
/etc/init.d/nfs_wrapper start done. RC=0
mach_down[15847][16119]: 2008/02/27_08:47:11 info:
/usr/lib64/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[15847][16123]: 2008/02/27_08:47:11 info: mach_down takeover complete
for node node02.
heartbeat[15490]: 2008/02/27_08:47:11 info: mach_down takeover complete.
logd[23277]: 2008/02/27_08:42:43 info: logd started with /etc/logd.cf.
logd[23280]: 2008/02/27_08:42:44 info: G_main_add_SignalHandler: Added signal
handler for signal 15
logd[23277]: 2008/02/27_08:42:44 info: G_main_add_SignalHandler: Added signal
handler for signal 15
heartbeat[23340]: 2008/02/27_08:42:44 info: Enabling logging daemon
heartbeat[23340]: 2008/02/27_08:42:44 info: logfile and debug file are those
specified in logd config file (default /etc/logd.cf)
heartbeat[23340]: 2008/02/27_08:42:44 info: **************************
heartbeat[23340]: 2008/02/27_08:42:44 info: Configuration validated. Starting
heartbeat 2.0.8
heartbeat[23341]: 2008/02/27_08:42:44 info: heartbeat: version 2.0.8
heartbeat[23341]: 2008/02/27_08:42:44 info: Heartbeat generation: 35
heartbeat[23341]: 2008/02/27_08:42:44 info: G_main_add_TriggerHandler: Added
signal manual handler
heartbeat[23341]: 2008/02/27_08:42:44 info: G_main_add_TriggerHandler: Added
signal manual handler
heartbeat[23341]: 2008/02/27_08:42:44 info: Removing /var/run/heartbeat/rsctmp
failed, recreating.
heartbeat[23341]: 2008/02/27_08:42:44 info: glib: UDP Broadcast heartbeat
started on port 694 (694) interface eth0
heartbeat[23341]: 2008/02/27_08:42:44 info: glib: UDP Broadcast heartbeat
closed on port 694 interface eth0 - Status: 1
heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ucast: write socket priority
set to IPTOS_LOWDELAY on eth1
heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ucast: bound send socket to
device: eth1
heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ucast: bound receive socket
to device: eth1
heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ucast: started on port 694
interface eth1 to 192.168.100.1
heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ping heartbeat started.
heartbeat[23341]: 2008/02/27_08:42:45 info: G_main_add_SignalHandler: Added
signal handler for signal 17
heartbeat[23341]: 2008/02/27_08:42:45 info: Local status now set to: 'up'
heartbeat[23341]: 2008/02/27_08:42:46 info: Link node01:eth0 up.
heartbeat[23341]: 2008/02/27_08:42:46 info: Status update for node node01:
status up
heartbeat[23341]: 2008/02/27_08:42:46 info: Link node02:eth0 up.
heartbeat[23341]: 2008/02/27_08:42:46 info: Link 10.0.0.133:10.0.0.133 up.
heartbeat[23341]: 2008/02/27_08:42:46 info: Status update for node 10.0.0.133:
status ping
heartbeat[23341]: 2008/02/27_08:42:46 debug: get_delnodelist: delnodelist=
heartbeat[23352]: 2008/02/27_08:42:46 debug: notify_world: setting SIGCHLD
Handler to SIG_DFL
heartbeat[23341]: 2008/02/27_08:42:46 info: Comm_now_up(): updating status to
active
heartbeat[23341]: 2008/02/27_08:42:46 info: Local status now set to: 'active'
heartbeat[23341]: 2008/02/27_08:42:46 info: Starting child client
"/usr/lib64/heartbeat/ipfail" (90,90)
heartbeat[23341]: 2008/02/27_08:42:47 WARN: G_CH_dispatch_int: Dispatch
function for read child took too long to execute: 90 ms (> 50 ms) (GSource:
0x653408)
heartbeat[23341]: 2008/02/27_08:42:47 info: Status update for node node01:
status active
heartbeat[23341]: 2008/02/27_08:42:47 debug: StartNextRemoteRscReq(): child
count 1
heartbeat[23341]: 2008/02/27_08:42:47 info: Link node01:eth1 up.
heartbeat[23356]: 2008/02/27_08:42:47 info: Starting
"/usr/lib64/heartbeat/ipfail" as uid 90 gid 90 (pid 23356)
harc[23352][23357]: 2008/02/27_08:42:47 info: Running /etc/ha.d/rc.d/status
status
heartbeat[23360]: 2008/02/27_08:42:47 debug: notify_world: setting SIGCHLD
Handler to SIG_DFL
harc[23360][23363]: 2008/02/27_08:42:47 info: Running /etc/ha.d/rc.d/status
status
heartbeat[23341]: 2008/02/27_08:42:57 info: local resource transition completed.
heartbeat[23341]: 2008/02/27_08:42:57 info: Initial resource acquisition
complete (T_RESOURCES(us))
heartbeat[23341]: 2008/02/27_08:42:57 info: remote resource transition
completed.
IPaddr[23393][23410]: 2008/02/27_08:42:57 INFO: Resource is stopped
heartbeat[23366]: 2008/02/27_08:42:57 info: Local Resource acquisition
completed.
heartbeat[23341]: 2008/02/27_08:42:57 debug: StartNextRemoteRscReq(): child
count 1
heartbeat[23414]: 2008/02/27_08:42:57 debug: notify_world: setting SIGCHLD
Handler to SIG_DFL
harc[23414][23417]: 2008/02/27_08:42:57 info: Running
/etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp[23414][23420]: 2008/02/27_08:42:57 received ip-request-resp
10.0.0.90 OK yes
ResourceManager[23421][23429]: 2008/02/27_08:42:57 info: Acquiring resource
group: node02 10.0.0.90 nfslock nfs_wrapper
IPaddr[23441][23458]: 2008/02/27_08:42:58 INFO: Resource is stopped
ResourceManager[23421][23474]: 2008/02/27_08:42:58 info: Running
/etc/ha.d/resource.d/IPaddr 10.0.0.90 start
ResourceManager[23421][23475]: 2008/02/27_08:42:58 debug: Starting
/etc/ha.d/resource.d/IPaddr 10.0.0.90 start
IPaddr[23485][23494]: 2008/02/27_08:42:58 INFO: Using calculated nic for
10.0.0.90: eth0
IPaddr[23485][23499]: 2008/02/27_08:42:58 DEBUG: Using calculated netmask for
10.0.0.90: 255.255.255.192
IPaddr[23485][23504]: 2008/02/27_08:42:58 DEBUG: Using calculated broadcast for
10.0.0.90: 10.0.0.127
IPaddr[23485][23521]: 2008/02/27_08:42:58 INFO: eval /sbin/ifconfig eth0:0
10.0.0.90 netmask 255.255.255.192 broadcast 10.0.0.127
IPaddr[23485][23526]: 2008/02/27_08:42:58 DEBUG: Sending Gratuitous Arp for
10.0.0.90 on eth0:0 [eth0]
IPaddr[23476][23540]: 2008/02/27_08:42:59 INFO: Success
ResourceManager[23421][23541]: 2008/02/27_08:42:59 debug:
/etc/ha.d/resource.d/IPaddr 10.0.0.90 start done. RC=0
ResourceManager[23421][23572]: 2008/02/27_08:42:59 info: Running
/etc/init.d/nfslock start
ResourceManager[23421][23573]: 2008/02/27_08:42:59 debug: Starting
/etc/init.d/nfslock start
ResourceManager[23421][23593]: 2008/02/27_08:42:59 debug: /etc/init.d/nfslock
start done. RC=0
ResourceManager[23421][23628]: 2008/02/27_08:43:00 info: Running
/etc/init.d/nfs_wrapper start
ResourceManager[23421][23629]: 2008/02/27_08:43:00 debug: Starting
/etc/init.d/nfs_wrapper start
ResourceManager[23421][23676]: 2008/02/27_08:43:01 debug:
/etc/init.d/nfs_wrapper start done. RC=0
heartbeat[23341]: 2008/02/27_08:44:46 info: Link node01:eth0 dead.
heartbeat[23341]: 2008/02/27_08:44:58 info: node01 wants to go standby [all]
logd[4645]: 2008/02/27_08:48:48 info: logd started with /etc/logd.cf.
logd[4650]: 2008/02/27_08:48:48 info: G_main_add_SignalHandler: Added signal
handler for signal 15
logd[4645]: 2008/02/27_08:48:48 info: G_main_add_SignalHandler: Added signal
handler for signal 15
...
logd[15280]: 2008/02/27_08:40:35 info: logd started with /etc/logd.cf.
logd[15281]: 2008/02/27_08:40:35 info: G_main_add_SignalHandler: Added signal
handler for signal 15
logd[15280]: 2008/02/27_08:40:35 info: G_main_add_SignalHandler: Added signal
handler for signal 15
heartbeat[15343]: 2008/02/27_08:40:35 info: Enabling logging daemon
heartbeat[15343]: 2008/02/27_08:40:35 info: logfile and debug file are those
specified in logd config file (default /etc/logd.cf)
heartbeat[15343]: 2008/02/27_08:40:35 ERROR: Current node [node01] not in
configuration!
heartbeat[15343]: 2008/02/27_08:40:35 info: By default, cluster nodes are named
by `uname -n` and must be declared with a 'node' directive in the ha.cf file.
heartbeat[15343]: 2008/02/27_08:40:35 info: See also:
http://linux-ha.org/ha.cf/NodeDirective
heartbeat[15343]: 2008/02/27_08:40:35 ERROR: Configuration error, heartbeat not
started.
heartbeat[15489]: 2008/02/27_08:42:39 info: Enabling logging daemon
heartbeat[15489]: 2008/02/27_08:42:39 info: logfile and debug file are those
specified in logd config file (default /etc/logd.cf)
heartbeat[15489]: 2008/02/27_08:42:39 info: **************************
heartbeat[15489]: 2008/02/27_08:42:39 info: Configuration validated. Starting
heartbeat 2.0.8
heartbeat[15490]: 2008/02/27_08:42:39 info: heartbeat: version 2.0.8
heartbeat[15490]: 2008/02/27_08:42:39 info: Heartbeat generation: 37
heartbeat[15490]: 2008/02/27_08:42:39 info: G_main_add_TriggerHandler: Added
signal manual handler
heartbeat[15490]: 2008/02/27_08:42:39 info: G_main_add_TriggerHandler: Added
signal manual handler
heartbeat[15490]: 2008/02/27_08:42:39 info: Removing /var/run/heartbeat/rsctmp
failed, recreating.
heartbeat[15490]: 2008/02/27_08:42:39 info: glib: UDP Broadcast heartbeat
started on port 694 (694) interface eth0
heartbeat[15490]: 2008/02/27_08:42:39 info: glib: UDP Broadcast heartbeat
closed on port 694 interface eth0 - Status: 1
heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ucast: write socket priority
set to IPTOS_LOWDELAY on eth1
heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ucast: bound send socket to
device: eth1
heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ucast: bound receive socket
to device: eth1
heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ucast: started on port 694
interface eth1 to 192.168.100.2
heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ping heartbeat started.
heartbeat[15490]: 2008/02/27_08:42:39 info: G_main_add_SignalHandler: Added
signal handler for signal 17
heartbeat[15490]: 2008/02/27_08:42:39 info: Local status now set to: 'up'
heartbeat[15490]: 2008/02/27_08:42:40 info: Link 10.0.0.133:10.0.0.133 up.
heartbeat[15490]: 2008/02/27_08:42:40 info: Status update for node 10.0.0.133:
status ping
heartbeat[15490]: 2008/02/27_08:42:40 info: Link node01:eth0 up.
heartbeat[15490]: 2008/02/27_08:42:45 info: Link node02:eth0 up.
heartbeat[15490]: 2008/02/27_08:42:45 info: Status update for node node02:
status up
heartbeat[15490]: 2008/02/27_08:42:45 info: Link node02:eth1 up.
harc[15501][15505]: 2008/02/27_08:42:45 info: Running /etc/ha.d/rc.d/status
status
heartbeat[15490]: 2008/02/27_08:42:45 info: Comm_now_up(): updating status to
active
heartbeat[15490]: 2008/02/27_08:42:45 info: Local status now set to: 'active'
heartbeat[15490]: 2008/02/27_08:42:45 info: Starting child client
"/usr/lib64/heartbeat/ipfail" (90,90)
heartbeat[15490]: 2008/02/27_08:42:45 WARN: G_CH_dispatch_int: Dispatch
function for read child took too long to execute: 90 ms (> 50 ms) (GSource:
0x653408)
heartbeat[15508]: 2008/02/27_08:42:45 info: Starting
"/usr/lib64/heartbeat/ipfail" as uid 90 gid 90 (pid 15508)
heartbeat[15490]: 2008/02/27_08:42:45 info: Status update for node node02:
status active
harc[15509][15512]: 2008/02/27_08:42:45 info: Running /etc/ha.d/rc.d/status
status
heartbeat[15490]: 2008/02/27_08:42:56 info: remote resource transition
completed.
heartbeat[15490]: 2008/02/27_08:42:56 info: remote resource transition
completed.
heartbeat[15490]: 2008/02/27_08:42:56 info: Initial resource acquisition
complete (T_RESOURCES(us))
heartbeat[15515]: 2008/02/27_08:42:56 info: No local resources
[/usr/lib64/heartbeat/ResourceManager listkeys node01] to acquire.
heartbeat[15490]: 2008/02/27_08:44:45 WARN: node 10.0.0.133: is dead
heartbeat[15490]: 2008/02/27_08:44:45 info: Link 10.0.0.133:10.0.0.133 dead.
harc[15592][15595]: 2008/02/27_08:44:45 info: Running /etc/ha.d/rc.d/status
status
heartbeat[15490]: 2008/02/27_08:44:46 info: Link node02:eth0 dead.
heartbeat[15490]: 2008/02/27_08:44:56 info: node01 wants to go standby [all]
heartbeat[15490]: 2008/02/27_08:44:57 info: standby: node02 can take our all
resources
heartbeat[15604]: 2008/02/27_08:44:57 info: give up all HA resources (standby).
ResourceManager[15614][15622]: 2008/02/27_08:44:57 info: Releasing resource
group: node02 10.0.0.90 nfslock nfs_wrapper
ResourceManager[15614][15632]: 2008/02/27_08:44:57 info: Running
/etc/init.d/nfs_wrapper stop
heartbeat[15490]: 2008/02/27_08:46:22 WARN: node node02: is dead
heartbeat[15490]: 2008/02/27_08:46:22 info: Cancelling pending standby operation
heartbeat[15490]: 2008/02/27_08:46:22 WARN: No STONITH device configured.
heartbeat[15490]: 2008/02/27_08:46:22 WARN: Shared disks are not protected.
heartbeat[15490]: 2008/02/27_08:46:22 info: Resources being acquired from
node02.
heartbeat[15490]: 2008/02/27_08:46:22 info: Link node02:eth1 dead.
heartbeat[15710]: 2008/02/27_08:46:23 info: No local resources
[/usr/lib64/heartbeat/ResourceManager listkeys node01] to acquire.
heartbeat[15490]: 2008/02/27_08:46:59 info: Link 10.0.0.133:10.0.0.133 up.
heartbeat[15490]: 2008/02/27_08:46:59 WARN: Late heartbeat: Node 10.0.0.133:
interval 164130 ms
heartbeat[15490]: 2008/02/27_08:46:59 info: Status update for node 10.0.0.133:
status ping
ResourceManager[15614][15765]: 2008/02/27_08:47:01 info: Running
/etc/init.d/nfslock stop
ResourceManager[15614][15820]: 2008/02/27_08:47:04 info: Running
/etc/ha.d/resource.d/IPaddr 10.0.0.90 stop
IPaddr[15822][15839]: 2008/02/27_08:47:04 INFO: Success
heartbeat[15604]: 2008/02/27_08:47:04 info: all HA resource release completed
(standby).
heartbeat[15490]: 2008/02/27_08:47:04 ERROR: Ignored standby message 'done'
from node01 in state 0
heartbeat[15490]: 2008/02/27_08:47:04 WARN: G_SIG_dispatch: Dispatch function
for SIGCHLD took too long to execute: 40 ms (> 10 ms) (GSource: 0x64ab68)
harc[15841][15844]: 2008/02/27_08:47:04 info: Running /etc/ha.d/rc.d/status
status
mach_down[15847][15862]: 2008/02/27_08:47:05 info: Taking over resource group
10.0.0.90
ResourceManager[15863][15871]: 2008/02/27_08:47:05 info: Acquiring resource
group: node02 10.0.0.90 nfslock nfs_wrapper
IPaddr[15883][15900]: 2008/02/27_08:47:06 INFO: Resource is stopped
ResourceManager[15863][15916]: 2008/02/27_08:47:06 info: Running
/etc/ha.d/resource.d/IPaddr 10.0.0.90 start
IPaddr[15927][15936]: 2008/02/27_08:47:07 INFO: Using calculated nic for
10.0.0.90: eth0
IPaddr[15927][15941]: 2008/02/27_08:47:07 DEBUG: Using calculated netmask for
10.0.0.90: 255.255.255.192
IPaddr[15927][15946]: 2008/02/27_08:47:07 DEBUG: Using calculated broadcast for
10.0.0.90: 10.0.0.127
IPaddr[15927][15963]: 2008/02/27_08:47:07 INFO: eval /sbin/ifconfig eth0:0
10.0.0.90 netmask 255.255.255.192 broadcast 10.0.0.127
IPaddr[15927][15968]: 2008/02/27_08:47:08 DEBUG: Sending Gratuitous Arp for
10.0.0.90 on eth0:0 [eth0]
IPaddr[15918][15982]: 2008/02/27_08:47:08 INFO: Success
ResourceManager[15863][16014]: 2008/02/27_08:47:09 info: Running
/etc/init.d/nfslock start
ResourceManager[15863][16070]: 2008/02/27_08:47:10 info: Running
/etc/init.d/nfs_wrapper start
mach_down[15847][16119]: 2008/02/27_08:47:11 info:
/usr/lib64/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[15847][16123]: 2008/02/27_08:47:11 info: mach_down takeover complete
for node node02.
heartbeat[15490]: 2008/02/27_08:47:11 info: mach_down takeover complete.
logd[23277]: 2008/02/27_08:42:43 info: logd started with /etc/logd.cf.
logd[23280]: 2008/02/27_08:42:44 info: G_main_add_SignalHandler: Added signal
handler for signal 15
logd[23277]: 2008/02/27_08:42:44 info: G_main_add_SignalHandler: Added signal
handler for signal 15
heartbeat[23340]: 2008/02/27_08:42:44 info: Enabling logging daemon
heartbeat[23340]: 2008/02/27_08:42:44 info: logfile and debug file are those
specified in logd config file (default /etc/logd.cf)
heartbeat[23340]: 2008/02/27_08:42:44 info: **************************
heartbeat[23340]: 2008/02/27_08:42:44 info: Configuration validated. Starting
heartbeat 2.0.8
heartbeat[23341]: 2008/02/27_08:42:44 info: heartbeat: version 2.0.8
heartbeat[23341]: 2008/02/27_08:42:44 info: Heartbeat generation: 35
heartbeat[23341]: 2008/02/27_08:42:44 info: G_main_add_TriggerHandler: Added
signal manual handler
heartbeat[23341]: 2008/02/27_08:42:44 info: G_main_add_TriggerHandler: Added
signal manual handler
heartbeat[23341]: 2008/02/27_08:42:44 info: Removing /var/run/heartbeat/rsctmp
failed, recreating.
heartbeat[23341]: 2008/02/27_08:42:44 info: glib: UDP Broadcast heartbeat
started on port 694 (694) interface eth0
heartbeat[23341]: 2008/02/27_08:42:44 info: glib: UDP Broadcast heartbeat
closed on port 694 interface eth0 - Status: 1
heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ucast: write socket priority
set to IPTOS_LOWDELAY on eth1
heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ucast: bound send socket to
device: eth1
heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ucast: bound receive socket
to device: eth1
heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ucast: started on port 694
interface eth1 to 192.168.100.1
heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ping heartbeat started.
heartbeat[23341]: 2008/02/27_08:42:45 info: G_main_add_SignalHandler: Added
signal handler for signal 17
heartbeat[23341]: 2008/02/27_08:42:45 info: Local status now set to: 'up'
heartbeat[23341]: 2008/02/27_08:42:46 info: Link node01:eth0 up.
heartbeat[23341]: 2008/02/27_08:42:46 info: Status update for node node01:
status up
heartbeat[23341]: 2008/02/27_08:42:46 info: Link node02:eth0 up.
heartbeat[23341]: 2008/02/27_08:42:46 info: Link 10.0.0.133:10.0.0.133 up.
heartbeat[23341]: 2008/02/27_08:42:46 info: Status update for node 10.0.0.133:
status ping
heartbeat[23341]: 2008/02/27_08:42:46 info: Comm_now_up(): updating status to
active
heartbeat[23341]: 2008/02/27_08:42:46 info: Local status now set to: 'active'
heartbeat[23341]: 2008/02/27_08:42:46 info: Starting child client
"/usr/lib64/heartbeat/ipfail" (90,90)
heartbeat[23341]: 2008/02/27_08:42:47 WARN: G_CH_dispatch_int: Dispatch
function for read child took too long to execute: 90 ms (> 50 ms) (GSource:
0x653408)
heartbeat[23341]: 2008/02/27_08:42:47 info: Status update for node node01:
status active
heartbeat[23341]: 2008/02/27_08:42:47 info: Link node01:eth1 up.
heartbeat[23356]: 2008/02/27_08:42:47 info: Starting
"/usr/lib64/heartbeat/ipfail" as uid 90 gid 90 (pid 23356)
harc[23352][23357]: 2008/02/27_08:42:47 info: Running /etc/ha.d/rc.d/status
status
harc[23360][23363]: 2008/02/27_08:42:47 info: Running /etc/ha.d/rc.d/status
status
heartbeat[23341]: 2008/02/27_08:42:57 info: local resource transition completed.
heartbeat[23341]: 2008/02/27_08:42:57 info: Initial resource acquisition
complete (T_RESOURCES(us))
heartbeat[23341]: 2008/02/27_08:42:57 info: remote resource transition
completed.
IPaddr[23393][23410]: 2008/02/27_08:42:57 INFO: Resource is stopped
heartbeat[23366]: 2008/02/27_08:42:57 info: Local Resource acquisition
completed.
harc[23414][23417]: 2008/02/27_08:42:57 info: Running
/etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp[23414][23420]: 2008/02/27_08:42:57 received ip-request-resp
10.0.0.90 OK yes
ResourceManager[23421][23429]: 2008/02/27_08:42:57 info: Acquiring resource
group: node02 10.0.0.90 nfslock nfs_wrapper
IPaddr[23441][23458]: 2008/02/27_08:42:58 INFO: Resource is stopped
ResourceManager[23421][23474]: 2008/02/27_08:42:58 info: Running
/etc/ha.d/resource.d/IPaddr 10.0.0.90 start
IPaddr[23485][23494]: 2008/02/27_08:42:58 INFO: Using calculated nic for
10.0.0.90: eth0
IPaddr[23485][23499]: 2008/02/27_08:42:58 DEBUG: Using calculated netmask for
10.0.0.90: 255.255.255.192
IPaddr[23485][23504]: 2008/02/27_08:42:58 DEBUG: Using calculated broadcast for
10.0.0.90: 10.0.0.127
IPaddr[23485][23521]: 2008/02/27_08:42:58 INFO: eval /sbin/ifconfig eth0:0
10.0.0.90 netmask 255.255.255.192 broadcast 10.0.0.127
IPaddr[23485][23526]: 2008/02/27_08:42:58 DEBUG: Sending Gratuitous Arp for
10.0.0.90 on eth0:0 [eth0]
IPaddr[23476][23540]: 2008/02/27_08:42:59 INFO: Success
ResourceManager[23421][23572]: 2008/02/27_08:42:59 info: Running
/etc/init.d/nfslock start
ResourceManager[23421][23628]: 2008/02/27_08:43:00 info: Running
/etc/init.d/nfs_wrapper start
heartbeat[23341]: 2008/02/27_08:44:46 info: Link node01:eth0 dead.
heartbeat[23341]: 2008/02/27_08:44:58 info: node01 wants to go standby [all]
logd[4645]: 2008/02/27_08:48:48 info: logd started with /etc/logd.cf.
logd[4650]: 2008/02/27_08:48:48 info: G_main_add_SignalHandler: Added signal
handler for signal 15
logd[4645]: 2008/02/27_08:48:48 info: G_main_add_SignalHandler: Added signal
handler for signal 15
...
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems