Re: [Linux-HA] Secondary reboot

Dejan Muhamedagic Wed, 27 Feb 2008 03:37:04 -0800

Hi,

On Wed, Feb 27, 2008 at 09:45:06AM +1300, Luke Pascoe wrote:
> > How do you induce a failover?
>
> Right now just the simplest kind, a network failure. I'm simply 
> disconnecting the LAN interface on one of the hosts.
>
> It doesn't matter which host I disconnect, 02 reboots.
>
> > Anything in the logs? Perhaps post them.
>
> Nothing obvious. Attached are the ha-log and ha-debug files from NODE01 and 
> NODE02. NODE02 is primary and has the resources, I disconnect eth0 on 
> NODE01, NODE02 acquires it's resources (there are none) and then promptly 
> reboots.


Strange. So, it reboots without a warning. These are all logs you
found? All I found was:

heartbeat[15343]: 2008/02/27_08:40:35 ERROR: Current node [node01] not in 
configuration!

The node names should match what you get from uname -n.

The version you are running is 2.0.8 which is before fastfail was
introduced. And there's no stonith configured. I don't see how
heartbeat could reboot the host. Are you sure there's nothing
else rebooting it?

Thanks,

Dejan

> :(
>
> Regards
>
> Luke Pascoe
> Linux Systems Engineer
> Asterisk
>
> T 09 366 8835
> F 09 302 1772
> M 0274 266649
> E [EMAIL PROTECTED]
> W www.asterisk.co.nz
>
> Level 9, Gen-i Tower, 66 Wyndham Street
> PO Box 8804, Auckland, New Zealand
>
>
> Dejan Muhamedagic wrote:
>> Hi,
>> On Tue, Feb 26, 2008 at 04:35:06PM +1300, Luke Pascoe wrote:
>>> Hello
>>>
>>> I'm trying to do NFS failover in a test environment with an underlying 
>>> OCFS2 filesystem. This is something that's apparently been done before 
>>> and certainly HA NFS isn't new.
>>>
>>> I've followed several HowTos, all of which seem to suggest pretty much 
>>> the same setup, but I seem to get the same problem no matter how I 
>>> configure it.
>>>
>>> Here's the setup:
>>>
>>> 2 VMWare VMs (NODE01 and NODE02) running RHEL4 U5 x86_64 with a shared 
>>> fibre channel SAN volume. That volume is OCFS2 formatted and mounted as 
>>> /data on both hosts.
>>>
>>> Each host has 2 interfaces. An external facing 10.0.0.0/24 and an 
>>> internal 192.168.100.0/30
>>>
>>> Heartbeat is installed on both nodes and configured identically as 
>>> follows:
>>>
>>> =ha.cf=
>>> keepalive 2
>>> deadtime 30
>>> warntime 10
>>> initdead 120
>>> bcast   eth0
>>> ucast eth1 192.168.100.1 # Obviously on NODE02 this is 192.168.100.2
>>> auto_failback on
>>> node NODE02 NODE01
>>>  ping 10.0.0.133 # this is an unrelated LAN host
>>> respawn hacluster /usr/lib64/heartbeat/ipfail
>>>    use_logd yes
>>>  crm off
>>> =/ha.cf=
>>>
>>> =haresources=
>>> NODE01 10.199.133.90 nfslock nfs_wrapper
>>> =/haresources=
>>>
>>> BTW, crm is off because I tried it with it on and got EXACTLY the same 
>>> result.
>>>
>>> Here's the problem:
>>>
>>> Both hosts start up just fine, NODE01 picks up all 3 resources and 
>>> everything's roses. If I induce a failure on NODE01, NODE02 correctly 
>>> acquires the resources and everything is still roses. HOWEVER, ~30 
>>> seconds later NODE02 reboots. Now the odd thing is, it doesn't matter 
>>> which is the primary node, or which host has the failure or even which 
>>> has the resources, NODE02 always reboots when there's a failure.
>> How do you induce a failover?
>>> Even if the resources are started on NODE02 and NODE01 has a failure (ie, 
>>> everything should stay as it is, no failover required) 30 seconds after 
>>> the failure NODE02 reboots!!!
>>>
>>> I've got NTP syncing the time, so it's not a clock issue, and I've tried 
>>> twiddling just about every setting in the config, to no avail.
>>>
>>> Any help? Please?
>> Anything in the logs? Perhaps post them.
>> Thanks,
>> Dejan
>>> -- 
>>> Regards
>>>
>>> Luke Pascoe
>>> Linux Systems Engineer
>>> Asterisk
>>>
>>> T 09 366 8835
>>> F 09 302 1772
>>> M 0274 266649
>>> E [EMAIL PROTECTED]
>>> W www.asterisk.co.nz
>>>
>>> Level 9, Gen-i Tower, 66 Wyndham Street
>>> PO Box 8804, Auckland, New Zealand
>>> _______________________________________________
>>> Linux-HA mailing list
>>> Linux-HA@lists.linux-ha.org
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> See also: http://linux-ha.org/ReportingProblems
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems

> logd[15280]: 2008/02/27_08:40:35 info: logd started with /etc/logd.cf.
> logd[15281]: 2008/02/27_08:40:35 info: G_main_add_SignalHandler: Added signal 
> handler for signal 15
> logd[15280]: 2008/02/27_08:40:35 info: G_main_add_SignalHandler: Added signal 
> handler for signal 15
> heartbeat[15343]: 2008/02/27_08:40:35 info: Enabling logging daemon 
> heartbeat[15343]: 2008/02/27_08:40:35 info: logfile and debug file are those 
> specified in logd config file (default /etc/logd.cf)
> heartbeat[15343]: 2008/02/27_08:40:35 ERROR: Current node [node01] not in 
> configuration!
> heartbeat[15343]: 2008/02/27_08:40:35 info: By default, cluster nodes are 
> named by `uname -n` and must be declared with a 'node' directive in the ha.cf 
> file.
> heartbeat[15343]: 2008/02/27_08:40:35 info: See also: 
> http://linux-ha.org/ha.cf/NodeDirective
> heartbeat[15343]: 2008/02/27_08:40:35 ERROR: Configuration error, heartbeat 
> not started.
> heartbeat[15489]: 2008/02/27_08:42:39 info: Enabling logging daemon 
> heartbeat[15489]: 2008/02/27_08:42:39 info: logfile and debug file are those 
> specified in logd config file (default /etc/logd.cf)
> heartbeat[15489]: 2008/02/27_08:42:39 info: **************************
> heartbeat[15489]: 2008/02/27_08:42:39 info: Configuration validated. Starting 
> heartbeat 2.0.8
> heartbeat[15490]: 2008/02/27_08:42:39 info: heartbeat: version 2.0.8
> heartbeat[15490]: 2008/02/27_08:42:39 info: Heartbeat generation: 37
> heartbeat[15490]: 2008/02/27_08:42:39 info: G_main_add_TriggerHandler: Added 
> signal manual handler
> heartbeat[15490]: 2008/02/27_08:42:39 info: G_main_add_TriggerHandler: Added 
> signal manual handler
> heartbeat[15490]: 2008/02/27_08:42:39 info: Removing 
> /var/run/heartbeat/rsctmp failed, recreating.
> heartbeat[15490]: 2008/02/27_08:42:39 info: glib: UDP Broadcast heartbeat 
> started on port 694 (694) interface eth0
> heartbeat[15490]: 2008/02/27_08:42:39 info: glib: UDP Broadcast heartbeat 
> closed on port 694 interface eth0 - Status: 1
> heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ucast: write socket 
> priority set to IPTOS_LOWDELAY on eth1
> heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ucast: bound send socket to 
> device: eth1
> heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ucast: bound receive socket 
> to device: eth1
> heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ucast: started on port 694 
> interface eth1 to 192.168.100.2
> heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ping heartbeat started.
> heartbeat[15490]: 2008/02/27_08:42:39 info: G_main_add_SignalHandler: Added 
> signal handler for signal 17
> heartbeat[15490]: 2008/02/27_08:42:39 info: Local status now set to: 'up'
> heartbeat[15490]: 2008/02/27_08:42:40 info: Link 10.0.0.133:10.0.0.133 up.
> heartbeat[15490]: 2008/02/27_08:42:40 info: Status update for node 
> 10.0.0.133: status ping
> heartbeat[15490]: 2008/02/27_08:42:40 info: Link node01:eth0 up.
> heartbeat[15490]: 2008/02/27_08:42:45 info: Link node02:eth0 up.
> heartbeat[15490]: 2008/02/27_08:42:45 info: Status update for node node02: 
> status up
> heartbeat[15501]: 2008/02/27_08:42:45 debug: notify_world: setting SIGCHLD 
> Handler to SIG_DFL
> heartbeat[15490]: 2008/02/27_08:42:45 info: Link node02:eth1 up.
> harc[15501][15505]: 2008/02/27_08:42:45 info: Running /etc/ha.d/rc.d/status 
> status
> heartbeat[15490]: 2008/02/27_08:42:45 debug: get_delnodelist: delnodelist= 
> heartbeat[15490]: 2008/02/27_08:42:45 info: Comm_now_up(): updating status to 
> active
> heartbeat[15490]: 2008/02/27_08:42:45 info: Local status now set to: 'active'
> heartbeat[15490]: 2008/02/27_08:42:45 info: Starting child client 
> "/usr/lib64/heartbeat/ipfail" (90,90)
> heartbeat[15490]: 2008/02/27_08:42:45 WARN: G_CH_dispatch_int: Dispatch 
> function for read child took too long to execute: 90 ms (> 50 ms) (GSource: 
> 0x653408)
> heartbeat[15508]: 2008/02/27_08:42:45 info: Starting 
> "/usr/lib64/heartbeat/ipfail" as uid 90  gid 90 (pid 15508)
> heartbeat[15490]: 2008/02/27_08:42:45 info: Status update for node node02: 
> status active
> heartbeat[15509]: 2008/02/27_08:42:45 debug: notify_world: setting SIGCHLD 
> Handler to SIG_DFL
> harc[15509][15512]: 2008/02/27_08:42:45 info: Running /etc/ha.d/rc.d/status 
> status
> heartbeat[15490]: 2008/02/27_08:42:56 info: remote resource transition 
> completed.
> heartbeat[15490]: 2008/02/27_08:42:56 info: remote resource transition 
> completed.
> heartbeat[15490]: 2008/02/27_08:42:56 info: Initial resource acquisition 
> complete (T_RESOURCES(us))
> heartbeat[15515]: 2008/02/27_08:42:56 info: No local resources 
> [/usr/lib64/heartbeat/ResourceManager listkeys node01] to acquire.
> heartbeat[15490]: 2008/02/27_08:44:45 WARN: node 10.0.0.133: is dead
> heartbeat[15490]: 2008/02/27_08:44:45 info: Link 10.0.0.133:10.0.0.133 dead.
> heartbeat[15592]: 2008/02/27_08:44:45 debug: notify_world: setting SIGCHLD 
> Handler to SIG_DFL
> harc[15592][15595]: 2008/02/27_08:44:45 info: Running /etc/ha.d/rc.d/status 
> status
> heartbeat[15490]: 2008/02/27_08:44:46 info: Link node02:eth0 dead.
> heartbeat[15490]: 2008/02/27_08:44:56 info: node01 wants to go standby [all]
> heartbeat[15490]: 2008/02/27_08:44:57 info: standby: node02 can take our all 
> resources
> heartbeat[15604]: 2008/02/27_08:44:57 info: give up all HA resources 
> (standby).
> ResourceManager[15614][15622]: 2008/02/27_08:44:57 info: Releasing resource 
> group: node02 10.0.0.90 nfslock nfs_wrapper
> ResourceManager[15614][15632]: 2008/02/27_08:44:57 info: Running 
> /etc/init.d/nfs_wrapper  stop
> ResourceManager[15614][15633]: 2008/02/27_08:44:57 debug: Starting 
> /etc/init.d/nfs_wrapper  stop
> heartbeat[15490]: 2008/02/27_08:46:22 WARN: node node02: is dead
> heartbeat[15490]: 2008/02/27_08:46:22 info: Cancelling pending standby 
> operation
> heartbeat[15490]: 2008/02/27_08:46:22 WARN: No STONITH device configured.
> heartbeat[15490]: 2008/02/27_08:46:22 WARN: Shared disks are not protected.
> heartbeat[15490]: 2008/02/27_08:46:22 info: Resources being acquired from 
> node02.
> heartbeat[15490]: 2008/02/27_08:46:22 debug: StartNextRemoteRscReq(): child 
> count 1
> heartbeat[15490]: 2008/02/27_08:46:22 info: Link node02:eth1 dead.
> heartbeat[15710]: 2008/02/27_08:46:23 info: No local resources 
> [/usr/lib64/heartbeat/ResourceManager listkeys node01] to acquire.
> heartbeat[15490]: 2008/02/27_08:46:23 debug: StartNextRemoteRscReq(): child 
> count 1
> heartbeat[15490]: 2008/02/27_08:46:59 info: Link 10.0.0.133:10.0.0.133 up.
> heartbeat[15490]: 2008/02/27_08:46:59 WARN: Late heartbeat: Node 10.0.0.133: 
> interval 164130 ms
> heartbeat[15490]: 2008/02/27_08:46:59 info: Status update for node 
> 10.0.0.133: status ping
> ResourceManager[15614][15749]: 2008/02/27_08:47:01 debug: 
> /etc/init.d/nfs_wrapper  stop done. RC=0
> ResourceManager[15614][15765]: 2008/02/27_08:47:01 info: Running 
> /etc/init.d/nfslock  stop
> ResourceManager[15614][15766]: 2008/02/27_08:47:02 debug: Starting 
> /etc/init.d/nfslock  stop
> ResourceManager[15614][15804]: 2008/02/27_08:47:03 debug: /etc/init.d/nfslock 
>  stop done. RC=0
> ResourceManager[15614][15820]: 2008/02/27_08:47:04 info: Running 
> /etc/ha.d/resource.d/IPaddr 10.0.0.90 stop
> ResourceManager[15614][15821]: 2008/02/27_08:47:04 debug: Starting 
> /etc/ha.d/resource.d/IPaddr 10.0.0.90 stop
> IPaddr[15822][15839]: 2008/02/27_08:47:04 INFO:  Success
> ResourceManager[15614][15840]: 2008/02/27_08:47:04 debug: 
> /etc/ha.d/resource.d/IPaddr 10.0.0.90 stop done. RC=0
> heartbeat[15604]: 2008/02/27_08:47:04 info: all HA resource release completed 
> (standby).
> heartbeat[15490]: 2008/02/27_08:47:04 ERROR: Ignored standby message 'done' 
> from node01 in state 0
> heartbeat[15490]: 2008/02/27_08:47:04 WARN: G_SIG_dispatch: Dispatch function 
> for SIGCHLD took too long to execute: 40 ms (> 10 ms) (GSource: 0x64ab68)
> heartbeat[15841]: 2008/02/27_08:47:04 debug: notify_world: setting SIGCHLD 
> Handler to SIG_DFL
> harc[15841][15844]: 2008/02/27_08:47:04 info: Running /etc/ha.d/rc.d/status 
> status
> mach_down[15847][15862]: 2008/02/27_08:47:05 info: Taking over resource group 
> 10.0.0.90
> ResourceManager[15863][15871]: 2008/02/27_08:47:05 info: Acquiring resource 
> group: node02 10.0.0.90 nfslock nfs_wrapper
> IPaddr[15883][15900]: 2008/02/27_08:47:06 INFO:  Resource is stopped
> ResourceManager[15863][15916]: 2008/02/27_08:47:06 info: Running 
> /etc/ha.d/resource.d/IPaddr 10.0.0.90 start
> ResourceManager[15863][15917]: 2008/02/27_08:47:06 debug: Starting 
> /etc/ha.d/resource.d/IPaddr 10.0.0.90 start
> IPaddr[15927][15936]: 2008/02/27_08:47:07 INFO: Using calculated nic for 
> 10.0.0.90: eth0
> IPaddr[15927][15941]: 2008/02/27_08:47:07 DEBUG: Using calculated netmask for 
> 10.0.0.90: 255.255.255.192
> IPaddr[15927][15946]: 2008/02/27_08:47:07 DEBUG: Using calculated broadcast 
> for 10.0.0.90: 10.0.0.127
> IPaddr[15927][15963]: 2008/02/27_08:47:07 INFO: eval /sbin/ifconfig eth0:0 
> 10.0.0.90 netmask 255.255.255.192 broadcast 10.0.0.127
> IPaddr[15927][15968]: 2008/02/27_08:47:08 DEBUG: Sending Gratuitous Arp for 
> 10.0.0.90 on eth0:0 [eth0]
> IPaddr[15918][15982]: 2008/02/27_08:47:08 INFO:  Success
> ResourceManager[15863][15983]: 2008/02/27_08:47:08 debug: 
> /etc/ha.d/resource.d/IPaddr 10.0.0.90 start done. RC=0
> ResourceManager[15863][16014]: 2008/02/27_08:47:09 info: Running 
> /etc/init.d/nfslock  start
> ResourceManager[15863][16015]: 2008/02/27_08:47:09 debug: Starting 
> /etc/init.d/nfslock  start
> ResourceManager[15863][16035]: 2008/02/27_08:47:10 debug: /etc/init.d/nfslock 
>  start done. RC=0
> ResourceManager[15863][16070]: 2008/02/27_08:47:10 info: Running 
> /etc/init.d/nfs_wrapper  start
> ResourceManager[15863][16071]: 2008/02/27_08:47:10 debug: Starting 
> /etc/init.d/nfs_wrapper  start
> ResourceManager[15863][16118]: 2008/02/27_08:47:11 debug: 
> /etc/init.d/nfs_wrapper  start done. RC=0
> mach_down[15847][16119]: 2008/02/27_08:47:11 info: 
> /usr/lib64/heartbeat/mach_down: nice_failback: foreign resources acquired
> mach_down[15847][16123]: 2008/02/27_08:47:11 info: mach_down takeover 
> complete for node node02.
> heartbeat[15490]: 2008/02/27_08:47:11 info: mach_down takeover complete.

> logd[23277]: 2008/02/27_08:42:43 info: logd started with /etc/logd.cf.
> logd[23280]: 2008/02/27_08:42:44 info: G_main_add_SignalHandler: Added signal 
> handler for signal 15
> logd[23277]: 2008/02/27_08:42:44 info: G_main_add_SignalHandler: Added signal 
> handler for signal 15
> heartbeat[23340]: 2008/02/27_08:42:44 info: Enabling logging daemon 
> heartbeat[23340]: 2008/02/27_08:42:44 info: logfile and debug file are those 
> specified in logd config file (default /etc/logd.cf)
> heartbeat[23340]: 2008/02/27_08:42:44 info: **************************
> heartbeat[23340]: 2008/02/27_08:42:44 info: Configuration validated. Starting 
> heartbeat 2.0.8
> heartbeat[23341]: 2008/02/27_08:42:44 info: heartbeat: version 2.0.8
> heartbeat[23341]: 2008/02/27_08:42:44 info: Heartbeat generation: 35
> heartbeat[23341]: 2008/02/27_08:42:44 info: G_main_add_TriggerHandler: Added 
> signal manual handler
> heartbeat[23341]: 2008/02/27_08:42:44 info: G_main_add_TriggerHandler: Added 
> signal manual handler
> heartbeat[23341]: 2008/02/27_08:42:44 info: Removing 
> /var/run/heartbeat/rsctmp failed, recreating.
> heartbeat[23341]: 2008/02/27_08:42:44 info: glib: UDP Broadcast heartbeat 
> started on port 694 (694) interface eth0
> heartbeat[23341]: 2008/02/27_08:42:44 info: glib: UDP Broadcast heartbeat 
> closed on port 694 interface eth0 - Status: 1
> heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ucast: write socket 
> priority set to IPTOS_LOWDELAY on eth1
> heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ucast: bound send socket to 
> device: eth1
> heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ucast: bound receive socket 
> to device: eth1
> heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ucast: started on port 694 
> interface eth1 to 192.168.100.1
> heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ping heartbeat started.
> heartbeat[23341]: 2008/02/27_08:42:45 info: G_main_add_SignalHandler: Added 
> signal handler for signal 17
> heartbeat[23341]: 2008/02/27_08:42:45 info: Local status now set to: 'up'
> heartbeat[23341]: 2008/02/27_08:42:46 info: Link node01:eth0 up.
> heartbeat[23341]: 2008/02/27_08:42:46 info: Status update for node node01: 
> status up
> heartbeat[23341]: 2008/02/27_08:42:46 info: Link node02:eth0 up.
> heartbeat[23341]: 2008/02/27_08:42:46 info: Link 10.0.0.133:10.0.0.133 up.
> heartbeat[23341]: 2008/02/27_08:42:46 info: Status update for node 
> 10.0.0.133: status ping
> heartbeat[23341]: 2008/02/27_08:42:46 debug: get_delnodelist: delnodelist= 
> heartbeat[23352]: 2008/02/27_08:42:46 debug: notify_world: setting SIGCHLD 
> Handler to SIG_DFL
> heartbeat[23341]: 2008/02/27_08:42:46 info: Comm_now_up(): updating status to 
> active
> heartbeat[23341]: 2008/02/27_08:42:46 info: Local status now set to: 'active'
> heartbeat[23341]: 2008/02/27_08:42:46 info: Starting child client 
> "/usr/lib64/heartbeat/ipfail" (90,90)
> heartbeat[23341]: 2008/02/27_08:42:47 WARN: G_CH_dispatch_int: Dispatch 
> function for read child took too long to execute: 90 ms (> 50 ms) (GSource: 
> 0x653408)
> heartbeat[23341]: 2008/02/27_08:42:47 info: Status update for node node01: 
> status active
> heartbeat[23341]: 2008/02/27_08:42:47 debug: StartNextRemoteRscReq(): child 
> count 1
> heartbeat[23341]: 2008/02/27_08:42:47 info: Link node01:eth1 up.
> heartbeat[23356]: 2008/02/27_08:42:47 info: Starting 
> "/usr/lib64/heartbeat/ipfail" as uid 90  gid 90 (pid 23356)
> harc[23352][23357]: 2008/02/27_08:42:47 info: Running /etc/ha.d/rc.d/status 
> status
> heartbeat[23360]: 2008/02/27_08:42:47 debug: notify_world: setting SIGCHLD 
> Handler to SIG_DFL
> harc[23360][23363]: 2008/02/27_08:42:47 info: Running /etc/ha.d/rc.d/status 
> status
> heartbeat[23341]: 2008/02/27_08:42:57 info: local resource transition 
> completed.
> heartbeat[23341]: 2008/02/27_08:42:57 info: Initial resource acquisition 
> complete (T_RESOURCES(us))
> heartbeat[23341]: 2008/02/27_08:42:57 info: remote resource transition 
> completed.
> IPaddr[23393][23410]: 2008/02/27_08:42:57 INFO:  Resource is stopped
> heartbeat[23366]: 2008/02/27_08:42:57 info: Local Resource acquisition 
> completed.
> heartbeat[23341]: 2008/02/27_08:42:57 debug: StartNextRemoteRscReq(): child 
> count 1
> heartbeat[23414]: 2008/02/27_08:42:57 debug: notify_world: setting SIGCHLD 
> Handler to SIG_DFL
> harc[23414][23417]: 2008/02/27_08:42:57 info: Running 
> /etc/ha.d/rc.d/ip-request-resp ip-request-resp
> ip-request-resp[23414][23420]: 2008/02/27_08:42:57 received ip-request-resp 
> 10.0.0.90 OK yes
> ResourceManager[23421][23429]: 2008/02/27_08:42:57 info: Acquiring resource 
> group: node02 10.0.0.90 nfslock nfs_wrapper
> IPaddr[23441][23458]: 2008/02/27_08:42:58 INFO:  Resource is stopped
> ResourceManager[23421][23474]: 2008/02/27_08:42:58 info: Running 
> /etc/ha.d/resource.d/IPaddr 10.0.0.90 start
> ResourceManager[23421][23475]: 2008/02/27_08:42:58 debug: Starting 
> /etc/ha.d/resource.d/IPaddr 10.0.0.90 start
> IPaddr[23485][23494]: 2008/02/27_08:42:58 INFO: Using calculated nic for 
> 10.0.0.90: eth0
> IPaddr[23485][23499]: 2008/02/27_08:42:58 DEBUG: Using calculated netmask for 
> 10.0.0.90: 255.255.255.192
> IPaddr[23485][23504]: 2008/02/27_08:42:58 DEBUG: Using calculated broadcast 
> for 10.0.0.90: 10.0.0.127
> IPaddr[23485][23521]: 2008/02/27_08:42:58 INFO: eval /sbin/ifconfig eth0:0 
> 10.0.0.90 netmask 255.255.255.192 broadcast 10.0.0.127
> IPaddr[23485][23526]: 2008/02/27_08:42:58 DEBUG: Sending Gratuitous Arp for 
> 10.0.0.90 on eth0:0 [eth0]
> IPaddr[23476][23540]: 2008/02/27_08:42:59 INFO:  Success
> ResourceManager[23421][23541]: 2008/02/27_08:42:59 debug: 
> /etc/ha.d/resource.d/IPaddr 10.0.0.90 start done. RC=0
> ResourceManager[23421][23572]: 2008/02/27_08:42:59 info: Running 
> /etc/init.d/nfslock  start
> ResourceManager[23421][23573]: 2008/02/27_08:42:59 debug: Starting 
> /etc/init.d/nfslock  start
> ResourceManager[23421][23593]: 2008/02/27_08:42:59 debug: /etc/init.d/nfslock 
>  start done. RC=0
> ResourceManager[23421][23628]: 2008/02/27_08:43:00 info: Running 
> /etc/init.d/nfs_wrapper  start
> ResourceManager[23421][23629]: 2008/02/27_08:43:00 debug: Starting 
> /etc/init.d/nfs_wrapper  start
> ResourceManager[23421][23676]: 2008/02/27_08:43:01 debug: 
> /etc/init.d/nfs_wrapper  start done. RC=0
> heartbeat[23341]: 2008/02/27_08:44:46 info: Link node01:eth0 dead.
> heartbeat[23341]: 2008/02/27_08:44:58 info: node01 wants to go standby [all]
> logd[4645]: 2008/02/27_08:48:48 info: logd started with /etc/logd.cf.
> logd[4650]: 2008/02/27_08:48:48 info: G_main_add_SignalHandler: Added signal 
> handler for signal 15
> logd[4645]: 2008/02/27_08:48:48 info: G_main_add_SignalHandler: Added signal 
> handler for signal 15
> ...

> logd[15280]: 2008/02/27_08:40:35 info: logd started with /etc/logd.cf.
> logd[15281]: 2008/02/27_08:40:35 info: G_main_add_SignalHandler: Added signal 
> handler for signal 15
> logd[15280]: 2008/02/27_08:40:35 info: G_main_add_SignalHandler: Added signal 
> handler for signal 15
> heartbeat[15343]: 2008/02/27_08:40:35 info: Enabling logging daemon 
> heartbeat[15343]: 2008/02/27_08:40:35 info: logfile and debug file are those 
> specified in logd config file (default /etc/logd.cf)
> heartbeat[15343]: 2008/02/27_08:40:35 ERROR: Current node [node01] not in 
> configuration!
> heartbeat[15343]: 2008/02/27_08:40:35 info: By default, cluster nodes are 
> named by `uname -n` and must be declared with a 'node' directive in the ha.cf 
> file.
> heartbeat[15343]: 2008/02/27_08:40:35 info: See also: 
> http://linux-ha.org/ha.cf/NodeDirective
> heartbeat[15343]: 2008/02/27_08:40:35 ERROR: Configuration error, heartbeat 
> not started.
> heartbeat[15489]: 2008/02/27_08:42:39 info: Enabling logging daemon 
> heartbeat[15489]: 2008/02/27_08:42:39 info: logfile and debug file are those 
> specified in logd config file (default /etc/logd.cf)
> heartbeat[15489]: 2008/02/27_08:42:39 info: **************************
> heartbeat[15489]: 2008/02/27_08:42:39 info: Configuration validated. Starting 
> heartbeat 2.0.8
> heartbeat[15490]: 2008/02/27_08:42:39 info: heartbeat: version 2.0.8
> heartbeat[15490]: 2008/02/27_08:42:39 info: Heartbeat generation: 37
> heartbeat[15490]: 2008/02/27_08:42:39 info: G_main_add_TriggerHandler: Added 
> signal manual handler
> heartbeat[15490]: 2008/02/27_08:42:39 info: G_main_add_TriggerHandler: Added 
> signal manual handler
> heartbeat[15490]: 2008/02/27_08:42:39 info: Removing 
> /var/run/heartbeat/rsctmp failed, recreating.
> heartbeat[15490]: 2008/02/27_08:42:39 info: glib: UDP Broadcast heartbeat 
> started on port 694 (694) interface eth0
> heartbeat[15490]: 2008/02/27_08:42:39 info: glib: UDP Broadcast heartbeat 
> closed on port 694 interface eth0 - Status: 1
> heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ucast: write socket 
> priority set to IPTOS_LOWDELAY on eth1
> heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ucast: bound send socket to 
> device: eth1
> heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ucast: bound receive socket 
> to device: eth1
> heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ucast: started on port 694 
> interface eth1 to 192.168.100.2
> heartbeat[15490]: 2008/02/27_08:42:39 info: glib: ping heartbeat started.
> heartbeat[15490]: 2008/02/27_08:42:39 info: G_main_add_SignalHandler: Added 
> signal handler for signal 17
> heartbeat[15490]: 2008/02/27_08:42:39 info: Local status now set to: 'up'
> heartbeat[15490]: 2008/02/27_08:42:40 info: Link 10.0.0.133:10.0.0.133 up.
> heartbeat[15490]: 2008/02/27_08:42:40 info: Status update for node 
> 10.0.0.133: status ping
> heartbeat[15490]: 2008/02/27_08:42:40 info: Link node01:eth0 up.
> heartbeat[15490]: 2008/02/27_08:42:45 info: Link node02:eth0 up.
> heartbeat[15490]: 2008/02/27_08:42:45 info: Status update for node node02: 
> status up
> heartbeat[15490]: 2008/02/27_08:42:45 info: Link node02:eth1 up.
> harc[15501][15505]: 2008/02/27_08:42:45 info: Running /etc/ha.d/rc.d/status 
> status
> heartbeat[15490]: 2008/02/27_08:42:45 info: Comm_now_up(): updating status to 
> active
> heartbeat[15490]: 2008/02/27_08:42:45 info: Local status now set to: 'active'
> heartbeat[15490]: 2008/02/27_08:42:45 info: Starting child client 
> "/usr/lib64/heartbeat/ipfail" (90,90)
> heartbeat[15490]: 2008/02/27_08:42:45 WARN: G_CH_dispatch_int: Dispatch 
> function for read child took too long to execute: 90 ms (> 50 ms) (GSource: 
> 0x653408)
> heartbeat[15508]: 2008/02/27_08:42:45 info: Starting 
> "/usr/lib64/heartbeat/ipfail" as uid 90  gid 90 (pid 15508)
> heartbeat[15490]: 2008/02/27_08:42:45 info: Status update for node node02: 
> status active
> harc[15509][15512]: 2008/02/27_08:42:45 info: Running /etc/ha.d/rc.d/status 
> status
> heartbeat[15490]: 2008/02/27_08:42:56 info: remote resource transition 
> completed.
> heartbeat[15490]: 2008/02/27_08:42:56 info: remote resource transition 
> completed.
> heartbeat[15490]: 2008/02/27_08:42:56 info: Initial resource acquisition 
> complete (T_RESOURCES(us))
> heartbeat[15515]: 2008/02/27_08:42:56 info: No local resources 
> [/usr/lib64/heartbeat/ResourceManager listkeys node01] to acquire.
> heartbeat[15490]: 2008/02/27_08:44:45 WARN: node 10.0.0.133: is dead
> heartbeat[15490]: 2008/02/27_08:44:45 info: Link 10.0.0.133:10.0.0.133 dead.
> harc[15592][15595]: 2008/02/27_08:44:45 info: Running /etc/ha.d/rc.d/status 
> status
> heartbeat[15490]: 2008/02/27_08:44:46 info: Link node02:eth0 dead.
> heartbeat[15490]: 2008/02/27_08:44:56 info: node01 wants to go standby [all]
> heartbeat[15490]: 2008/02/27_08:44:57 info: standby: node02 can take our all 
> resources
> heartbeat[15604]: 2008/02/27_08:44:57 info: give up all HA resources 
> (standby).
> ResourceManager[15614][15622]: 2008/02/27_08:44:57 info: Releasing resource 
> group: node02 10.0.0.90 nfslock nfs_wrapper
> ResourceManager[15614][15632]: 2008/02/27_08:44:57 info: Running 
> /etc/init.d/nfs_wrapper  stop
> heartbeat[15490]: 2008/02/27_08:46:22 WARN: node node02: is dead
> heartbeat[15490]: 2008/02/27_08:46:22 info: Cancelling pending standby 
> operation
> heartbeat[15490]: 2008/02/27_08:46:22 WARN: No STONITH device configured.
> heartbeat[15490]: 2008/02/27_08:46:22 WARN: Shared disks are not protected.
> heartbeat[15490]: 2008/02/27_08:46:22 info: Resources being acquired from 
> node02.
> heartbeat[15490]: 2008/02/27_08:46:22 info: Link node02:eth1 dead.
> heartbeat[15710]: 2008/02/27_08:46:23 info: No local resources 
> [/usr/lib64/heartbeat/ResourceManager listkeys node01] to acquire.
> heartbeat[15490]: 2008/02/27_08:46:59 info: Link 10.0.0.133:10.0.0.133 up.
> heartbeat[15490]: 2008/02/27_08:46:59 WARN: Late heartbeat: Node 10.0.0.133: 
> interval 164130 ms
> heartbeat[15490]: 2008/02/27_08:46:59 info: Status update for node 
> 10.0.0.133: status ping
> ResourceManager[15614][15765]: 2008/02/27_08:47:01 info: Running 
> /etc/init.d/nfslock  stop
> ResourceManager[15614][15820]: 2008/02/27_08:47:04 info: Running 
> /etc/ha.d/resource.d/IPaddr 10.0.0.90 stop
> IPaddr[15822][15839]: 2008/02/27_08:47:04 INFO:  Success
> heartbeat[15604]: 2008/02/27_08:47:04 info: all HA resource release completed 
> (standby).
> heartbeat[15490]: 2008/02/27_08:47:04 ERROR: Ignored standby message 'done' 
> from node01 in state 0
> heartbeat[15490]: 2008/02/27_08:47:04 WARN: G_SIG_dispatch: Dispatch function 
> for SIGCHLD took too long to execute: 40 ms (> 10 ms) (GSource: 0x64ab68)
> harc[15841][15844]: 2008/02/27_08:47:04 info: Running /etc/ha.d/rc.d/status 
> status
> mach_down[15847][15862]: 2008/02/27_08:47:05 info: Taking over resource group 
> 10.0.0.90
> ResourceManager[15863][15871]: 2008/02/27_08:47:05 info: Acquiring resource 
> group: node02 10.0.0.90 nfslock nfs_wrapper
> IPaddr[15883][15900]: 2008/02/27_08:47:06 INFO:  Resource is stopped
> ResourceManager[15863][15916]: 2008/02/27_08:47:06 info: Running 
> /etc/ha.d/resource.d/IPaddr 10.0.0.90 start
> IPaddr[15927][15936]: 2008/02/27_08:47:07 INFO: Using calculated nic for 
> 10.0.0.90: eth0
> IPaddr[15927][15941]: 2008/02/27_08:47:07 DEBUG: Using calculated netmask for 
> 10.0.0.90: 255.255.255.192
> IPaddr[15927][15946]: 2008/02/27_08:47:07 DEBUG: Using calculated broadcast 
> for 10.0.0.90: 10.0.0.127
> IPaddr[15927][15963]: 2008/02/27_08:47:07 INFO: eval /sbin/ifconfig eth0:0 
> 10.0.0.90 netmask 255.255.255.192 broadcast 10.0.0.127
> IPaddr[15927][15968]: 2008/02/27_08:47:08 DEBUG: Sending Gratuitous Arp for 
> 10.0.0.90 on eth0:0 [eth0]
> IPaddr[15918][15982]: 2008/02/27_08:47:08 INFO:  Success
> ResourceManager[15863][16014]: 2008/02/27_08:47:09 info: Running 
> /etc/init.d/nfslock  start
> ResourceManager[15863][16070]: 2008/02/27_08:47:10 info: Running 
> /etc/init.d/nfs_wrapper  start
> mach_down[15847][16119]: 2008/02/27_08:47:11 info: 
> /usr/lib64/heartbeat/mach_down: nice_failback: foreign resources acquired
> mach_down[15847][16123]: 2008/02/27_08:47:11 info: mach_down takeover 
> complete for node node02.
> heartbeat[15490]: 2008/02/27_08:47:11 info: mach_down takeover complete.

> logd[23277]: 2008/02/27_08:42:43 info: logd started with /etc/logd.cf.
> logd[23280]: 2008/02/27_08:42:44 info: G_main_add_SignalHandler: Added signal 
> handler for signal 15
> logd[23277]: 2008/02/27_08:42:44 info: G_main_add_SignalHandler: Added signal 
> handler for signal 15
> heartbeat[23340]: 2008/02/27_08:42:44 info: Enabling logging daemon 
> heartbeat[23340]: 2008/02/27_08:42:44 info: logfile and debug file are those 
> specified in logd config file (default /etc/logd.cf)
> heartbeat[23340]: 2008/02/27_08:42:44 info: **************************
> heartbeat[23340]: 2008/02/27_08:42:44 info: Configuration validated. Starting 
> heartbeat 2.0.8
> heartbeat[23341]: 2008/02/27_08:42:44 info: heartbeat: version 2.0.8
> heartbeat[23341]: 2008/02/27_08:42:44 info: Heartbeat generation: 35
> heartbeat[23341]: 2008/02/27_08:42:44 info: G_main_add_TriggerHandler: Added 
> signal manual handler
> heartbeat[23341]: 2008/02/27_08:42:44 info: G_main_add_TriggerHandler: Added 
> signal manual handler
> heartbeat[23341]: 2008/02/27_08:42:44 info: Removing 
> /var/run/heartbeat/rsctmp failed, recreating.
> heartbeat[23341]: 2008/02/27_08:42:44 info: glib: UDP Broadcast heartbeat 
> started on port 694 (694) interface eth0
> heartbeat[23341]: 2008/02/27_08:42:44 info: glib: UDP Broadcast heartbeat 
> closed on port 694 interface eth0 - Status: 1
> heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ucast: write socket 
> priority set to IPTOS_LOWDELAY on eth1
> heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ucast: bound send socket to 
> device: eth1
> heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ucast: bound receive socket 
> to device: eth1
> heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ucast: started on port 694 
> interface eth1 to 192.168.100.1
> heartbeat[23341]: 2008/02/27_08:42:44 info: glib: ping heartbeat started.
> heartbeat[23341]: 2008/02/27_08:42:45 info: G_main_add_SignalHandler: Added 
> signal handler for signal 17
> heartbeat[23341]: 2008/02/27_08:42:45 info: Local status now set to: 'up'
> heartbeat[23341]: 2008/02/27_08:42:46 info: Link node01:eth0 up.
> heartbeat[23341]: 2008/02/27_08:42:46 info: Status update for node node01: 
> status up
> heartbeat[23341]: 2008/02/27_08:42:46 info: Link node02:eth0 up.
> heartbeat[23341]: 2008/02/27_08:42:46 info: Link 10.0.0.133:10.0.0.133 up.
> heartbeat[23341]: 2008/02/27_08:42:46 info: Status update for node 
> 10.0.0.133: status ping
> heartbeat[23341]: 2008/02/27_08:42:46 info: Comm_now_up(): updating status to 
> active
> heartbeat[23341]: 2008/02/27_08:42:46 info: Local status now set to: 'active'
> heartbeat[23341]: 2008/02/27_08:42:46 info: Starting child client 
> "/usr/lib64/heartbeat/ipfail" (90,90)
> heartbeat[23341]: 2008/02/27_08:42:47 WARN: G_CH_dispatch_int: Dispatch 
> function for read child took too long to execute: 90 ms (> 50 ms) (GSource: 
> 0x653408)
> heartbeat[23341]: 2008/02/27_08:42:47 info: Status update for node node01: 
> status active
> heartbeat[23341]: 2008/02/27_08:42:47 info: Link node01:eth1 up.
> heartbeat[23356]: 2008/02/27_08:42:47 info: Starting 
> "/usr/lib64/heartbeat/ipfail" as uid 90  gid 90 (pid 23356)
> harc[23352][23357]: 2008/02/27_08:42:47 info: Running /etc/ha.d/rc.d/status 
> status
> harc[23360][23363]: 2008/02/27_08:42:47 info: Running /etc/ha.d/rc.d/status 
> status
> heartbeat[23341]: 2008/02/27_08:42:57 info: local resource transition 
> completed.
> heartbeat[23341]: 2008/02/27_08:42:57 info: Initial resource acquisition 
> complete (T_RESOURCES(us))
> heartbeat[23341]: 2008/02/27_08:42:57 info: remote resource transition 
> completed.
> IPaddr[23393][23410]: 2008/02/27_08:42:57 INFO:  Resource is stopped
> heartbeat[23366]: 2008/02/27_08:42:57 info: Local Resource acquisition 
> completed.
> harc[23414][23417]: 2008/02/27_08:42:57 info: Running 
> /etc/ha.d/rc.d/ip-request-resp ip-request-resp
> ip-request-resp[23414][23420]: 2008/02/27_08:42:57 received ip-request-resp 
> 10.0.0.90 OK yes
> ResourceManager[23421][23429]: 2008/02/27_08:42:57 info: Acquiring resource 
> group: node02 10.0.0.90 nfslock nfs_wrapper
> IPaddr[23441][23458]: 2008/02/27_08:42:58 INFO:  Resource is stopped
> ResourceManager[23421][23474]: 2008/02/27_08:42:58 info: Running 
> /etc/ha.d/resource.d/IPaddr 10.0.0.90 start
> IPaddr[23485][23494]: 2008/02/27_08:42:58 INFO: Using calculated nic for 
> 10.0.0.90: eth0
> IPaddr[23485][23499]: 2008/02/27_08:42:58 DEBUG: Using calculated netmask for 
> 10.0.0.90: 255.255.255.192
> IPaddr[23485][23504]: 2008/02/27_08:42:58 DEBUG: Using calculated broadcast 
> for 10.0.0.90: 10.0.0.127
> IPaddr[23485][23521]: 2008/02/27_08:42:58 INFO: eval /sbin/ifconfig eth0:0 
> 10.0.0.90 netmask 255.255.255.192 broadcast 10.0.0.127
> IPaddr[23485][23526]: 2008/02/27_08:42:58 DEBUG: Sending Gratuitous Arp for 
> 10.0.0.90 on eth0:0 [eth0]
> IPaddr[23476][23540]: 2008/02/27_08:42:59 INFO:  Success
> ResourceManager[23421][23572]: 2008/02/27_08:42:59 info: Running 
> /etc/init.d/nfslock  start
> ResourceManager[23421][23628]: 2008/02/27_08:43:00 info: Running 
> /etc/init.d/nfs_wrapper  start
> heartbeat[23341]: 2008/02/27_08:44:46 info: Link node01:eth0 dead.
> heartbeat[23341]: 2008/02/27_08:44:58 info: node01 wants to go standby [all]
> logd[4645]: 2008/02/27_08:48:48 info: logd started with /etc/logd.cf.
> logd[4650]: 2008/02/27_08:48:48 info: G_main_add_SignalHandler: Added signal 
> handler for signal 15
> logd[4645]: 2008/02/27_08:48:48 info: G_main_add_SignalHandler: Added signal 
> handler for signal 15
> ...

> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Secondary reboot

Reply via email to