Hello,

Sometimes when heartbeat starts I get an error: NV failure 
(msgfromsteam). It looks like mach_down script and heartbeat are writing 
to the fifo at the same time and messages are lost. Because of this 
failover doesn't happen.
Is there any solution to this?
Below I attached log from ha-debug.

Thank you in advance.

Best Regards,
Arek

heartbeat[31298]: 2012/06/19_11:12:18 WARN: Logging daemon is disabled 
--enabling logging daemon is recommended
heartbeat[31298]: 2012/06/19_11:12:18 info: **************************
heartbeat[31298]: 2012/06/19_11:12:18 info: Configuration validated. 
Starting heartbeat 2.1.4
heartbeat[31299]: 2012/06/19_11:12:18 info: heartbeat: version 2.1.4
heartbeat[31299]: 2012/06/19_11:12:18 info: Heartbeat generation: 1340024896
heartbeat[31299]: 2012/06/19_11:12:18 info: seed is 1421633575
heartbeat[31299]: 2012/06/19_11:12:18 info: Creating FIFO 
/var/lib/heartbeat/fifo.
heartbeat[31299]: 2012/06/19_11:12:18 info: glib: ping group heartbeat 
started.
heartbeat[31299]: 2012/06/19_11:12:18 info: glib: ucast: write socket 
priority set to IPTOS_LOWDELAY on eth0
heartbeat[31299]: 2012/06/19_11:12:18 info: glib: ucast: bound send 
socket to device: eth0
heartbeat[31299]: 2012/06/19_11:12:18 info: glib: ucast: bound receive 
socket to device: eth0
heartbeat[31299]: 2012/06/19_11:12:18 info: glib: ucast: started on port 
694 interface eth0 to 192.168.246.106
heartbeat[31299]: 2012/06/19_11:12:18 info: glib: ucast: write socket 
priority set to IPTOS_LOWDELAY on eth1
heartbeat[31299]: 2012/06/19_11:12:18 info: glib: ucast: bound send 
socket to device: eth1
heartbeat[31299]: 2012/06/19_11:12:18 info: glib: ucast: bound receive 
socket to device: eth1
heartbeat[31299]: 2012/06/19_11:12:18 info: glib: ucast: started on port 
694 interface eth1 to 192.168.11.221
heartbeat[31299]: 2012/06/19_11:12:18 info: G_main_add_TriggerHandler: 
Added signal manual handler
heartbeat[31299]: 2012/06/19_11:12:18 info: G_main_add_TriggerHandler: 
Added signal manual handler
heartbeat[31299]: 2012/06/19_11:12:18 info: G_main_add_SignalHandler: 
Added signal handler for signal 17
heartbeat[31299]: 2012/06/19_11:12:18 info: Local status now set to: 'up'
heartbeat[31299]: 2012/06/19_11:12:18 info: Link 
default_group:default_group up.
heartbeat[31299]: 2012/06/19_11:12:18 info: Status update for node 
default_group: status ping
heartbeat[31299]: 2012/06/19_11:12:34 WARN: node dss41148823: is dead
heartbeat[31299]: 2012/06/19_11:12:34 info: Comm_now_up(): updating 
status to active
heartbeat[31299]: 2012/06/19_11:12:34 info: Local status now set to: 
'active'
heartbeat[31299]: 2012/06/19_11:12:34 info: Starting child client 
"/usr/lib/heartbeat/ipfail" (60,60)
heartbeat[31299]: 2012/06/19_11:12:34 info: Starting child client 
"/usr/lib/heartbeat/dopd" (60,60)
heartbeat[6152]: 2012/06/19_11:12:34 info: Starting 
"/usr/lib/heartbeat/ipfail" as uid 60  gid 60 (pid 6152)
heartbeat[31299]: 2012/06/19_11:12:34 WARN: No STONITH device configured.
heartbeat[31299]: 2012/06/19_11:12:34 WARN: Shared disks are not protected.
heartbeat[31299]: 2012/06/19_11:12:34 info: Resources being acquired 
from dss41148823.
heartbeat[6154]: 2012/06/19_11:12:34 info: Starting 
"/usr/lib/heartbeat/dopd" as uid 60  gid 60 (pid 6154)
heartbeat[6156]: 2012/06/19_11:12:34 debug: notify_world: setting 
SIGCHLD Handler to SIG_DFL
ipfail[6152]: 2012/06/19_11:12:34 debug: PID=6152
ipfail[6152]: 2012/06/19_11:12:34 debug: Signing in with heartbeat
harc[6156]:    2012/06/19_11:12:34 info: Running /etc/ha.d/rc.d/status 
status
/usr/lib/heartbeat/dopd[6154]: 2012/06/19_11:12:34 debug: PID=6154
/usr/lib/heartbeat/dopd[6154]: 2012/06/19_11:12:34 debug: Signing in 
with heartbeat
ipfail[6152]: 2012/06/19_11:12:34 debug: [We are dss92574518]
/usr/lib/heartbeat/dopd[6154]: 2012/06/19_11:12:34 debug: [We are 
dss92574518]
mach_down[6204]:    2012/06/19_11:12:34 info: 
/usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
heartbeat[31353]: 2012/06/19_11:12:34 WARN: ha_msg_add_nv_depth: line 
doesn't contain '='
heartbeat[31353]: 2012/06/19_11:12:34 info: >>>

heartbeat[31353]: 2012/06/19_11:12:34 ERROR: NV failure (msgfromsteam): [>>>
]
heartbeat[6157]: 2012/06/19_11:12:34 info: Local Resource acquisition 
completed.
heartbeat[31299]: 2012/06/19_11:12:34 info: Initial resource acquisition 
complete (T_RESOURCES(us))
heartbeat[31299]: 2012/06/19_11:12:34 debug: StartNextRemoteRscReq(): 
child count 1
/usr/lib/heartbeat/dopd[6154]: 2012/06/19_11:12:34 debug: Setting 
message filter mode
ipfail[6152]: 2012/06/19_11:12:34 debug: auto_failback -> 0 (off)
mach_down[6204]:    2012/06/19_11:12:34 info: mach_down takeover 
complete for node dss41148823.


_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to