Hello, Sometimes when heartbeat starts I get an error: NV failure (msgfromsteam). It looks like mach_down script and heartbeat are writing to the fifo at the same time and messages are lost. Because of this failover doesn't happen. Is there any solution to this? Below I attached log from ha-debug.
Thank you in advance. Best Regards, Arek heartbeat[31298]: 2012/06/19_11:12:18 WARN: Logging daemon is disabled --enabling logging daemon is recommended heartbeat[31298]: 2012/06/19_11:12:18 info: ************************** heartbeat[31298]: 2012/06/19_11:12:18 info: Configuration validated. Starting heartbeat 2.1.4 heartbeat[31299]: 2012/06/19_11:12:18 info: heartbeat: version 2.1.4 heartbeat[31299]: 2012/06/19_11:12:18 info: Heartbeat generation: 1340024896 heartbeat[31299]: 2012/06/19_11:12:18 info: seed is 1421633575 heartbeat[31299]: 2012/06/19_11:12:18 info: Creating FIFO /var/lib/heartbeat/fifo. heartbeat[31299]: 2012/06/19_11:12:18 info: glib: ping group heartbeat started. heartbeat[31299]: 2012/06/19_11:12:18 info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth0 heartbeat[31299]: 2012/06/19_11:12:18 info: glib: ucast: bound send socket to device: eth0 heartbeat[31299]: 2012/06/19_11:12:18 info: glib: ucast: bound receive socket to device: eth0 heartbeat[31299]: 2012/06/19_11:12:18 info: glib: ucast: started on port 694 interface eth0 to 192.168.246.106 heartbeat[31299]: 2012/06/19_11:12:18 info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth1 heartbeat[31299]: 2012/06/19_11:12:18 info: glib: ucast: bound send socket to device: eth1 heartbeat[31299]: 2012/06/19_11:12:18 info: glib: ucast: bound receive socket to device: eth1 heartbeat[31299]: 2012/06/19_11:12:18 info: glib: ucast: started on port 694 interface eth1 to 192.168.11.221 heartbeat[31299]: 2012/06/19_11:12:18 info: G_main_add_TriggerHandler: Added signal manual handler heartbeat[31299]: 2012/06/19_11:12:18 info: G_main_add_TriggerHandler: Added signal manual handler heartbeat[31299]: 2012/06/19_11:12:18 info: G_main_add_SignalHandler: Added signal handler for signal 17 heartbeat[31299]: 2012/06/19_11:12:18 info: Local status now set to: 'up' heartbeat[31299]: 2012/06/19_11:12:18 info: Link default_group:default_group up. heartbeat[31299]: 2012/06/19_11:12:18 info: Status update for node default_group: status ping heartbeat[31299]: 2012/06/19_11:12:34 WARN: node dss41148823: is dead heartbeat[31299]: 2012/06/19_11:12:34 info: Comm_now_up(): updating status to active heartbeat[31299]: 2012/06/19_11:12:34 info: Local status now set to: 'active' heartbeat[31299]: 2012/06/19_11:12:34 info: Starting child client "/usr/lib/heartbeat/ipfail" (60,60) heartbeat[31299]: 2012/06/19_11:12:34 info: Starting child client "/usr/lib/heartbeat/dopd" (60,60) heartbeat[6152]: 2012/06/19_11:12:34 info: Starting "/usr/lib/heartbeat/ipfail" as uid 60 gid 60 (pid 6152) heartbeat[31299]: 2012/06/19_11:12:34 WARN: No STONITH device configured. heartbeat[31299]: 2012/06/19_11:12:34 WARN: Shared disks are not protected. heartbeat[31299]: 2012/06/19_11:12:34 info: Resources being acquired from dss41148823. heartbeat[6154]: 2012/06/19_11:12:34 info: Starting "/usr/lib/heartbeat/dopd" as uid 60 gid 60 (pid 6154) heartbeat[6156]: 2012/06/19_11:12:34 debug: notify_world: setting SIGCHLD Handler to SIG_DFL ipfail[6152]: 2012/06/19_11:12:34 debug: PID=6152 ipfail[6152]: 2012/06/19_11:12:34 debug: Signing in with heartbeat harc[6156]: 2012/06/19_11:12:34 info: Running /etc/ha.d/rc.d/status status /usr/lib/heartbeat/dopd[6154]: 2012/06/19_11:12:34 debug: PID=6154 /usr/lib/heartbeat/dopd[6154]: 2012/06/19_11:12:34 debug: Signing in with heartbeat ipfail[6152]: 2012/06/19_11:12:34 debug: [We are dss92574518] /usr/lib/heartbeat/dopd[6154]: 2012/06/19_11:12:34 debug: [We are dss92574518] mach_down[6204]: 2012/06/19_11:12:34 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired heartbeat[31353]: 2012/06/19_11:12:34 WARN: ha_msg_add_nv_depth: line doesn't contain '=' heartbeat[31353]: 2012/06/19_11:12:34 info: >>> heartbeat[31353]: 2012/06/19_11:12:34 ERROR: NV failure (msgfromsteam): [>>> ] heartbeat[6157]: 2012/06/19_11:12:34 info: Local Resource acquisition completed. heartbeat[31299]: 2012/06/19_11:12:34 info: Initial resource acquisition complete (T_RESOURCES(us)) heartbeat[31299]: 2012/06/19_11:12:34 debug: StartNextRemoteRscReq(): child count 1 /usr/lib/heartbeat/dopd[6154]: 2012/06/19_11:12:34 debug: Setting message filter mode ipfail[6152]: 2012/06/19_11:12:34 debug: auto_failback -> 0 (off) mach_down[6204]: 2012/06/19_11:12:34 info: mach_down takeover complete for node dss41148823. _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems