[users] node down not detected

David Hoyt Tue, 15 Sep 2015 11:57:12 -0700

Hi All,

I have configured opensaf 4.6 on two virtual machines (VMs) running  RHEL 6.6 
as CLM nodes with TCP as the communication mechanism.


Setup:
- cluster with just 2 controllers
- each controller runs on a VM (RHEL 6.6)
- each VM resides on its own physical server which is also running RHEL 6.6
- AmfDemo app (2N) is also running on each controller VM
- SC-1 has

*         "active" 2N OpenSAF SU1

*         "standby" AmfDemo SU1
- SC-2 has

*         "standby" 2N OpenSAF SU2

*         "active" AmfDemo SU2

dtmd.conf is:
DTM_INI_DIS_TIMEOUT_SECS=5
DTM_TCP_KEEPIDLE_TIME=2
DTM_TCP_KEEPALIVE_INTVL=1
DTM_TCP_KEEPALIVE_PROBES=2


When SC-2 VM is rebooted, all is well.
osafdtmd on SC-1 logs that  it has "Lost contact" with SC-2 and AmfDemo SU1 
becomes "active". Eventually, SC-2 recovers and the AmfDemo SU2 goes standby.

Now, with the same setup, instead of restarting the SC-2 VM, I power cycle the 
server where the SC-2 is running. In this case, there is no indication from 
OpenSAF on SC-1 that SC-2 is down.
I have another monitoring process on each VM that simply does a ping and this 
process detects the loss of the mate node.
I've seen other threads where people indicated that they were having problems 
with opensaf not detecting eth0 or node down, but I never found if there was a 
resolution.

Any help/suggestions would be appreciated.

Thanks,
David

------------------------------------------------------------------------------
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

[users] node down not detected

Reply via email to