On 01/08/2012 11:37 PM, SATHYA - IT wrote: > Hi, > > We had configured RHEL 6.2 - 2 node Cluster with clvmd + gfs2 + cman + > smb. We have 4 nic cards in the servers where 2 been configured in > bonding for heartbeat (with mode=1) and 2 been configured in bonding for > public access (with mode=0). Heartbeat network is connected directly > from server to server. Once in 3 – 4 days, the heartbeat goes down and > comes up automatically in 2 to 3 seconds. Not sure why this down and up > occurs. Because of this in cluster, one system is got fenced by other. > > Is there anyway where we can increase the time to wait for the cluster > to wait for heartbeat. Ie if the cluster can wait for 5-6 seconds even > the heartbeat fails for 5-6 seconds the node won’t get fenced. Kindly > advise.
"mode=1" is Active/Passive and I use it extensively with no trouble. I'm not sure where "heartbeat" comes from, but I might be missing the obvious. Can you share your bond and eth configuration files here please (as plain-text attachments)? Secondly, make sure that you are actually using that interface/bond. Run 'gethostip -d <nodename>', where "nodename" is what you set in cluster.conf. The returned IP will be the one used by the cluster. Back to the bond; A failed link would nearly instantly transfer to the backup link. So if you are going down for 2~3 seconds on both links, something else is happening. Look at syslog on both nodes around the time the last fence happened and see what logs are written just prior to the fence. That might give you a clue. -- Digimer E-Mail: digi...@alteeve.com Freenode handle: digimer Papers and Projects: http://alteeve.com Node Assassin: http://nodeassassin.org "omg my singularity battery is dead again. stupid hawking radiation." - epitron -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster