Hi,

I have changed the ip address of some members of one of our cluster and have
seen that in some cases this causes the heartbeat to not be recognized
even though TN is an acceptable value and current data is still displayed.
What happens is that a "This host is down' message is displayed and the host
is not included with the list of up computers.

Know what I can do to repair this?

The problem is that the ganglia host report web page is reporting that no
heartbeat is being seen for four of fifteen machines whose ip address has
been
changed.  However if you look at the XML being served by gmetad you see.

<HOST NAME="host name" IP="new ip address" REPORTED="1175278727" TN="4" 
    TMAX="20" DMAX="0" LOCATION="unspecified" GMOND_STARTED="1175278188">
<HOST NAME="host name" IP="old ip address" REPORTED="1173869674" TN="1409057"
    TMAX="20" DMAX="0" LOCATION="unspecified" GMOND_STARTED="1170351747">

Note two entries. One for the new ip address with TN=4 and one for the old
padres with TN=1409057.  It looks like for heartbeat the latest entry in the
XML listing is being used.

For the other fifteen machines, the order of the XML entries is switched with
the
new ip address being last, then the heartbeat is recognized.

--Lew



Reply via email to