Sorry, I thought I was posting to the list when in fact I was
communicating just with Steven Wagner!
--
John Francis Lee
1/9-10 Thanon Trairat
Muang Chiang Rai 57000
THAILAND
[EMAIL PROTECTED]
--- Begin Message ---
Maybe if you responded to the list to tell of your success, you'd be able
to help others. ;)
John Francis Lee wrote:
Victory!
Apparently my problem was that I didn't understand what constituted a
data source.
Originally I thought each machine was a data source.
That didn't seem to work, so I imagined that I needed to specify just
one machine as a data source, since they all synchronized knowledge of
the cluster via multicast, as in vector based routing.
But neither of these guesses was correct.
When I followed your advice and created a single data source for the
cluster, and put all the machines on the rhs... voila! Beautiful
pictures of the data that was there all the time.
Thanks for your patience.
I don't know if others might be similarly confused as to what a data
source is.
On พ., 2003-01-29 at 01:02, Steven Wagner wrote:
John Francis Lee wrote:
Thanks again!
Setting the debug level to 10 showed me that gmetad was unable to
connect to itself! I changed the datasource specification to 'localhost'
from the machine'd fqdn and things worked!
What I get now is
'There are 10 nodes up and running. There are no nodes down.'
But when I click on the pictures of any of the 10 machines I get:
'This node is down'.
I'm still investigating why this is so.
gmetad's connecting to itself? Oh my. That should never happen.
Also, data_source is on a per-cluster basis.
In other words:
data_source "one_group_of_servers" server1.farm.net server2.farm.net [...]
There may be a display bug in the released version of the web front-end
that counts down hosts as up in the meta and cluster views (the host view
is correct). But that doesn't explain why the hosts appear as down in the
first place.
What's especially creepy is that the hosts are ALL marked as down. If
there were a network problem you would expect n-1 hosts to be down, and 1
host to be up - the one running the gmond that gmetad queried to get the
data in the first place.
In fact, there might be another bug in one of the webfrontend tarballs
floating around out there. I think I might have seen this when I upgraded
the web-frontend to 2.5.1 - the code starts using TN/TMAX to determine
whether a host is up, but there's a logic error in it. Pretty sure it's
fixed in CVS.
Try the CVS version of ganglia-webfrontend and see if that fixes it.
[maybe it's time to release 2.5.2?]
(I draw the line at a CVS howto, folks...)
--- End Message ---