[Ganglia-general] News of my final success, which might help another

John Francis Lee Tue, 28 Jan 2003 18:05:52 -0800

Sorry, I thought I was posting to the list when in fact I was
communicating just with Steven Wagner!


-- 
John Francis Lee
1/9-10 Thanon Trairat
Muang Chiang Rai 57000
THAILAND
[EMAIL PROTECTED]

--- Begin Message --- Maybe if you responded to the list to tell of your success, you'd be ableto help others. ;)
John Francis Lee wrote:
Victory!

Apparently my problem was that I didn't understand what constituted a
data source.

Originally I thought each machine was a data source.
That didn't seem to work, so I imagined that I needed to specify just
one machine as a data source, since they all synchronized knowledge of
the cluster via multicast, as in vector based routing.

But neither of these guesses was correct.

When I followed your advice and created a single data source for the
cluster, and put all the machines on the rhs... voila! Beautiful
pictures of the data that was there all the time.

Thanks for your patience.

I don't know if others might be similarly confused as to what a data
source is.


On พ., 2003-01-29 at 01:02, Steven Wagner wrote:
John Francis Lee wrote:
Thanks again!

Setting the debug level to 10 showed me that gmetad was unable to
connect to itself! I changed the datasource specification to 'localhost'
from the machine'd fqdn and things worked!

What I get now is
'There are 10 nodes up and running. There are no nodes down.'

But when I click on the pictures of any of the 10 machines I get:
'This node is down'.

I'm still investigating why this is so.
gmetad's connecting to itself?  Oh my.  That should never happen.

Also, data_source is on a per-cluster basis.

In other words:

data_source "one_group_of_servers" server1.farm.net server2.farm.net [...]
There may be a display bug in the released version of the web front-endthat counts down hosts as up in the meta and cluster views (the host viewis correct). But that doesn't explain why the hosts appear as down in thefirst place.
What's especially creepy is that the hosts are ALL marked as down. Ifthere were a network problem you would expect n-1 hosts to be down, and 1host to be up - the one running the gmond that gmetad queried to get thedata in the first place.
In fact, there might be another bug in one of the webfrontend tarballsfloating around out there. I think I might have seen this when I upgradedthe web-frontend to 2.5.1 - the code starts using TN/TMAX to determinewhether a host is up, but there's a logic error in it. Pretty sure it'sfixed in CVS.
Try the CVS version of ganglia-webfrontend and see if that fixes it.

[maybe it's time to release 2.5.2?]

(I draw the line at a CVS howto, folks...)
--- End Message ---

[Ganglia-general] News of my final success, which might help another

Reply via email to