Ben Hartshorne wrote:
On Mon, Feb 26, 2007 at 01:06:23PM -0600, Seth Graham wrote:
Ben Hartshorne wrote:
It seems to me that using the name to determine cluster membership would
simplify things for the people configuring ganglia.
It would, but when you have 3000+ machines all chattering on the same port that's a lot of data for a machine to deal with. Not only do the aggregating machines have to hold it all in memory, but the gmetad host has to dump all that info into the rrds.

Isn't the machine going to have to handle exactly the same amount of
data, regardless of whether its on one port or two?

Not neccessarily, because you can instruct a machine to not poll a port at all. This is an easy way to exploit the features of networking to limit the traffic a host has to parse.

If you break up clusters by the name, the machine will have to read in the data for everything that exists on a subnet and filter based on the data it captures.

> I would imagine
that by the time your network got to 3000+ hosts, things would be
segregated in their own right, independent of ganglia.  Such segregation
would make it easy (and more logical) to use head nodes as aggregators
and then pass data up the tree to your main web interface.  Multicast
networks can be broken up by subnet or VLAN, and the unicast nodes can
use ganglia's ability to only pass on summary info, etc.

This is true. Assuming one keeps them reasonable the volume of data would not become a problem. But IP blocks always seem to be getting bigger and bigger when new ones are assigned, and people are always cramming more and more machines onto them. We haven't crossed the line of 1000 machines on a single vlan yet, but the IP space is there and I worry what happens then.

The main web interface is my concern. Gmetad sucks up memory like it's free, and the disk I/O created when rrds are updated quickly get out of hand. Because of this we had to move the rrds to a ramdisk, which eats up even more memory.

Of course, I have not had the privilege of working with a cluster of
that size.  I've only got just over 100 hosts, so please forgive
anything that will become obvious as soon as I actually have to deal
with the problem...  ;)

I think your idea could work, it just seems (to me) to rely on a lot more components being configured in an ideal way. In my experience, I never get that. ;)


Reply via email to