I like your comments. Here are some counter-questions :)


On Friday, August 30, 2002, at 02:34 PM, Steven Wagner wrote:

It seems to me this would also make the "DSO-ification" of the monitoring core a smoother process, not to mention a cleaner one from the standpoint of those developing the DSO's. :)

Good point.

I was thinking of "yet another hash" that has a hashed-up number based on the name or hierarchy position of the metric as a key. The idea being, this number is shorter than using the fully-qualified name of the metric all the time.

So instead of encoding "cpu.idle" we encode 0x03FA450A and that field's 50% shorter (even better if we get to "processes.top.1.cpu_percentage"), and only have to multicast the real string name once. The hierarchical information is stored (as a pointer, at the very least) in this hash.

What's really going to be key here is not so much the idea of making the
statically-#define'd metric hash dynamic, but keeping it up to date...

If we go far enough in this it'll look like SNMP, only more collaborative. :)


So I am thinking that sending the fully-qualified metric name (as shown above) is a better idea now - it handles failures more effectively. When a node comes up it would receive metrics that look like "host1/cpu/cache/size" (fully-qualified with all the metric's ancestors) instead of "cache/size" (relative as I had suggested previously). This fits in with Steven's idea of hosts being authoritative for branches they created - each metric specifies its branches explicitly. It also reduces reliance on an elder node for the branch hierarchy.

This way a node can easily create branches as needed for any metric it receives.

About the "hash for storing fully qualified metric names (FQMN :)". How would we populate such a hash? At some level, the metric must specify its fully-qualified name, so we know where to put it. A hash value is no good if we don't already have the name stored. How would you handle new metrics? I think we could runlength-encode the name strings to save space if we need to, but having each metric carry its full name seems clearer to me.

I imagine a hash_find(node, "cpu", "cache") function that takes a variable number of arguments to locate the hash table to insert a given metric in (the metric here: host1/cpu/cache/size). The 'node' argument specifies the root of the metric tree - the node hash table for host1. Note each branch would get it's own hash table so that hash_foreach() will work correctly and printing the XML will be easy.

To make this work, we simply add a 'hash_t *branch' member to the metric_data_t structure. If branch==NULL then we are a leaf (actual metric), else this is a branch that points to another hash table. I can visualize the XML output code now...


Dense, yes, but the area of metrics is just about the only one in the Ganglia design that *doesn't* scale well (kudos, Matt & co.). I'm sure that we can work this out if we just keep banging those rocks together. :)

Clever ;)

Do people like the java-like dot notation for hierarchical names, like "host1.cpu.cache.size", or the unix filesystem forward-slash notation: "host1/cpu/cache/size". I like the slashes because its easy to tell if you're talking about a leaf or a branch: "host1/cpu/" is clearly a branch, while "host1.cpu." is a little harder to read. But either way would work.

I'm psyched about this change, and I am ready to dive in right after the 2.5.0 release.

-Federico

Rocks Cluster Group, Camp X-Ray, SDSC, San Diego
GPG Fingerprint: 3C5E 47E7 BDF8 C14E ED92  92BB BA86 B2E6 0390 8845


Reply via email to