I've written more than my share of machine.c's, folks, and in doing so I've noticed that I do the same thing over and over again.

Namely, I almost always chuck the "I'll go query the appropriate subsystem, discard all the data I don't need for this metric, and return the result" method in favor of a method I call, "I'll call a function that gathers all the appropriate data and loads them into a struct, then return whatever's in that struct as my metric."

Most of the reason I did this is that I realized that some metrics are related. It's no good to gather each of your related memory stats fifteen seconds apart - you've got to grab them all at once, otherwise mem_* doesn't add up to mem_total and the words "margin of error" start to become meaningful.

This is especially noticeable with the CPU stats, which have been giving me quite a bit of a headache until about 10 minutes ago when I laid my vengeance upon them*.

So to expand upon one of my caffeinated asides earlier this month, at some point (3.0, sooner, I don't know) we should take a very good look at the machines directory and then drain as much non-platform-specific code as possible out of it and stick it in a utility library.

I am painfully aware that there's about 75 different ways of kicking a running kernel and getting it to spit out metrics. However, many of the raw data structures that come out are similar across platforms and you end up doing the same calculations on it everywhere. Chances are you will want to convert the amount of free memory from pages into kilobytes. Chances are you will want to do a whole lot of voodoo on CPU ticks in order to get percentages. [insert exceedingly obvious music reference here]

I would especially like to move the responsibility of implementing metrics away from the machine.c files - in other words, if mtu_func isn't implemented, it should return a zero instead of bringing the compiler down in a doomed fiery ball of flaming, spherical doom. If machine.c becomes a data collector that populates a struct, and the data processing is done elsewhere, we gain a uniformity of metrics that we don't see so much right now. I am taking it on good faith that the kernel CPU percentages reported in Linux procfs are approximately similar to the figures I'm getting from Solaris, IRIX and Tru64 but I have no real way of guaranteeing this since the data collection and massaging operations are all slightly different.

Plus, new metrics can be implemented (by anyone) without aforesaid fiery doom. Although we still have this XDR metric number thing. That's kind of a bummer. If only there was a way to build the XDR metric hash on-the-fly on a per-cluster basis. Is that a totally absurd concept? There's no "master node" on a cluster so either the nodes would have to democratically assign a metric number to the new XDR or they'd have to each maintain a separate list and we're back to a string representation of the metric in the XDR.

OK, so how about we steal a bit of DHCP? When a "non-standard" (defined in the new metric code of course :) ) metric first appears on a vanilla network (through gmetric, a monitoring core upgrade on one box, etc.), the metric is sent out with a special, distinctive XDR metric value (0, -1, 313378649, 0xDEADBEEF, etc.). The oldest node with a current heartbeat value is expected to assign the new index value for the metric. If it isn't heard from in 15 seconds, the next oldest node responds, etc. Or the "master node" if such a system is implemented. Or each node decides on its own. Or each node multicasts an "election packet" containing its next available metric value. On receiving a heartbeat metric from a new node, the first node to hear it (or the master node) sends an ACK/"here comes some config data"/"I'll get it!" packet over the multicast channel and then starts sending packets with the metric hash info for that cluster.

This deliberately blurs or erases the line between internal and external metrics. Which might be useful. It also makes it possible to auto-configure new nodes (assuming you trust the multicast channel :) ).

Anyway, just stuff I was thinking about while I was debugging the Tru64 monitoring core. :O!





* - Know why my percentages were off? Because CPU_STATES enumerates from zero with a meaningful last value, and the percentage-calculating code (taken straight from top, but top uses the same CPU_STATES value!) says:
for(i = 0; i < CPU_STATES; i++) ... DOH!  Works nice with a <= in there...


Reply via email to