Lester Vecsey wrote:
Looking through the key_metrics.h file it seems that linux machines get a
different set of keys from aix, and so on. Theres a basic core set of keys
that are on all platforms, but then when it gets to things like pkts_in its
only available for linux.

In particular pkts_in is key 31, and here is the question: Imagine you have
a gmond running on aix that is getting its xml tcp port queried from a
gmetad, and you add a couple gmonds on linux machines to the same cluster.
The pkts_in keys from the linux machines aren't going to be accepted by the
aix gmond because they key is out of range.

This begs the question of what a future windows version should do as well --
should key 31 be considered pkts_in for all platforms, i.e., go by the linux
definitions? I think solaris overlaps, and this is a potential problem as
well. Is ganglia leading towards having metric key definitaions that will be
interchangeable between platforms in a global sense, or perhaps having a
'platform' key that allows gmond to go by its own set of key listings per
platform type that is sending to it?

I hope this makes sense.. would appreciate some feedback.

Mmmm, deja vu all over again. We slugged this out on the dev list around six months ago.

Basically, this is a problem because there's no standard for metrics, only consensus. It gets weirder - mix and match platforms in a single cluster that have different metrics on the same metric key and both of them will report the metrics ... but in their own name/unit format.

It's all rather confusing, really.

Oh, the fix?  You want the fix?

[first few seconds of "One Thing Leads To Another..." ... no, not that Fixx!]

... wait for Ganglia 3, which will handle different metrics on different platforms in the same cluster ...

...OR...

 ... put different platforms in different clusters!

Heterogeneous clusters are not guaranteed to operate correctly at this time. If the platforms you want to mix have their metric keys lined up properly, then fabulous - you have no problems. If not, then you're going to get some data dropped or some data misrepresented, depending on which monitoring core platform you query.

If this is a showstopping problem, modify the metric headers and platform file so that you have the right number and order of metrics for both platforms, and fill out the unsupported metrics with dummy functions that return a bogus value.

Or you could actually write support in for the metric... ;)


Reply via email to