Lester Vecsey wrote:
Looking through the key_metrics.h file it seems that linux machines get a
different set of keys from aix, and so on. Theres a basic core set of keys
that are on all platforms, but then when it gets to things like pkts_in its
only available for linux.
In particular pkts_in is key 31, and here is the question: Imagine you have
a gmond running on aix that is getting its xml tcp port queried from a
gmetad, and you add a couple gmonds on linux machines to the same cluster.
The pkts_in keys from the linux machines aren't going to be accepted by the
aix gmond because they key is out of range.
This begs the question of what a future windows version should do as well --
should key 31 be considered pkts_in for all platforms, i.e., go by the linux
definitions? I think solaris overlaps, and this is a potential problem as
well. Is ganglia leading towards having metric key definitaions that will be
interchangeable between platforms in a global sense, or perhaps having a
'platform' key that allows gmond to go by its own set of key listings per
platform type that is sending to it?
I hope this makes sense.. would appreciate some feedback.
Mmmm, deja vu all over again. We slugged this out on the dev list around
six months ago.
Basically, this is a problem because there's no standard for metrics, only
consensus. It gets weirder - mix and match platforms in a single cluster
that have different metrics on the same metric key and both of them will
report the metrics ... but in their own name/unit format.
It's all rather confusing, really.
Oh, the fix? You want the fix?
[first few seconds of "One Thing Leads To Another..." ... no, not that Fixx!]
... wait for Ganglia 3, which will handle different metrics on different
platforms in the same cluster ...
...OR...
... put different platforms in different clusters!
Heterogeneous clusters are not guaranteed to operate correctly at this
time. If the platforms you want to mix have their metric keys lined up
properly, then fabulous - you have no problems. If not, then you're going
to get some data dropped or some data misrepresented, depending on which
monitoring core platform you query.
If this is a showstopping problem, modify the metric headers and platform
file so that you have the right number and order of metrics for both
platforms, and fill out the unsupported metrics with dummy functions that
return a bogus value.
Or you could actually write support in for the metric... ;)