Jonathan Pauli wrote:
Is there a way to remove some of the standard metrics
from gmond? We are running ganglia on a 300 node cluster
and are seeing some performance issues especialy with the
CPU usage on the head node (running gmetad). It would also
be desirable to cut down on teh clutter on the web page, and
cut down multicast traffic if possible.
Thank you much.
Check out gmond/metric.h - it has all the metrics and metric
thresholds in there. In most cases the common metrics are the CPU load
and memory usage. Increasing the delays on those metrics will probably
help you out. Of course, it does involve rebuilding, redistributing and
restarting gmond. Hopefully you have the infrastructure in place...
As for reducing CPU usage on the gmetad node ... whew. May want to
check the archives. There are several potential bottlenecks and we've
gone over all of them at one point or another. Disk I/O seems to be the
first one people hit on Linux boxes. A temporary filesystem or even a
journalling filesystem may help with performance.
Another thing to consider is, is gmetad using a lot of CPU time when *no
one* is hitting the web front-end? If so then XML parsing (doubtful) or
RRD updating (much more likely) is where all the CPU-cycle-sucking is
happening.
If you're getting those fabulous 8-to-10-second page loads like I am,
you just have to wait for a gmetad and web front-end that support
interactive queries, unfortunately (reducing the number of metrics
transmitted would actually address this, too). Parsing 3MB (or more) of
XML *in PHP* on *every page load* isn't fun...
You can also reduce the number of graphs shown per page. We just went
over that last month, I believe...
Hope this helps.