On 12/07/2013 03:00 PM, Vladimir Vuksan wrote: >> Were these failures totally random or grouped in some way? (Same >> cluster, type, etc). > > We run multiple dozens of clusters and some of the larger clusters ie. > clusters that had 2-3x machines that other clusters would exhibit either > gaps, slower updating ie. data points would update on 3 or 4x poll > period if you looked on graph details. Also Grid summary views > disappeared altogether. >
This very broadly matches our experience (mo' metrics, mo' problems) but we never found a way to quantify 'gap-y-ness to prove it. >>> In our Ganglia setup, we run a `gmond` to collect data for every >>> machine and >>> several `gmetad` processes: >>> >>> * An interactive `gmetad` process is responsible solely for >>> reporting summary >>> statistics to the web interface. >>> * Another `gmetad` process is responsible for writing graphs. >> >> >> Are these two gmetad process co-located on the same server? I think >> this is an interesting option that I at least was not aware of. > > > This set up is very similar to what you have. Basically have one gmetad > that polls all the same gmonds however has write_rrd off and is used for > both alerting and feeding the web interface. > Ah, in our case we had our gmetad for rrd/webui on a different host from the one used for alerting. This somewhat reduced interference with the webui, but even without writing rrds the alerting instance seemed to suffer delays of various sorts. Again anecdotally there tend to be more gaps when people are looking for them, so separate process for writing rrds and interactive web queries might help in our setup. ------------------------------------------------------------------------------ Sponsored by Intel(R) XDK Develop, test and display web and hybrid apps with a single code base. Download it for free now! http://pubads.g.doubleclick.net/gampad/clk?id=111408631&iu=/4140/ostg.clktrk _______________________________________________ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers