Jesse, I did follow both your suggestion as well as Bernards and the problem has gone away. I adjusted the "data source" collection interval to 45 seconds.
Also, I have a python script I wrote with pexpect that will ssh into all of your servers and stop, start, or restart gmond as directed if anyone wants it. Makes restarting your site easier. Thanks for the help. -Regards Ron Cavallo Sr. Director, Infrastructure Saks Fifth Avenue / Saks Direct 12 East 49th Street New York, NY 10017 212-451-3807 (O) 212-940-5079 (fax) 646-315-0119(C) www.saks.com -----Original Message----- From: Jesse Becker [mailto:haw...@gmail.com] Sent: Wednesday, March 16, 2011 10:43 PM To: Ron Cavallo Cc: ganglia-general@lists.sourceforge.net Subject: Re: [Ganglia-general] Web Frontend says that nodes are coming up and down , but they are not Before I answer, I'll mention that Bernard's advice is good, and you should follow it. :-) The reason that the stop-gmond/bounce-gmetad/start-gmond process works has to do with how gmond stores and shares data. In most cases, gmond will store data for every single *other* gmond that it hears about. In the case of multicast, this can be a lot of hosts. Gmetad also keeps some state as well. Shutting down the various parts will clear everything out so you can start fresh. This is really more useful when you have gmond clients that you have decomissioned, but can't remove from Ganglia. Strictly speaking, you should be able to do: 1) stop gmond everywhere 2) stop gmetad 3) start gmond everywhere 4) start gmetad But I find it easier to issue 3 commands instead of 4. The web UI also is offline when gmetad is not running, and minimizing that "downtime" may be important in some circumstances. On Wed, Mar 16, 2011 at 21:32, Ron Cavallo <ron_cava...@s5a.com> wrote: > I would gladly do that in that fashion, can you explain why this corrects > the problem? > > Ron Cavallo > Sr. Director, Infrastructure > Saks Fifth Avenue / Saks Direct > 12 East 49th Street > New York, NY 10017 > 212-451-3807 (O) > 212-451-3510 (fax) > 646-315-0119(C) > www.saks.com <http://www.saks.com/> > > ----- Original Message ----- > From: Jesse Becker <haw...@gmail.com> > To: Ron Cavallo > Cc: ganglia-general@lists.sourceforge.net > <ganglia-general@lists.sourceforge.net> > Sent: Wed Mar 16 19:53:29 2011 > Subject: Re: [Ganglia-general] Web Frontend says that nodes are coming up > and down , but they are not > > I've seen this occasionally. The usual (and perhaps only) solution is > to shutdown *all* of the gmond processes running on your nodes. > Bounce gmetad, then start gmond everywhere. > > On Wed, Mar 16, 2011 at 17:06, Ron Cavallo <ron_cava...@s5a.com> wrote: >> Every minute nodes disappear from the web front end and the webfrontend >> reports then as down. Then they get reported up a minute or so later, then >> repeat. Any ideas what is going on? >> >> This is what I see every couple of minutes: >> >> Ron Cavallo >> >> Sr. Director, Infrastructure >> Saks Fifth Avenue / Saks Direct >> 12 East 49th Street >> New York, NY 10017 >> 212-451-3807 (O) >> 212-940-5079 (fax) >> >> 646-315-0119(C) >> >> www.saks.com >> >> >> >> >> ------------------------------------------------------------------------------ >> Colocation vs. Managed Hosting >> A question and answer guide to determining the best fit >> for your organization - today and in the future. >> http://p.sf.net/sfu/internap-sfd2d >> _______________________________________________ >> Ganglia-general mailing list >> Ganglia-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/ganglia-general >> >> > > > > -- > Jesse Becker > -- Jesse Becker ------------------------------------------------------------------------------ Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d _______________________________________________ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general