Jesse,

I did follow both your suggestion as well as Bernards and the problem has gone 
away. I adjusted the "data source" collection interval to 45 seconds.

Also, I have a python script I wrote with pexpect that will ssh into all of 
your servers and stop, start, or restart gmond as directed if anyone wants it. 
Makes restarting your site easier.

Thanks for the help.

-Regards

Ron Cavallo 
Sr. Director, Infrastructure
Saks Fifth Avenue / Saks Direct
12 East 49th Street
New York, NY 10017
212-451-3807 (O)
212-940-5079 (fax) 
646-315-0119(C) 
www.saks.com
 

-----Original Message-----
From: Jesse Becker [mailto:haw...@gmail.com] 
Sent: Wednesday, March 16, 2011 10:43 PM
To: Ron Cavallo
Cc: ganglia-general@lists.sourceforge.net
Subject: Re: [Ganglia-general] Web Frontend says that nodes are coming up and 
down , but they are not

Before I answer, I'll mention that Bernard's advice is good, and you
should follow it. :-)

The reason that the stop-gmond/bounce-gmetad/start-gmond process works
has to do with how gmond stores and shares data.  In most cases, gmond
will store data for every single *other* gmond that it hears about. In
the case of multicast, this can be a lot of hosts.  Gmetad also keeps
some state as well.  Shutting down the various parts will clear
everything out so you can start fresh.  This is really more useful
when you have gmond clients that you have decomissioned, but can't
remove from Ganglia.

Strictly speaking, you should be able to do:
1) stop gmond everywhere
2) stop gmetad
3) start gmond everywhere
4) start gmetad

But I find it easier to issue 3 commands instead of 4.  The web UI
also is offline when gmetad is not running, and minimizing that
"downtime" may be important in some circumstances.

On Wed, Mar 16, 2011 at 21:32, Ron Cavallo <ron_cava...@s5a.com> wrote:
> I would gladly do that in that fashion, can you explain why this corrects
> the problem?
>
> Ron Cavallo
> Sr. Director, Infrastructure
> Saks Fifth Avenue / Saks Direct
> 12 East 49th Street
> New York, NY 10017
> 212-451-3807 (O)
> 212-451-3510 (fax)
> 646-315-0119(C)
> www.saks.com <http://www.saks.com/>
>
> ----- Original Message -----
> From: Jesse Becker <haw...@gmail.com>
> To: Ron Cavallo
> Cc: ganglia-general@lists.sourceforge.net
> <ganglia-general@lists.sourceforge.net>
> Sent: Wed Mar 16 19:53:29 2011
> Subject: Re: [Ganglia-general] Web Frontend says that nodes are coming up
> and down , but they are not
>
> I've seen this occasionally.  The usual (and perhaps only) solution is
> to shutdown *all* of the gmond processes running on your nodes.
> Bounce gmetad, then start gmond everywhere.
>
> On Wed, Mar 16, 2011 at 17:06, Ron Cavallo <ron_cava...@s5a.com> wrote:
>> Every minute nodes disappear from the web front end and the webfrontend
>> reports then as down. Then they get reported up a minute or so later, then
>> repeat. Any ideas what is going on?
>>
>> This is what I see every couple of minutes:
>>
>> Ron Cavallo
>>
>> Sr. Director, Infrastructure
>> Saks Fifth Avenue / Saks Direct
>> 12 East 49th Street
>> New York, NY 10017
>> 212-451-3807 (O)
>> 212-940-5079 (fax)
>>
>> 646-315-0119(C)
>>
>> www.saks.com
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Colocation vs. Managed Hosting
>> A question and answer guide to determining the best fit
>> for your organization - today and in the future.
>> http://p.sf.net/sfu/internap-sfd2d
>> _______________________________________________
>> Ganglia-general mailing list
>> Ganglia-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/ganglia-general
>>
>>
>
>
>
> --
> Jesse Becker
>



-- 
Jesse Becker

------------------------------------------------------------------------------
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to