What makes 60 an unlucky number? On 10/25/2012 05:20 PM, Vladimir Vuksan wrote: > 60 seconds is likely the problem. I would leave it at default ie 15. I can > explain later. > > "Potter,Mark L" <mlpot...@mdanderson.org> wrote: > >> Nicholas, >> >> I have it set to collect every 60 seconds at the moment as per the >> gmetad I posted yesterday but even with that, running "netstat -ua" in >> a 1 second watch loop, once Recv-Q pops it is still responding >> immediately and the Recv-Q never stays lit, so to speak, for more than >> two seconds. In fact even telneting to the port only lights up Recv-Q >> for 2 seconds flat. >> ________________________________________ >> From: Nicholas Satterly [nfsatte...@gmail.com] >> Sent: Thursday, October 25, 2012 15:19 >> To: Potter,Mark L >> Cc: ganglia-general@lists.sourceforge.net >> Subject: Re: [Ganglia-general] Question about scaling >> >> Hi Mark, >> >> I wouldn't be so quick to dismiss timeouts as the problem. The >> "0.9751s" it took to download and parse ganglia's XML tree refers to >> the time it took the PHP web frontend to query the gmetad XML whereas >> the timeout's I was referring to occur when the gmetad polls the gmonds >> during metric collection every 15 seconds. >> >> My suggestion would be to run "netstat -ua" in a loop on the head node >> and look for a non-zero "Recv-Q" on UDP port 8649. As soon as you see >> it go non-zero telnet to port 8649 on the head node and make note of >> how long it takes to respond. If it's any longer than 10 seconds you >> will see random hosts down and broken graphs on the ganglia web. >> >> --Nick. >> >> On Thu, Oct 25, 2012 at 8:30 PM, Potter,Mark L >> <mlpot...@mdanderson.org<mailto:mlpot...@mdanderson.org>> wrote: >> Well things blew up ~184 hosts. The web interface shows a random number >> of hosts down each refresh, although sometimes there are all up. It >> reports just ~1 second to download and process the XML: "Downloading >> and parsing ganglia's XML tree took 0.9751s " So I don't think timeouts >> are the problem. A telnet to 8649 produces the XLM immediately. Could >> this be the point where I need start using a RAM based partition or >> could it be something else. Is sflow so much better I should consider >> using it? Would multiple gmond's, say one per rack, and listing them >> all in gmetad be a solution? At this point I am not sure of the next >> step and I really appreciate the help the list have given me so far. >> >> >> >>> Hi Mark, >>> >>> I assume cnode340 is the head node that all ~340 other gmond's send >> their data to. If so, you could reduce >the amount of redundant >> metadata flying around by increasing "send_metadata_interval" to 120 >> seconds or >>> higher. >> >> That is correct, cnode340 is the head node for ganglia. I have >> increased the "send metadata interval" to 120 seconds and have 100 >> nodes reporting at this point and it seems pretty smooth. I am going to >> add the others ~50 at a time. >> >>> Also, I suspect that if you telnet to port 8649 on your head node it >> will take a while to respond because >it's busy processing incoming UDP >> metrics. If it takes more than 10 seconds to respond on a regular basis >>> then gmetad will timeout [1]. >> >> So far, with the 100 I have the response is an instant dump of the XML. >> >>> Try deploying a recently patched version of gmond [2] to the head node >> which is now multi-threaded and see >if that fixes the problem. It >> starts a separate thread for responding to XML metric requests and >> should >respond immediately while the main thread is still processing >> metrics. >> >> I am running: >> >> gmond 3.4.0 >> gmetad 3.4.0 >> Ganglia Web Frontend version 3.5.2 >> >> Would I need to patch gmond at this version? >> >> >> <SNIP> >> >> ------------------------------------------------------------------------------ >> Everyone hates slow websites. So do we. >> Make your web apps faster with AppDynamics >> Download AppDynamics Lite for free today: >> http://p.sf.net/sfu/appdyn_sfd2d_oct >> _______________________________________________ >> Ganglia-general mailing list >> Ganglia-general@lists.sourceforge.net<mailto:Ganglia-general@lists.sourceforge.net> >> https://lists.sourceforge.net/lists/listinfo/ganglia-general >> >> ------------------------------------------------------------------------------ >> Everyone hates slow websites. So do we. >> Make your web apps faster with AppDynamics >> Download AppDynamics Lite for free today: >> http://p.sf.net/sfu/appdyn_sfd2d_oct >> _______________________________________________ >> Ganglia-general mailing list >> Ganglia-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/ganglia-general > > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_sfd2d_oct > > > > _______________________________________________ > Ganglia-general mailing list > Ganglia-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/ganglia-general >
------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct _______________________________________________ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general