Re: [Ganglia-general] help :: making grids of gmetads (and question about monitoring topology)
Yes, a gmetad or gmond can be polled by any number of different gmetad's in any combination or hierarchy that makes sense to you. --Nick. On Tue, Oct 23, 2012 at 9:57 AM, Adrian Sevcenco wrote: > On 10/22/2012 02:18 PM, Adrian Sevcenco wrote: > > Hi! I am a little bit lost on the subject of making grids of metads .. > > how can i do something like : > > gmetad1 gmetad2 (that take data from theri gmonds) > > \ / > >gmetad3 <- gmond of this machine that takes different other > data > > > > Also the hierarchy of grids and clusters is made at gmetad or gmond > > level? did i understood correctly that gmetad just define data sources > > (gmonds and gmetads) but the exact hierarchy is done at gmond level? > > But if yes, where comes into play the "gridname" ? > > > > Also, what can i do for taking all data for the other metads not only > > summary data? > > > > Hi! I have other questions about gmond and gmetad : > Is it posible that a gmond to be datasource for multiple gmetads? > i would want something like : > > gmond_wn_1 ... gmond_wn_n > \ / > \ / > \ / > gmond_frontend_1 > gmetad_frontend > \ > \ >gmond_central > gmetad_central > > is this posible?? > > Thank you! > Adrian > > > > -- > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_sfd2d_oct > ___ > Ganglia-general mailing list > Ganglia-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/ganglia-general > > -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] help :: making grids of gmetads
Hi Adrian, To create a grid of grids type hierarchy it depends on how the gmetad are configured, not the gmonds. To get a gemtad to pull metric data from another gmetad append the port number for the gmetad (normally 8651) to the data source. Using your example the gmetad3.conf would look like ... data_source 'gmetad1'gmetad1:8651 data_source 'gmetad2'gmetad2:8651 > Also, what can i do for taking all data for the other metads not only > summary data? In gmetad.conf there is an option called "scalable". Set this to "off". Make sure you are running version 3.1.7 of gmetad or the latest from github trunk because this feature was broken at some point after 3.1.7 was released. --Nick. On Mon, Oct 22, 2012 at 12:18 PM, Adrian Sevcenco wrote: > Hi! I am a little bit lost on the subject of making grids of metads .. > how can i do something like : > gmetad1 gmetad2 (that take data from theri gmonds) > \ / >gmetad3 <- gmond of this machine that takes different other data > > Also the hierarchy of grids and clusters is made at gmetad or gmond > level? did i understood correctly that gmetad just define data sources > (gmonds and gmetads) but the exact hierarchy is done at gmond level? > But if yes, where comes into play the "gridname" ? > > Also, what can i do for taking all data for the other metads not only > summary data? > > Thanks! > Adrian > > > > -- > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_sfd2d_oct > ___ > Ganglia-general mailing list > Ganglia-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/ganglia-general > > -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Question about scaling
Hi Mark, If you want to significantly reduce the amount of UDP traffic going to your head end gmond (cnode340), then you might want to consider using Host sFlow agents to monitor machines in the cluster - sFlow encodes all the core Ganglia metrics (along with additional disk IO, swap, interrupt activity metrics) in a single UDP packet, so you can cut the UDP packets per second (and the load on the head end gmond) by a factor of 30 or more. If you make extensive use of gmond plugins for custom metrics then you would want to stick with gmond on all your nodes. However, if you have a limited number of custom metrics, you can supplement the core metrics exported by sFlow using gmetric. http://blog.sflow.com/2011/07/ganglia-32-released.html As Nick suggested, you should be using the latest version of gmond for the head node. Multi-threading significantly improves scaleability and the newer versions of gmond also include native sFlow support. Regards, Peter On Tue, Oct 23, 2012 at 4:34 PM, Nicholas Satterly wrote: > I assume cnode340 is the head node that all ~340 other gmond's send their > data to. If so, you could reduce the amount of redundant metadata flying > around by increasing "send_metadata_interval" to 120 seconds or higher. > > Also, I suspect that if you telnet to port 8649 on your head node it will > take a while to respond because it's busy processing incoming UDP metrics. > If it takes more than 10 seconds to respond on a regular basis then gmetad > will timeout [1]. > > Try deploying a recently patched version of gmond [2] to the head node which > is now multi-threaded and see if that fixes the problem. It starts a > separate thread for responding to XML metric requests and should respond > immediately while the main thread is still processing metrics. > > Let us know how you get on. > > Regards, > Nick > > [1] > https://github.com/ganglia/monitor-core/blob/master/gmetad/data_thread.c#L103 > [2] https://github.com/ganglia/monitor-core/pull/53 > > > On Tue, Oct 23, 2012 at 7:36 PM, Potter,Mark L > wrote: >> >> >> >> data_source "MDACC" 60 cnode340:8649 >> >> Everything else is default at this point. http://pastebin.com/UAQYxcX3 is >> a full copy. >> >> >> From: Nicholas Satterly [nfsatte...@gmail.com] >> Sent: Tuesday, October 23, 2012 13:33 >> To: Potter,Mark L >> Cc: ganglia-general@lists.sourceforge.net >> Subject: Re: [Ganglia-general] Question about scaling >> >> Please send thru your gmetad.conf file so we can see how things are >> configured on the server side. * >> >> --Nick. >> >> * Be sure to anonymise any sensitive info. >> >> On 23 Oct 2012, at 19:21, "Potter,Mark L" wrote: >> >> > I am using what I think to be a fairly standard gmond.conf: >> > >> > globals { >> > daemonize = yes >> > setuid = yes >> > user = nobody >> > debug_level = 0 >> > max_udp_msg_len = 1472 >> > mute = no >> > deaf = no >> > allow_extra_data = yes >> > host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in >> > 1 day */ >> > host_tmax = 30 /*secs */ >> > cleanup_threshold = 300 /*secs */ >> > gexec = no >> > send_metadata_interval = 30 /*secs */ >> > } >> > >> > cluster { >> > name = "MDACC" >> > owner = "MD Anderson Caner Center" >> > latlong = "unspecified" >> > url = "unspecified" >> > } >> > >> > host { >> > location = "8,3,1" >> > } >> > >> > udp_send_channel { >> > host = cnode340 >> > port = 8649 >> > } >> > >> > udp_recv_channel { >> >port = 8649 >> > retry_bind = true >> > } >> > >> > tcp_accept_channel { >> > port = 8649 >> > } >> > >> > gmetad is set to check every 60 seconds: >> > >> > data_source "MDACC" 60 cnode340:8649 >> > >> > >> > Everything works well until around 200 hosts where it appears gmetad >> > starts having issues. I have ~340 hosts to go in to this cluster. Should I >> > be running multiple gmetads for this amount of hosts? With all of them >> > active the web interface reports all of them down and collects no stats at >> > all. I am looking for advice on getting this up and running properly. The >> > ganglia host isn't underpowered at all IMO and has plenty of HDD space: >> > >> > Mem: 32955788 (from free) >> > 16 Cores (AMD Opteron(tm) Processor 6128) >> > >> > Thanks for any assistance. >> > >> > >> > Respectfully, >> > >> > Mark L. Potter >> > Research IS & Technology Services >> > UNIX Systems Administrator >> > O: 713-745-2032 >> > C: 713-965-4133 >> > >> > -- >> > Everyone hates slow websites. So do we. >> > Make your web apps faster with AppDynamics >> > Download AppDynamics Lite for free today: >> > http://p.sf.net/sfu/appdyn_sfd2d_oct >> > ___ >> > Ganglia-general mailing list >> > Ganglia-general@lists.sourceforge.net >> > https://lists.sourceforge.net/lists/listinfo/ganglia-general > > > > -
Re: [Ganglia-general] [Ganglia-developers] Adding Holt-Winters databases to existing rrd causes __SummaryInfo__ metric to fail to render on graphs
I don't have a lot of time to look into it however different between SummaryInfo RRDs and other RRDs is that SummaryInfo contains ds[num] which is the number of nodes that being summarized. I wonder if that is somehow throwing off your script. Vladimir On Tue, 23 Oct 2012, Aaron Nichols wrote: On Tue, Oct 23, 2012 at 5:41 PM, Nicholas Satterly wrote: Hi Aaron, What is the output of "rrdtool info cron.webServiceRequestCounter.Counter.rrd"? --Nick. Thanks for your response. Here's the rrdtool info output before and after applying the HW rra's - I also have some additional info below this regarding the data pre/post application of the HW rras: Before: filename = "cron.webServiceRequestCounter.Counter.rrd.bak" rrd_version = "0003" step = 15 last_update = 1351027681 header_size = 1760 ds[sum].index = 0 ds[sum].type = "COUNTER" ds[sum].minimal_heartbeat = 120 ds[sum].min = NaN ds[sum].max = NaN ds[sum].last_ds = "25966364" ds[sum].value = 1.126667e+02 ds[sum].unknown_sec = 0 ds[num].index = 1 ds[num].type = "COUNTER" ds[num].minimal_heartbeat = 120 ds[num].min = NaN ds[num].max = NaN ds[num].last_ds = "5" ds[num].value = 0.00e+00 ds[num].unknown_sec = 0 rra[0].cf = "AVERAGE" rra[0].rows = 11520 rra[0].cur_row = 6534 rra[0].pdp_per_row = 1 rra[0].xff = 5.00e-01 rra[0].cdp_prep[0].value = NaN rra[0].cdp_prep[0].unknown_datapoints = 0 rra[0].cdp_prep[1].value = NaN rra[0].cdp_prep[1].unknown_datapoints = 0 rra[1].cf = "AVERAGE" rra[1].rows = 20160 rra[1].cur_row = 6785 rra[1].pdp_per_row = 4 rra[1].xff = 5.00e-01 rra[1].cdp_prep[0].value = 0.00e+00 rra[1].cdp_prep[0].unknown_datapoints = 0 rra[1].cdp_prep[1].value = 0.00e+00 rra[1].cdp_prep[1].unknown_datapoints = 0 rra[2].cf = "AVERAGE" rra[2].rows = 53568 rra[2].cur_row = 13483 rra[2].pdp_per_row = 20 rra[2].xff = 5.00e-01 rra[2].cdp_prep[0].value = 4.257556e+02 rra[2].cdp_prep[0].unknown_datapoints = 8 rra[2].cdp_prep[1].value = 0.00e+00 rra[2].cdp_prep[1].unknown_datapoints = 8 rra[3].cf = "AVERAGE" rra[3].rows = 87600 rra[3].cur_row = 78819 rra[3].pdp_per_row = 120 rra[3].xff = 5.00e-01 rra[3].cdp_prep[0].value = 4.257556e+02 rra[3].cdp_prep[0].unknown_datapoints = 108 rra[3].cdp_prep[1].value = 0.00e+00 rra[3].cdp_prep[1].unknown_datapoints = 108 After: filename = "cron.webServiceRequestCounter.Counter.rrd" rrd_version = "0003" step = 15 last_update = 1351051988 header_size = 3200 ds[num].index = 0 ds[num].type = "COUNTER" ds[num].minimal_heartbeat = 120 ds[num].min = NaN ds[num].max = NaN ds[num].last_ds = "15150735" ds[num].value = 4.736000e+02 ds[num].unknown_sec = 0 ds[sum].index = 1 ds[sum].type = "COUNTER" ds[sum].minimal_heartbeat = 120 ds[sum].min = NaN ds[sum].max = NaN ds[sum].last_ds = "5" ds[sum].value = 0.00e+00 ds[sum].unknown_sec = 0 rra[0].cf = "AVERAGE" rra[0].rows = 11520 rra[0].cur_row = 5484 rra[0].pdp_per_row = 1 rra[0].xff = 5.00e-01 rra[0].cdp_prep[0].value = NaN rra[0].cdp_prep[0].unknown_datapoints = 0 rra[0].cdp_prep[1].value = NaN rra[0].cdp_prep[1].unknown_datapoints = 0 rra[1].cf = "AVERAGE" rra[1].rows = 20160 rra[1].cur_row = 2762 rra[1].pdp_per_row = 4 rra[1].xff = 5.00e-01 rra[1].cdp_prep[0].value = 0.00e+00 rra[1].cdp_prep[0].unknown_datapoints = 0 rra[1].cdp_prep[1].value = 0.00e+00 rra[1].cdp_prep[1].unknown_datapoints = 0 rra[2].cf = "AVERAGE" [4/190] rra[2].rows = 53568 rra[2].cur_row = 51312 rra[2].pdp_per_row = 20 rra[2].xff = 5.00e-01 rra[2].cdp_prep[0].value = 9.0053916667e+02 rra[2].cdp_prep[0].unknown_datapoints = 0 rra[2].cdp_prep[1].value = 0.00e+00 rra[2].cdp_prep[1].unknown_datapoints = 0 rra[3].cf = "AVERAGE" rra[3].rows = 87600 rra[3].cur_row = 70734 rra[3].pdp_per_row = 120 rra[3].xff = 5.00e-01 rra[3].cdp_prep[0].value = 4.008067e+03 rra[3].cdp_prep[0].unknown_datapoints = 0 rra[3].cdp_prep[1].value = 0.00e+00 rra[3].cdp_prep[1].unknown_datapoints = 0 rra[4].cf = "HWPREDICT" rra[4].rows = 4032 rra[4].cur_row = 3617 rra[4].pdp_per_row = 1 rra[4].alpha = 1.00e-01 rra[4].beta = 3.50e-03 rra[4].cdp_prep[0].intercept = -4.7889582583e+05 rra[4].cdp_prep[0].slope = -3.9220340107e+03 rra[4].cdp_prep[0].NaN_count = 1 rra[4].cdp_prep[1].intercept = 1.3071810839e+06 rra[4].cdp_prep[1].slope = 4.0880046086e+03 rra[4].cdp_prep[1].NaN_count = 1 rra[5].cf = "SEASONAL" rra[5].rows = 288 rra[5].cur_row = 32 rra[5].pdp_per_row = 1 rra[5].gamma = 1.00e-01 rra[5].cdp_prep[0].seasonal = 8.5694737658e+05 rra[5].cdp_prep[1].seasonal = -1.2692770344e+06 rra[6].cf = "DEVSEASONAL" rra[6].rows = 288 rra[6].cur_row = 162 rra[6].pdp_per_row = 1 rra[6].gamma = 1.00e-01 rra[6].cdp_prep[0].deviation = 2.2947800076e+06 rra[6].cdp_prep[1].deviation = 2
Re: [Ganglia-general] Ganglia-general Digest, Vol 77, Issue 34
> -Original Message- > From: ganglia-general-requ...@lists.sourceforge.net [mailto:ganglia- > general-requ...@lists.sourceforge.net] > Sent: 24 October 2012 00:35 > To: ganglia-general@lists.sourceforge.net > Subject: Ganglia-general Digest, Vol 77, Issue 34 > > Send Ganglia-general mailing list submissions to > ganglia-general@lists.sourceforge.net > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/ganglia-general > or, via email, send a message with subject or body 'help' to > ganglia-general-requ...@lists.sourceforge.net > > You can reach the person managing the list at > ganglia-general-ow...@lists.sourceforge.net > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Ganglia-general digest..." > > > Today's Topics: > >1. Question about scaling (Potter,Mark L) >2. Re: Question about scaling (Nicholas Satterly) >3. Re: Question about scaling (Potter,Mark L) >4. Re: Adding Holt-Winters databases to existing rrd causes > __SummaryInfo__ metric to fail to render on graphs (Aaron Nichols) >5. Re: Question about scaling (Nicholas Satterly) > > > -- > > Message: 1 > Date: Tue, 23 Oct 2012 12:58:02 -0500 > From: "Potter,Mark L" > Subject: [Ganglia-general] Question about scaling > To: "ganglia-general@lists.sourceforge.net" > > Message-ID: > <622D99D1851E994CBEB3A7A0C54123722FBFCC8A1A@DCPWVMBXC1VS3.mdanders > on.edu> > > Content-Type: text/plain; charset="us-ascii" > > I am using what I think to be a fairly standard gmond.conf: > > globals { > daemonize = yes > setuid = yes > user = nobody > debug_level = 0 > max_udp_msg_len = 1472 > mute = no > deaf = no > allow_extra_data = yes > host_dmax = 86400 /*secs. Expires (removes from web interface) hosts > in 1 day */ > host_tmax = 30 /*secs */ > cleanup_threshold = 300 /*secs */ > gexec = no > send_metadata_interval = 30 /*secs */ > } > > cluster { > name = "MDACC" > owner = "MD Anderson Caner Center" > latlong = "unspecified" > url = "unspecified" > } > > host { > location = "8,3,1" > } > > udp_send_channel { >host = cnode340 >port = 8649 > } > > udp_recv_channel { > port = 8649 > retry_bind = true > } > > tcp_accept_channel { > port = 8649 > } > > gmetad is set to check every 60 seconds: > > data_source "MDACC" 60 cnode340:8649 > > > Everything works well until around 200 hosts where it appears gmetad > starts having issues. I have ~340 hosts to go in to this cluster. Should > I be running multiple gmetads for this amount of hosts? With all of them > active the web interface reports all of them down and collects no stats > at all. I am looking for advice on getting this up and running properly. > The ganglia host isn't underpowered at all IMO and has plenty of HDD > space: > > Mem: 32955788 (from free) > 16 Cores (AMD Opteron(tm) Processor 6128) > > Thanks for any assistance. > > > Respectfully, > > Mark L. Potter > Research IS & Technology Services > UNIX Systems Administrator > O: 713-745-2032 > C: 713-965-4133 > > Hi Mark I had a similar problem and solved it by increasing the udp kernel buffers. In /etc/sysctl.conf: # net.core.rmem_max=2048 net.core.rmem_default=1024 and specify buffer size in /etc/ganglia/gmond.conf: udp_recv_channel { port = 8649 buffer = 2000 } If you want to see if you are losing udp packets type: watch -d grep '^Udp' /proc/net/snmp Udp: InDatagrams NoPorts InErrors OutDatagrams Udp: 39230574570 5121159 88139646 608369019 and see if the InErrors field is increasing. Regards -- Paul Hewlett "Write documentation as if whoever reads it is a violent psychopath who knows where you live." Steve English, as quoted by Peter Langston http://www.quotegarden.com/programming.html ARM Ltd 110 Fulbourn Road, Cambridge, CB1 9NJ Tel: +44 (0)1223 405923 skype: paul-at-arm www.arm.com -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general