Re: [Ganglia-general] help :: making grids of gmetads (and question about monitoring topology)

2012-10-24 Thread Nicholas Satterly
Yes, a gmetad or gmond can be polled by any number of different gmetad's in
any combination or hierarchy that makes sense to you.

--Nick.

On Tue, Oct 23, 2012 at 9:57 AM, Adrian Sevcenco wrote:

> On 10/22/2012 02:18 PM, Adrian Sevcenco wrote:
> > Hi! I am a little bit lost on the subject of making grids of metads ..
> > how can i do something like :
> > gmetad1  gmetad2  (that take data from theri gmonds)
> >   \ /
> >gmetad3 <- gmond of this machine that takes different other
> data
> >
> > Also the hierarchy of grids and clusters is made at gmetad or gmond
> > level? did i understood correctly that gmetad just define data sources
> > (gmonds and gmetads) but the exact hierarchy is done at gmond level?
> > But if yes, where comes into play the "gridname" ?
> >
> > Also, what can i do for taking all data for the other metads not only
> > summary data?
> >
>
> Hi! I have other questions about gmond and gmetad :
> Is it posible that a gmond to be datasource for multiple gmetads?
> i would want something like :
>
> gmond_wn_1 ... gmond_wn_n
> \ /
>  \   /
>   \ /
> gmond_frontend_1 >  gmetad_frontend
>  \
>   \
>gmond_central > gmetad_central
>
> is this posible??
>
> Thank you!
> Adrian
>
>
>
> --
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_sfd2d_oct
> ___
> Ganglia-general mailing list
> Ganglia-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
>
>
--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] help :: making grids of gmetads

2012-10-24 Thread Nicholas Satterly
Hi Adrian,

To create a grid of grids type hierarchy it depends on how the gmetad are
configured, not the gmonds. To get a gemtad to pull metric data from
another gmetad append the port number for the gmetad (normally 8651) to the
data source. Using your example the gmetad3.conf would look like ...

data_source 'gmetad1'gmetad1:8651
data_source 'gmetad2'gmetad2:8651

> Also, what can i do for taking all data for the other metads not only
> summary data?

In gmetad.conf there is an option called "scalable". Set this to "off".
Make sure you are running version 3.1.7 of gmetad or the latest from github
trunk because this feature was broken at some point after 3.1.7 was
released.

--Nick.


On Mon, Oct 22, 2012 at 12:18 PM, Adrian Sevcenco
wrote:

> Hi! I am a little bit lost on the subject of making grids of metads ..
> how can i do something like :
> gmetad1  gmetad2  (that take data from theri gmonds)
>   \ /
>gmetad3 <- gmond of this machine that takes different other data
>
> Also the hierarchy of grids and clusters is made at gmetad or gmond
> level? did i understood correctly that gmetad just define data sources
> (gmonds and gmetads) but the exact hierarchy is done at gmond level?
> But if yes, where comes into play the "gridname" ?
>
> Also, what can i do for taking all data for the other metads not only
> summary data?
>
> Thanks!
> Adrian
>
>
>
> --
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_sfd2d_oct
> ___
> Ganglia-general mailing list
> Ganglia-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
>
>
--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Question about scaling

2012-10-24 Thread Peter Phaal
Hi Mark,

If you want to significantly reduce the amount of UDP traffic going to
your head end gmond (cnode340), then you might want to consider using
Host sFlow agents to monitor machines in the cluster - sFlow encodes
all the core Ganglia metrics (along with additional disk IO, swap,
interrupt activity metrics) in a single UDP packet, so you can cut the
UDP packets per second (and the load on the head end gmond) by a
factor of 30 or more.

If you make extensive use of gmond plugins for custom metrics then you
would want to stick with gmond on all your nodes. However, if you have
a limited number of custom metrics, you can supplement the core
metrics exported by sFlow using gmetric.

http://blog.sflow.com/2011/07/ganglia-32-released.html

As Nick suggested, you should be using the latest version of gmond for
the head node. Multi-threading significantly improves scaleability and
the newer versions of gmond also include native sFlow support.

Regards,
Peter

On Tue, Oct 23, 2012 at 4:34 PM, Nicholas Satterly  wrote:
> I assume cnode340 is the head node that all ~340 other gmond's send their
> data to. If so, you could reduce the amount of redundant metadata flying
> around by increasing "send_metadata_interval" to 120 seconds or higher.
>
> Also, I suspect that if you telnet to port 8649 on your head node it will
> take a while to respond because it's busy processing incoming UDP metrics.
> If it takes more than 10 seconds to respond on a regular basis then gmetad
> will timeout [1].
>
> Try deploying a recently patched version of gmond [2] to the head node which
> is now multi-threaded and see if that fixes the problem. It starts a
> separate thread for responding to XML metric requests and should respond
> immediately while the main thread is still processing metrics.
>
> Let us know how you get on.
>
> Regards,
> Nick
>
> [1]
> https://github.com/ganglia/monitor-core/blob/master/gmetad/data_thread.c#L103
> [2] https://github.com/ganglia/monitor-core/pull/53
>
>
> On Tue, Oct 23, 2012 at 7:36 PM, Potter,Mark L 
> wrote:
>>
>>
>>
>> data_source "MDACC" 60 cnode340:8649
>>
>> Everything else is default at this point. http://pastebin.com/UAQYxcX3 is
>> a full copy.
>>
>> 
>> From: Nicholas Satterly [nfsatte...@gmail.com]
>> Sent: Tuesday, October 23, 2012 13:33
>> To: Potter,Mark L
>> Cc: ganglia-general@lists.sourceforge.net
>> Subject: Re: [Ganglia-general] Question about scaling
>>
>> Please send thru your gmetad.conf file so we can see how things are
>> configured on the server side. *
>>
>> --Nick.
>>
>> * Be sure to anonymise any sensitive info.
>>
>> On 23 Oct 2012, at 19:21, "Potter,Mark L"  wrote:
>>
>> > I am using what I think to be a fairly standard gmond.conf:
>> >
>> > globals {
>> >  daemonize = yes
>> >  setuid = yes
>> >  user = nobody
>> >  debug_level = 0
>> >  max_udp_msg_len = 1472
>> >  mute = no
>> >  deaf = no
>> >  allow_extra_data = yes
>> >  host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in
>> > 1 day */
>> >  host_tmax = 30 /*secs */
>> >  cleanup_threshold = 300 /*secs */
>> >  gexec = no
>> >  send_metadata_interval = 30 /*secs */
>> > }
>> >
>> > cluster {
>> >  name = "MDACC"
>> >  owner = "MD Anderson Caner Center"
>> >  latlong = "unspecified"
>> >  url = "unspecified"
>> > }
>> >
>> > host {
>> >  location = "8,3,1"
>> > }
>> >
>> > udp_send_channel {
>> >   host = cnode340
>> >   port = 8649
>> > }
>> >
>> > udp_recv_channel {
>> >port = 8649
>> >  retry_bind = true
>> > }
>> >
>> > tcp_accept_channel {
>> >  port = 8649
>> > }
>> >
>> > gmetad is set to check every 60 seconds:
>> >
>> > data_source "MDACC" 60 cnode340:8649
>> >
>> >
>> > Everything works well until around 200 hosts where it appears gmetad
>> > starts having issues. I have ~340 hosts to go in to this cluster. Should I
>> > be running multiple gmetads for this amount of hosts? With all of them
>> > active the web interface reports all of them down and collects no stats at
>> > all. I am looking for advice on getting this up and running properly. The
>> > ganglia host isn't underpowered at all IMO and has plenty of HDD space:
>> >
>> > Mem:  32955788 (from free)
>> > 16 Cores (AMD Opteron(tm) Processor 6128)
>> >
>> > Thanks for any assistance.
>> >
>> >
>> > Respectfully,
>> >
>> > Mark L. Potter
>> > Research IS & Technology Services
>> > UNIX Systems Administrator
>> > O: 713-745-2032
>> > C:  713-965-4133
>> >
>> > --
>> > Everyone hates slow websites. So do we.
>> > Make your web apps faster with AppDynamics
>> > Download AppDynamics Lite for free today:
>> > http://p.sf.net/sfu/appdyn_sfd2d_oct
>> > ___
>> > Ganglia-general mailing list
>> > Ganglia-general@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/ganglia-general
>
>
>
> -

Re: [Ganglia-general] [Ganglia-developers] Adding Holt-Winters databases to existing rrd causes __SummaryInfo__ metric to fail to render on graphs

2012-10-24 Thread Vladimir Vuksan
I don't have a lot of time to look into it however different between 
SummaryInfo RRDs and other RRDs is that SummaryInfo contains ds[num] which 
is the number of nodes that being summarized. I wonder if that is somehow 
throwing off your script.


Vladimir

On Tue, 23 Oct 2012, Aaron Nichols wrote:


On Tue, Oct 23, 2012 at 5:41 PM, Nicholas Satterly  wrote:
  Hi Aaron,

What is the output of "rrdtool info cron.webServiceRequestCounter.Counter.rrd"?

--Nick.


Thanks for your response.

Here's the rrdtool info output before and after applying the HW rra's - I also 
have some additional info below
this regarding the data pre/post application of the HW rras:

Before:

filename = "cron.webServiceRequestCounter.Counter.rrd.bak"
rrd_version = "0003"
step = 15
last_update = 1351027681
header_size = 1760
ds[sum].index = 0
ds[sum].type = "COUNTER"
ds[sum].minimal_heartbeat = 120
ds[sum].min = NaN
ds[sum].max = NaN
ds[sum].last_ds = "25966364"
ds[sum].value = 1.126667e+02
ds[sum].unknown_sec = 0
ds[num].index = 1
ds[num].type = "COUNTER"
ds[num].minimal_heartbeat = 120
ds[num].min = NaN
ds[num].max = NaN
ds[num].last_ds = "5"
ds[num].value = 0.00e+00
ds[num].unknown_sec = 0
rra[0].cf = "AVERAGE"
rra[0].rows = 11520
rra[0].cur_row = 6534
rra[0].pdp_per_row = 1
rra[0].xff = 5.00e-01
rra[0].cdp_prep[0].value = NaN
rra[0].cdp_prep[0].unknown_datapoints = 0
rra[0].cdp_prep[1].value = NaN
rra[0].cdp_prep[1].unknown_datapoints = 0
rra[1].cf = "AVERAGE"
rra[1].rows = 20160
rra[1].cur_row = 6785
rra[1].pdp_per_row = 4
rra[1].xff = 5.00e-01
rra[1].cdp_prep[0].value = 0.00e+00
rra[1].cdp_prep[0].unknown_datapoints = 0
rra[1].cdp_prep[1].value = 0.00e+00
rra[1].cdp_prep[1].unknown_datapoints = 0
rra[2].cf = "AVERAGE"
rra[2].rows = 53568
rra[2].cur_row = 13483
rra[2].pdp_per_row = 20
rra[2].xff = 5.00e-01
rra[2].cdp_prep[0].value = 4.257556e+02
rra[2].cdp_prep[0].unknown_datapoints = 8
rra[2].cdp_prep[1].value = 0.00e+00
rra[2].cdp_prep[1].unknown_datapoints = 8
rra[3].cf = "AVERAGE"
rra[3].rows = 87600
rra[3].cur_row = 78819
rra[3].pdp_per_row = 120
rra[3].xff = 5.00e-01
rra[3].cdp_prep[0].value = 4.257556e+02
rra[3].cdp_prep[0].unknown_datapoints = 108
rra[3].cdp_prep[1].value = 0.00e+00
rra[3].cdp_prep[1].unknown_datapoints = 108

After:

filename = "cron.webServiceRequestCounter.Counter.rrd"
rrd_version = "0003"
step = 15
last_update = 1351051988
header_size = 3200
ds[num].index = 0
ds[num].type = "COUNTER"
ds[num].minimal_heartbeat = 120
ds[num].min = NaN
ds[num].max = NaN
ds[num].last_ds = "15150735"
ds[num].value = 4.736000e+02
ds[num].unknown_sec = 0
ds[sum].index = 1
ds[sum].type = "COUNTER"
ds[sum].minimal_heartbeat = 120
ds[sum].min = NaN
ds[sum].max = NaN
ds[sum].last_ds = "5"
ds[sum].value = 0.00e+00
ds[sum].unknown_sec = 0
rra[0].cf = "AVERAGE"
rra[0].rows = 11520
rra[0].cur_row = 5484
rra[0].pdp_per_row = 1
rra[0].xff = 5.00e-01
rra[0].cdp_prep[0].value = NaN
rra[0].cdp_prep[0].unknown_datapoints = 0
rra[0].cdp_prep[1].value = NaN
rra[0].cdp_prep[1].unknown_datapoints = 0
rra[1].cf = "AVERAGE"
rra[1].rows = 20160
rra[1].cur_row = 2762
rra[1].pdp_per_row = 4
rra[1].xff = 5.00e-01
rra[1].cdp_prep[0].value = 0.00e+00
rra[1].cdp_prep[0].unknown_datapoints = 0
rra[1].cdp_prep[1].value = 0.00e+00
rra[1].cdp_prep[1].unknown_datapoints = 0
rra[2].cf = "AVERAGE"                                                           
                                 
                                                                                
  [4/190]
rra[2].rows = 53568
rra[2].cur_row = 51312
rra[2].pdp_per_row = 20
rra[2].xff = 5.00e-01
rra[2].cdp_prep[0].value = 9.0053916667e+02
rra[2].cdp_prep[0].unknown_datapoints = 0
rra[2].cdp_prep[1].value = 0.00e+00
rra[2].cdp_prep[1].unknown_datapoints = 0
rra[3].cf = "AVERAGE"
rra[3].rows = 87600
rra[3].cur_row = 70734
rra[3].pdp_per_row = 120
rra[3].xff = 5.00e-01
rra[3].cdp_prep[0].value = 4.008067e+03
rra[3].cdp_prep[0].unknown_datapoints = 0
rra[3].cdp_prep[1].value = 0.00e+00
rra[3].cdp_prep[1].unknown_datapoints = 0
rra[4].cf = "HWPREDICT"
rra[4].rows = 4032
rra[4].cur_row = 3617
rra[4].pdp_per_row = 1
rra[4].alpha = 1.00e-01
rra[4].beta = 3.50e-03
rra[4].cdp_prep[0].intercept = -4.7889582583e+05
rra[4].cdp_prep[0].slope = -3.9220340107e+03
rra[4].cdp_prep[0].NaN_count = 1
rra[4].cdp_prep[1].intercept = 1.3071810839e+06
rra[4].cdp_prep[1].slope = 4.0880046086e+03
rra[4].cdp_prep[1].NaN_count = 1
rra[5].cf = "SEASONAL"
rra[5].rows = 288
rra[5].cur_row = 32
rra[5].pdp_per_row = 1
rra[5].gamma = 1.00e-01
rra[5].cdp_prep[0].seasonal = 8.5694737658e+05
rra[5].cdp_prep[1].seasonal = -1.2692770344e+06
rra[6].cf = "DEVSEASONAL"
rra[6].rows = 288
rra[6].cur_row = 162
rra[6].pdp_per_row = 1
rra[6].gamma = 1.00e-01
rra[6].cdp_prep[0].deviation = 2.2947800076e+06
rra[6].cdp_prep[1].deviation = 2

Re: [Ganglia-general] Ganglia-general Digest, Vol 77, Issue 34

2012-10-24 Thread Paul Hewlett
> -Original Message-
> From: ganglia-general-requ...@lists.sourceforge.net [mailto:ganglia-
> general-requ...@lists.sourceforge.net]
> Sent: 24 October 2012 00:35
> To: ganglia-general@lists.sourceforge.net
> Subject: Ganglia-general Digest, Vol 77, Issue 34
>
> Send Ganglia-general mailing list submissions to
>   ganglia-general@lists.sourceforge.net
>
> To subscribe or unsubscribe via the World Wide Web, visit
>   https://lists.sourceforge.net/lists/listinfo/ganglia-general
> or, via email, send a message with subject or body 'help' to
>   ganglia-general-requ...@lists.sourceforge.net
>
> You can reach the person managing the list at
>   ganglia-general-ow...@lists.sourceforge.net
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Ganglia-general digest..."
>
>
> Today's Topics:
>
>1. Question about scaling (Potter,Mark L)
>2. Re: Question about scaling (Nicholas Satterly)
>3. Re: Question about scaling (Potter,Mark L)
>4. Re: Adding Holt-Winters databases to existing rrd causes
>   __SummaryInfo__ metric to fail to render on graphs (Aaron Nichols)
>5. Re: Question about scaling (Nicholas Satterly)
>
>
> --
>
> Message: 1
> Date: Tue, 23 Oct 2012 12:58:02 -0500
> From: "Potter,Mark L" 
> Subject: [Ganglia-general] Question about scaling
> To: "ganglia-general@lists.sourceforge.net"
>   
> Message-ID:
>   <622D99D1851E994CBEB3A7A0C54123722FBFCC8A1A@DCPWVMBXC1VS3.mdanders
> on.edu>
>
> Content-Type: text/plain; charset="us-ascii"
>
> I am using what I think to be a fairly standard gmond.conf:
>
> globals {
>   daemonize = yes
>   setuid = yes
>   user = nobody
>   debug_level = 0
>   max_udp_msg_len = 1472
>   mute = no
>   deaf = no
>   allow_extra_data = yes
>   host_dmax = 86400 /*secs. Expires (removes from web interface) hosts
> in 1 day */
>   host_tmax = 30 /*secs */
>   cleanup_threshold = 300 /*secs */
>   gexec = no
>   send_metadata_interval = 30 /*secs */
> }
>
> cluster {
>   name = "MDACC"
>   owner = "MD Anderson Caner Center"
>   latlong = "unspecified"
>   url = "unspecified"
> }
>
> host {
>   location = "8,3,1"
> }
>
> udp_send_channel {
>host = cnode340
>port = 8649
> }
>
> udp_recv_channel {
> port = 8649
>   retry_bind = true
> }
>
> tcp_accept_channel {
>   port = 8649
> }
>
> gmetad is set to check every 60 seconds:
>
> data_source "MDACC" 60 cnode340:8649
>
>
> Everything works well until around 200 hosts where it appears gmetad
> starts having issues. I have ~340 hosts to go in to this cluster. Should
> I be running multiple gmetads for this amount of hosts? With all of them
> active the web interface reports all of them down and collects no stats
> at all. I am looking for advice on getting this up and running properly.
> The ganglia host isn't underpowered at all IMO and has plenty of HDD
> space:
>
> Mem:  32955788 (from free)
> 16 Cores (AMD Opteron(tm) Processor 6128)
>
> Thanks for any assistance.
>
>
> Respectfully,
>
> Mark L. Potter
> Research IS & Technology Services
> UNIX Systems Administrator
> O: 713-745-2032
> C:  713-965-4133
>
>

Hi Mark

I had a similar problem and solved it by increasing the udp kernel buffers.

In /etc/sysctl.conf:

#
net.core.rmem_max=2048
net.core.rmem_default=1024

and specify buffer size in /etc/ganglia/gmond.conf:

udp_recv_channel {
  port = 8649
  buffer = 2000
}

If you want to see if you are losing udp packets type:

watch -d grep '^Udp' /proc/net/snmp

Udp: InDatagrams NoPorts InErrors OutDatagrams
Udp: 39230574570 5121159 88139646 608369019

and see if the InErrors field is increasing.

Regards

--
Paul Hewlett

"Write documentation as if whoever reads it is a violent psychopath who knows 
where you live."
Steve English, as quoted by Peter Langston

http://www.quotegarden.com/programming.html

ARM Ltd
110 Fulbourn Road, Cambridge, CB1 9NJ
Tel: +44 (0)1223 405923
skype: paul-at-arm
www.arm.com



-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium.  Thank you.


--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general