Re: [Ganglia-general] [Ganglia-developers] Adding Holt-Winters databases to existing rrd causes __SummaryInfo__ metric to fail to render on graphs

2012-10-23 Thread Aaron Nichols
On Tue, Oct 23, 2012 at 5:41 PM, Nicholas Satterly wrote:

> Hi Aaron,
>
> What is the output of "rrdtool
> info cron.webServiceRequestCounter.Counter.rrd"?
>
> --Nick.
>
>
Thanks for your response.

Here's the rrdtool info output before and after applying the HW rra's - I
also have some additional info below this regarding the data pre/post
application of the HW rras:

Before:

filename = "cron.webServiceRequestCounter.Counter.rrd.bak"
rrd_version = "0003"
step = 15
last_update = 1351027681
header_size = 1760
ds[sum].index = 0
ds[sum].type = "COUNTER"
ds[sum].minimal_heartbeat = 120
ds[sum].min = NaN
ds[sum].max = NaN
ds[sum].last_ds = "25966364"
ds[sum].value = 1.126667e+02
ds[sum].unknown_sec = 0
ds[num].index = 1
ds[num].type = "COUNTER"
ds[num].minimal_heartbeat = 120
ds[num].min = NaN
ds[num].max = NaN
ds[num].last_ds = "5"
ds[num].value = 0.00e+00
ds[num].unknown_sec = 0
rra[0].cf = "AVERAGE"
rra[0].rows = 11520
rra[0].cur_row = 6534
rra[0].pdp_per_row = 1
rra[0].xff = 5.00e-01
rra[0].cdp_prep[0].value = NaN
rra[0].cdp_prep[0].unknown_datapoints = 0
rra[0].cdp_prep[1].value = NaN
rra[0].cdp_prep[1].unknown_datapoints = 0
rra[1].cf = "AVERAGE"
rra[1].rows = 20160
rra[1].cur_row = 6785
rra[1].pdp_per_row = 4
rra[1].xff = 5.00e-01
rra[1].cdp_prep[0].value = 0.00e+00
rra[1].cdp_prep[0].unknown_datapoints = 0
rra[1].cdp_prep[1].value = 0.00e+00
rra[1].cdp_prep[1].unknown_datapoints = 0
rra[2].cf = "AVERAGE"
rra[2].rows = 53568
rra[2].cur_row = 13483
rra[2].pdp_per_row = 20
rra[2].xff = 5.00e-01
rra[2].cdp_prep[0].value = 4.257556e+02
rra[2].cdp_prep[0].unknown_datapoints = 8
rra[2].cdp_prep[1].value = 0.00e+00
rra[2].cdp_prep[1].unknown_datapoints = 8
rra[3].cf = "AVERAGE"
rra[3].rows = 87600
rra[3].cur_row = 78819
rra[3].pdp_per_row = 120
rra[3].xff = 5.00e-01
rra[3].cdp_prep[0].value = 4.257556e+02
rra[3].cdp_prep[0].unknown_datapoints = 108
rra[3].cdp_prep[1].value = 0.00e+00
rra[3].cdp_prep[1].unknown_datapoints = 108

After:

filename = "cron.webServiceRequestCounter.Counter.rrd"
rrd_version = "0003"
step = 15
last_update = 1351051988
header_size = 3200
ds[num].index = 0
ds[num].type = "COUNTER"
ds[num].minimal_heartbeat = 120
ds[num].min = NaN
ds[num].max = NaN
ds[num].last_ds = "15150735"
ds[num].value = 4.736000e+02
ds[num].unknown_sec = 0
ds[sum].index = 1
ds[sum].type = "COUNTER"
ds[sum].minimal_heartbeat = 120
ds[sum].min = NaN
ds[sum].max = NaN
ds[sum].last_ds = "5"
ds[sum].value = 0.00e+00
ds[sum].unknown_sec = 0
rra[0].cf = "AVERAGE"
rra[0].rows = 11520
rra[0].cur_row = 5484
rra[0].pdp_per_row = 1
rra[0].xff = 5.00e-01
rra[0].cdp_prep[0].value = NaN
rra[0].cdp_prep[0].unknown_datapoints = 0
rra[0].cdp_prep[1].value = NaN
rra[0].cdp_prep[1].unknown_datapoints = 0
rra[1].cf = "AVERAGE"
rra[1].rows = 20160
rra[1].cur_row = 2762
rra[1].pdp_per_row = 4
rra[1].xff = 5.00e-01
rra[1].cdp_prep[0].value = 0.00e+00
rra[1].cdp_prep[0].unknown_datapoints = 0
rra[1].cdp_prep[1].value = 0.00e+00
rra[1].cdp_prep[1].unknown_datapoints = 0
rra[2].cf = "AVERAGE"

[4/190]
rra[2].rows = 53568
rra[2].cur_row = 51312
rra[2].pdp_per_row = 20
rra[2].xff = 5.00e-01
rra[2].cdp_prep[0].value = 9.0053916667e+02
rra[2].cdp_prep[0].unknown_datapoints = 0
rra[2].cdp_prep[1].value = 0.00e+00
rra[2].cdp_prep[1].unknown_datapoints = 0
rra[3].cf = "AVERAGE"
rra[3].rows = 87600
rra[3].cur_row = 70734
rra[3].pdp_per_row = 120
rra[3].xff = 5.00e-01
rra[3].cdp_prep[0].value = 4.008067e+03
rra[3].cdp_prep[0].unknown_datapoints = 0
rra[3].cdp_prep[1].value = 0.00e+00
rra[3].cdp_prep[1].unknown_datapoints = 0
rra[4].cf = "HWPREDICT"
rra[4].rows = 4032
rra[4].cur_row = 3617
rra[4].pdp_per_row = 1
rra[4].alpha = 1.00e-01
rra[4].beta = 3.50e-03
rra[4].cdp_prep[0].intercept = -4.7889582583e+05
rra[4].cdp_prep[0].slope = -3.9220340107e+03
rra[4].cdp_prep[0].NaN_count = 1
rra[4].cdp_prep[1].intercept = 1.3071810839e+06
rra[4].cdp_prep[1].slope = 4.0880046086e+03
rra[4].cdp_prep[1].NaN_count = 1
rra[5].cf = "SEASONAL"
rra[5].rows = 288
rra[5].cur_row = 32
rra[5].pdp_per_row = 1
rra[5].gamma = 1.00e-01
rra[5].cdp_prep[0].seasonal = 8.5694737658e+05
rra[5].cdp_prep[1].seasonal = -1.2692770344e+06
rra[6].cf = "DEVSEASONAL"
rra[6].rows = 288
rra[6].cur_row = 162
rra[6].pdp_per_row = 1
rra[6].gamma = 1.00e-01
rra[6].cdp_prep[0].deviation = 2.2947800076e+06
rra[6].cdp_prep[1].deviation = 2.8459664904e+04
rra[7].cf = "DEVPREDICT"
rra[7].rows = 4032
rra[7].cur_row = 174
rra[7].pdp_per_row = 1
rra[8].cf = "FAILURES"
rra[8].rows = 4032
rra[8].cur_row = 3617
rra[8].pdp_per_row = 1
rra[8].delta_pos = 2.00e+00
rra[8].delta_neg = 2.00e+00
rra[8].failure_threshold = 6
rra[8].window_length = 9
rra[8].cdp_prep[0].history = "0"
rra[8].cdp_prep[1].history = "0"

While I noted in my original 

Re: [Ganglia-general] [Ganglia-developers] Adding Holt-Winters databases to existing rrd causes __SummaryInfo__ metric to fail to render on graphs

2012-10-23 Thread Nicholas Satterly
Hi Aaron,

What is the output of "rrdtool
info cron.webServiceRequestCounter.Counter.rrd"?

--Nick.

On Tue, Oct 23, 2012 at 9:44 PM, Aaron Nichols  wrote:

> Bumping this thread - I updated to rrdtool 1.4.7 & rebuilt ganglia against
> the new version of rrdtool and it didn't appear to change the behavior.
>
> Any suggestions for what this problem might be would be much appreciated.
>
> Thanks,
> Aaron
>
>
> On Mon, Oct 22, 2012 at 8:47 PM, Aaron Nichols wrote:
>
>> Re-posting this on ganglia-general as I didn't get any hits on
>> -developers:
>>
>> All,
>>   I'm trying to apply Holt-Winters to a few of our rrd's and have run
>> into a problem which seems specific to the __SummaryInfo__ rrds. When I
>> apply the Holt-Winters algorithm to both the host rrds as well as the
>> __SummaryInfo__ rrds, the host rrds work fine however the __SummaryInfo__
>> do not. This note will focus on the __SummaryInfo__ rrds.
>>
>> These metrics are stored in an rrd of type COUNTER. I have also observed
>> this behavior on the same metric set to type DERIVE.
>>
>> I am using the rrd_hwreapply.pl script (
>> http://rrfw.sourceforge.net/rrdman/rrd_hwreapply.pod.html) to apply
>> holt-winters to the rrds and for simplicity right now I'm only using the
>> following command line:
>>
>> $ sudo -u nobody cp cron.webServiceRequestCounter.Counter.rrd
>> cron.webServiceRequestCounter.Counter.rrd.bak && sudo -u nobody
>> rrd_hwreapply cron.webServiceRequestCounter.Counter.rrd.bak
>> cron.webServiceRequestCounter.Counter.rrd --defaults --force
>>
>> As soon as I run this command, the primary metric
>> (cron.webServiceRequestCounter.Counter) will stop rendering new values on
>> the graph. It appears, however, that updates are still being stored in the
>> rrd except in a different column in each row - I couldn't find much info
>> about what the two different columns in the rrdtool dump output are for.
>>
>> I have placed images, rrdtool dump output & version info into a github
>> repository here - just view the readme
>>
>> https://github.com/adnichols/ganglia-hw-graph-problem
>>
>> I dug through the rrdtool changes in 1.4.6 & 1.4.7 and didn't see
>> anything that looked like this problem, however if someone can point me to
>> something that suggests rrdtool is the issue I can update. It's just a fair
>> amount of work that I'd like to avoid if it's not the issue.
>>
>> Thanks,
>> Aaron
>> --
>> Twitter: @anichols
>> Blog: http://www.opsbs.com
>> Pro: http://www.linkedin.com/in/anichols
>>
>>
>
>
> --
> Twitter: @anichols
> Blog: http://www.opsbs.com
> Pro: http://www.linkedin.com/in/anichols
>
>
>
> --
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_sfd2d_oct
> ___
> Ganglia-developers mailing list
> ganglia-develop...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-developers
>
>
--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Question about scaling

2012-10-23 Thread Nicholas Satterly
Hi Mark,

I assume cnode340 is the head node that all ~340 other gmond's send their
data to. If so, you could reduce the amount of redundant metadata flying
around by increasing "send_metadata_interval" to 120 seconds or higher.

Also, I suspect that if you telnet to port 8649 on your head node it will
take a while to respond because it's busy processing incoming UDP metrics.
If it takes more than 10 seconds to respond on a regular basis then gmetad
will timeout [1].

Try deploying a recently patched version of gmond [2] to the head node
which is now multi-threaded and see if that fixes the problem. It starts a
separate thread for responding to XML metric requests and should respond
immediately while the main thread is still processing metrics.

Let us know how you get on.

Regards,
Nick

[1]
https://github.com/ganglia/monitor-core/blob/master/gmetad/data_thread.c#L103
[2] https://github.com/ganglia/monitor-core/pull/53


On Tue, Oct 23, 2012 at 7:36 PM, Potter,Mark L wrote:

>
>
> data_source "MDACC" 60 cnode340:8649
>
> Everything else is default at this point. http://pastebin.com/UAQYxcX3 is
> a full copy.
>
> 
> From: Nicholas Satterly [nfsatte...@gmail.com]
> Sent: Tuesday, October 23, 2012 13:33
> To: Potter,Mark L
> Cc: ganglia-general@lists.sourceforge.net
> Subject: Re: [Ganglia-general] Question about scaling
>
> Please send thru your gmetad.conf file so we can see how things are
> configured on the server side. *
>
> --Nick.
>
> * Be sure to anonymise any sensitive info.
>
> On 23 Oct 2012, at 19:21, "Potter,Mark L"  wrote:
>
> > I am using what I think to be a fairly standard gmond.conf:
> >
> > globals {
> >  daemonize = yes
> >  setuid = yes
> >  user = nobody
> >  debug_level = 0
> >  max_udp_msg_len = 1472
> >  mute = no
> >  deaf = no
> >  allow_extra_data = yes
> >  host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in
> 1 day */
> >  host_tmax = 30 /*secs */
> >  cleanup_threshold = 300 /*secs */
> >  gexec = no
> >  send_metadata_interval = 30 /*secs */
> > }
> >
> > cluster {
> >  name = "MDACC"
> >  owner = "MD Anderson Caner Center"
> >  latlong = "unspecified"
> >  url = "unspecified"
> > }
> >
> > host {
> >  location = "8,3,1"
> > }
> >
> > udp_send_channel {
> >   host = cnode340
> >   port = 8649
> > }
> >
> > udp_recv_channel {
> >port = 8649
> >  retry_bind = true
> > }
> >
> > tcp_accept_channel {
> >  port = 8649
> > }
> >
> > gmetad is set to check every 60 seconds:
> >
> > data_source "MDACC" 60 cnode340:8649
> >
> >
> > Everything works well until around 200 hosts where it appears gmetad
> starts having issues. I have ~340 hosts to go in to this cluster. Should I
> be running multiple gmetads for this amount of hosts? With all of them
> active the web interface reports all of them down and collects no stats at
> all. I am looking for advice on getting this up and running properly. The
> ganglia host isn't underpowered at all IMO and has plenty of HDD space:
> >
> > Mem:  32955788 (from free)
> > 16 Cores (AMD Opteron(tm) Processor 6128)
> >
> > Thanks for any assistance.
> >
> >
> > Respectfully,
> >
> > Mark L. Potter
> > Research IS & Technology Services
> > UNIX Systems Administrator
> > O: 713-745-2032
> > C:  713-965-4133
> >
> --
> > Everyone hates slow websites. So do we.
> > Make your web apps faster with AppDynamics
> > Download AppDynamics Lite for free today:
> > http://p.sf.net/sfu/appdyn_sfd2d_oct
> > ___
> > Ganglia-general mailing list
> > Ganglia-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/ganglia-general
>
--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Adding Holt-Winters databases to existing rrd causes __SummaryInfo__ metric to fail to render on graphs

2012-10-23 Thread Aaron Nichols
Bumping this thread - I updated to rrdtool 1.4.7 & rebuilt ganglia against
the new version of rrdtool and it didn't appear to change the behavior.

Any suggestions for what this problem might be would be much appreciated.

Thanks,
Aaron

On Mon, Oct 22, 2012 at 8:47 PM, Aaron Nichols  wrote:

> Re-posting this on ganglia-general as I didn't get any hits on -developers:
>
> All,
>   I'm trying to apply Holt-Winters to a few of our rrd's and have run into
> a problem which seems specific to the __SummaryInfo__ rrds. When I apply
> the Holt-Winters algorithm to both the host rrds as well as the
> __SummaryInfo__ rrds, the host rrds work fine however the __SummaryInfo__
> do not. This note will focus on the __SummaryInfo__ rrds.
>
> These metrics are stored in an rrd of type COUNTER. I have also observed
> this behavior on the same metric set to type DERIVE.
>
> I am using the rrd_hwreapply.pl script (
> http://rrfw.sourceforge.net/rrdman/rrd_hwreapply.pod.html) to apply
> holt-winters to the rrds and for simplicity right now I'm only using the
> following command line:
>
> $ sudo -u nobody cp cron.webServiceRequestCounter.Counter.rrd
> cron.webServiceRequestCounter.Counter.rrd.bak && sudo -u nobody
> rrd_hwreapply cron.webServiceRequestCounter.Counter.rrd.bak
> cron.webServiceRequestCounter.Counter.rrd --defaults --force
>
> As soon as I run this command, the primary metric
> (cron.webServiceRequestCounter.Counter) will stop rendering new values on
> the graph. It appears, however, that updates are still being stored in the
> rrd except in a different column in each row - I couldn't find much info
> about what the two different columns in the rrdtool dump output are for.
>
> I have placed images, rrdtool dump output & version info into a github
> repository here - just view the readme
>
> https://github.com/adnichols/ganglia-hw-graph-problem
>
> I dug through the rrdtool changes in 1.4.6 & 1.4.7 and didn't see anything
> that looked like this problem, however if someone can point me to something
> that suggests rrdtool is the issue I can update. It's just a fair amount of
> work that I'd like to avoid if it's not the issue.
>
> Thanks,
> Aaron
> --
> Twitter: @anichols
> Blog: http://www.opsbs.com
> Pro: http://www.linkedin.com/in/anichols
>
>


-- 
Twitter: @anichols
Blog: http://www.opsbs.com
Pro: http://www.linkedin.com/in/anichols
--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Question about scaling

2012-10-23 Thread Potter,Mark L


data_source "MDACC" 60 cnode340:8649

Everything else is default at this point. http://pastebin.com/UAQYxcX3 is a 
full copy.


From: Nicholas Satterly [nfsatte...@gmail.com]
Sent: Tuesday, October 23, 2012 13:33
To: Potter,Mark L
Cc: ganglia-general@lists.sourceforge.net
Subject: Re: [Ganglia-general] Question about scaling

Please send thru your gmetad.conf file so we can see how things are
configured on the server side. *

--Nick.

* Be sure to anonymise any sensitive info.

On 23 Oct 2012, at 19:21, "Potter,Mark L"  wrote:

> I am using what I think to be a fairly standard gmond.conf:
>
> globals {
>  daemonize = yes
>  setuid = yes
>  user = nobody
>  debug_level = 0
>  max_udp_msg_len = 1472
>  mute = no
>  deaf = no
>  allow_extra_data = yes
>  host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in 1 
> day */
>  host_tmax = 30 /*secs */
>  cleanup_threshold = 300 /*secs */
>  gexec = no
>  send_metadata_interval = 30 /*secs */
> }
>
> cluster {
>  name = "MDACC"
>  owner = "MD Anderson Caner Center"
>  latlong = "unspecified"
>  url = "unspecified"
> }
>
> host {
>  location = "8,3,1"
> }
>
> udp_send_channel {
>   host = cnode340
>   port = 8649
> }
>
> udp_recv_channel {
>port = 8649
>  retry_bind = true
> }
>
> tcp_accept_channel {
>  port = 8649
> }
>
> gmetad is set to check every 60 seconds:
>
> data_source "MDACC" 60 cnode340:8649
>
>
> Everything works well until around 200 hosts where it appears gmetad starts 
> having issues. I have ~340 hosts to go in to this cluster. Should I be 
> running multiple gmetads for this amount of hosts? With all of them active 
> the web interface reports all of them down and collects no stats at all. I am 
> looking for advice on getting this up and running properly. The ganglia host 
> isn't underpowered at all IMO and has plenty of HDD space:
>
> Mem:  32955788 (from free)
> 16 Cores (AMD Opteron(tm) Processor 6128)
>
> Thanks for any assistance.
>
>
> Respectfully,
>
> Mark L. Potter
> Research IS & Technology Services
> UNIX Systems Administrator
> O: 713-745-2032
> C:  713-965-4133
> --
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_sfd2d_oct
> ___
> Ganglia-general mailing list
> Ganglia-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Question about scaling

2012-10-23 Thread Nicholas Satterly
Please send thru your gmetad.conf file so we can see how things are
configured on the server side. *

--Nick.

* Be sure to anonymise any sensitive info.

On 23 Oct 2012, at 19:21, "Potter,Mark L"  wrote:

> I am using what I think to be a fairly standard gmond.conf:
>
> globals {
>  daemonize = yes
>  setuid = yes
>  user = nobody
>  debug_level = 0
>  max_udp_msg_len = 1472
>  mute = no
>  deaf = no
>  allow_extra_data = yes
>  host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in 1 
> day */
>  host_tmax = 30 /*secs */
>  cleanup_threshold = 300 /*secs */
>  gexec = no
>  send_metadata_interval = 30 /*secs */
> }
>
> cluster {
>  name = "MDACC"
>  owner = "MD Anderson Caner Center"
>  latlong = "unspecified"
>  url = "unspecified"
> }
>
> host {
>  location = "8,3,1"
> }
>
> udp_send_channel {
>   host = cnode340
>   port = 8649
> }
>
> udp_recv_channel {
>port = 8649
>  retry_bind = true
> }
>
> tcp_accept_channel {
>  port = 8649
> }
>
> gmetad is set to check every 60 seconds:
>
> data_source "MDACC" 60 cnode340:8649
>
>
> Everything works well until around 200 hosts where it appears gmetad starts 
> having issues. I have ~340 hosts to go in to this cluster. Should I be 
> running multiple gmetads for this amount of hosts? With all of them active 
> the web interface reports all of them down and collects no stats at all. I am 
> looking for advice on getting this up and running properly. The ganglia host 
> isn't underpowered at all IMO and has plenty of HDD space:
>
> Mem:  32955788 (from free)
> 16 Cores (AMD Opteron(tm) Processor 6128)
>
> Thanks for any assistance.
>
>
> Respectfully,
>
> Mark L. Potter
> Research IS & Technology Services
> UNIX Systems Administrator
> O: 713-745-2032
> C:  713-965-4133
> --
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_sfd2d_oct
> ___
> Ganglia-general mailing list
> Ganglia-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-general

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


[Ganglia-general] Question about scaling

2012-10-23 Thread Potter,Mark L
I am using what I think to be a fairly standard gmond.conf:

globals {
  daemonize = yes
  setuid = yes
  user = nobody
  debug_level = 0
  max_udp_msg_len = 1472
  mute = no
  deaf = no
  allow_extra_data = yes
  host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in 1 day 
*/
  host_tmax = 30 /*secs */
  cleanup_threshold = 300 /*secs */
  gexec = no
  send_metadata_interval = 30 /*secs */
}

cluster {
  name = "MDACC"
  owner = "MD Anderson Caner Center"
  latlong = "unspecified"
  url = "unspecified"
}

host {
  location = "8,3,1"
}

udp_send_channel {
   host = cnode340
   port = 8649
}

udp_recv_channel {
port = 8649
  retry_bind = true
}

tcp_accept_channel {
  port = 8649
}

gmetad is set to check every 60 seconds:

data_source "MDACC" 60 cnode340:8649


Everything works well until around 200 hosts where it appears gmetad starts 
having issues. I have ~340 hosts to go in to this cluster. Should I be running 
multiple gmetads for this amount of hosts? With all of them active the web 
interface reports all of them down and collects no stats at all. I am looking 
for advice on getting this up and running properly. The ganglia host isn't 
underpowered at all IMO and has plenty of HDD space:

Mem:  32955788 (from free)
16 Cores (AMD Opteron(tm) Processor 6128)

Thanks for any assistance.


Respectfully,

Mark L. Potter
Research IS & Technology Services
UNIX Systems Administrator
O: 713-745-2032
C:  713-965-4133
--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] PANGO Error

2012-10-23 Thread Ghassan Elnajjar
The below errors are complaining that pango-querymodules is not able to 
load the libgobject-2.0.a(libgobject-2.0.so.0) and the 
libffi.a(libffi.so.5).  What was installed in /opt/freewar/bin are the 
following.

-rwxr-xr-x1 root system   182130 Oct 19 10:08 libffi.a
-rwxr-xr-x1 root system85253 Oct 19 10:08 libffi.so.6

-rwxr-xr-x1 root system  1198360 Apr 11 2012  libgobject-2.0.a
lrwxrwxrwx1 root system   19 Oct 22 16:17 
libgobject-2.0.so -> libgobject-2.0.so.0
-rwxr-xr-x1 root system   535910 Apr 11 2012 
libgobject-2.0.so.0


It seems to me that pango-1.24.5 is looking for libffi.so.5 not 
libffi.so.6.  It's either I have installed the wrong libffi, or the wrong 
pango, or there is a bug with the pango rpm.  Will someone please chime in 
and let me know what you did to install the rrdtool-1.3 or higher?  Has 
anyone encountered this issue?  Is there a work around?  Your input is 
greatly appreciated.  Thanks.




Ghassan





From:   Ghassan Elnajjar 
To: ganglia-general@lists.sourceforge.net
Date:   10/22/2012 06:03 PM
Subject:[Ganglia-general]  PANGO Error



Does anyone know why does this version of pango produces the below errors 
after installing on AIX 7.1?  It's needed for the installation of 
rrdtool-1.3 and higher.  Please help as I need to fix this issue for my 
ganglia-web to work.  Thanks. 


Ghassan


#: rpm -Uvh pango-1.24.5-1.aix5.1.ppc.rpm 
pango ## 
Could not load program /opt/freeware/bin/pango-querymodules: 
Could not load module 
/opt/freeware/lib/libgobject-2.0.a(libgobject-2.0.so.0). 
Dependent module /opt/freeware/lib/libffi.a(libffi.so.5) could not 
be loaded. 
Member libffi.so.5 is not found in archive 
Could not load module pango-querymodules. 
Dependent module 
/opt/freeware/lib/libgobject-2.0.a(libgobject-2.0.so.0) could not be 
loaded. 
Could not load module . 
execution of pango-1.24.5-1 script failed, exit status 255 



BTW, below is the order in which I have installed all the prerequisites to 
the rrdtool with pango being the last one. 


rpm -Uvh expat-2.1.0-1.aix5.1.ppc.rpm 
rpm -Uvh zlib-1.2.7-2.aix5.1.ppc.rpm 
rpm -Uvh freetype2-2.4.10-1.aix5.1.ppc.rpm 
rpm -Uvh fontconfig-2.8.0-2.aix5.1.ppc.rpm 
rpm -Uvh libgcc-4.7.2-1.aix7.1.ppc.rpm 
rpm -Uvh libpng-1.5.13-1.aix5.1.ppc.rpm 
rpm --nodeps -Uvh gettext-0.17-1.aix5.1.ppc.rpm 
rpm -Uvh libffi-3.0.11-1.aix5.1.ppc.rpm 
rpm -Uvh glib2-2.30.3-1.aix5.1.ppc.rpm 
rpm -Uvh atk-1.32.0-1.aix5.1.ppc.rpm 
rpm -Uvh libjpeg-8d-1.aix5.1.ppc.rpm 
rpm -Uvh libdatrie-0.2.4-1.aix5.1.ppc.rpm 
rpm -Uvh libthai-0.1.15-1.aix5.1.ppc.rpm 
rpm -Uvh libXrender-0.9.7-2.aix6.1.ppc.rpm 
rpm -Uvh jbigkit-libs-2.0-2.aix5.1.ppc.rpm 
rpm -Uvh jbigkit-2.0-2.aix5.1.ppc.rpm 
rpm -Uvh xz-libs-5.0.4-1.aix5.1.ppc.rpm 
rpm -Uvh libtiff-4.0.3-1.aix5.1.ppc.rpm 
rpm -Uvh jasper-1.900.1-2.aix5.1.ppc.rpm 
rpm -Uvh libiconv-1.14-2.aix5.1.ppc.rpm 
rpm -Uvh libxml2-2.9.0-1.aix5.1.ppc.rpm 
rpm -Uvh pixman-0.26.2-1.aix5.1.ppc.rpm 
rpm -Uvh cairo-1.8.8-1.aix5.1.ppc.rpm 
rpm -Uvh bzip2-1.0.2-4.aix5.1.ppc.rpm 
rpm -Uvh bash-4.2-9.aix5.1.ppc.rpm 
rpm -Uvh info-4.13a-2.aix5.1.ppc.rpm 
rpm -Uvh readline-5.2-3.aix5.1.ppc.rpm 
rpm -Uvh pcre-8.31-1.aix5.1.ppc.rpm 
rpm -Uvh libart_lgpl-2.3.21-1.aix5.1.ppc.rpm 
rpm -Uvh libdbi-0.8.4-1.aix5.1.ppc.rpm 
rpm -Uvh openssl-1.0.1c-1.aix5.1.ppc.rpm 
rpm -Uvh dejavu-sans-mono-fonts-2.33-1.aix5.1.noarch.rpm 
rpm -Uvh dejavu-lgc-sans-mono-fonts-2.33-1.aix5.1.noarch.rpm





NOTE: The information contained in this message may be privileged and 
confidential and protected from disclosure. If the reader of this message 
is not the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly 
prohibited. If you have received this communication in error, please 
notify us immediately by replying to the message and deleting it from your 
computer.
--
VCU Health System
http://www.vcuhealth.org--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] PANGO Error

2012-10-23 Thread Ghassan Elnajjar
The below errors are complaining that pango-querymodules is not able to 
load the libgobject-2.0.a(libgobject-2.0.so.0) and the 
libffi.a(libffi.so.5).  What was installed in /opt/freewar/bin are the 
following.

-rwxr-xr-x1 root system   182130 Oct 19 10:08 libffi.a
-rwxr-xr-x1 root system85253 Oct 19 10:08 libffi.so.6

-rwxr-xr-x1 root system  1198360 Apr 11 2012  libgobject-2.0.a
lrwxrwxrwx1 root system   19 Oct 22 16:17 
libgobject-2.0.so -> libgobject-2.0.so.0
-rwxr-xr-x1 root system   535910 Apr 11 2012 
libgobject-2.0.so.0


It seems to me that pango-1.24.5 is looking for libffi.so.5 not 
libffi.so.6.  It's either I have installed the wrong libffi, or the wrong 
pango, or there is a bug with the pango rpm.  Will someone please chime in 
and let me know what you did to install the rrdtool-1.3 or higher?  Has 
anyone encountered this issue?  Is there a work around?  Your input is 
greatly appreciated.  Thanks.


Ghassan





From:   Ghassan Elnajjar 
To: ganglia-general@lists.sourceforge.net
Date:   10/22/2012 06:03 PM
Subject:[Ganglia-general]  PANGO Error



Does anyone know why does this version of pango produces the below errors 
after installing on AIX 7.1?  It's needed for the installation of 
rrdtool-1.3 and higher.  Please help as I need to fix this issue for my 
ganglia-web to work.  Thanks. 


Ghassan


#: rpm -Uvh pango-1.24.5-1.aix5.1.ppc.rpm 
pango ## 
Could not load program /opt/freeware/bin/pango-querymodules: 
Could not load module 
/opt/freeware/lib/libgobject-2.0.a(libgobject-2.0.so.0). 
Dependent module /opt/freeware/lib/libffi.a(libffi.so.5) could not 
be loaded. 
Member libffi.so.5 is not found in archive 
Could not load module pango-querymodules. 
Dependent module 
/opt/freeware/lib/libgobject-2.0.a(libgobject-2.0.so.0) could not be 
loaded. 
Could not load module . 
execution of pango-1.24.5-1 script failed, exit status 255 



BTW, below is the order in which I have installed all the prerequisites to 
the rrdtool with pango being the last one. 


rpm -Uvh expat-2.1.0-1.aix5.1.ppc.rpm 
rpm -Uvh zlib-1.2.7-2.aix5.1.ppc.rpm 
rpm -Uvh freetype2-2.4.10-1.aix5.1.ppc.rpm 
rpm -Uvh fontconfig-2.8.0-2.aix5.1.ppc.rpm 
rpm -Uvh libgcc-4.7.2-1.aix7.1.ppc.rpm 
rpm -Uvh libpng-1.5.13-1.aix5.1.ppc.rpm 
rpm --nodeps -Uvh gettext-0.17-1.aix5.1.ppc.rpm 
rpm -Uvh libffi-3.0.11-1.aix5.1.ppc.rpm 
rpm -Uvh glib2-2.30.3-1.aix5.1.ppc.rpm 
rpm -Uvh atk-1.32.0-1.aix5.1.ppc.rpm 
rpm -Uvh libjpeg-8d-1.aix5.1.ppc.rpm 
rpm -Uvh libdatrie-0.2.4-1.aix5.1.ppc.rpm 
rpm -Uvh libthai-0.1.15-1.aix5.1.ppc.rpm 
rpm -Uvh libXrender-0.9.7-2.aix6.1.ppc.rpm 
rpm -Uvh jbigkit-libs-2.0-2.aix5.1.ppc.rpm 
rpm -Uvh jbigkit-2.0-2.aix5.1.ppc.rpm 
rpm -Uvh xz-libs-5.0.4-1.aix5.1.ppc.rpm 
rpm -Uvh libtiff-4.0.3-1.aix5.1.ppc.rpm 
rpm -Uvh jasper-1.900.1-2.aix5.1.ppc.rpm 
rpm -Uvh libiconv-1.14-2.aix5.1.ppc.rpm 
rpm -Uvh libxml2-2.9.0-1.aix5.1.ppc.rpm 
rpm -Uvh pixman-0.26.2-1.aix5.1.ppc.rpm 
rpm -Uvh cairo-1.8.8-1.aix5.1.ppc.rpm 
rpm -Uvh bzip2-1.0.2-4.aix5.1.ppc.rpm 
rpm -Uvh bash-4.2-9.aix5.1.ppc.rpm 
rpm -Uvh info-4.13a-2.aix5.1.ppc.rpm 
rpm -Uvh readline-5.2-3.aix5.1.ppc.rpm 
rpm -Uvh pcre-8.31-1.aix5.1.ppc.rpm 
rpm -Uvh libart_lgpl-2.3.21-1.aix5.1.ppc.rpm 
rpm -Uvh libdbi-0.8.4-1.aix5.1.ppc.rpm 
rpm -Uvh openssl-1.0.1c-1.aix5.1.ppc.rpm 
rpm -Uvh dejavu-sans-mono-fonts-2.33-1.aix5.1.noarch.rpm 
rpm -Uvh dejavu-lgc-sans-mono-fonts-2.33-1.aix5.1.noarch.rpm
NOTE: The information contained in this message may be privileged and 
confidential and protected from disclosure. If the reader of this message 
is not the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly 
prohibited. If you have received this communication in error, please 
notify us immediately by replying to the message and deleting it from your 
computer.
--
VCU Health System
http://www.vcuhealth.org
--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


NOTE: The information contained in this message may be privileged and 
confidential and protected from disclosure. If the reader of this message 
is not the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly 
prohibited. If you have received this communication in error, please 
notify us immediately by replying to the message and deleting it from your 
computer.
--
VCU Health System
http://www

[Ganglia-general] PANGO Error - Additional Info

2012-10-23 Thread Ghassan Elnajjar
The below errors are complaining that pango-querymodules is not able to 
load the libgobject-2.0.a(libgobject-2.0.so.0) and the 
libffi.a(libffi.so.5).  What was installed in /opt/freewar/bin are the 
following.

-rwxr-xr-x1 root system   182130 Oct 19 10:08 libffi.a
-rwxr-xr-x1 root system85253 Oct 19 10:08 libffi.so.6

-rwxr-xr-x1 root system  1198360 Apr 11 2012  libgobject-2.0.a
lrwxrwxrwx1 root system   19 Oct 22 16:17 
libgobject-2.0.so -> libgobject-2.0.so.0
-rwxr-xr-x1 root system   535910 Apr 11 2012 
libgobject-2.0.so.0


It seems to me that pango-1.24.5 is looking for libffi.so.5 not 
libffi.so.6.  It's either I have installed the wrong libffi, or the wrong 
pango, or there is a bug with the pango rpm.  Will someone please chime in 
and let me know what you did to install the rrdtool-1.3 or higher?  Has 
anyone encountered this issue?  Is there a work around?  Your input is 
greatly appreciated.  Thanks.



Ghassan





From:   Ghassan Elnajjar 
To: ganglia-general@lists.sourceforge.net
Date:   10/22/2012 06:03 PM
Subject:[Ganglia-general]  PANGO Error



Does anyone know why does this version of pango produces the below errors 
after installing on AIX 7.1?  It's needed for the installation of 
rrdtool-1.3 and higher.  Please help as I need to fix this issue for my 
ganglia-web to work.  Thanks. 


Ghassan


#: rpm -Uvh pango-1.24.5-1.aix5.1.ppc.rpm 
pango ## 
Could not load program /opt/freeware/bin/pango-querymodules: 
Could not load module 
/opt/freeware/lib/libgobject-2.0.a(libgobject-2.0.so.0). 
Dependent module /opt/freeware/lib/libffi.a(libffi.so.5) could not 
be loaded. 
Member libffi.so.5 is not found in archive 
Could not load module pango-querymodules. 
Dependent module 
/opt/freeware/lib/libgobject-2.0.a(libgobject-2.0.so.0) could not be 
loaded. 
Could not load module . 
execution of pango-1.24.5-1 script failed, exit status 255 



BTW, below is the order in which I have installed all the prerequisites to 
the rrdtool with pango being the last one. 


rpm -Uvh expat-2.1.0-1.aix5.1.ppc.rpm 
rpm -Uvh zlib-1.2.7-2.aix5.1.ppc.rpm 
rpm -Uvh freetype2-2.4.10-1.aix5.1.ppc.rpm 
rpm -Uvh fontconfig-2.8.0-2.aix5.1.ppc.rpm 
rpm -Uvh libgcc-4.7.2-1.aix7.1.ppc.rpm 
rpm -Uvh libpng-1.5.13-1.aix5.1.ppc.rpm 
rpm --nodeps -Uvh gettext-0.17-1.aix5.1.ppc.rpm 
rpm -Uvh libffi-3.0.11-1.aix5.1.ppc.rpm 
rpm -Uvh glib2-2.30.3-1.aix5.1.ppc.rpm 
rpm -Uvh atk-1.32.0-1.aix5.1.ppc.rpm 
rpm -Uvh libjpeg-8d-1.aix5.1.ppc.rpm 
rpm -Uvh libdatrie-0.2.4-1.aix5.1.ppc.rpm 
rpm -Uvh libthai-0.1.15-1.aix5.1.ppc.rpm 
rpm -Uvh libXrender-0.9.7-2.aix6.1.ppc.rpm 
rpm -Uvh jbigkit-libs-2.0-2.aix5.1.ppc.rpm 
rpm -Uvh jbigkit-2.0-2.aix5.1.ppc.rpm 
rpm -Uvh xz-libs-5.0.4-1.aix5.1.ppc.rpm 
rpm -Uvh libtiff-4.0.3-1.aix5.1.ppc.rpm 
rpm -Uvh jasper-1.900.1-2.aix5.1.ppc.rpm 
rpm -Uvh libiconv-1.14-2.aix5.1.ppc.rpm 
rpm -Uvh libxml2-2.9.0-1.aix5.1.ppc.rpm 
rpm -Uvh pixman-0.26.2-1.aix5.1.ppc.rpm 
rpm -Uvh cairo-1.8.8-1.aix5.1.ppc.rpm 
rpm -Uvh bzip2-1.0.2-4.aix5.1.ppc.rpm 
rpm -Uvh bash-4.2-9.aix5.1.ppc.rpm 
rpm -Uvh info-4.13a-2.aix5.1.ppc.rpm 
rpm -Uvh readline-5.2-3.aix5.1.ppc.rpm 
rpm -Uvh pcre-8.31-1.aix5.1.ppc.rpm 
rpm -Uvh libart_lgpl-2.3.21-1.aix5.1.ppc.rpm 
rpm -Uvh libdbi-0.8.4-1.aix5.1.ppc.rpm 
rpm -Uvh openssl-1.0.1c-1.aix5.1.ppc.rpm 
rpm -Uvh dejavu-sans-mono-fonts-2.33-1.aix5.1.noarch.rpm 
rpm -Uvh dejavu-lgc-sans-mono-fonts-2.33-1.aix5.1.noarch.rpm





NOTE: The information contained in this message may be privileged and 
confidential and protected from disclosure. If the reader of this message 
is not the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly 
prohibited. If you have received this communication in error, please 
notify us immediately by replying to the message and deleting it from your 
computer.
--
VCU Health System
http://www.vcuhealth.org
NOTE: The information contained in this message may be privileged and 
confidential and protected from disclosure. If the reader of this message 
is not the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly 
prohibited. If you have received this communication in error, please 
notify us immediately by replying to the message and deleting it from your 
computer.
--
VCU Health System
http://www.vcuhealth.org--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo

Re: [Ganglia-general] help :: making grids of gmetads (and question about monitoring topology)

2012-10-23 Thread Adrian Sevcenco
On 10/22/2012 02:18 PM, Adrian Sevcenco wrote:
> Hi! I am a little bit lost on the subject of making grids of metads ..
> how can i do something like :
> gmetad1  gmetad2  (that take data from theri gmonds)
>   \ /
>gmetad3 <- gmond of this machine that takes different other data
> 
> Also the hierarchy of grids and clusters is made at gmetad or gmond
> level? did i understood correctly that gmetad just define data sources
> (gmonds and gmetads) but the exact hierarchy is done at gmond level?
> But if yes, where comes into play the "gridname" ?
> 
> Also, what can i do for taking all data for the other metads not only
> summary data?
> 

Hi! I have other questions about gmond and gmetad :
Is it posible that a gmond to be datasource for multiple gmetads?
i would want something like :

gmond_wn_1 ... gmond_wn_n
\ /
 \   /
  \ /
gmond_frontend_1 >  gmetad_frontend
 \
  \
   gmond_central > gmetad_central

is this posible??

Thank you!
Adrian



smime.p7s
Description: S/MIME Cryptographic Signature
--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general