[Ganglia-general] Ganglia Windows agent binaries.

2007-04-03 Thread Richard.Grevis
FYI.

http://www.aouk83.dsl.pipex.com

has a link to a cygwin based windows agent (not as an installer package
though),
and also a link to a WMI native Ganglia agent coded by APR consulting in
Switzerland.

Enjoy.

Richard  Grevis
Production Architecture
Barclays Capital, Canary Wharf, London, E14 4BB



> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On 
> Behalf Of Martin Knoblauch
> Sent: 29 March 2007 16:29
> To: Witham, Timothy D; ganglia-general@lists.sourceforge.net
> Subject: Re: [Ganglia-general] gmetad patch to contact random 
> data_sourcehosts
> 
> 
> Tim,
> 
>  your diff command looks a bit surprising to me. The revision 
> number looks like CVS to me and we are SVN since quite some time.
> 
>  Which version of Ganglia have you checked out?
> 
> Cheers
> Martin
> --- "Witham, Timothy D" <[EMAIL PROTECTED]> wrote:
> 
> > Hi,
> > 
> > I just had a situation where the first host in a gmetad data_source 
> > accepts the connection but offers no data, like this:
> > 
> >   poll() timeout for [clustername] data source after 0 bytes read
> > 
> > Gmetad always tries the sources in order and so it just 
> keeps getting 
> > stuck on this first one, and losing the data for the entire cluster.
> > 
> > Here is a quick patch that tries random hosts from the list 
> instead, 
> > and solved my problem.  It is not careful to make sure it 
> tried them 
> > all, but if it fails it will just try again next time.  If someone 
> > wants to fix it to try all the sources in a random order, 
> that would 
> > be fine.  Perhaps this could be included in the next release unless 
> > someone knows a good reason to always try the sources in order.
> > 
> > Thanks!
> > 
> > -8<-
> > diff -c -r1.1.1.1 data_thread.c
> > *** data_thread.c   19 Mar 2007 18:52:32 -  1.1.1.1
> > --- data_thread.c   28 Mar 2007 18:12:08 -
> > ***
> > *** 18,24 
> >   void *
> >   data_thread ( void *arg )
> >   {
> > !int i, sleep_time, bytes_read, rval;
> >  data_source_list_t *d = (data_source_list_t *)arg;
> >  g_inet_addr *addr;
> >  g_tcp_socket *sock=0;
> > --- 18,24 
> >   void *
> >   data_thread ( void *arg )
> >   {
> > !int i, j, sleep_time, bytes_read, rval;
> >  data_source_list_t *d = (data_source_list_t *)arg;
> >  g_inet_addr *addr;
> >  g_tcp_socket *sock=0;
> > ***
> > *** 60,75 
> >  if(d->last_good_index >= 0)
> >sock = g_tcp_socket_new ( d->sources[d->last_good_index] );
> >   
> > !/* If there was no good connection last time or the above
> > connect failed then try each host in the list. */
> >  if(!sock)
> >  {
> > !  for(i=0; i < d->num_sources; i++)
> >  {
> > !  /* Find first viable source in list. */
> > !  sock = g_tcp_socket_new ( d->sources[i] );
> >if( sock )
> >  {
> > !  d->last_good_index = i;
> >break;
> >  }
> >  }
> > --- 60,80 
> >  if(d->last_good_index >= 0)
> >sock = g_tcp_socket_new ( d->sources[d->last_good_index] );
> >   
> > !/* If there was no good connection last time or the above
> > !   connect failed then try random hosts in the list.  We try
> > !   random ones in case someone is accepting the connection
> > !   but refusing to provide any data; we don't want to get
> > !   stuck with a non-working host. */
> >  if(!sock)
> >  {
> > !  for(i=0; i < d->num_sources * 2; i++)
> >  {
> > !  /* Find random viable source in list. */
> > !j = d->num_sources * (rand() / (RAND_MAX - 1.0));
> > !  sock = g_tcp_socket_new ( d->sources[j] );
> >if( sock )
> >  {
> > !  d->last_good_index = j;
> >break;
> >  }
> >  }
> > -8<--
> > 
> > --
> > <[EMAIL PROTECTED]>; I don't speak for Intel or anyone.
> > 
> >
> --
> ---
> > Take Surveys. Earn Cash. Influence the Future of IT
> > Join SourceForge.net's Techsay panel and you'll get the chance to 
> > share your opinions on IT & business topics through brief 
> surveys-and 
> > earn cash
> >
> http://www.techsay.com/default.php?page=join.php&p=sourceforge
&CID=DEVDEV
> ___
> Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
> 
> 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

-

Re: [Ganglia-general] A survey of Ganglia users and usage.

2007-04-03 Thread Richard.Grevis
Chris,

I fully agree with your clean and simple comment. Part of
Ganglia's real strength is what it doesn't have, rather than
what it does. Examples:

- metric data is not written locally on the monitored host
- The metric set is fixed in compiled code.
- No ability to customise graphs.
- No server side database other than the RRD files.

They may seem like limitations, but in fact they make Ganglia
easier to deploy to production hosts, and ongoing administrative
effort is essentially zero.

We need to carfully consider "enhancements" so we don't end up
destroying its simplicity and ease of use.

Richard Grevis
Production Architecture
Barclays Capital, Canary Wharf, London, E14 4BB



> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On 
> Behalf Of Stackpole, Chris
> Sent: 02 April 2007 21:29
> To: ganglia-general@lists.sourceforge.net
> Subject: Re: [Ganglia-general] A survey of Ganglia users and usage.
> 
> 
> " But how do we know that everyone else's suggestions will 
> fall into the same not important "philosphical" camp?"
> 
> I agree that ganglia is excellent, however, just by watching 
> the list and reading the occasional post one can see that 
> there are plenty of people asking for features. Some features 
> are asked for more often then others and people could debate 
> on how important they are and if they should or shouldn't be 
> a part of Ganglia, but the requests are still there. Looking 
> through the list I wonder how many times people have 
> reinvented the wheel simply because there was not a way of 
> doing it within Ganglia. Example: A quick search through the 
> list shows multiple ways of alerting when a system goes down. 
> It is also quite evident that many have settled in with 
> Nagios and Zabbix to get the features that they required.
> 
> I love Ganglia because it has a really simple and clean 
> interface that lets me check on the status of my systems. It 
> is incredibly easy to learn and work with while being nice 
> enough eye candy that the boss likes to show off the graphs. 
> However, when it comes to the details of why a system went 
> down I turn to Zabbix. There are just a few voids within 
> Ganglia that I need to have Zabbix for (alerting, time 
> shifting, logging of errors and who fixed them at what time, 
> plus several more).
> 
> I think it would be a great idea to put together a 
> questionnaire of the uses that Ganglia is put through. If 
> nothing more then to show who has been adding what to their 
> version of Ganglia and what could be shared with the 
> community. I would be very interested in seeing what kind of 
> add-ons people have come up with because I am positive that 
> others have come up with better solutions then I have and it 
> is possible that I have come up with a solution that can help 
> someone else out.
> 
> " I've been under the impression for a while ganglia wasn't 
> getting a whole lot of development and was mostly in 
> maintenance mode. It hasn't changed a whole lot in the few 
> years I've been using it"
> 
> Working together in a community where many people work on, 
> test, and improve the project is what makes a project like 
> this really strong. Maybe it is just me, but to think that a 
> project is not growing and has been in maintenance mode for a 
> number of years is disturbing. Maybe it is the thought of a 
> stagnant project or maybe it is the thought that I have made 
> changes that I have not sent back to the community. Either 
> way, I am all for change if it helps share ideas through the 
> community and helps someone else out.
> 
> Just my 2 cents :-D
> Post a link to the questionnaire and I will fill it out. I 
> have not done any major changes, but I will be willing to 
> share what I have done if anyone wants.
> 
> Chris Stackpole
> 
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On 
> Behalf Of [EMAIL PROTECTED]
> Sent: Monday, April 02, 2007 12:07 PM
> To: [EMAIL PROTECTED]
> Cc: ganglia-general@lists.sourceforge.net; [EMAIL PROTECTED]
> Subject: Re: [Ganglia-general] A survey of Ganglia users and usage.
> 
> 
> > 
> > I've been under the impression for a while ganglia wasn't getting a
> > whole lot of development and was mostly in maintenance mode. 
> > It hasn't 
> > changed a whole lot in the few years I've been using it 
> > (except perhaps 
> > the config file format, a change that was much appreciated)
> 
> You are quite right, it has not changed much lately.
> 
> > 
> > The software is already excellent, and most of the changes I could
> > suggest would be philosophical "my way is better than your 
> > way" type things.
> 
> Yes, the software is already excellent. And its great that it 
> does everything you want. But how do we know that everyone else's 
> suggestions will fall into the same not important "philosphical" camp?
> 
> That's my point really. We don't.
> 
> kind regards,
> Richard
> 

Re: [Ganglia-general] A survey of Ganglia users and usage.

2007-04-02 Thread Richard.Grevis

> 
> I've been under the impression for a while ganglia wasn't getting a 
> whole lot of development and was mostly in maintenance mode. 
> It hasn't 
> changed a whole lot in the few years I've been using it 
> (except perhaps 
> the config file format, a change that was much appreciated)

You are quite right, it has not changed much lately.

> 
> The software is already excellent, and most of the changes I could 
> suggest would be philosophical "my way is better than your 
> way" type things.

Yes, the software is already excellent. And its great that it does
everything you want. But how do we know that everyone else's 
suggestions will fall into the same not important "philosphical" camp?

That's my point really. We don't.

kind regards,
Richard

For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.




[Ganglia-general] A survey of Ganglia users and usage.

2007-04-02 Thread Richard.Grevis
All,

Like many Ganglia users, we have modified the PHP a lot, changed some
C code a bit, and added a whole lot of functionality by creating scripts
of various flavours.

I have also have entirely failed to push these mods back to the
community,
and one reason for this is that I have no idea how others use ganglia
and what
is important to them. For example, our clusters are often small,
membership is
fairly volatile, and we have hundreds of clusters. So we have code that
deals
with this in terms of navigation and filtering. Would other users want
this?
I have no idea.

Perhaps we could create a simple anonymous survey for Ganglia users?
Code authors could then be guided quantitively by what the community
is really doing - what kind of hosts they monitor - what they use
in Ganglia, and what they may need.

What do you (all) think?

- richard

For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.




Re: [Ganglia-general] Tsubame uses Ganglia

2007-04-02 Thread Richard.Grevis
10336 cores - golly!
A grid level screen shot for us:
http://www.aouk83.dsl.pipex.com/
21,676 cores. More golly!

But to be fair, ours is not one big cluster - we have hundreds.

Richard Grevis
Production Architecture
Barclays Capital, Canary Wharf, London, E14 4BB



> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On 
> Behalf Of Bernard Li
> Sent: 02 April 2007 06:16
> To: ganglia-general@lists.sourceforge.net
> Subject: [Ganglia-general] Tsubame uses Ganglia
> 
> 
> Dear Ganglia users:
> 
> Was watching this video about Tsubame on Novell's website -- 
> Tsubame is Japan's newest supercomputer and was ranked no. 1 
> in Asia in November's top500 list.
> 
> Anyways, the video has a fly by segment of the ganglia web 
> frontend -- monitoring 10336 cores!
> 
> Just FYI :)
> 
> Cheers,
> 
> Bernard
> 
> --
> ---
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the 
> chance to share your opinions on IT & business topics through 
> brief surveys-and earn cash 
> http://www.techsay.com/default.php?page=join.php&p=sourceforge
&CID=DEVDEV
___
Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.




Re: [Ganglia-general] Not getting something

2007-03-30 Thread Richard.Grevis
Michael,

Use different multicast addresses for each cluster,
unless you are sure the multicast can't leak
from 1 cluster to another.

Remember that when you list hosts after the data_source
for gmetad.conf that is for resilience only. You do not have to
mention all nodes in the cluster there.

Given your symptoms it might be something else. I suggest you
consider using unicast initially rather than multicast until
you get everything going. (upd_send in gmond.conf pointing to
a nominated headnode on each cluser, then data_source from that).

And netcatting hosts can be very instructive (e.g. nc lsora1006 8649).
Are all expected hosts listed in the nc output?
Unexpected hostnames? (gmond does reverse dns lookup to make hostnames).
Is the cluster name returned by nc different for every cluster?
(the clustername in gmetad.conf is not used).

good luck

Richard Grevis
Production Architecture
Barclays Capital, Canary Wharf, London, E14 4BB



> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On 
> Behalf Of Michael Steeevs
> Sent: 30 March 2007 15:29
> To: ganglia-general@lists.sourceforge.net
> Subject: [Ganglia-general] Not getting something
> 
> 
> 
> I'm trying to set up what I hope/think is a pretty straight 
> forward configuration -- I'm looking to monitor Oracle RAC 
> via ganglia, and I've got three clusters (Prod, Dev and 
> Test).  I've got a machine I'm using right now for both 
> gmetad and the web front end piece, and I can only get one 
> host from each cluster to show up as 'up' in the web front end.
> 
> Right now, I've got the following setup:
> 
> lsora1003 and lsora1006 -- Dev RAC, running just gmond.
> 
> cluster { 
>   name = "Oracle RAC Dev nodes" 
>   owner = "myorg" 
>   latlong = "unspecified" 
>   url = "unspecified" 
> } 
> 
> /* Feel free to specify as many udp_send_channels as you
>  * like.  Gmond used to only support having a single channel */ 
> udp_send_channel { 
>   mcast_join = 239.2.100.71 
>   port = 8649 
> } 
> 
> /* You can specify as many udp_recv_channels as you
>  * like as well. */ 
> udp_recv_channel { 
>   mcast_join = 239.2.100.71 
>   port = 8649 
>   bind = 239.2.100.71 
> } 
> 
> lsora1001, lsora1002 and lsora1005 -- Test RAC, running just gmond.
> 
> cluster { 
>   name = "Oracle RAC Test nodes" 
>   owner = "myorg" 
>   latlong = "unspecified" 
>   url = "unspecified" 
> } 
> 
> /* Feel free to specify as many udp_send_channels as you
>  * like.  Gmond used to only support having a single channel */ 
> udp_send_channel { 
>   mcast_join = 239.2.101.71 
>   port = 8649 
> } 
> 
> /* You can specify as many udp_recv_channels as you
>  * like as well. */ 
> udp_recv_channel { 
>   mcast_join = 239.2.101.71 
>   port = 8649 
>   bind = 239.2.101.71 
> } 
> 
> lsora1004, lsora1007 and lsora1008 -- Prod RAC, running just gmond.
> 
> cluster { 
>   name = "Oracle RAC Prod nodes" 
>   owner = "myorg" 
>   latlong = "unspecified" 
>   url = "unspecified" 
> } 
> 
> /* Feel free to specify as many udp_send_channels as you
>  * like.  Gmond used to only support having a single channel */ 
> udp_send_channel { 
>   mcast_join = 239.2.102.71 
>   port = 8649 
> } 
> 
> /* You can specify as many udp_recv_channels as you
>  * like as well. */ 
> udp_recv_channel { 
>   mcast_join = 239.2.102.71 
>   port = 8649 
>   bind = 239.2.102.71 
> } 
> 
> My gmetad server has gmond running as cluster 'localhost', 
> and the following in the gmetad file:
> 
> data_source "localhost" localhost
> data_source "Oracle RAC Dev nodes" lsora1003 lsora1006 
> data_source "Oracle RAC Test nodes" lsora1001 lsora1002 
> lsora1005 data_source "Oracle RAC Prod nodes" lsora1004 
> lsora1007 lsora1008
> 
> Initially, none of the other hosts would show up in their 
> clusters, just the first node listed, but over time the other 
> nodes do appear, and are active briefly, and then show as 
> being down and unable to contact.
> 
> 
> -Mike
> -- 
> Michael Steeves ([EMAIL PROTECTED])
> 
> --
> ---
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the 
> chance to share your opinions on IT & business topics through 
> brief surveys-and earn cash 
> http://www.techsay.com/default.php?page=join.php&p=sourceforge
&CID=DEVDEV
___
Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by v

Re: [Ganglia-general] Ganglia custom Round-Robin archives RRA

2007-03-29 Thread Richard.Grevis
You will have to remove the old rrds to allow your new definition to be
applied.
The RRA is only used at the initial creation of each rrd file.
 
If you want to keep your old data, you will have to do magic
(dump/export/import/perl-script)
 
regards,

Richard Grevis 
Production Architecture 
Barclays Capital, Canary Wharf, London, E14 4BB 


 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
CASTRO Paulo Edgar
Sent: 29 March 2007 15:57
To: ganglia-general@lists.sourceforge.net
Subject: [Ganglia-general] Ganglia custom Round-Robin archives RRA



Hi all. 

We have been testing ganglia here implemented in about 250
machines. 
By the way, good job on the tool guys. 

We've been peeking at the conf files namely gmetad.conf and we
found this commented option about Custom Round-Robin archives.

The thing is, we wanted to be able to have a RRA of our own who
could aggregate all the 5 minute PDP for a whole year. See what I mean
;), So we wouldn't lose granularity while reading directly from the rrd
files.

We tried adding this to the gmetad.conf 
RRAs "RRA:AVERAGE:0.5:1:105408" being 105408 the number of 5
minutes in a year. 

But we still haven't noticed any change nor the rrd files have
grown enough to accommodate the new RRA. 

How can we manage to do this? 
Do we need to start the whole colection process again, erasing
the previous data and files? 
Will it work with this new option? 
Is this syntax for the conf file correct? 

Tkx in advance, 


PECastro 



For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.



Re: [Ganglia-general] Gmetad and web frontend on different machines.

2007-03-29 Thread Richard.Grevis
Saundry,
 
It sort of looks like you can, but actually you can't.
gmetad writes to rrd databases as local files,
and the web and php read rrd databases as local
(actually it invokes rrdtool itself).
 
I imagine you could separate the two using NFS filessystems,
but I have not tried this.

kind regards,

Richard Grevis 
Production Architecture 
Barclays Capital, Canary Wharf, London, E14 4BB 
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
saundrya mishra
Sent: 29 March 2007 14:30
To: ganglia-general@lists.sourceforge.net
Subject: [Ganglia-general] Gmetad and web frontend on different
machines.



Hi There,

I am new to Ganglia. Can we have gmetad and web frontend for a
cluster to be running on two different machines?? If yes, then how is it
possible since i read in the configuration file of the web frontend that
the RRDTool databases  need to be local to be read? 

Greetings,
Saundrya.




For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.



Re: [Ganglia-general] GRID / CLUSTER

2007-02-16 Thread Richard.Grevis

See comments below, although it may or may not be really right.


> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On 
> Behalf Of Matthias Blankenhaus
> Sent: 16 February 2007 04:29
> To: ganglia-general@lists.sourceforge.net
> Subject: [Ganglia-general] GRID / CLUSTER
> 
> 
> Howdy !
> 
> I am a newbie and have some questions about concepts and 
> their mapping to config identifier:
> 
> 1. In gmetad.conf one can define a "gridname".  What is the concept 
>behind this and what does this actually do ?

In itself the grid name is just a string and does nothing if all your
clusters report to 1 server. So as you are starting out, set it and 
forget it.

Later: you can federate ganglia servers together
in a heirarchical way, which is where grid name comes in. You build a
grid of grids. There are examples of this on the web.

> 
> 2. In gmond.conf one can define a "name" for a cluster. What is the 
>concept behind this and what does this actually do ?  What is
>the difference between a Ganglia grid and a Ganglia cluster ?

Err... the cluster name is the name of the cluster. You may think that
ganglia would use the cluster label to work out which hosts belong
in which cluster. No.

Clusters contain hosts, Grids contain clusters. They are treated
differently in the php code, but they are structurally similar.

> 
> I wanted to create a cluster (grid ?) consisting of two sub-cluster 
> (cluster ?).  I have tried the following two configuration 
> without seeing a difference.  So what is the difference ?
> 
> Also, I have noticed that the identifier in gmetad.conf after 
> data_source is completely independent from the actual naming 
> of the cluster.  The cluster name then is the one that is 
> presented in the GUI and also reflected in the RRD DB.  What 
> is the id in the data_source clause for ?

Yes, the cluster name in gmetad.conf is irrevelant.
The cluster name is what is returned by gmond that is
polled by gmetad. cluster_head01 from below.

As for the configs below -
Ouch! I have a headache. What I would suggest is that you start simple.
So:
1) Use unicast, not multicast.
2) Start with a single headnode, and configure all hosts of
   the cluster to point to it. (udp_send_channel)
3) Start with a single ganglia server (1 gmetad instance).
   Do not run a gmetad on cluster headnodes.
   Configure gmetad (on what you call master_node)
   to have a single data_source entries for each cluster
   (you configure gmetad with the headnode names).

> I have noticed that the number of deteced CPUs is
> inconsistent.  It seems to depend on the order in which I
> start the gmond's / gmetad's.  Is there a order and if
> so which one is the correct order ?
> 
> 
> -
> Configuration I
> -
> 
> MASTER NODE
> ---
> 
> gmetad.conf
> 
> gridname "CARLSBAD"
> data_source "OSCAR 1" cluster_head01:8651   # running gmetad
> data_source "OSCAR 2" cluster_head02:8651   # runing gmetad

If you really wanted to do this, you should access port 8651
see above. Port 8649 is for the gmond not the gmetad.

> 
> cluster_head01
> --
> 
> gmetad.conf:
> 
> data_source "Rack 1" 11.0.0.5 11.0.0.4
> 
> 
> gmond.conf:  cluster_head01
> 
> cluster_head02
> --
> 
> gmetad.conf:
> 
> data_source "Rack 1" 11.0.0.5 11.0.0.4
> 
> gmond.conf:  cluster_head01
> 
> 
> 
> Configuration II
> 
> MASTER NODE
> ---
> 
> gmetad.conf
> 
> gridname "CARLSBAD"
> data_source "OSCAR 1" cluster_head01   # running gmetad
> data_source "OSCAR 2" cluster_head02   # runing gmetad
> 
> 
> cluster_head01
> --
> 
> gmetad.conf:
> gridname "OSCAR 1"
> data_source "Rack 1" 11.0.0.5 11.0.0.4
> 
> gmond.conf:  cluster_head01
> 
> cluster_head02
> --
> 
> gmetad.conf:
> 
> gridname "OSCAR 2"
> data_source "Rack 1" 11.0.0.5 11.0.0.4
> 
> 
> gmond.conf:  cluster_head01
> 
> 
> 
> 
> 
> Your answers are greatly appreciated.
> 
> 
> Thanx,
> Matthias
> 
> 
> --
> ---
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the 
> chance to share your opinions on IT & business topics through 
> brief surveys-and earn cash 
> http://www.techsay.com/default.php?page=join.php&p=sourceforge
&CID=DEVDEV
___
Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not acc

[Ganglia-general] A native windows gmond

2007-02-07 Thread Richard.Grevis
All,

a swiss consultancy has implemented a native windows gmond, and the
binaries
are in the public domain and free. Follow the trail here:
http://aprconsulting.ch/product.htm
I believe they are also offering ganglia consulting, support, and
customisations for a fee.
This daemon is much better because of the extra metrics, and because it
is a native
windows service. With the existing gmond, missing or zero metrics were
actually cygwin's fault,
not gmond.

These are the extra metrics they have compared to the gmond-cygwin one:
name = "proc_run"
name = "proc_total"
name = "mem_free"
name = "mem_shared"
name = "mem_buffers"
name = "mem_cached"
name = "swap_free"
name = "bytes_out"
name = "bytes_in"
name = "pkts_in"
name = "pkts_out"
name = "disk_total"
name = "disk_free"
name = "part_max_used"
   name = "sys_cpu_queue_len"
   name = "mem_pages_sec"
   name = "mem_committed_bytes"
   name = "phys_disk_bytes_sec"
   name = "phys_disk_time"

It all seems to work just fine, although the disk stat's caused a
problem on one of
my hosts. If gmond exits, try turning off some disk metrics.

The final point is that as the extra metrics are binary coded, it should
be deployed
in an all or nothing way per cluster. (Well maybe there can be some
mixing, so
long as the headnode is the APR daemon.

cheers,
Richard

For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.




[Ganglia-general] windows gmond in cpu loop

2007-02-07 Thread Richard.Grevis
Hi,

has anyone observed gmond consume 100% of cpu under cygwin?

The gmond process suddenly starts using up all the cpu.
The cygwin version is recent.

There were 6 threads in the daemon, each consuming about
20% each (really this means that any of the threads could soak
up 1 cpu if given the chance). csrss.exe was also eating cpu.
But gmond still responded to TCP requests on the 8649.

Each thread seemed to be calling tdll.dll!RtlConvertUiListToApiList a
lot,
which is part of cygwin. There is this note:
http://www.cygwin.com/ml/cygwin/2004-09/msg01265.html
which sounds like what I have. But that was years ago.

Richard Grevis
Production Architecture
Barclays Capital, Canary Wharf, London, E14 4BB
*DDI : +44 (0) 20 7773 4915
 * richard.grevis


For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.




Re: [Ganglia-general] What are the rrdtool creation parameters forGanglia Databases?

2007-01-26 Thread Richard.Grevis
Ian,

it is unclear what you are really trying to do.

Do you want a complete normal running ganglia with some separate
java rrd4j thing able to separately extract/graph rrd data popupulated
by ganglia?

If so, the key data source to parse is connecting to
the gmetad server's port 8651 which dumps the current grid state as XML.

Or do you want to replace 1 or more ganglia components?

Or... ?

Richard 


> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On 
> Behalf Of Ian Wootten
> Sent: 26 January 2007 16:46
> To: ganglia-general@lists.sourceforge.net
> Subject: [Ganglia-general] What are the rrdtool creation 
> parameters forGanglia Databases?
> 
> 
> Hi all,
> 
> I want to replicate ganglia's storage in Java, using a multicast 
> listener, storing and manipulating using rrd4j. Firstly has 
> anyone done 
> anything similar? I'm struggling knowing what parameters to 
> set for the 
> database and getting an adequate resolution of the metrics 
> captured from 
> multicast (10-30s for the application I desire). Does anyone 
> know what 
> the datasource and archive creation commands would be/how 
> many there are?
> 
> Secondly, and I think this is the main thing, the capture of 
> information 
> seems to take ages to be recieved in this way. I'm aware of 
> the MonaLISA 
> project and their java interfaces into ganglia, but a similar 
> implementation by myself seems extremely slow. Currently 
> packets seem to 
> be retrieved at a rate of 1 a second, with each packet containing a 
> single metric value - I'd like to have a complete set after 10 or so 
> seconds.Would I be better off sticking to my current method of 
> interfacing with ganglia's rrd databases directly and 
> extracting content 
> via the fetch command?
> 
> Thanks,
> 
> Ian
> 
> 
> --
> ---
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the 
> chance to share your opinions on IT & business topics through 
> brief surveys - and earn cash 
> http://www.techsay.com/default.php?page=join.php&p=sourceforge
&CID=DEVDEV
___
Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.




Re: [Ganglia-general] gmond getting stuck

2007-01-18 Thread Richard.Grevis
hmm.

a blocking write? Is this write in apr_socket_send do you
know? The network I/O is meant to be asynchronous. One
of the guru boys changed this on the 3.0.3 release - fairly
recently. Does someone remember?

Richard Grevis
Production Architecture
Barclays Capital, Canary Wharf, London, E14 4BB
*DDI : +44 (0) 20 7773 4915
 * richard.grevis


> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On
apr_socket_send
> Behalf Of Bernard Li
> Sent: 15 January 2007 21:04
> To: ganglia-general@lists.sourceforge.net
> Subject: [Ganglia-general] gmond getting stuck
> 
> 
> I've had this happen to me at least twice now, every now and 
> then gmond would stop getting data even though it is running. 
>  Telnetting to port 8649 gives nothing and strace on the 
> process gives:
> 
> # strace -p 6076
> Process 6076 attached - interrupt to quit
> write(6, " 
> So it looks like it's stuck.
> 
> Has anybody seen this happen before?  This is running the 
> latest 3.0.4 code on a x86 machine.
> 
> Thanks,
> 
> Bernard
> 
> --
> ---
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the 
> chance to share your opinions on IT & business topics through 
> brief surveys - and earn cash 
> http://www.techsay.com/default.php?page=join.php&p=sourceforge
&CID=DEVDEV
___
Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.




Re: [Ganglia-general] Windows port issues

2007-01-09 Thread Richard.Grevis
I can send you gygwin binaries for 3.0.4 if that helps, although
for the life of me I can't find where the cygwin version number
is kept, so I can't say whether the cygwin1.dll is new enough.

Richard Grevis
Production Architecture
Barclays Capital, Canary Wharf, London, E14 4BB
*DDI : +44 (0) 20 7773 4915
 * richard.grevis


> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On 
> Behalf Of Vladimir
> Sent: 08 January 2007 02:22
> To: Carlo Marcelo Arenas Belon
> Cc: ganglia-general@lists.sourceforge.net
> Subject: Re: [Ganglia-general] Windows port issues
> 
> 
> Carlo Marcelo Arenas Belon wrote:
> > where you using the cygwin1.dll from the installer as well (version 
> > 1.5.11)?, just went and install that package and wasn't able to 
> > reproduce the problem you reported in a Windows XP Home SP2 box.
> >   
> I was using cygwin1.dll from the installer.
> 
> > but of course wasn't able to run cygwin anymore (that uses version 
> > 1.5.23-2) while gmond was running because there were conflicting 
> > library versions.
> >
> > what systems were you running gmond on?
> I have run it on XP Pro SP2 and Windows 2003 Server with same result. 
> 100% CPU WAIT goes away if I replace the binaries with the latest 
> cygwin1.dll and 3.0.3 binaries I compiled. I didn't test 
> 3.0.3 binaries 
> with 1.5.11 DLL.
> 
> I was gonna see if I could make a quick distribution using the NSIS 
> (Nullsoft Installer).
> 
> Vladimir
> 
> --
> ---
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the 
> chance to share your opinions on IT & business topics through 
> brief surveys - and earn cash 
> http://www.techsay.com/default.php?page=join.php&p=sourceforge
&CID=DEVDEV
___
Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.




Re: [Ganglia-general] Windows port issues

2007-01-08 Thread Richard.Grevis
Also,

forgot to mention. The cygwin ganglia build sometimes hangs
in a make. Don't know why. kill the make process and then
try the make again. eventually it gets all the way through.

Richard Grevis
Production Architecture
Barclays Capital, Canary Wharf, London, E14 4BB
*DDI : +44 (0) 20 7773 4915
 * richard.grevis


> -Original Message-
> From: Grevis, Richard: IT (LDN) 
> Sent: 08 January 2007 11:52
> To: 'Vladimir Vuksan'; ganglia-general@lists.sourceforge.net
> Subject: RE: [Ganglia-general] Windows port issues
> Importance: Low
> 
> 
> Hi,
> 
> I never understood the what waitIO under cygwin was telling 
> me. So I simply commented the metric out of the gmond.conf 
> file. That should be sufficient, but it is important that the 
> graph.php code in the html tree (cpu_report) has something like this:
>   if (file_exists("$rrd_dir/cpu_wio.rrd")) {
>  $series .= 
> "DEF:'cpu_wio'='${rrd_dir}/cpu_wio.rrd':'sum':AVERAGE "
>  ."CDEF:'ccpu_wio'=cpu_wio,num_nodes,/ "
>  ."STACK:'ccpu_wio'#$cpu_wio_color:'WAIT CPU' ";
>   }
> 
> So waitIO is only graphed if the metric is collected (file 
> exists really).
> 
> I have compiled 3.0.4 gmond and gmetric, and it compiled the 
> first time, which was much easier that getting 3.0.3 going. I 
> followed my own rules
> (below):
> 
> So:
> 
> 1) Install a full cygwin system (specified on their downloader app),
>don't use the default install or any other subset. Get the lot.
> 2) The ganglia sourse must come straight from the downloaded tar file,
>and especially don't use a source tree where a ./configure 
> has already
>been done on some other system.
> 3) find . '(' -name libtool -o -name config.cache -o -name 
> config.status ')' -a -print -a -exec rm '{}' ';'
>3.0.4 does not seem to contain spurious libtools, so this 
> may no longer be needed.
>can't hurt though.
> 3a) ./configure make sure you do this within a cygwin cmd 
> window, not a DOS window.
>be prepared to run ./configure more than once (if the make 
> does not generate .exe's.
> 4) make -i, and be prepared to do this more than once.
> 5) Only stop when you see gmond.exe exists. The make errors you see
>are sometmies for things that don't matter (hence the -i)
> 6) when you have the .exe files, remember that you need the 
> cygwin1.dll
>from the version of cygwin1.dll for the cygwin you used to compile
>the agents (/bin/cygwin1.dll)
> 
> note - it is only gmond and gmetad that will compile - gmetad 
> will not compile under cygwin.
> 
> another note - do your hosts contain other running cygwin 
> processes? If so, you will need to build ganglia against the 
> version of cygwin already used. The issue here is that only 1 
> cygwin dll version can run, so all your cygwin based 
> processes must use the same dll.
> 
> 
> Richard Grevis
> Production Architecture, at least this week.
> Barclays Capital, Canary Wharf, London, E14 4BB
> *DDI : +44 (0) 20 7773 4915
>  * richard.grevis
> 
> 
> > -Original Message-
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] On 
> > Behalf Of Vladimir Vuksan
> > Sent: 03 January 2007 20:38
> > To: ganglia-general@lists.sourceforge.net
> > Subject: [Ganglia-general] Windows port issues
> > 
> > 
> > Just curious about the state of Windows port 3.0.0. Apparently all
> > machines we installed the 3.0.0 version on show constant 100% 
> > WAIT CPU 
> > under CPU report.
> > 
> > On a different note for kicks I tried compiling 3.0.4 under
> > cygwin and  
> > I run into
> > 
> > protocol.h:9:21: rpc/rpc.h: No such file or directory
> > 
> > On my SuSe box /usr/include/rpc/rpc.h is part of glibc-devel
> > but  can't 
> > find on Cygwin which package contains those files. I tried 
> > the include 
> > files from SuSe but that didn't work either. Any clues ?
> > 
> > Vladimir
> > 
> > --
> > ---
> > Take Surveys. Earn Cash. Influence the Future of IT
> > Join SourceForge.net's Techsay panel and you'll get the
> > chance to share your opinions on IT & business topics through 
> > brief surveys - and earn cash 
> > http://www.techsay.com/default.php?page=join.php&p=sourceforge
> &CID=DEVDEV
> 

Re: [Ganglia-general] [Ganglia-developers] Windows port issues

2007-01-08 Thread Richard.Grevis
As per a mail I just shot off, I did more recent versions,
but I did not package them.

If anyone can let me know the software used to create the installer,
or some method of exlpoding the 3.0.0 to change the binaries,
then no worries, I will do it on an ongoing basis.

Richard Grevis
Production Architecture
Barclays Capital, Canary Wharf, London, E14 4BB
*DDI : +44 (0) 20 7773 4915
 * richard.grevis


> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On 
> Behalf Of Martin Knoblauch
> Sent: 04 January 2007 13:43
> To: Vladimir
> Cc: [EMAIL PROTECTED]; 
> ganglia-general@lists.sourceforge.net
> Subject: Re: [Ganglia-developers] [Ganglia-general] Windows 
> port issues
> 
> 
> 
> --- Vladimir <[EMAIL PROTECTED]> wrote:
> 
> > Martin Knoblauch wrote:
> > >  could you be more specific on the error message? Is it compile
> > time,
> > > or link time? There is no such thing as "xdr_create". Maybe 
> > > "xdrmem_create".
> > Sorry I should have been more precise. It is a linking 
> error. Here is
> > 
> > the log
> > 
> > gmond.o: In function `Ganglia_collection_group_send':
> > /ganglia-3.0.4/gmond/gmond.c:1633: undefined reference to
> > `_xdrmem_create'
> > gmond.o: In function `main':
> > /ganglia-3.0.4/gmond/gmond.c:897: undefined reference to
> > `_xdrmem_create'
> > /ganglia-3.0.4/gmond/gmond.c:828: undefined reference to 
> > `_xdr_free'
> > /ganglia-3.0.4/gmond/gmond.c:912: undefined reference to 
> > `_xdr_free'
> > ../lib/.libs/libganglia.a(libgmond.o): In function
> > `Ganglia_gmetric_send':
> > /ganglia-3.0.4/lib/libgmond.c:695: undefined reference to
> > `_xdrmem_create'
> > ../lib/.libs/libganglia.a(libgmond.o): In function
> > `Ganglia_gmetric_send_spoof':
> > /ganglia-3.0.4/lib/libgmond.c:748: undefined reference to
> > `_xdrmem_create'
> > ../lib/.libs/libganglia.a(protocol_xdr.o): In function
> > `xdr_Ganglia_value_types':
> > /ganglia-3.0.4/lib/protocol_xdr.c:13: undefined reference to 
> > `_xdr_enum'
> > ../lib/.libs/libganglia.a(protocol_xdr.o): In function
> > `xdr_Ganglia_gmetric_message':
> > /ganglia-3.0.4/lib/protocol_xdr.c:23: undefined reference to
> > `_xdr_string'
> > /ganglia-3.0.4/lib/protocol_xdr.c:25: undefined reference to
> > `_xdr_string'
> > /ganglia-3.0.4/lib/protocol_xdr.c:27: undefined reference to
> > `_xdr_string'
> > /ganglia-3.0.4/lib/protocol_xdr.c:29: undefined reference to
> > `_xdr_string'
> > /ganglia-3.0.4/lib/protocol_xdr.c:31: undefined reference to
> > `_xdr_u_int'
> > /ganglia-3.0.4/lib/protocol_xdr.c:33: undefined reference to
> > `_xdr_u_int'
> > /ganglia-3.0.4/lib/protocol_xdr.c:35: undefined reference to
> > `_xdr_u_int'
> > ../lib/.libs/libganglia.a(protocol_xdr.o): In function
> > `xdr_Ganglia_spoof_header':
> > /ganglia-3.0.4/lib/protocol_xdr.c:45: undefined reference to
> > `_xdr_string'
> > /ganglia-3.0.4/lib/protocol_xdr.c:47: undefined reference to
> > `_xdr_string'
> > ../lib/.libs/libganglia.a(protocol_xdr.o): In function
> > `xdr_Ganglia_message_formats':
> > /ganglia-3.0.4/lib/protocol_xdr.c:69: undefined reference to 
> > `_xdr_enum'
> > ../lib/.libs/libganglia.a(protocol_xdr.o): In function
> > `xdr_Ganglia_message':
> > /ganglia-3.0.4/lib/protocol_xdr.c:116: undefined reference to
> > `_xdr_u_int'
> > /ganglia-3.0.4/lib/protocol_xdr.c:124: undefined reference to
> > `_xdr_string'
> > /ganglia-3.0.4/lib/protocol_xdr.c:151: undefined reference to
> > `_xdr_float'
> > /ganglia-3.0.4/lib/protocol_xdr.c:156: undefined reference to
> > `_xdr_double'
> > /ganglia-3.0.4/lib/protocol_xdr.c:95: undefined reference to
> > `_xdr_u_short'
> > ../lib/.libs/libganglia.a(protocol_xdr.o): In function
> > `xdr_Ganglia_25metric':
> > /ganglia-3.0.4/lib/protocol_xdr.c:170: undefined reference to 
> > `_xdr_int'
> > /ganglia-3.0.4/lib/protocol_xdr.c:172: undefined reference to
> > `_xdr_string'
> > /ganglia-3.0.4/lib/protocol_xdr.c:174: undefined reference to 
> > `_xdr_int'
> > /ganglia-3.0.4/lib/protocol_xdr.c:178: undefined reference to
> > `_xdr_s

Re: [Ganglia-general] Windows port issues

2007-01-08 Thread Richard.Grevis
Hmm,

this may be my fault. I have done cygwin compiles for 3.0.2 3.0.3, and
now 3.0.4. What I *should* have done is wrap the binaries in an
installer,
but I didn't know how, so I did not bother. I sent out binaries when
requested, with instructions like "do the 3.0.0 install, then clobber
the binaries and conf files with the newer ones".

In fact you do need the more recent version because it fixes an XML
truncation problem.

If matt or whoever packaged the 3.0.0 windows agent can send me the
framework
or instructions, I am more than happy to do the builds/packaging and
sent the results back.

Richard Grevis
Production Architecture
Barclays Capital, Canary Wharf, London, E14 4BB
*DDI : +44 (0) 20 7773 4915
 * richard.grevis


> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On 
> Behalf Of Martin Knoblauch
> Sent: 04 January 2007 10:20
> To: Vladimir Vuksan; ganglia-general@lists.sourceforge.net
> Subject: Re: [Ganglia-general] Windows port issues
> 
> 
> 
> --- Vladimir Vuksan <[EMAIL PROTECTED]> wrote:
> 
> > matt massie wrote:
> > > you need to install the cygwin sunrpc package which is not
> > installed by
> > > default during the cygwin install...
> > >   
> > That was it.
> > 
> > I still wasn't able to compile 3.0.4 (xdr_create? can't be find)
> > however 3.0.3 compiles with no problem.
> >
> 
>  could you be more specific on the error message? Is it 
> compile time, or link time? There is no such thing as 
> "xdr_create". Maybe "xdrmem_create".
>  
> > Who is the person that packaged it initially since 3.0.3 
> corrects the
> > 
> > Wait CPU issue ie. instead of showing 100% idle shows 100% Wait CPU.
> > 
> > Also it may be nice to include gmetric.
> > 
> 
>  Hmm. What package are you refering to? There is no "official" windows
> (cygwin) binary distribution.
> 
> Cheers
> Martin
> 
> --
> Martin Knoblauch
> email: k n o b i AT knobisoft DOT de
> www:   http://www.knobisoft.de
> 
> --
> ---
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the 
> chance to share your opinions on IT & business topics through 
> brief surveys - and earn cash 
> http://www.techsay.com/default.php?page=join.php&p=sourceforge
&CID=DEVDEV
___
Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.




Re: [Ganglia-general] RRD update errors and timestamps

2007-01-08 Thread Richard.Grevis
Hi Jason,
 
Note that the timestamps are the same.
 
when this error occurs on the summary graphs, but not host graphs, check
whether 2 clusters have the
same cluster name. Note that the cluster names are not defined by that
string on the data_source, it comes
from the gmond.conf of the host mentioned for the data_source ni
gmetad.conf.
 
If you also get this error on host rrd files too, then you may be
including the same cluster twice in gmetad.conf.
 
If you get errors just on a single host, there may be an error in
reverse dns entries (gmond does a reverse lookup
to see where a ganglia packet comes from).
 
In all cases, do a "nc ganglia-server-host 8651". Check whether whether
cluster names or host names occur
more than once. If they do, you have something to fi.
 

kind regards,

Richard Grevis 
Production Architecture 
Barclays Capital, Canary Wharf, London, E14 4BB 
*DDI : +44 (0) 20 7773 4915 
 * richard.grevis 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Jason Faulkner
Sent: 04 January 2007 04:16
To: [EMAIL PROTECTED]; ganglia-general@lists.sourceforge.net
Subject: Re: [Ganglia-general] "Compatibility mode" for gmetad?


Martin Knoblauch wrote: 

--- Jason Faulkner <[EMAIL PROTECTED]>
<mailto:[EMAIL PROTECTED]>  wrote:

  

I'm curious about how "possible" or difficult it
would be to make
gmetad 
backwards compatible -- i.e. where I could leave
my 2.5.x gmond 
installations alone, and install 3.x gmetad on
my main server (and be

able to collect stats despite having a
heterogeneous 2.5.x and 3.x 
environment). This would allow me to (hopefully)
live-migrate my
ganglia 
install up to the new version.

-- 
Jason Faulkner
Systems Manager
Broadwick Corporation
(919) 459-2509



Hi Jason,

 although we bumped the major number in the 2.5.x -> 3.0
transition, we
took care to not introduce incompatible changes to the
core metrics
framework. In short, I see no reason why a 3.0.4 gmetad
should not be
able to query 2.5.x gmond data.

 It should even be possible to have a 3.0.4 gmond listen
to older
gmonds. Of course, you are limited to multicast until
you have replaced
all gmonds.
  

Jan  3 23:12:07 intranet1 ./gmetad[25006]: RRD_update
(/var/lib/ganglia/rrds/Dev Login
Servers/__SummaryInfo__/part_max_used.rrd): illegal attempt to update
using time 1167883927 when last update time is 1167883927 (minimum one
second step)

I've been receiving repeated errors like this attempting to use
a 3.0.x gmetad with a 2.5.7 gmond. The times are synced perfectly to a
local NTP server, so I'm sure that's not the issue.


-- 
Jason Faulkner
Systems Manager
Broadwick Corporation
(919) 459-2509
[EMAIL PROTECTED]



For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.



Re: [Ganglia-general] Windows port issues

2007-01-08 Thread Richard.Grevis
Hi,

I never understood the what waitIO under cygwin was telling me.
So I simply commented the metric out of the gmond.conf file.
That should be sufficient, but it is important that the graph.php
code in the html tree (cpu_report) has something like this:
  if (file_exists("$rrd_dir/cpu_wio.rrd")) {
 $series .=
"DEF:'cpu_wio'='${rrd_dir}/cpu_wio.rrd':'sum':AVERAGE "
 ."CDEF:'ccpu_wio'=cpu_wio,num_nodes,/ "
 ."STACK:'ccpu_wio'#$cpu_wio_color:'WAIT CPU' ";
  }

So waitIO is only graphed if the metric is collected (file exists
really).

I have compiled 3.0.4 gmond and gmetric, and it compiled the first time,
which was much easier that getting 3.0.3 going. I followed my own rules
(below):

So:

1) Install a full cygwin system (specified on their downloader app),
   don't use the default install or any other subset. Get the lot.
2) The ganglia sourse must come straight from the downloaded tar file,
   and especially don't use a source tree where a ./configure has
already
   been done on some other system.
3) find . '(' -name libtool -o -name config.cache -o -name config.status
')' -a -print -a -exec rm '{}' ';'
   3.0.4 does not seem to contain spurious libtools, so this may no
longer be needed.
   can't hurt though.
3a) ./configure make sure you do this within a cygwin cmd window, not a
DOS window.
   be prepared to run ./configure more than once (if the make does not
generate .exe's.
4) make -i, and be prepared to do this more than once.
5) Only stop when you see gmond.exe exists. The make errors you see
   are sometmies for things that don't matter (hence the -i)
6) when you have the .exe files, remember that you need the cygwin1.dll
   from the version of cygwin1.dll for the cygwin you used to compile
   the agents (/bin/cygwin1.dll)

note - it is only gmond and gmetad that will compile - gmetad will not
compile under cygwin.

another note - do your hosts contain other running cygwin processes?
If so, you will need to build ganglia against the version of cygwin
already used. The issue here is that only 1 cygwin dll version can run,
so all your cygwin based processes must use the same dll.


Richard Grevis
Production Architecture, at least this week.
Barclays Capital, Canary Wharf, London, E14 4BB
*DDI : +44 (0) 20 7773 4915
 * richard.grevis


> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On 
> Behalf Of Vladimir Vuksan
> Sent: 03 January 2007 20:38
> To: ganglia-general@lists.sourceforge.net
> Subject: [Ganglia-general] Windows port issues
> 
> 
> Just curious about the state of Windows port 3.0.0. Apparently all 
> machines we installed the 3.0.0 version on show constant 100% 
> WAIT CPU 
> under CPU report.
> 
> On a different note for kicks I tried compiling 3.0.4 under 
> cygwin and  
> I run into
> 
> protocol.h:9:21: rpc/rpc.h: No such file or directory
> 
> On my SuSe box /usr/include/rpc/rpc.h is part of glibc-devel 
> but  can't 
> find on Cygwin which package contains those files. I tried 
> the include 
> files from SuSe but that didn't work either. Any clues ?
> 
> Vladimir
> 
> --
> ---
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the 
> chance to share your opinions on IT & business topics through 
> brief surveys - and earn cash 
> http://www.techsay.com/default.php?page=join.php&p=sourceforge
&CID=DEVDEV
___
Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.




Re: [Ganglia-general] [Ganglia-developers] "Correct" counting of CPUs, Cores, Siblings (bz #84)

2007-01-05 Thread Richard.Grevis


Can I ask whether you will keep the existing semantics of the
existing metrics unchanged? I would not be comfortable with
my cpu loads (and cpu count) suddenly doubling or halved.

Also remember about the cygwin agent build, which also processes
from cygwin's /proc.

kind regards,
Richard










For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.




Re: [Ganglia-general] Server showing up as IP instead of DNS name

2006-11-20 Thread Richard.Grevis
Sam,
I imagine this has already been well answered for you, but the host
names you
see are the result of a reverse DNS lookup on the headnode, or whatever
node you get the XML from. You will get IP addresses if the reverse
lookup failed,
although the failure is at the headnode level - not from the server
(gmetad),
and not from the monitored host itself.

Try nslookup on the various servers in question. Also note that if any
or your
nodes is in a cluster (UNIX/windows type, not an HPC cluster), then care
need to
be taken

regards,
Richard Grevis
Infrastructure Architecture
Barclays Capital, Canary Wharf, London, E14 4BB



> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On 
> Behalf Of Sam Guido
> Sent: 13 November 2006 18:19
> To: ganglia-general@lists.sourceforge.net
> Subject: [Ganglia-general] Server showing up as IP instead of DNS name
> 
> 
> One of our servers is showing up on the Ganglia web page 
> under it's IP 
> address instead of it's DNS name.  It showed up under it's DNS name 
> until just recently when the system was restarted.  We've 
> seen this on 
> some of our other systems after restarts.  These are RedHat 4 systems.
> 
> - Sam Guido
> 
> 
> Sam Guido
> Clemson University, DCIT
> [EMAIL PROTECTED]
> 
> --
> ---
> Using Tomcat but need to do more? Need to support web 
> services, security? Get stuff done quickly with 
> pre-integrated technology to make your job easier Download 
> IBM WebSphere Application Server v.1.0.1 based on Apache 
> Geronimo 
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&;
dat=121642
___
Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.




Re: [Ganglia-general] Q: is it possible to see a specific day forexample, from last week

2006-11-13 Thread Richard.Grevis
Vitaly,

my version does. The only problem was that I hacked the PHP left
right and centre before I understood everything.

So it will take some work to create a patch. Still, someone else
expressed a desire for that functionality, so I will work on it
this week.

Here are some screen shot samples from our Ganglia servers:
http://www.aouk83.dsl.pipex.com/

kind regards,
Richard Grevis

Infrastructure Architecture
Barclays Capital, Canary Wharf, London, E14 4BB



> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On 
> Behalf Of Vitaly Karasik
> Sent: 12 November 2006 13:24
> To: ganglia-general@lists.sourceforge.net
> Subject: [Ganglia-general] Q: is it possible to see a 
> specific day forexample, from last week
> 
> 
> Is there some Ganglia version (beta/patched) which allow me 
> to see a graphs for specific day from a last week, for example?
> 
> Thanks,
> Vitaly
> 
> --
> ---
> Using Tomcat but need to do more? Need to support web 
> services, security? Get stuff done quickly with 
> pre-integrated technology to make your job easier Download 
> IBM WebSphere Application Server v.1.0.1 based on Apache 
> Geronimo 
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&;
dat=121642
___
Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.




Re: [Ganglia-general] Any plans on zooming or graphing granularitychanges to ganglia ?

2006-11-09 Thread Richard.Grevis
John,

some of the changes I made to the php were a bit ugly because I did them
a while ago, and I didn't know Ganglia like I do now.

If you give me a while I could extract that change (from the others
I have made) and apply it to the standard ganglia PHP tree and
make a patch out of it.

kind Richard Grevis
Infrastructure Architecture
Barclays Capital, Canary Wharf, London, E14 4BB
*DDI : +44 (0) 20 7773 4915
 * richard.grevis


> -Original Message-
> From: john allspaw [mailto:[EMAIL PROTECTED] 
> Sent: 09 November 2006 03:03
> To: Grevis, Richard: IT (LDN); ganglia-general@lists.sourceforge.net
> Subject: Re: [Ganglia-general] Any plans on zooming or 
> graphing granularitychanges to ganglia ?
> 
> 
> Excellent!  So Mr. Massie and/or the ganglia dev team...any 
> thoughts about getting something like this (from/to) into the 
> main build of ganglia ?
> 
> thanks!
> john
> 
> - Original Message 
> From: "[EMAIL PROTECTED]" 
> <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
> ganglia-general@lists.sourceforge.net
> Sent: Tuesday, November 7, 2006 8:00:38 AM
> Subject: RE: [Ganglia-general] Any plans on zooming or 
> graphing granularitychanges to ganglia ?
> 
> John,
> 
> Yes, I hacked the php for "from/to" too. See picture, if it 
> gets through that is. We find it pretty useful - sometimes 
> for some after-the-fact analysis, but more usually for the 
> simplest of reasons - we want to be able to email a ganglia 
> URL that refers to a fixed point in time and is of a fixed 
> duration. Also some of our users have crafted up nightly 
> reports of their clusters by simply creating the right HTML, 
> which can be located anywhere. Like this:
> 
> LDN FIP Bermudan
> http://ganglia/graph.php?g=cpu_report&z=large&c=LDN 
> FIP CRE Bermudan PDN&m=&r=6pm%20yesterday&ends=6am%20Today">
>  SRC="http://ganglia/graph.php?g=network_report&z=large&c=LDN 
> FIP CRE Bermudan PDN&m=&r=6pm%20yesterday&ends=6am%20Today">
>   
> QA Level 
> http://ganglia/graph.php?g=cpu_report&z=large&c=LDN 
> FIP QA Bermudan PDN&m=&r=6pm%20yesterday&ends=6am%20Today">
>  SRC="http://ganglia/graph.php?g=network_report&z=large&c=LDN 
> FIP QA Bermudan PDN&m=&r=6pm%20yesterday&ends=6am%20Today">
> 
> The advantages of GET over PUT eh...
> 
> Adding more controlling parameters to graph.php was a bit 
> dicky because the the PHP passes around context/state 
> explicitly from URL to URL. Anyway I did it. I also 
> eventually realised that to enable more flexible date parsing 
> all I needed to do was pass "from" and "to" fields directly 
> to rrdtool. Rrdtool can parse many date formats as documented 
> here: http://oss.oetiker.ch/rrdtool/doc/rrdfetch.en.html 
> (scroll down to the date stuff).
> 
> So I would prefer having "from"/"to" in the standard build. 
> In part this is simply what I use my ganglia for. If you 
> purely use ganglia for looking at the here and now, then what 
> it now does is fine. If you need overnight reports, or you do 
> after the fact analysis, then from/to is useful. If you are 
> thinking about capacity planning, then (after I stop 
> laughing), you may want another modification of mine, which 
> is to have a MAX consolidation function as well as AVERAGE, 
> and graph them both. It means you never lose sight of your spikes.
> 
> Richard Grevis
> Infrastructure Architecture
> Barclays Capital, Canary Wharf, London, E14 4BB
> *DDI : +44 (0) 20 7773 4915
>  * richard.grevis
> 
> --
> --
> For more information about Barclays Capital, please visit our 
> web site at http://www.barcap.com.
> 
> Internet communications are not secure and therefore the 
> Barclays Group does not accept legal responsibility for the 
> contents of this message.  Although the Barclays Group 
> operates anti-virus programmes, it does not accept 
> responsibility for any damage whatsoever that is caused by 
> viruses being passed.  Any views or opinions presented are 
> solely those of the author and do not necessarily represent 
> those of the Barclays Group.  Replies to this email may be 
> monitored by the Barclays Group for operational or business reasons.
> --
> --
> 
> 
> 
> 
> 
> 
>  
> __
> __
> Sponsored Link
> 
> Free Uniden 5.8GHz Phone System with Packet8 Internet Phone 
> Service http://www.getpacket8.net/yahoo2
> 



Re: [Ganglia-general] Display the same host in in two differentclusters

2006-10-27 Thread Richard.Grevis

Yes,

Bernard is right. If you have configuration problems
I usually recommend first trying a unicast configuration.
And the only way you get a node to appear in 2 clusters
is to configure the node agent itself to send data to
two different headnodes.

The above configuration is "clunky" to say the least.
We have had a need for "multiple views" of our estate
for a long time. (e.g one view for application users,
another for their managers).

I would be interested if other ganglia users have this
need for multiple views.

regards
Richard


> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On 
> Behalf Of Bernard Li
> Sent: 26 October 2006 18:02
> To: [EMAIL PROTECTED]
> Cc: ganglia-general@lists.sourceforge.net
> Subject: Re: [Ganglia-general] Display the same host in in 
> two differentclusters
> 
> 
> How are the different clusters separated?  By using different 
> multicast/unicast ports?  If so, I suppose you can just have 
> multiple udp_send, etc. entries in the host's gmond.conf and 
> it should show up in different "clusters".
> 
> Cheers,
> 
> Bernard
> 
> On 10/26/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> > Hello,
> >
> > I have a question about ganglia configuration.
> > I'd like to have the same host displayed in two different 
> clusters. Is 
> > there any way to obtain this?
> >
> > Thanks in advance,
> > d
> >
> > 
> --
> > ---
> > Using Tomcat but need to do more? Need to support web 
> services, security?
> > Get stuff done quickly with pre-integrated technology to 
> make your job easier
> > Download IBM WebSphere Application Server v.1.0.1 based on 
> Apache Geronimo
> > 
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&;
dat=121642
> ___
> Ganglia-general mailing list
> Ganglia-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
>


-
Using Tomcat but need to do more? Need to support web services,
security? Get stuff done quickly with pre-integrated technology to make
your job easier Download IBM WebSphere Application Server v.1.0.1 based
on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.




Re: [Ganglia-general] Cluster under Grid issues

2006-10-24 Thread Richard.Grevis
Dave,

you may need to be more precise about what you want to happen.

If you are adding hosts to an existing cluster, simply
give them the same configuration as the others and all will be fine.
By fine I mean that the new hosts will appear in the cluster view, even
if their data history is not as long as the others.

But suppose you drop a host out of a cluster. The cluster view
is created ONLY from the output of gmetad, which contains a snapshot
at the current time only. The dropped host will disappear, and even
if you ask for a year's worth of data to be shown (times when the
host used to appear), it will not suddenly appear again.

Also the host's rrd files will remain under the clustername
in your rrd tree, and are never removed.

And moving a host from one cluster to another.
Do you move the host rrd's? Do you not? Ha ha.

And the effect on the rollup cluster views when the above
is done? Extra ha ha.

kind regards,
richard

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On 
> Behalf Of David Peterson
> Sent: 24 October 2006 08:41
> To: ganglia-general@lists.sourceforge.net
> Subject: Re: [Ganglia-general] Cluster under Grid issues
> 
> 
> 
> That's a fine document.  However, it only covers the very basics.
> 
> What if we have ganglia running for some time.  The cluster grows.   
> We want to reconfigure ganglia.  Do we have to dump all of our stats  
> and start again?
> 
> Is this even valid in gmetad.conf:
>   data_source "web hosts" localhost
>   data_source "db hosts" localhost
> 
> -Dave
> 
> 
> On Oct 24, 2006, at 12:01 AM, Vitaly Karasik wrote:
> 
> > I'll suggest you IBM's "Ganglia Howto" (http://www-941.ibm.com/
> > collaboration/wiki/display/WiKiPtype/ganglia)
> >
> > Rgds,
> > Vitaly
> > 
> > From: [EMAIL PROTECTED] [mailto:ganglia-
> > [EMAIL PROTECTED] On Behalf Of  
> > [EMAIL PROTECTED]
> > Sent: Tuesday, October 24, 2006 5:01 AM
> > To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
> > Cc: ganglia-general@lists.sourceforge.net
> > Subject: Re: [Ganglia-general] Cluster under Grid issues
> >
> > Hi
> >
> > I am new to Ganglia.I would like to know the steps that you
> > followed for breaking up the hosts into different clusters.  It  
> > would be helpful if it is explained with your configuration.
> >
> > Thanks & Regards,
> >
> > Aravindh
> > Phone: 28520408-1053 | Mobile: 9986017606
> 
> 
> --
> ---
> Using Tomcat but need to do more? Need to support web 
> services, security? Get stuff done quickly with 
> pre-integrated technology to make your job easier Download 
> IBM WebSphere Application Server v.1.0.1 based on Apache 
> Geronimo 
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&;
dat=121642
___
Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.




Re: [Ganglia-general] how to get machines from different subnetsinto same cluster

2006-09-25 Thread Richard.Grevis
John,

I assume you have configured for multicast and the multicast address
you use does not travel outside the local subnets? That is your current
situation?

option 1 is to make a multicast address on the routers that scopes to
all
your subnets.

option 2 is to unicast to 1 or 2 nominated headnodes. It turns out
that the UDP traffic is pretty network efficient - much more so than
the XML streams. So there is little to fear wrt network loads.

kind regards,
richard



> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On 
> Behalf Of john allspaw
> Sent: 23 September 2006 01:06
> To: ganglia-general@lists.sourceforge.net
> Subject: Re: [Ganglia-general] how to get machines from 
> different subnetsinto same cluster
> 
> 
> Sorry, more info on what I *thought* was the solution:
> 
> I try running gmetad on www11 (on subnet2) , so that it can 
> be polled by the frontend box (on subnet1) via TCP on port 
> 8651. Lo and behold, telnetting to 8651 on www11 dumps the 
> xml stream (I have trusted_hosts set on www11 to trust the 
> frontend machine) for all of www11-15.  So that is good.
> 
> question is...how can I get www1-10 (subnet1) and www11-15 
> (subnet2) to be in the same cluster in the web frontend ?
> 
> I have in the main frontend gmetad.conf:
> 
> data_source "WWW"  www1 www3 www5
> data_source "WWW" www11:8651
> 
> but that splits the two groups into two grids.  If I turn 
> scalable off, then all I get is www11-15 in the WWW cluster, 
> I assume because it's the last directive ?
> 
> Not sure. Thoughts ?
> 
> -john
> 
> - Original Message 
> From: john allspaw <[EMAIL PROTECTED]>
> To: ganglia-general@lists.sourceforge.net
> Sent: Friday, September 22, 2006 4:16:36 PM
> Subject: [Ganglia-general] how to get machines from different 
> subnets into same cluster
> 
> Hi all - 
> 
> I apologize for what seems to be a commonly-asked question, 
> but to be honest, searching through the mail archives on 
> sourceforge is like getting my molars pulled. :)
> 
> I have one grid.  I have www1-10 servers on one subnet, and 
> they're graphing fine on my gmetad host.   I have some new 
> machines, www11-15, but on another subnet, unfortunately.  Is 
> there any way to get ALL of www1-15 to show up on the same 
> "www" cluster, all together, without having to have www1-10 
> on one grid, and www11-15 on another ?
> 
> I've tried putting gmetad on www11, and having:
> 
> data_source WWW www1 www3 www5
> data_source WWW www11:8651
> 
> and that just splits them up into grids, which isn't what I'd like.  
> Is what I'm doing possible with ganglia ?
> 
> thanks a lot, in advance,
> john
> 
> 
> 
> 
> 
> 
> --
> ---
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the 
> chance to share your opinions on IT & business topics through 
> brief surveys -- and earn cash 
> http://www.techsay.com/default.php?page=join.php&p=sourceforge
&CID=DEVDEV
___
Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general







-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share
your opinions on IT & business topics through brief surveys -- and earn
cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDE
V
___
Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.




Re: [Ganglia-general] Gmetad RRD update question

2006-08-30 Thread Richard.Grevis
Dave,
 
I tried this, and for me last_update gets changed at the same rate as my
poll rate, so I don't see what you see.
 
What is your poll rate of the cluster in gmetad.conf? Perhaps you could
mail me the RRA lines in gmetad.conf and
the full output of rrdtool info.
 
I'm sure you aleady know this, but when the rrd file is created, it
takes the step size from the poll rate, but if
you subsequently change your poll rate in gmetad.conf, ganglia does not
try to change the step size in the rrd file.
 
kind regards,
Richard

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Dr.
Dave Blunt
Sent: 29 August 2006 19:15
To: ganglia-general
Subject: [Ganglia-general] Gmetad RRD update question


 
I'm trying to stress test gmetad by submitting large numbers of
metric updates to a single gmond that gmetad is using as a data_source.
 
When I submit an update to a single metric on a single host and
then look at 'rrdtool info' output for the
/var/lib/ganglia/rrds/unspecified/hostX/metricY.rrd file, I see a total
of 5 changes to that last_update timestamp over a space of 60 seconds or
so.  When I apply another update to the metric (via gmetric) to gmond
the same thing happens.
 
Any ideas as to why the RRD has to be touched five times for
every update?  Again, this isn't a summary RRD, it's just the RRD
recording a single metric on a single host within a cluster.
 
 
Dave.
 

Dr. Dave Blunt
Regional Operations Manager


GROUNDWORK Open Source, Inc.
139 Townsend Street, Suite 100
San Francisco, CA 94107-1946
415.992.4500 (office)
206.282.2867 (direct)
650.619.5256 (mobile)
206.374.2892 (fax)

[EMAIL PROTECTED]

www.groundworkopensource.com
  
 



For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.



Re: [Ganglia-general] Ganglia scaling testing?

2006-08-24 Thread Richard.Grevis
Ahh yes,

I forgot about Yemi's spoofing code. Hacking that sounds
the easiest way.

regards,
Richard

-Original Message-
From: Dr. Dave Blunt [mailto:[EMAIL PROTECTED] 
Sent: 24 August 2006 16:58
To: ganglia-general; harper.mann; Grevis, Richard: IT (LDN)
Subject: RE: [Ganglia-general] Ganglia scaling testing?


Hey Richard,

We were wanting to populate a gmond on a separate box that we would run
some alarm code against - no gmetad on the box.  I'm trying Yemi's spoof
code right now to get a large number of 'hosts' set up with lots of
metrics.  I did look at the packet format in the source but my C is a
bit rusty.  I got bogged down in tcpdump/tcpreplay without much
progress.

Thanks,


Dave. 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 24, 2006 4:09 AM
To: [EMAIL PROTECTED]; ganglia-general@lists.sourceforge.net; Dr.
Dave Blunt
Subject: RE: [Ganglia-general] Ganglia scaling testing?

Harper,

I think that the RRD disk I/O from gmetad will be the first limit you
reach.

If you want to load up the gmond process, you could write a program to
send properly formatted gmond packets but with a spoofed and always
changing source address. the headnode gmond only determines the host
from the source address of the packet. I am not sure the of the best way
to do this - either a standalone program that sends packets flat chat
where the contents were snarfed from real gmond packets, or maybe a hack
to gmond itself.

Simulating on the server/gmetric side is easier - just write a script
that contructs the right XML with thousands of hosts and presents it on
a port, and point gmetad to that port.

regards,
richard grevis

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Harper Mann
Sent: 23 August 2006 18:08
To: ganglia-general@lists.sourceforge.net; Dr. Dave Blunt
Subject: [Ganglia-general] Ganglia scaling testing?


Hi,
We're supporting a couple of sites with several hundred servers
monitored with Ganglia.  So far, ganglia is working well and easily
keeping up with the load. We want to run some scaling tests and thought
it would be good to simulate a large gmond with a couple thousand
servers reporting before we have to support that many. Is there a way to
simulate multiple servers reporting to a gmond? If not, what do you
think might be the best approach for this?  We could take a crack at
creating it. Thanks for any help. Regards,
- Harper
Harper Mann
Groundwork Open Source
510-599-2075 (cell)



-
Using Tomcat but need to do more? Need to support web services,
security? Get stuff done quickly with pre-integrated technology to make
your job easier Download IBM WebSphere Application Server v.1.0.1 based
on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

For more information about Barclays Capital, please visit our web site
at http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group
does not accept legal responsibility for the contents of this message.
Although the Barclays Group operates anti-virus programmes, it does not
accept responsibility for any damage whatsoever that is caused by
viruses being passed.  Any views or opinions presented are solely those
of the author and do not necessarily represent those of the Barclays
Group.  Replies to this email may be monitored by the Barclays Group for
operational or business reasons.





Re: [Ganglia-general] Ganglia scaling testing?

2006-08-24 Thread Richard.Grevis
Harper,

I think that the RRD disk I/O from gmetad will be the first limit you
reach.

If you want to load up the gmond process, you could write a program to
send
properly formatted gmond packets but with a spoofed and always changing
source address.
the headnode gmond only determines the host from the source address of
the packet.
I am not sure the of the best way to do this - either a standalone
program that
sends packets flat chat where the contents were snarfed from real gmond
packets,
or maybe a hack to gmond itself.

Simulating on the server/gmetric side is easier - just write a script
that
contructs the right XML with thousands of hosts and presents it on a
port,
and point gmetad to that port.

regards,
richard grevis

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Harper Mann
Sent: 23 August 2006 18:08
To: ganglia-general@lists.sourceforge.net; Dr. Dave Blunt
Subject: [Ganglia-general] Ganglia scaling testing?


Hi,
We're supporting a couple of sites with several hundred servers
monitored with Ganglia.  So far, ganglia is working well and easily
keeping up with the load. We want to run some scaling tests and thought
it would be good to simulate a large gmond with a couple thousand
servers reporting before we have to support that many. Is there a way to
simulate multiple servers reporting to a gmond? If not, what do you
think might be the best approach for this?  We could take a crack at
creating it. Thanks for any help. Regards,
- Harper
Harper Mann
Groundwork Open Source
510-599-2075 (cell)



-
Using Tomcat but need to do more? Need to support web services,
security? Get stuff done quickly with pre-integrated technology to make
your job easier Download IBM WebSphere Application Server v.1.0.1 based
on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.




Re: [Ganglia-general] Obtaining Immediate Interval Data From Ganglia

2006-08-14 Thread Richard.Grevis
Absolutely agreed,

subsecond makes no sense and the ganglia design is not appropriate for
that anyway.
I was originally asked to do 5 seconds, but I have increased that to 10
seconds
as there was no meaningful change in the shape of the graphs anyway.

But 10 second polling is useful to me for a subset of our estate that
does monti-carlo pricing
of instruments on demand for the traders. These calculations take 5-30
seconds and preserving
the peak of this load spike helps us with cluster sizing. Most other HPC
environments out there
do job runs that are much longer - and for them a 5 second poll is
silly.

Martin, bug entered #110. Maybe a limit of a few seconds is appropriate,
just to 
limit silly gmond.conf poll rates and ensure /prioc/blah is read once
per entire
group of metrics.

- kind regards,
richard

-Original Message-
From: Martin Knoblauch [mailto:[EMAIL PROTECTED] 
Sent: 11 August 2006 12:12
To: Grevis, Richard: IT (LDN); [EMAIL PROTECTED];
[EMAIL PROTECTED]
Cc: ganglia-general@lists.sourceforge.net
Subject: Re: [Ganglia-general] Obtaining Immediate Interval Data From
Ganglia


Correct. Below code limits the sampling rate for the cpu*, load*, mem*
and net* graphs. Setting them to 0 will give you 1 second "accuracy". Or
"nice furry graphs" as Richard said (actually the "furriness" is what
the original authors wanted to prevent :-). Personally I doubt that
sampling load* and mem* at that rate. cpu* and net* may make sense.

 Richard, yes please file a report. Unfortunatelly I spoke to soon when
I mentioned that we should get rid of the intervalls at all. Reason is
that we need to compute differences for the cpu* and net* metrics (they
are rates after all). If we want to have sub-second sampling rates, we
need to use "getimeofday" instead of "time".

--- [EMAIL PROTECTED] wrote:

> If you do want to do fast polling on the Linux or cygwin gmond, I 
> found some hardwired code in there which effectively limits the 
> polling rate
> for
> some metrics no matter what you put in the config files. (Sorry
> martin,
> have not raised a bug report yet). Anyway:
> > the code below is in the cygwin and linux metric.c files.
> > 
> > 
> > typedef struct {
> >   uint32_t last_read;
> >   uint32_t thresh;
> >   char *name;
> >   char buffer[BUFFSIZE];
> > } timely_file;
> > 
> > timely_file proc_stat= { 0, 15, "/proc/stat" };
> > timely_file proc_loadavg = { 0, 15, "/proc/loadavg" }; timely_file 
> > proc_meminfo = { 0, 30, "/proc/meminfo" }; timely_file proc_net_dev 
> > = { 0, 30, "/proc/net/dev" };
> > 
> > char *update_file(timely_file *tf)
> > {
> >   int now,rval;
> >   now = time(0);
> >   if(now - tf->last_read > tf->thresh) {
> > rval = slurpfile(tf->name, tf->buffer, BUFFSIZE);
> > if(rval == SYNAPSE_FAILURE) {
> >   err_msg("update_file() got an error from slurpfile() reading
> > %s",
> >   tf->name);
> >   return (char *)SYNAPSE_FAILURE;
> > }
> > else tf->last_read = now;
> >   }
> >   return tf->buffer;
> > }
> > 
> 
> I have set those timeout values zero, which works well and gives me 
> nice spiky furry graphs.
> 
> - richard


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.




Re: [Ganglia-general] Obtaining Immediate Interval Data From Ganglia

2006-08-10 Thread Richard.Grevis
If you do want to do fast polling on the Linux or cygwin gmond, I found
some hardwired code in there which effectively limits the polling rate
for
some metrics no matter what you put in the config files. (Sorry martin,
have not raised a bug report yet). Anyway:
> the code below is in the cygwin and linux metric.c files.
> 
> 
> typedef struct {
>   uint32_t last_read;
>   uint32_t thresh;
>   char *name;
>   char buffer[BUFFSIZE];
> } timely_file;
> 
> timely_file proc_stat= { 0, 15, "/proc/stat" };
> timely_file proc_loadavg = { 0, 15, "/proc/loadavg" }; timely_file 
> proc_meminfo = { 0, 30, "/proc/meminfo" }; timely_file proc_net_dev = 
> { 0, 30, "/proc/net/dev" };
> 
> char *update_file(timely_file *tf)
> {
>   int now,rval;
>   now = time(0);
>   if(now - tf->last_read > tf->thresh) {
> rval = slurpfile(tf->name, tf->buffer, BUFFSIZE);
> if(rval == SYNAPSE_FAILURE) {
>   err_msg("update_file() got an error from slurpfile() reading 
> %s",
>   tf->name);
>   return (char *)SYNAPSE_FAILURE;
> }
> else tf->last_read = now;
>   }
>   return tf->buffer;
> }
> 

I have set those timeout values zero, which works well and gives
me nice spiky furry graphs.

- richard

For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.




Re: [Ganglia-general] Obtaining Immediate Interval Data From Ganglia

2006-08-10 Thread Richard.Grevis
Ian,

it is the gmetad process which write the rrd files, not gmond. Are you
using "rrdtool fetch" to
get the numbers? If you don't specify an end time, rrdtool will choose
"now", so
it is almost certain you will have some Nan's at the end.

What I do is to do a "rrdtool last" first, then use that value for the
rrftool fetch.

- regards,
richard

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Ian
Wootten
Sent: 08 August 2006 16:23
To: ganglia-general@lists.sourceforge.net
Subject: [Ganglia-general] Obtaining Immediate Interval Data From
Ganglia


I am facing a problem in that I would like short-segment up to date 
information from ganglia in order to monitor services after invocation.
Whilst connecting to a port will provide the most immediate metrics, I
would like information over a particular time period. Inspection of
ganglia's round robin databases provides this also, but if most
immediate information is requested - i.e Invoke this service, collect
ganglia metrics for that period, many metric values are left empty
(NaN). I increase the speed at which localhost is polled, but still much

information is to be written to rrd afterwards. A second query sometime 
later and these are populated. Is there a way to force gmond writes to 
rrd. Or another method to obtain such information from ganglia?

Thanks,

Ian



-
Using Tomcat but need to do more? Need to support web services,
security? Get stuff done quickly with pre-integrated technology to make
your job easier Download IBM WebSphere Application Server v.1.0.1 based
on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.




[Ganglia-general] reported time from gmond XML data leaping backwards.

2006-06-21 Thread Richard.Grevis
All,
 
I am observing 2 problems occassionally occuring which may or may not be
related.
 
The first is that very rarely the time reported back to gmetad from the
gmond XML will
leap backwards by maybe a month and a half. Checking the time on the
server running
the gmond reveals that the server time remains correct. Restarting the
gmond fixex the problem.
 
My environment is the gmond is running on windows under cygwin. Hosts
unicast to a single headnode
which is polled by the gmetad. I have not observed the problem on Linux.
So it may be windows specific or even cygwin. It has me stumped. Anyone
ever seen this?
 
The send thing that sometimes occurs is:
RRD_update (/.../FX EXO QANet farm/__SummaryInfo__/cpu_user.rrd):
illegal attempt to update using time 1150890115 when last update time is
1150890254 (minimum one second step)
 
The was I see it, the only way this sort of thing can happen when
writing summary data is if the LOCALTIME reported
for the cluster was maybe jittering around. But I don't know.
 
Can anyone give me suggestions or suggest ways I may get to the bottom
of this please?
 
- Richard
Barclay Capital Ganglia Engineering and Development, Ganglia 1st line
support, 2nd line support, Ganglia Commissioning and Deployment, Ganglia
dog's body.


For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.



Re: [Ganglia-general] Windows servers

2006-06-15 Thread Richard.Grevis
Ron,
 
do the following:
 
- choose one of your w2k servers as a "headnode".
- configure gmetad.conf to have a SINGLE data_source entry pointing to
this headnode.
- configure gmond.conf on all hosts (including headnode) to have:
udp_send_channel { 
  host = headnode-hostname   
  port = 8649 
} 
- The name of the cluster is determined by the clustername in gmond.conf
on the headnode alone.
 
Poof. After that, and after restarting everything you should get what
you want.
 
kind regards,
richard
 
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Owens, Ron
Sent: 15 June 2006 11:03
To: ganglia-general@lists.sourceforge.net
Subject: [Ganglia-general] Windows servers


I have successfully set up gmond on 4 W2k servers and get graphs
on my gmetad server (thanks to Ian Cunningham!).
 
The only thing I would like to see is a overview graphs of these
servers, just like I get with my Linux clusters. Is this possibleto get
ganglia to "pretend" they are a cluster, just for reporting purposes?
 
I have a separate "data source" line for each of the W2k servers
in my gmetad.conf as they are not a multicasting group.
 
__
Ron Owens - Principal Technical Specialist
Infrastructure Team - Computer Services 
email: [EMAIL PROTECTED] 
Phone: External +353 91 49 3252 Internal: 3252 
 


For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.



Re: [Ganglia-general] Problem with Windows gmond

2006-06-12 Thread Richard.Grevis
Ron, gmonds only send UDP data to ther gmonds. From your post it is
unclear
what is listening on 140.203.7.43 port 2344. It should be a gmond.
 
To test a single host, you should configure a udp_send_channel on the
w2k server
to send data to its proper address or hostname (not 127.0.0.1 which is
only visible on
the w2k server itself).
 
Then configure gmetad.conf to retrieve from whatever TCP port you have
configured in
gmond.conf
 
kind regards,
richard
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Owens, Ron
Sent: 09 June 2006 10:22
To: ganglia-general@lists.sourceforge.net
Subject: [Ganglia-general] Problem with Windows gmond


 
I have installed gmond on a W2K server. The gmond.conf relevant
bits are as follows:
 
 
used to only support having a single channel */ 
udp_send_channel { 
  host = 127.0.0.1
  port = 2344 
} 
udp_send_channel { 
  host = 140.203.7.43
  port = 2344
 
 
 
The 140.203.7.43 is the server running gmetad.
 
The problem is that the gmetad server is not seeing any data
from the windows server (or if its seeing it, it doesn't display any
graphs!). 
 
If I run gmetad with debug_level =1 , it list the windows server
as being monitored, but no errors arear.
 
Any ideas?
 
__
Ron Owens - Principal Technical Specialist
Infrastructure Team - Computer Services 
email: [EMAIL PROTECTED] 
Phone: External +353 91 49 3252 Internal: 3252 
 


For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.



Re: [Ganglia-general] Ganglia Alert and Tracking

2006-06-12 Thread Richard.Grevis
Ahh yes, aggregating data in different ways after the fact.
We had a need to do that, and also a need to provide more than one
cluster heirachy (e.g. clusters grouped by region, but also clusters
grouped by technology owner (say)).

I have written some perl code to do this - sucking the data out of
defined
clusters and manually calling rrdtool update for different aggregate
views.
Doing it is a little tricky I must say, at least for me. The other thing
that
is a bit disappointing is that if you extract the data from some time
range,
if the finest grain data does not go back that far, it will use the
coarser grained
data for the full extract - even in the timeframe where there is finer
data.

The other step is to get a ganglia instance to understand enough to
display this
other rollup data. Your choices include faking up appropriate XML on
port 8652
to convince a 2nd gmetad instance to display the data, or hacking a 2nd
copy of the
php tree and replace the code that asks gmetad for data with file based
data (say).

The perl code is attached for, but this is only for interest. It is too
horrible to
be usable by others.

best regards,
richard

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Alex
Balk
Sent: 09 June 2006 20:32
To: Bernard Li
Cc: Stackpole, Chris; ganglia-general@lists.sourceforge.net
Subject: Re: [Ganglia-general] Ganglia Alert and Tracking




Bernard Li wrote:

>> I am trying to write a script that pulls the info from netcat
>> and averages out some numbers but I believe that there is a 
>> easier way. Does ganglia store data in such a way that I 
>> could pull this type of information? This appears so useful 
>> to me that I am sure that there are others that have tried 
>> this, are there any ideas and suggestions?
>> 
>
> Sorry for hijacking your thread Chris but your question leads me to 
> think that there are some interesting data stored in the RRD database,

> perhaps we could write a script to mine this data and provide some 
> interesting historical reports?
>   

Actually, my patch for "custom graphs" accomplishes exactly what you're
talking about. It allows you to create a template and then load it for
whatever view (meta, cluster, host) you desire. Couple this with
gmetrics and you can pretty much generate a graph for anything (read -
visually represent any aspect of your data). It also supports rrdtool's
CDEFs, so you can do data transformations as well. Oh, and the rendering
backend may be called from within an  which allows creating
"customized dashboards". I've started working on one where customers can
view different utilizations graphs based on the cluster specialty
(batch, interactive, infrastructure), NFS statistics, parallel job
utilization (how much does process named X consume across multiple
hosts), etc.


What I'm really missing is a method to "generate" aggregate data on the
fly. Something like "take these 3 hosts, all from different clusters,
and show me their aggregate CPU consumption".

Cheers,
Alex


___
Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.



rollup
Description: rollup


Re: [Ganglia-general] Questions about hostname and grid names

2006-06-05 Thread Richard.Grevis
Mark,

the hostnames that you see in the web interface are the result of
reverse DNS lookup of the IP addresses of the hosts in your clusters.
You will find the differences there and this is what you have to change.

Bear in mind that the host doing the reverse DNS lookup is the headnode
for each cluster - in your case newton and winterstar.

Also the configuration file on the headnode determines the cluster name
for the
whole cluster shall be - it is not the name you put in gmetad.conf
on data_source entries for example.

If in your second question you are saying that you do not see all the
hosts
in your cluster, I usually recommend getting things going with unicast
first,
and then later multicast if you want.

To implement unicast change the gmond.cons everywhere to have this:
udp_send_channel {
  #mcast_join = 239.255.160.2
   host = some-host
  port = 8649
}

and then use some-host as the data_source in gmetad.conf

- kind regards,
richard


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Mark
Haney
Sent: 01 June 2006 23:47
To: ganglia-general@lists.sourceforge.net
Subject: [Ganglia-general] Questions about hostname and grid names


First, I'd like to say that I am very impressed with Ganglia now that 
it's up and running on my boxes.  I have built 2 separate RPMs for ia64 
(one for SUSE and one for RHEL 3) that I'd like to contribute if the 
need is great enough.  However, that's not why I'm posting.

I have gone through the archives and have some questions about things I 
am seeing on the graphs.  First, currently I have 3 systems (2 SGI Altix

boxes and a front end gmetad server that is only for the web interface) 
being monitored, but the hostname has me confused.  On 2 of the servers 
(one SGi box and the web front end) I get the FQDN of the server.  But 
on the other SGI box I get only the actual hostname, not the FQDN.  The 
hostnames are all set for the FQDN, and I noted in the archives some 
items related to reverse lookups, but none of that is a problem that I 
can see.  In gmetd.conf I have:

data_source "Newton" newton.hpcc.ercbroadband.org
data_source "Winterstar" winterstar.hpcc.ercbroadband.org

With Newton being the one that shows up as 'Newton', and Winterstar 
showing up as winterstar.hpcc.ercbroadband.org in the Web GUI.  I do not

understand why this is, any ideas?

Now, second question, I want to 'pretty up' the Interface to include 
some company info.  I've tried to get the cluster name setup by using 
the same cluster name in gmond.conf for all boxes, but only the web 
interface shows up in it.  I also wanted to change the 'grid' name, but 
I get wonky CPU totals when I change it on the web server.  I know this 
is something simple, but I'm not totally knowledgeable of the 
architecture of ganglia to know what it is.  The docs are kind of vague 
on this and the list archives didn't really give me much either.  Can 
someone help me?

-- 
Mark Haney
Sr. Systems Administrator   
ERC Broadband




___
Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.




Re: [Ganglia-general] New issue with hosts reporting

2006-06-05 Thread Richard.Grevis
Have you checked whether your reverse DNS entries are correct?

The ganglia agents use the source address of the UDP packets that are
transmitted to o a reverse DNS lookup to yield the hostname seen in the
XML.

- richard

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Mark
Haney
Sent: 02 June 2006 14:06
To: ganglia-general@lists.sourceforge.net
Subject: [Ganglia-general] New issue with hosts reporting


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Okay, since my first post didn't get an answer, maybe this one will.  I
have a setup where I have a separate server acting as the web front end
to all my clusters (All SGI Altix boxes).  I just added a new host to be
monitored with a separate cluster name in gmond.conf and added the
data_source to the web front end.  Here's my problem:

The new node is reporting itself as part of an existing cluster, when
it's configured that way.  Not only that, but telnetting to 8649 on that
server reports the HOST NAME as another server on the network that's
already monitored.  In other words:

 is shown on Server B and
 is shown on Server A in the XML it reports.

What's the deal?  It's got to be something simple, but I've yet to get
it.

- --
Fere libenter homines id quod volunt credunt.

Mark Haney
Sr. Systems Administrator
ERC Broadband
(828) 350-2415
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEgDerYQhnfRtc0AIRAh/QAJ46iNTkcmFYlEz+ukyR8B7K5+y7lACghb/w
LOBE+fjncTaPXV/0erbWtTQ=
=pneB
-END PGP SIGNATURE-


___
Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.




RE: [Ganglia-general] very annoying issue with jittery cluster graphs

2006-05-23 Thread Richard.Grevis
John,

this may not particularly help you, but on your ganglia server
I would try netcating localhost and checking out TN numbers for a start.
e.g.
nc localhost 8651 | grep 'HOST NAME'
and check out TN values, or maybe just wc the above to see if the data
is always coming in properly. Do the same sort of thing on your headnode
or 1 of the nodes if multicast (port 8649).

Some other steps include editing gmetad.conf so that only a single
cluster
and a single host is mentioned for the data source, and if desperate try
using unicast instead of multicast.

regards,
richard 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of john
allspaw
Sent: 23 May 2006 16:10
To: ganglia-general@lists.sourceforge.net
Subject: [Ganglia-general] very annoying issue with jittery cluster
graphs


Hello there - 

I'm attaching two graphs from the grid view of my ganglia install, load
and memory.  The cluster has 24 nodes, and the individual 
nodes don't drop in or out as the graphs are showing, all of their
individual graphs look normal.

Anyone have any idea what's going on here ?
-john


For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.




RE: [Ganglia-general] Compiling Ganglia on Windows

2006-05-17 Thread Richard.Grevis
RRDtool - look to the site?
rrdtool windows binary distributions - see page
http://oss.oetiker.ch/rrdtool/download.en.html scroll down to "binary
distributions",
http://oss.oetiker.ch/rrdtool/pub/?M=D and
http://www.cacti.net/downloads/rrdtool/win32
Find documentation as required on his site.
 
I am unclear though. Given you can't run gmetad on windows, what is your
point in wanting rrdtool on windows?
 
All the gmond build instructions under cygwin? You asked for it -
- download and install EVERY cygwin package (see site how to do this)
- download clean ganglia src tar ball. to the cygwin environment
  (do not for example use a ganglia tree which has had a previous (linux
based) "./configure" run on it.
- in ganglia src tree under a cygwin shell, "./configure"
- make -i
- loop around doing last 2 steps over and over until you finally get
gmond.exe and gmetric.exe binaries appearing
  in their respective directories.
- Note: ./configure will be slow. If it seems to hang for ever, wait a
while, kill the shell window, make
  another window, and run configure again (and again) until it finally
stumbles through. Regardless, try running
  make -i as well, in case it finally works (It always fails on
something, by work, I mean that the binaries appear)
 
- Richard

-Original Message-
From: joshua mora [mailto:[EMAIL PROTECTED] 
Sent: 17 May 2006 11:20
To: Grevis, Richard: IT (LDN);
ganglia-general@lists.sourceforge.net
Subject: RE: [Ganglia-general] Compiling Ganglia on Windows



Thank you Richard for the clarification about gmetad.

I was hoping to run everything under Windows (Apache, rrdtools,
gmetad and gmond).

Can you provide some documentation about the compilation of
rrdtools (this I believe can be compiled with VS) and gmond under cygwin
? 

Thanks.

 





From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] 
Sent: Wednesday, May 17, 2006 2:45 AM
To: [EMAIL PROTECTED]; ganglia-general@lists.sourceforge.net
Subject: RE: [Ganglia-general] Compiling Ganglia on Windows

 

Joshua,

 

to the best of my knowledge, gmetad has never been compiled for
windows. So you will not be

able to run the server code under windows.

 

Gmond and gmetric have been compiled by myself and others in a
cygwin environment. There is no windows build documentation.

 

The windows gmond does not implement (properly or at all) quite
a few metrics, most notably the load

measures. This is not the fault of gmond, but rather the
incomplete linux /proc emulation by cygwin.

 

In my case, as the bulk of the monitored systems are windows
based HPC farms, I hacked the php code

to be (generally) keyed from the cpu metrics rather than the
(non working) load metrics.

 

I will mail you a zip file with the binaries. Let me know if
they do not arrive.

 

regards,

richard

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of joshua
mora
Sent: 16 May 2006 18:23
To: ganglia-general@lists.sourceforge.net
Subject: [Ganglia-general] Compiling Ganglia on Windows

Hello

Is there a document with explanations (step by step
process) about how to build Ganglia (gmetad and gmond) and rddtools for
Windows.

I don't know if some parts have to be compiled within
cygwin or can everything be compiled through solution files with VS.

Although I am interested in compiling Ganglia, I am
looking also for the binaries to download since I haven't found them.

 

Thank you in advance.

 



For more information about Barclays Capital, please visit our
web site at http://www.barcap.com  .

Internet communications are not secure and therefore the
Barclays Group does not accept legal responsibility for the contents of
this message.  Although the Barclays Group operates anti-virus
programmes, it does not accept responsibility for any damage whatsoever
that is caused by viruses being passed.  Any views or opinions presented
are solely those of the author and do not necessarily represent those of
the Barclays Group.  Replies to this email may be monitored by the
Barclays 
Group for operational or business reasons.





RE: [Ganglia-general] Compiling Ganglia on Windows

2006-05-17 Thread Richard.Grevis
Joshua,
 
to the best of my knowledge, gmetad has never been compiled for windows.
So you will not be
able to run the server code under windows.
 
Gmond and gmetric have been compiled by myself and others in a cygwin
environment. There is no windows build documentation.
 
The windows gmond does not implement (properly or at all) quite a few
metrics, most notably the load
measures. This is not the fault of gmond, but rather the incomplete
linux /proc emulation by cygwin.
 
In my case, as the bulk of the monitored systems are windows based HPC
farms, I hacked the php code
to be (generally) keyed from the cpu metrics rather than the (non
working) load metrics.
 
I will mail you a zip file with the binaries. Let me know if they do not
arrive.
 
regards,
richard

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of joshua
mora
Sent: 16 May 2006 18:23
To: ganglia-general@lists.sourceforge.net
Subject: [Ganglia-general] Compiling Ganglia on Windows



Hello

Is there a document with explanations (step by step process)
about how to build Ganglia (gmetad and gmond) and rddtools for Windows.

I don't know if some parts have to be compiled within cygwin or
can everything be compiled through solution files with VS.

Although I am interested in compiling Ganglia, I am looking also
for the binaries to download since I haven't found them.

 

Thank you in advance.

 



For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.



RE: [Ganglia-general] Newbie question - gmond not returning any metrics

2006-05-16 Thread Richard.Grevis
Steve,

it may seem strange, but that is the way gmond behaves.
If in all your gmond instances you specify a single unicast
headnode, the only place you will get the XML data payload
is the headnode. The other nodes dump the DTD and nothing else.

If you want to see the data on each of your workers locally, then
in gmond.conf specify a second send channel, ala:

udp_send_channel {
  host = headnode
  port = 8649
}
udp_send_channel {
  host = 127.0.0.1
  port = 8649
}

regards,
richard

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Steve
Webb
Sent: 15 May 2006 19:56
To: ganglia-general@lists.sourceforge.net
Subject: [Ganglia-general] Newbie question - gmond not returning any
metrics


headnode (report01): runs gmetad, gmond and web-frontend and is working 
fine.

workers (worker01, worker02, ...): runs gmond and runs fine, but when 
telnetting to port 8459 (even from worker01 using localhost), I get no 
metrics in the XML:

-
[EMAIL PROTECTED] bin]# telnet localhost 8649
Trying 127.0.0.1...
Connected to localhost.localdomain (127.0.0.1).
Escape character is '^]'.
 
   
   

   
   
   

   
   
   
   
   

   
   
   
   
   
   
   
   

   
   
   
   
   
   
   
   
   

   
   
   

   
   
   
   
   
   
   
]>




Connection closed by foreign host.

[EMAIL PROTECTED] bin]# gstat
CLUSTER INFORMATION
Name: crawl
   Hosts: 0
Gexec Hosts: 0
  Dead Hosts: 0
   Localtime: Mon May 15 12:47:46 2006

There are no hosts running gexec at this time
[EMAIL PROTECTED] bin]#
-

I'm guessing that there's no multicast issues since I'm just trying to
get 
gmond to tell me the locahost's stats, right?  Nothing is even leaving
the 
machine at this point, I'm just telnetting to the localhost's port and 
asking for stats.


I compiled the source on report01 and then just copied gmond, gstats & 
gmertic to the workers and started them up.  Am I missing something on
the 
workers to collect stats?

- Steve

-- 
EMAIL: (h) [EMAIL PROTECTED]  WEB: http://badcheese.com/~steve



---
Using Tomcat but need to do more? Need to support web services,
security? Get stuff done quickly with pre-integrated technology to make
your job easier Download IBM WebSphere Application Server v.1.0.1 based
on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.




RE: [Ganglia-general] Metric pull-down menu not showing all metrics

2006-05-08 Thread Richard.Grevis
I was hoping that someone would do it properly!
If I get time today I will get the patch working against 3.0.3

- richard

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Rick
Mohr
Sent: 04 May 2006 15:08
To: ganglia-general@lists.sourceforge.net
Subject: RE: [Ganglia-general] Metric pull-down menu not showing all
metrics



Ben,

Below is the patch I created to display all metrics regardless of which
hosts 
might have different sets of metrics.  It is essentially the same 
modifications that Richard previously described (I had just chosen a
different 
variable name).  However, it is a patch against 3.0.1 and not the latest

version, so I don't know if it will apply cleanly.  Luckily, the patch
is very 
simple so there shouldn't be any problems getting it to work with the
latest 
version.

Hope that helps.

--Rick

--
Rick Mohr
Systems Developer
Ohio Supercomputer Center



- BEGIN PATCH -
diff -Naur ganglia-3.0.1-orig/web/ganglia.php
ganglia-3.0.1/web/ganglia.php
--- ganglia-3.0.1-orig/web/ganglia.php  2005-07-20 09:58:33.0
-0400
+++ ganglia-3.0.1/web/ganglia.php   2005-07-20 11:23:14.0
-0400
@@ -29,6 +29,10 @@
  # Context dependant structure.
  $metrics = array();

+# RFM - We want to keep track of all metrics even if different ones # 
+appear on different nodes. $metric_names = array();
+
  # 1Key = "Component" (gmetad | gmond) = Version string
  $version = array();

@@ -108,7 +112,7 @@

  function start_cluster ($parser, $tagname, $attrs)
  {
-   global $metrics, $cluster, $self, $grid, $hosts_up, $hosts_down;
+   global $metrics, $metric_names, $cluster, $self, $grid, $hosts_up, 
+ $hosts_down;
 static $hostname;

 switch ($tagname)
@@ -145,6 +149,7 @@
   case "METRIC":
  $metricname = $attrs['NAME'];
  $metrics[$hostname][$metricname] = $attrs;
+$metric_names[$metricname] = 1;
  break;

   default:
diff -Naur ganglia-3.0.1-orig/web/header.php
ganglia-3.0.1/web/header.php
--- ganglia-3.0.1-orig/web/header.php   2005-07-20 09:58:33.0
-0400
+++ ganglia-3.0.1/web/header.php2005-07-20 11:23:14.0
-0400
@@ -219,13 +219,12 @@

  if( $context == "cluster" )
 {
-   if (!count($metrics)) {
+   if (!count($metric_names)) {
echo "Cannot find any metrics for selected cluster 
\"$clustername\", exiting.\n";
echo "Check ganglia XML tree (telnet $ganglia_ip
$ganglia_port)\n";
exit;
 }
-   $firsthost = key($metrics);
-   foreach ($metrics[$firsthost] as $m => $foo)
+   foreach ($metric_names as $m => $foo)
   $context_metrics[] = $m;
 foreach ($reports as $r => $foo)
   $context_metrics[] = $r;
- END PATCH -


---
Using Tomcat but need to do more? Need to support web services,
security? Get stuff done quickly with pre-integrated technology to make
your job easier Download IBM WebSphere Application Server v.1.0.1 based
on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.




RE: [Ganglia-general] Metric pull-down menu not showing all metrics

2006-05-04 Thread Richard.Grevis
Ben,

As you probably already know, the code is in header.php -
if( $context == "cluster" )
   {
   if (!count($metrics)) {
  echo "Cannot find any metrics for selected cluster
\"$clustername\", exiting.\n";
  echo "Check ganglia XML tree (telnet $ganglia_ip
$ganglia_port)\n";
  exit;
   }
   $firsthost = key($metrics);
   foreach ($metrics[$firsthost] as $m => $foo)
 $context_metrics[] = $m;
   foreach ($reports as $r => $foo)
 $context_metrics[] = $r;
   }
-
So what to do?
Hack  header.php about line 417 like this:
#   $firsthost = key($metrics);
#   foreach ($metrics[$firsthost] as $m => $foo)
# $context_metrics[] = $m;
foreach($rgmetrics as $m => $foo) {
 $context_metrics[] = $m;
}
   foreach ($reports as $r => $foo)
 $context_metrics[] = $r;
   }
-
Fill the rgmetric array in ganglia.php:
$rgmetrics = array();   # Global decl.
In function start_cluster:
function start_cluster ($parser, $tagname, $attrs)
{
   global $metrics, $cluster, $self, $grid, $hosts_up, $hosts_down;
global $rgmetrics;  # decl.
and a little furth down in start_cluster function:
 case "METRIC":
$metricname = $attrs[NAME];
$metrics[$hostname][$metricname] = $attrs;
$rgmetrics[$metricname] = $metricname;
break;

I would make a patch but my code is modded already.
This was also, errm, realtime programming - I did it just then.
Let me know if this is enough...

regards,
Richard


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Ben
Hartshorne
Sent: 04 May 2006 08:00
To: ganglia-general@lists.sourceforge.net
Subject: [Ganglia-general] Metric pull-down menu not showing all metrics


All,

I am curious how the Metric menu in the cluster view gets populated.  I
have a number of my hosts reporting metrics that the others don't.   For
example, the hosts that are running mysql and replicating from a
different database report how many seconds they are behind their master,
but only 10 out of my 30 hosts run mysql.  

The mysql_slave ganglia metric does not usually show in the Metric
pull-down menu.  

Previously, I had only one cluster, so clicking on 'Grid' just went
straight to the cluster.  For some reason, after clicking on 'Grid', I
could see all the metrics that are reported.  As soon as I chose a
metric, only some of the metrics were present in the Metric pull-down
menu.  I think only the metrics present on the first host in the cluster
list are present in the pull-down menu.  

Now I have more than one cluster in my grid, so clicking on Grid no
longer gives me all the metrics in the Metric menu.  I am now unable to
see my mysql_slave metric without manually typing it into the URL
string.

Suggestions?

-ben

-- 
Ben Hartshorne
email: [EMAIL PROTECTED]
http://ben.hartshorne.net



For more information about Barclays Capital, please
visit our web site at http://www.barcap.com.


Internet communications are not secure and therefore the Barclays 
Group does not accept legal responsibility for the contents of this 
message.  Although the Barclays Group operates anti-virus programmes, 
it does not accept responsibility for any damage whatsoever that is 
caused by viruses being passed.  Any views or opinions presented are 
solely those of the author and do not necessarily represent those of the 
Barclays Group.  Replies to this email may be monitored by the Barclays 
Group for operational or business reasons.






RE: [Ganglia-general] Multiple gmetads

2006-05-02 Thread Richard.Grevis
Did you specify different ports for each copy of gmond?
This mean you will need 2 copies of gmetad.conf (port numbers
are specified there), and depending on how you want it to
work you will need to have 2 php web directories and set
$ganglia_port = ???;Inside conf.php

also remember that you can turn debugging on
 "gmetad -d3 -c configfile"

regards,
richard
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Phil
Dibowitz
Sent: 02 May 2006 19:08
To: ganglia-general@lists.sourceforge.net
Subject: [Ganglia-general] Multiple gmetads


Hey folks,

We have a situation where we have 4 "grids" made up of several decent
sized "clusters". All works well with "scalability" set to "on".

But we'd like to run a box that pulls directly from each cluster for the
purposes of having all the RRD data on one box. However, setting up a
box and turning "scalability" off, yields breaks in data...

So I thought of running multiple gmetads on a box, each pulling from all
the clusters in one grid directly.

However, even if I specify different PID files, I can't make gmetad run
more than once. I haven't started digging into the code to figure out
why yet, but is there an actual conflict here, or should I be able to
hack around this?

-- 
Phil Dibowitz
P: 310-360-2330 C: 213-923-5115
Unix Admin, Ticketmaster.com


For more information about Barclays Capital, please visit our web site at 
http://www.barcap.com.

Internet communications are not secure and therefore the Barclays Group does 
not accept legal responsibility for the contents of this message.  Although the 
Barclays Group operates anti-virus programmes, it does not accept 
responsibility for any damage whatsoever that is caused by viruses being 
passed.  Any views or opinions presented are solely those of the author and do 
not necessarily represent those of the Barclays Group.  Replies to this email 
may be monitored by the Barclays Group for operational or business reasons.




RE: [Ganglia-general] 2 clusters same subnet

2006-04-28 Thread Richard.Grevis
Chris,

with unicast, the cluster derives its name from the head-node
configuration only.
By head-node, I mean the nodes that appear in the gmetad configuration
as you have
detailed below.

So in your case, if you have two separate head-nodes for each cluster,
there is
no need to use different ports, whether of not they are on the same
subnet.

kind regards,
richard

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Eli
Stair
Sent: 26 April 2006 22:34
To: Botka, Christopher; ganglia-general@lists.sourceforge.net
Subject: Re: [Ganglia-general] 2 clusters same subnet



Double-check and make sure you've got the cluster participants for the
discreet clusters unicasting to another gmond that is listening on the
correct port.  

Start running tethereal on the gmond host collecting and make sure
you're not getting ICMP port unreachable, or another issue.

/eli 

On 4/26/06 2:13 PM, "Botka, Christopher"
<[EMAIL PROTECTED]> wrote:

> 
> There was a thread a while back where the question - " I have 2 
> clusters on the same subnet, how do I get gmetad to display them as 2 
> clusters on the web frontend?" This was also answered:
> 
> You need to separate the ports where your clusters multicast. Default

> is 8649. Select another port for (8648) for your second cluster.
>  
>   Then you need to define two datasources in gmetad.conf (you only 
> need  one of those).
>  
>  data_source "cluster 1" node_in_cluster_1:8649
>  data_source "cluster 2" node_in_cluster_2:8648
>  
> My follow up questions is: Does this hold true for a Unicast 
> configuration - I cannot run multicast and have tried to config as 
> suggested above.  However, I am not seeing any stats gathered on the 
> gmetad host for the :8648 data sources.
> 
> Any suggestions would be appreciated.
> 
> Thx,
> 
> CB
> 
> 
> ---
> Using Tomcat but need to do more? Need to support web services, 
> security? Get stuff done quickly with pre-integrated technology to 
> make your job easier Download IBM WebSphere Application Server v.1.0.1

> based on Apache Geronimo 
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642
> ___
> Ganglia-general mailing list
> Ganglia-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
> 



---
Using Tomcat but need to do more? Need to support web services,
security? Get stuff done quickly with pre-integrated technology to make
your job easier Download IBM WebSphere Application Server v.1.0.1 based
on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general



For more information about Barclays Capital, please
visit our web site at http://www.barcap.com.


Internet communications are not secure and therefore the Barclays 
Group does not accept legal responsibility for the contents of this 
message.  Although the Barclays Group operates anti-virus programmes, 
it does not accept responsibility for any damage whatsoever that is 
caused by viruses being passed.  Any views or opinions presented are 
solely those of the author and do not necessarily represent those of the 
Barclays Group.  Replies to this email may be monitored by the Barclays 
Group for operational or business reasons.






[Ganglia-general] Date timestamp in XML for a cluster leapt backwards.

2006-04-20 Thread Richard.Grevis
All,

I have just seen that the date as reported in the XML stream
for one of my clusters went backwards by about 20 days recently.

This of course caused the RRD update to fail because it was attempting
to update with an earlier timestamp.

A netcat of the cluster port 8649 revealed that the cluster timestamp
as well as timestamps for all the hosts was now well in the past.

Looking at the code all timestamps would be affected if the headnode got
confused about what was now, and now is calculated by calling:
now = apr_time_now();
Note that this is a windows cluster not a linux one.

Anyone ever seen this sort of thing happen?

kind regards,
richard



For more information about Barclays Capital, please
visit our web site at http://www.barcap.com.


Internet communications are not secure and therefore the Barclays 
Group does not accept legal responsibility for the contents of this 
message.  Although the Barclays Group operates anti-virus programmes, 
it does not accept responsibility for any damage whatsoever that is 
caused by viruses being passed.  Any views or opinions presented are 
solely those of the author and do not necessarily represent those of the 
Barclays Group.  Replies to this email may be monitored by the Barclays 
Group for operational or business reasons.






RE: [Ganglia-general] Ganglia reporting all nodes down...except 1

2006-04-20 Thread Richard.Grevis
I agree with Steve, Chris,

I would suggest that (at least until you are more confident) that
you change the gmond.conf config to be unicast to your headnode, i.e:
udp_send_channel {
  #mcast_join = 239.255.160.2
   host = your-headnode-hostname
  port = 8649
}

And also confirm all seems OK by netcating your headnode XML port:
nc your-headnode 8649

Hopefully it will all just work.

- richard

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Steven
A. DuChene
Sent: 20 April 2006 01:15
To: Botka,Christopher
Cc: ganglia-general@lists.sourceforge.net
Subject: Re: [Ganglia-general] Ganglia reporting all nodes down...except
1


Chris:
Ganglia does multicast out of the box and my guess is that multicast
support is not turned on in the Cisco 6500 switch you are using to
interconnect your nodes.
--
Steve

-Original Message-
>From: "Botka, Christopher" <[EMAIL PROTECTED]>
>Sent: Apr 19, 2006 3:15 PM
>To: ganglia-general@lists.sourceforge.net
>Subject: [Ganglia-general] Ganglia reporting all nodes down...except 1
>
>
>Hi,
>
>I have a small cluster (15 nodes x86_64 RHE4/FC4 mixed about 50/50).  I

>recently built and installed v3.0.3 making only minor changes to the 
>gmond.conf file:  name, owner, latlong, url and location.  The

>gmond.conf file is identical across the nodes and they are running the 
>same NFS shared binary.  The only change I made to gmetad.conf is the 
>data_source line.
>
>If I am only running gmond on the head node (where gmetad is running) 
>ganglia reports correct and accurate info in the Physical View for the 
>head node, however the images are empty where graphs are supposed to 
>be. All I see is the std. red "x" and image name with a border.
>
>If I start gmond on any other node, the node will infrequently report a

>value for load and will appear up for about 1-3 mins, but 
>CPU/Hardware/Software are not reported - same problem with the images 
>of the graphs.
>
>If I start gmond on all the nodes and the stop it on the head node, 
>node01 will begin reporting accurately in the Physical View, all others

>reported as down, no images.  If I stop node01, node02 reports 
>correctly (still no images), etc.  I have done this up through node05 
>with same results.
>
>Nothing weird in dmesg or messages and everything looks OK when I run 
>gmond in debug.  When I look at the debug of gmetad, the updates of all

>nodes except the one reporting correctly are very infrequent.
>
>All these nodes are single 1Gb nic attached to the same blade on a 
>cisco 6500.  Windows DHCP and DNS ;-0. I have also tried 
>installing/running the rpm on some of the nodes with the same outcome.
>
>This is the first time I have used Ganglia, so I am not certain if I am

>making an obvious newbie mistake.  Any & all suggestions are welcom, 
>the amount of time I have already spent without a breakthrough is 
>embarrassingly large.
>
>Thx,
>
>Chris
>
>
>
>
>---
>Using Tomcat but need to do more? Need to support web services, 
>security? Get stuff done quickly with pre-integrated technology to make

>your job easier Download IBM WebSphere Application Server v.1.0.1 based

>on Apache Geronimo 
>http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642
>___
>Ganglia-general mailing list
>Ganglia-general@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/ganglia-general



---
Using Tomcat but need to do more? Need to support web services,
security? Get stuff done quickly with pre-integrated technology to make
your job easier Download IBM WebSphere Application Server v.1.0.1 based
on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=k&kid0709&bid&3057&dat1642
___
Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general



For more information about Barclays Capital, please
visit our web site at http://www.barcap.com.


Internet communications are not secure and therefore the Barclays 
Group does not accept legal responsibility for the contents of this 
message.  Although the Barclays Group operates anti-virus programmes, 
it does not accept responsibility for any damage whatsoever that is 
caused by viruses being passed.  Any views or opinions presented are 
solely those of the author and do not necessarily represent those of the 
Barclays Group.  Replies to this email may be monitored by the Barclays 
Group for operational or business reasons.






RE: [Ganglia-general] enlarge ganglia graphs

2006-04-19 Thread Richard.Grevis
Bernard,
 
sure - no problem. Give me a day as I need to tease apart other customisations 
I have done
that people would not be interested in.
 
- richard

-Original Message-
From: Bernard Li [mailto:[EMAIL PROTECTED] 
Sent: 18 April 2006 21:57
To: Grevis, Richard: IT (LDN); [EMAIL PROTECTED]; 
ganglia-general@lists.sourceforge.net
Subject: RE: [Ganglia-general] enlarge ganglia graphs


Hi Richard:
 
It would be great if you can submit a patch for this.  I think it would 
be useful...
 
Thanks,
 
Bernard




From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, April 18, 2006 6:34
To: [EMAIL PROTECTED]; ganglia-general@lists.sourceforge.net; 
Bernard Li
Subject: RE: [Ganglia-general] enlarge ganglia graphs


Martin,
 
what you are looking at is a customisation that UC Berkely have 
done e.g.:

http://monitor.millennium.berkeley.edu/graph.php?g=load_report&z=huge&c=PSI%20Cluster&m=&r=week&s=descending&hc=4&st=1145366202
 
The graph size is set to huge, which is not standard in ganglia.
 
If you want to do this sort of thing, you will need to
1) in graph.php, about line 50, add (say)
else if ($size == "huge")
{
  $height = 750;
  $width = 1024;
}
 
2) Create the link from the medium image in the cluster view to 
the big one.
e.g. in 
/var/www/html/ganglia/templates/default/cluster_view.tpl, at line 20 and other 
places, put this:




 
It could be argued that this sort of thing could be placed into 
the standard ganglia. Certainly
I have already done this sort of thing in my ganglia 
implementation.
 

kind regards,
Richard



-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On 
Behalf Of [EMAIL PROTECTED]
Sent: 18 April 2006 09:05
To: ganglia-general@lists.sourceforge.net; [EMAIL 
PROTECTED]
Subject: WG: [Ganglia-general] enlarge ganglia graphs


 
-Ursprüngliche Nachricht-
Von: Pelikan, Martin 
Gesendet: Dienstag, 18. April 2006 09:56
An: 'ganglia-general@lists.sourceforge.net'
Betreff: WG: [Ganglia-general] enlarge ganglia graphs



Hi Bernard, 
 
Sorry, the attachements were to tall ;-)
 
my main page has already 3 sections (the summary and 
one for each data source); I also can click on the graps of the sections and I 
can see the summary of every cluster (data source). I also can "click" to each 
node. 
 
I can't see the  large   view 
(example_what_i_do_not_see_*.jpg)
 
Regards,
Martin
 
 

-Ursprüngliche Nachricht-
Von: Bernard Li [mailto:[EMAIL PROTECTED] 
Gesendet: Dienstag, 18. April 2006 08:58
An: Pelikan, Martin; 
ganglia-general@lists.sourceforge.net
Betreff: RE: [Ganglia-general] enlarge ganglia 
graphs


Hi Martin:
 
If you have 2 data sources, then the main page 
should have three sections - one for each data source and a third one, the 
"summary" of both.  You should be able to click on either one of the graphs of 
the data sources (but not the summary).
 
Perhaps you can take a screenshot and tell us 
exaclty what you wanted.
 
Cheers,
 
Bernard



From: [EMAIL PROTECTED] [

RE: [Ganglia-general] Nodes Reported as Dead

2006-04-19 Thread Richard.Grevis
Chris,

3.0 gmond?

This version of the agent will have the truncated XML problem,
although I have only seen "no element found" errors on parse as opposed
to what chris is seeing - which sounds like perhaps a partially
constructed XML tree in gmetad memory which then blows up the
subsequent?

Chris to see if the truncated XML fix makes this go away, you
only have to upgrade the gmond on the particular headnode that
is configured into the gmetad config file. This is the only
process generating the XML for the cluster.

If the perl script returned nothing, then that is good, but
the script does not actually parse the XML of course, so it will
not alert you to truncated or malformed XML.

- richard

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Martin
Knoblauch
Sent: 18 April 2006 18:32
To: Stackpole, Chris; ganglia-general@lists.sourceforge.net
Subject: RE: [Ganglia-general] Nodes Reported as Dead


Hi Chris,

 is there anything in the headnodes /var/log/messages or in the
webserver logfiles?

 As for the versions:

- webfrontend 2.5.7 is likely OK
- gmetad 3.0.3 should be OK. We actually fixed some dead-node related
stuff there
- gmond 3.0.0 is not a great idea. Please use at least 3.0.2.

 Not sure whetehr this was asked already: are your clocks in sync? It is
highly recommended that they are.

Martin

--- "Stackpole, Chris" <[EMAIL PROTECTED]> wrote:

> Thanks for the input.
> 
> The perl script returns nothing when it is run. I checked, netcat is 
> installed and working. Thinking that something was wrong, I removed 
> the comments out from the print statements and got a bunch of 
> information. The line `print "$headnode $info\n"` returns: localhost 
> GridMonitor IP=(all IPs are correct) TN=(different single digit 
> numbers)
> 
> I also added several print statements within the script and it appears

> as if everything is running great. Since all of the hosts are up and 
> just incorrectly reported, should it return anything? What should I be

> looking for?
> 
> I am still researching the other possibility.
> 
> Thank you for your help,
> Chris Stackpole
> 
> 
> 
> 
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, April 18, 2006 9:23 AM
> To: Stackpole, Chris; ganglia-general@lists.sourceforge.net
> Subject: RE: [Ganglia-general] Nodes Reported as Dead
> 
> 
> Chris,
> 
> possibility 1 - look for "Possible bug in hosts up calculation when 
> federating clusters" in the mail archive. But if you are using the 
> 3.0.3 release this should
> be fixed.
> The reason that the XML stream version affects hosts up is because
> the
> test of liveness
> changed between before 2.5, and after 2.5: The old method would be
> sensitive to clock
> variations between the server and monitored hosts.
> 
> possibility 2 - Do your servers have dual NICs that are not teamed? Or
> dodgy reverse DNS entries?
> try this little script to check a few things about your XML streams:
> #!/usr/bin/perl
> 
> #   Poll Ganglia headnodes and check for duped hosts.
> #   Richard Grevis,  Wed Mar 29 22:28:20 BST 2006
> 
> sub slurp {
> my ($headnode, $port) = @_;
> #print "<$headnode> <$port>\n";
> open (FD, "nc $headnode $port |") or die "netcat";
> while () {
> ($cluster) = / NAME="([^"]*)" / if (/^ if (($host,$IP,$TN,$TMAX) =
>  
> m/\sNAME="([^"]*)"\s+IP="([^"]+)".*TN="([^"]*)".*TMAX="([^"]*)"/) {
> $info = "$cluster IP=$IP TN=$TN";
> $host =~ s/\..*// if length($host) > 16;
> if (defined $info{$host}) {
> # Seen this host before - that's not 
> right.
> print "$host dup in XML: <$info> 
> <$info{$host}>\n";
> } else {
> #print "$headnode $info\n";
> $info{$host} = $info;
> }
> }
> }
> }
> 
> open (CONF, "/etc/gmetad.conf") or die "/etc/gmetad.conf"; while 
> () {
> s/#.*//;
> chop;
> if (($headnode, $port) = ($_ =~
> m/data_source\s+"[^"]+"\s+\d+\s+([^:\s]+)(:\d+){0,1}/)) {
> if ($port) {
> $port =~ s/://;
> } else {
> $port = 8649;
> }
> slurp($headnode, $port);
> }
> }
> 
> 
> - richard
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of 
> Stackpole, Chris
> Sent: 18 April 2006 14:41
> To: ganglia-general@lists.sourceforge.net
> Subject: [Ganglia-general] Nodes Reported as Dead
> 
> 
> "Good morning,
> Afraid this is going to require a little back story. We were 
> interested in using Ganglia to monitor a few priority systems. Because

> we were running Debian at the time we just used the 2.5.7-2 Ganglia 
> that is 

RE: [Ganglia-general] Nodes Reported as Dead

2006-04-18 Thread Richard.Grevis
Chris,

possibility 1 - look for "Possible bug in hosts up calculation when
federating clusters"
in the mail archive. But if you are using the 3.0.3 release this should
be fixed.
The reason that the XML stream version affects hosts up is because the
test of liveness
changed between before 2.5, and after 2.5: The old method would be
sensitive to clock
variations between the server and monitored hosts.

possibility 2 - Do your servers have dual NICs that are not teamed? Or
dodgy reverse DNS entries?
try this little script to check a few things about your XML streams:
#!/usr/bin/perl

#   Poll Ganglia headnodes and check for duped hosts.
#   Richard Grevis,  Wed Mar 29 22:28:20 BST 2006

sub slurp {
my ($headnode, $port) = @_;
#print "<$headnode> <$port>\n";
open (FD, "nc $headnode $port |") or die "netcat";
while () {
($cluster) = / NAME="([^"]*)" / if (/^ 16;
if (defined $info{$host}) {
# Seen this host before - that's not
right.
print "$host dup in XML: <$info>
<$info{$host}>\n";
} else {
#print "$headnode $info\n";
$info{$host} = $info;
}
}
}
}

open (CONF, "/etc/gmetad.conf") or die "/etc/gmetad.conf";
while () {
s/#.*//;
chop;
if (($headnode, $port) = ($_ =~
m/data_source\s+"[^"]+"\s+\d+\s+([^:\s]+)(:\d+){0,1}/)) {
if ($port) {
$port =~ s/://;
} else {
$port = 8649;
}
slurp($headnode, $port);
}
}


- richard
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Stackpole, Chris
Sent: 18 April 2006 14:41
To: ganglia-general@lists.sourceforge.net
Subject: [Ganglia-general] Nodes Reported as Dead


"Good morning,
Afraid this is going to require a little back story. We were interested
in using Ganglia to monitor a few priority systems. Because we were
running Debian at the time we just used the 2.5.7-2 Ganglia that is in
Debian Sarge (stable) apt repository. The project grew to include many
other systems. When we had a hardware failure, we quickly moved the
Ganglia monitor to a Mandrake 10 box about a week ago. The project is
much larger now and I had all sorts of problems. I went through the
email list and fixed almost all of them, but one occurring theme was
that I really should update to a newer version as there are many more
updates, changes, and fixes. So I did. Probably not the best choice, but
whats done is done. While it is important to have running soon, I do
have time to rebuild if absolutely necessary.

The Current setup.
The system that is running that Ganglia monitor is a Mandrake 10.1 box:
Gmetad Web Frontend v2.5.7, Gmetad Web Backend v3.0.3. The nodes are
running all running the Gmond 3.0.0 ( a few of the computers are windows
and there are many flavors of Linux so I updated to this version across
the board in hopes of keeping things on their end all as close as
possible ).

The Problems.
1) Now at any given time at least half of the computers are reported as
dead, even though they are not. Doing a `telnet computer 8649` gives the
appropriate data. "Get Fresh Data" will usually change out which nodes
are dead and given a 30min cycle most will have switched.

2) Even though this has been running for many hours, some of the "alive"
nodes report inaccuracies. Like one node for example "Last heartbeat
received -209998 seconds ago" "Uptime -975 days, 16:27:49"
"Swap: Using 0.0 of -100Mb"
"Booted: January 1, 1970"
The inaccuracies change every so often and it will report correctly for
a while. Most of those I don't care about but I think it may be a
related problem.

3) The "dead" nodes are almost all spot on with their stats, and if you
go to the node view and click the "Get Fresh Data" the Load and CPU
Utilization do update in sync even though its reported as dead.


Maybe I missed the keywords, but I was not able to find anything quite
like this in the email archive. I would be very grateful if anyone has
any clues as to what maybe going on.

Thank you for your time,
Chris Stackpole


---
This SF.Net email is sponsored by xPML, a groundbreaking scripting
language that extends applications into web and mobile media. Attend the
live webcast and join the prime developer group breaking into this new
coding territory!
http://sel.as-us.falkag.net/sel?cmd=k&kid0944&bid$1720&dat1642
___
Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general



For more information about Barclays Capital, please
visit our web site

RE: [Ganglia-general] enlarge ganglia graphs

2006-04-18 Thread Richard.Grevis
Martin,
 
what you are looking at is a customisation that UC Berkely have done e.g.:
http://monitor.millennium.berkeley.edu/graph.php?g=load_report&z=huge&c=PSI%20Cluster&m=&r=week&s=descending&hc=4&st=1145366202
 
The graph size is set to huge, which is not standard in ganglia.
 
If you want to do this sort of thing, you will need to
1) in graph.php, about line 50, add (say)
else if ($size == "huge")
{
  $height = 750;
  $width = 1024;
}
 
2) Create the link from the medium image in the cluster view to the big one.
e.g. in /var/www/html/ganglia/templates/default/cluster_view.tpl, at line 20 
and other places, put this:




 
It could be argued that this sort of thing could be placed into the standard 
ganglia. Certainly
I have already done this sort of thing in my ganglia implementation.
 
kind regards,
Richard



-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL 
PROTECTED]
Sent: 18 April 2006 09:05
To: ganglia-general@lists.sourceforge.net; [EMAIL PROTECTED]
Subject: WG: [Ganglia-general] enlarge ganglia graphs


 
-Ursprüngliche Nachricht-
Von: Pelikan, Martin 
Gesendet: Dienstag, 18. April 2006 09:56
An: 'ganglia-general@lists.sourceforge.net'
Betreff: WG: [Ganglia-general] enlarge ganglia graphs



Hi Bernard, 
 
Sorry, the attachements were to tall ;-)
 
my main page has already 3 sections (the summary and one for each data 
source); I also can click on the graps of the sections and I can see the 
summary of every cluster (data source). I also can "click" to each node. 
 
I can't see the  large   view (example_what_i_do_not_see_*.jpg)
 
Regards,
Martin
 
 

-Ursprüngliche Nachricht-
Von: Bernard Li [mailto:[EMAIL PROTECTED] 
Gesendet: Dienstag, 18. April 2006 08:58
An: Pelikan, Martin; ganglia-general@lists.sourceforge.net
Betreff: RE: [Ganglia-general] enlarge ganglia graphs


Hi Martin:
 
If you have 2 data sources, then the main page should have 
three sections - one for each data source and a third one, the "summary" of 
both.  You should be able to click on either one of the graphs of the data 
sources (but not the summary).
 
Perhaps you can take a screenshot and tell us exaclty what you 
wanted.
 
Cheers,
 
Bernard



From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Mon 17/04/2006 23:23
To: Bernard Li; ganglia-general@lists.sourceforge.net
Subject: AW: [Ganglia-general] enlarge ganglia graphs



Hi Bernard,

I have more than one data_source; I'm monitoring 2 different 
clusters; on each one gmond is collecting the data,
One cluster uses the port 8050, the other uses the port 8054. 
The gmetad on a seperate node collects all the data
And puts it into the rrd database. How can I get in my case the 
large view?

Regards,
Martin


-Ursprüngliche Nachricht-
Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von 
Bernard Li
Gesendet: Donnerstag, 13. April 2006 21:28
An: Pelikan, Martin; ganglia-general@lists.sourceforge.net
Betreff: RE: [Ganglia-general] enlarge ganglia graphs


Hi Martin:

If you only have one data_source, then you are already at the 
expanded view.

Cheers,

Bernard

> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On
> Behalf Of [EMAIL PROTECTED]
> Sent: Thursday, April 13, 2006 1:49
> To: ganglia-general@lists.sourceforge.net
> Subject: [Ganglia-general] enlarge ganglia graphs
>
>
>
> -Ursprüngliche Nachricht-
> Von: Pelikan, Martin
> Gesendet: Mittwoch, 12. April 2006 10:51
> An: 'ganglia-general@lists.sourceforge.net'
> Betreff: enlarge ganglia graphs
>
>
> Hello,
>
> I have installed ganglia 3.0.2 and it works fine.
  

RE: [Ganglia-general] A script that checks clusters for down and duplicated hosts in clusters

2006-03-31 Thread Richard.Grevis
Eli and others,
 
just relized that the pattern for /etc/gmetad data_source spec is not
quite good enough.
($headnode, $port) = ($_ =~
m/data_source\s+"[^"]+"\s+\d+\s+([^:]+)(:\d+){0,1}/)
only works when a single headnode is mentioned. This is better, but
still only matches the first headnode:
(($headnode, $port) = ($_ =~
m/data_source\s+"[^"]+"\s+\d+\s+([^:\s]+)(:\d+){0,1}/))
 
code becomes:
#!/usr/bin/perl
 
#   Poll Ganglia headnodes and check for down hosts and duped hosts.
#   Richard Grevis,  Wed Mar 29 22:28:20 BST 2006
 
sub slurp {
my ($headnode, $port) = @_;
print "<$headnode> <$port>\n";
open (FD, "nc $headnode $port |") or die "netcat";
while () {
($cluster) = / NAME="([^"]*)" / if (/^
<$info{$host}>\n";
} else {
$info{$host} = $info;
}
if ($TN > 10*$TMAX) {
printf("$host has not reported for %.1f hours: $info\n",
$TN/(60*60));
}
}
}
}
 
open (CONF, "/etc/gmetad.conf") or die "/etc/gmetad.conf";
while () {
s/#.*//;
chop;
if (($headnode, $port) = ($_ =~
m/data_source\s+"[^"]+"\s+\d+\s+([^:\s]+)(:\d+){0,1}/)) {
if ($port) {
$port =~ s/://;
} else {
$port = 8649;
}
slurp($headnode, $port);
}
}


 

-Original Message-
From: Eli Stair [mailto:[EMAIL PROTECTED] 
Sent: 30 March 2006 18:25
To: Grevis, Richard: IT (LDN)
Cc: Ganglia-general@lists.sourceforge.net
Subject: RE: [Ganglia-general] A script that checks clusters for
down and duplicated hosts in clusters




Hey, thanks for the idea... I'm working this into a nightly
report (maybe hourly...) to check for recurrence of this issue.

Cheers!

/eli

-Original Message-
From: [EMAIL PROTECTED] on behalf of
[EMAIL PROTECTED]
Sent: Thu 3/30/2006 3:04 AM
Cc: Ganglia-general@lists.sourceforge.net
Subject: [Ganglia-general] A script that checks clusters for
down and duplicated hosts in clusters

Fresh off the presses - others may find it useful too. This
iterates
through your clusters
and finds dead hosts or duplicated host entries. Note that you
can't
find duplicated
host entries by netcatting gmetad port 8651. You must do it as
below:
You will need to compile or otherwise have netcat (nc), but this
is
handy regardless.
Enjoy.

#!/usr/bin/perl
#   Poll Ganglia headnodes and check for down hosts and duped
hosts.
#   Richard Grevis,  Thu Mar 30 11:28:20 BST 2006

sub slurp {
my ($headnode, $port) = @_;
#print "<$headnode> <$port>\n";
open (FD, "nc $headnode $port |") or die "netcat";
while () {
($cluster) = / NAME="([^"]*)" / if (/^
<$info{$host}>\n";
} else {
$info{$host} = $info;
}
if ($TN > 10*$TMAX) {
printf("$host has not reported for %.1f hours:
$info\n",
$TN/(60*60));
}
}
}
}

open (CONF, "/etc/gmetad.conf") or die "/etc/gmetad.conf";
while () {
s/#.*//;
chop;
if (($headnode, $port) = ($_ =~
m/data_source\s+"[^"]+"\s+\d+\s+([^:]+)(:\d+){0,1}/)) {
if ($port) {
$port =~ s/://;
} else {
$port = 8649;
}
slurp($headnode, $port);
}
}





For more information about Barclays Capital, please
visit our web site at http://www.barcap.com.


Internet communications are not secure and therefore the
Barclays
Group does not accept legal responsibility for the contents of
this
message.  Although the Barclays Group operates anti-virus
programmes,
it does not accept responsibility for any damage whatsoever that
is
caused by viruses being passed.  Any views or opinions presented
are
solely those of the author and do not necessarily represent
those of the
Barclays Group.  Replies to this email may be monitored by the
Barclays
Group for operational or business reasons.









[Ganglia-general] A script that checks clusters for down and duplicated hosts in clusters

2006-03-30 Thread Richard.Grevis
Fresh off the presses - others may find it useful too. This iterates
through your clusters
and finds dead hosts or duplicated host entries. Note that you can't
find duplicated
host entries by netcatting gmetad port 8651. You must do it as below:
You will need to compile or otherwise have netcat (nc), but this is
handy regardless.
Enjoy.
 
#!/usr/bin/perl
#   Poll Ganglia headnodes and check for down hosts and duped hosts.
#   Richard Grevis,  Thu Mar 30 11:28:20 BST 2006
 
sub slurp {
my ($headnode, $port) = @_;
#print "<$headnode> <$port>\n";
open (FD, "nc $headnode $port |") or die "netcat";
while () {
($cluster) = / NAME="([^"]*)" / if (/^
<$info{$host}>\n";
} else {
$info{$host} = $info;
}
if ($TN > 10*$TMAX) {
printf("$host has not reported for %.1f hours: $info\n",
$TN/(60*60));
}
}
}
}
 
open (CONF, "/etc/gmetad.conf") or die "/etc/gmetad.conf";
while () {
s/#.*//;
chop;
if (($headnode, $port) = ($_ =~
m/data_source\s+"[^"]+"\s+\d+\s+([^:]+)(:\d+){0,1}/)) {
if ($port) {
$port =~ s/://;
} else {
$port = 8649;
}
slurp($headnode, $port);
}
}




For more information about Barclays Capital, please
visit our web site at http://www.barcap.com.


Internet communications are not secure and therefore the Barclays 
Group does not accept legal responsibility for the contents of this 
message.  Although the Barclays Group operates anti-virus programmes, 
it does not accept responsibility for any damage whatsoever that is 
caused by viruses being passed.  Any views or opinions presented are 
solely those of the author and do not necessarily represent those of the 
Barclays Group.  Replies to this email may be monitored by the Barclays 
Group for operational or business reasons.





RE: [Ganglia-general] Re: gmetad not updating RRD's/hosts that are proper in gmond XML

2006-03-30 Thread Richard.Grevis
Eli,

Martin is most surely right. If you are running an unpatched 3.0.2,
let me share with you the many ways it can all go wrong.

gmond generates the hostnames found in the XML stream by reverse DNS
lookup only. Its internal structures treat every different IP address
it sees as a different host, regardless of what the reverse DNS entry
is.

So, if you have
1) Incorrect reverse DNS entries such that 2 different hosts reverse map
  to the same hostname,
2) Or 2 NICs on a host that are not teamed (i.e. 2 different addresses)
and
  the routing allows packets to exit either NIC, hence either source
address
  may be used.
3) Or a DHCP lease renewal that results in a host changing IP addresses.

Then what will happen is that the XML stream from the cluster will
contain
2 (or more) entries with different IP addrs, but the same name. Even in
the DHCP
case when only 1 source address is used at a time, gmond will keep the
old IP address
entry until a timeout, even though it is not being updated. So dups
arise again.

Now unfortunately, gmetad only uses the HOSTNAME for the RRD files and
its own
processing. So if there is a duplicated hostname in the XML stream, it
will update
the RRDs after parsing the first entry, and then again after parsing the
second.
As these 2 updates to the same RRD files will occur in less than one
second, this
results in an RRD update error.

On unpatched 3.0.2, this then causes THE ENTIRE PROCESSING OF THE
CLUSTER TO BE ABANDONED.
So some hosts get updated, some not, and the cluster view does not get
updated.
If you patch this particular issue, you will still get double processing
for duped
hosts, which can result in them erroneouly being reported as down (for
example).

phew.
long mail.

- richard

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Martin
Knoblauch
Sent: 30 March 2006 08:05
To: Eli Stair
Cc: Ganglia-general@lists.sourceforge.net
Subject: Re: [Ganglia-general] Re: gmetad not updating RRD's/hosts that
are proper in gmond XML


Eli,

 yup. That could definitely cause problems. Do you see anything in the
/var/log/messages of the gmetad host?

 Hmm. You may have to restart *all* gmonds, as well as the gmetad. This
is something that I usually do when my ganglia setup was hosed somehow.
Definitely the case for multicast clusters. Not really sure about
unicast.

 And yes - this is not optimal.

--- Eli Stair <[EMAIL PROTECTED]> wrote:

> 
> The only issue I can find at all with this config is that the new 
> hosts have been deployed by someone with two PTR records, both the 
> proper one
> pointing to the A hostname, as well as all having an improper PTR -> 
> linux."FQDN".
> 
> Is there a potential that gmetad is doing a lookup of both the forward
> and reverse entries for a host before populating it?  Unfortunately 
> removing the invalid entry for a host and restarting gmetad as well
> as 
> the gmond aggregator and the host did not resolve it.
> 
> /eli
> 
> Eli Stair wrote:
> > 
> > My installation started having an issue yesterday afternoon that I
> have
> > yet to explain or remedy.  One cluster that I have unicasting, has
> > started "losing" hosts... the directory entries on disk never get 
> > created for newly deployed hosts, and gmond reports receiving
> messages
> > for the host (and outputs metrics) but gmetad does not report an
> > "updating host" message, and never creates the RRD's even though
> the
> > host is up.
> > 
> > The critical problem is that the report graphs for this cluster
> have
> > stopped being updated as well, which nix'es my ability to view
> cluster
> > load/job level... in addition to not being able to alert on the RRD
> 
> > values for the individual hosts that are malfunctioning.  Those
> hosts
> > that are "good" continue to update their metric RRD's properly,
> their
> > host reports are populated etc.  The bad ones I cannot explain...
> > 
> > The two questions, if anyone has insight:
> > 
> > 1) What is causing gmetad to stop acting on the gmond XML input
> that it
> > has available?  I don't see any error or threshhold it's hitting
> WRT the
> > hosts, they just don't create/update the RRD
> > 
> > 2) Why does the report stop being populated (the graph is still
> > generated with past data, but not updated with new... not even the
> data
> > from hosts that ARE functioning individually.
> > 
> > I'm continuing on with this, will update with anything else I find
> awry.
> >  Any suggestions on what to pursue beyond this are welcome... at
> this
> > point it looks to me a problem with the magic in gmetad's parsing
> of the
> > gmond output, since it is present and up-to-date but not acting on
> it.
> > 
> > Cheers,
> > 
> > /eli
> > 
> > 
> > Here are the details:
> > 
> > server:
> > ganglia 3.0.2 (x86_64)
> > 6 (six) multicast clusters polled by gmetad
> > 1 (one) unicast cluster, reporting to a 'mute' gmond aggregating on
> the
> > same host as gmetad.
> > 
> > clients:
> > suse9.3 x86_64
> > g

RE: [Ganglia-general] gmond unreliable on one cluster, must be constantly restarted

2006-03-30 Thread Richard.Grevis
There are a few simple and obvious steps.

BTW, it is good that TN is greater than TMAX in some sense, because this
means
that gmetad and all the php stuff is not saying anything that is wrong
wrt to
the XML stream.

So have you done a simple tcpdump of UDP port 8649 on the headnode? Do
the UDP packets arrive?
I am also trying to remember whether you unicast or multicast. At least
with
unicast I pretty well know that anything of the right address arriving
on the NIC
will be seen by gmond. Consider unicast for debugging, although perhaps
I am being
unduly conservative. Also did you mention whether your headnode has
multiple NICs?
Are they teamed?

- Richard

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Steven
A. DuChene
Sent: 30 March 2006 03:18
To: ganglia-general@lists.sourceforge.net
Subject: [Ganglia-general] gmond unreliable on one cluster, must be
constantly restarted


I have been struggling with a gmond process on one cluster here that
after some indeterminate period of time marks everything else in the
cluster as down so instead of having a clustersize (as indicqated from
the ganglia python command line client) of 135, the clustersize is 1. I
have on the advice of Richard looked at the TN verses TMAX values in the
XML that I get from this host when I telnet to it's port 8649 and the TN
values are much bigger than the TMAX values. Richard said this indicated
a problem but I am not sure where to go next with trying to diagnose
this issue. I have ganglia running on other clusters just fine but this
one cluster running 3.0.2 on RHEL4u2 seems to be having an issue.

Any suggestions as to what I can do to continue diagnosing the issue?

-Original Message-
>From: "Steven A. DuChene" <[EMAIL PROTECTED]>
>Sent: Mar 28, 2006 5:48 PM
>To: [EMAIL PROTECTED], 
>ganglia-general@lists.sourceforge.net
>Subject: RE: [Ganglia-general] gmond stops recognizing the rest of the
cluster
>
>Yes, I have confirmed that when this condition occurs the TN figures 
>are MUCH greater than the TMAX figures. I have double checked the 
>routes and stuff is still there (i.e. netstat returns:
>
>Destination Gateway Genmask Flags   MSS Window
irtt Iface
>239.2.11.71 0.0.0.0 255.255.255.255 UH0 0
0 eth0
>
>Where the host route for 239.2.11.71 is indeed still associated with 
>the interface on the internal cluster network.
>
>Any suggestions then?
>
>-Original Message-
>>From: [EMAIL PROTECTED]
>>Sent: Mar 22, 2006 4:00 AM
>>To: [EMAIL PROTECTED], 
>>ganglia-general@lists.sourceforge.net
>>Subject: RE: [Ganglia-general] gmond stops recognizing the rest of the
cluster
>>
>>Steven,
>>
>>if the problem is routing or actual packet loss, then that should be 
>>reflected by the XML output of the master gmond - the "down" host will

>>have a TN
>>(much) greater
>>than the TMAX. e.g.:
>>
>REPORTED="1143022788" TN="145" TMAX="20" DMAX="0" 
>>LOCATION="unspecified" GMOND_STARTED="1142870107">
>>
>>There is also a very small chance that what you are seeing is related 
>>to the "Possible bug in hosts up calculation" thread. This bug causes 
>>an erroneous tagging of a data source as "old", which then changes the

>>host up calculation to be
>>one based on the wall clock of the gmetad server. Unless all the
clocks
>>are right,
>>the host_up calculation is wrong.
>>
>>
>>You may try this patch: 
>>http://sourceforge.net/mailarchive/message.php?msg_id=15170774
>>
>>hope springs eternal, anyway. Myself, I only encountered the problem 
>>fixed here when I was federating clusters.
>>
>>There is also a host_up calculation in the PHP web stuff, ganglia.php,

>>function host_alive. You could put debugging in there as well.
>>
>>kind regards,
>>Richard
>>
>>-Original Message-
>>From: [EMAIL PROTECTED]
>>[mailto:[EMAIL PROTECTED] On Behalf Of 
>>Steven A. DuChene
>>Sent: 21 March 2006 23:32
>>To: ganglia-general@lists.sourceforge.net
>>Subject: [Ganglia-general] gmond stops recognizing the rest of the 
>>cluster
>>
>>
>>I have a couple of mixed clusters here with AMD64/Opteron compute 
>>nodes and Intel EM64T Xeon managment nodes and I am running 
>>ganglia-gmond-3.0.2
>>
>>Periodically (sometimes a couple or more times a day) I check the 
>>stats for the clusters and the cluster running RedHatEL-4.0 has a 
>>problem with the master gmond process (the one running on the 
>>management server with interfaces on the internal cluster network and 
>>the external lan here). It still responds to a query (using the python

>>ganglia client or through the standard front end web
>>page) but it stops seeing the client nodes and marks them off-line. It

>>will indicate that only one host (itself) is actually up. I have to 
>>constantly be watching the outputs to see if this has happened and 
>>when it does do a:
>>
>> /etc/init.d/gmond restart
>>
>>That clears it up until next time.
>>
>>Any idea what could be causing this? I have been using ganglia to 
>>moni

RE: [Ganglia-general] gmetad not updating RRD's/hosts that are proper in gmond XML

2006-03-30 Thread Richard.Grevis
This is the classic behaviour that comes from a "trunucated" XML stream.
There is now a full patch for this in the CVS repository. But
if you suffer from this, you should get a /var/log/messages entry like:

Mar 30 09:56:37 ldndsr0163 /apps/ganglia/sbin/gmetad[15336]: Process XML
(LDN FIP QA Scenarios PDN): XML_ParseBuffer() error at line 374: no
element found

do you?

This error causes the entire processing for the current opll to be
abandoned -
the remaining hosts are not updated, nor the cluster summary.

- Richard

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Eli
Stair
Sent: 30 March 2006 02:49
To: Ganglia-general@lists.sourceforge.net; [EMAIL PROTECTED]
Subject: [Ganglia-general] gmetad not updating RRD's/hosts that are
proper in gmond XML



My installation started having an issue yesterday afternoon that I have 
yet to explain or remedy.  One cluster that I have unicasting, has 
started "losing" hosts... the directory entries on disk never get 
created for newly deployed hosts, and gmond reports receiving messages 
for the host (and outputs metrics) but gmetad does not report an 
"updating host" message, and never creates the RRD's even though the 
host is up.

The critical problem is that the report graphs for this cluster have 
stopped being updated as well, which nix'es my ability to view cluster 
load/job level... in addition to not being able to alert on the RRD 
values for the individual hosts that are malfunctioning.  Those hosts 
that are "good" continue to update their metric RRD's properly, their 
host reports are populated etc.  The bad ones I cannot explain...

The two questions, if anyone has insight:

1) What is causing gmetad to stop acting on the gmond XML input that it 
has available?  I don't see any error or threshhold it's hitting WRT the

hosts, they just don't create/update the RRD

2) Why does the report stop being populated (the graph is still 
generated with past data, but not updated with new... not even the data 
from hosts that ARE functioning individually.

I'm continuing on with this, will update with anything else I find awry.

  Any suggestions on what to pursue beyond this are welcome... at this 
point it looks to me a problem with the magic in gmetad's parsing of the

gmond output, since it is present and up-to-date but not acting on it.

Cheers,

/eli


Here are the details:

server:
ganglia 3.0.2 (x86_64)
6 (six) multicast clusters polled by gmetad
1 (one) unicast cluster, reporting to a 'mute' gmond aggregating on the 
same host as gmetad.

clients:
suse9.3 x86_64
ganglia 3.0.2 (x86_64)


Debug logged info (-d2):

Bad host:

   Apache error_log for bad host:
 ERROR: opening 
'/var/lib/ganglia/rrds/Opteron_Production-Desktop_Droid_Cluster/frankens
tein.lucasfilm.com/swap_free.rrd': 
No such file or directory

   gmond:
 Processing a Ganglia_message from badhost
   gmetad:
 server_thread() received request 
"/Opteron_Production-Desktop_Droid_Cluster/badhost" from 127.0.0.1

   XML:


 



  













  




Good host:

   gmond:
 Processing a Ganglia_message from goodhost
   gmetad:
 Updating host goodhost, metric numjobs
 server_thread() received request 
"/Opteron_Production-Desktop_Droid_Cluster/goodhost" from 127.0.0.1
   XML:


 



  













  






---
This SF.Net email is sponsored by xPML, a groundbreaking scripting
language that extends applications into web and mobile media. Attend the
live webcast and join the prime developer group breaking into this new
coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
___
Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general



For more information about Barclays Capital, please
visit our web site at http://www.barcap.com.


Internet communications are not secure and therefore the Barclays 
Group does not accept legal responsibility for the contents of this 
message.  Although the Barclays Group operates anti-virus programmes, 
it does not accept responsibility for any damage whatsoever that is 
caused by viruses being passed.  Any views or opinions presented are 
solely those of the author and do not necessarily represent those of the 
Barclays Group.  Replies to this email may be monitored by the Barclays 
Group for operational or business reasons.






RE: [Ganglia-general] gmond stops recognizing the rest of the cluster

2006-03-22 Thread Richard.Grevis

Steven,

if the problem is routing or actual packet loss, then that should be
reflected by the XML output of the master gmond - the "down" host will
have a TN (much) greater than the TMAX. e.g.:



There is also a very small chance that what you are seeing is related to
the "Possible bug in hosts up calculation" thread. This bug causes an
erroneous tagging of a data source as "old", which then changes the host
up calculation to be one based on the wall clock of the gmetad server.
Unless all the clocks are right, the host_up calculation is wrong.


You may try this patch:
http://sourceforge.net/mailarchive/message.php?msg_id=15170774

hope springs eternal, anyway. Myself, I only encountered the problem
fixed here when I was federating clusters.

There is also a host_up calculation in the PHP web stuff, ganglia.php,
function host_alive. You could put debugging in there as well. 

kind regards,
Richard

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Steven
A. DuChene
Sent: 21 March 2006 23:32
To: ganglia-general@lists.sourceforge.net
Subject: [Ganglia-general] gmond stops recognizing the rest of the
cluster


I have a couple of mixed clusters here with AMD64/Opteron compute nodes
and Intel EM64T Xeon managment nodes and I am running
ganglia-gmond-3.0.2

Periodically (sometimes a couple or more times a day) I check the stats
for the clusters and the cluster running RedHatEL-4.0 has a problem with
the master gmond process (the one running on the management server with
interfaces on the internal cluster network and the external lan here).
It still responds to a query (using the python ganglia client or through
the standard front end web
page) but it stops seeing the client nodes and marks them off-line. It
will indicate that only one host (itself) is actually up. I have to
constantly be watching the outputs to see if this has happened and when
it does do a:

 /etc/init.d/gmond restart

That clears it up until next time.

Any idea what could be causing this? I have been using ganglia to
monitor clusters for quite some time but this is the first time i have
seen the gmond process needing to be restarted to regain connection to
the data stream running around inside the cluster.

BTW, I have added a line to the /etc/init.d/gmond script to add a host
route on the system with the dual network interfaces to point
239.2.11.71 to the network interface that faces to the internal network
of the cluster.

I do not seem to have this issue with the cluster that has RedHatEL3
installed (same hardware thought). It is a smaller cluster (64 nodes
verses the 128 cluster) though.
--
Steven A. DuChene




---
This SF.Net email is sponsored by xPML, a groundbreaking scripting
language that extends applications into web and mobile media. Attend the
live webcast and join the prime developer group breaking into this new
coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
___
Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general



For more information about Barclays Capital, please
visit our web site at http://www.barcap.com.


Internet communications are not secure and therefore the Barclays 
Group does not accept legal responsibility for the contents of this 
message.  Although the Barclays Group operates anti-virus programmes, 
it does not accept responsibility for any damage whatsoever that is 
caused by viruses being passed.  Any views or opinions presented are 
solely those of the author and do not necessarily represent those of the 
Barclays Group.  Replies to this email may be monitored by the Barclays 
Group for operational or business reasons.






[Ganglia-general] Early termination of XML stream from a windows based agent.

2006-02-10 Thread Richard.Grevis
All,

I just know that no-one else is doing this, but

I updated the windows gmond with a current cygwin install
and fixed the processor count metric. That is all I did.
Simple recompile, slightly newer cygwin1.dll

However when I used this agent, when gmetad did the tcp poll,
instead of the 100k of data for my farm coming back, only
8k of data was returned - about 100 lines, about 8200 bytes
The precise amount returned varied "a little bit".
And yes, 8200 is quite close to 8192.

When I snooped the traffic I found that the windows gmond
was shutting the connection by sending a FIN-ACK.
Not a FIN as one would expect on normal termination.

Has anyone had any experience with gmond not returning
all the XML stream, and just closing the link? Turning
debugging on does not reveal any problems or error messages.

I must also say that the original 3.0.0 gmond.exe binary
does not seem to have this property. It is almost as if
there is a new cygwin bug lurking.

Any ideas anyone - I have a feeling I am alone on this one...

kind regards,
richard grevis



For more information about Barclays Capital, please
visit our web site at http://www.barcap.com.


Internet communications are not secure and therefore the Barclays 
Group does not accept legal responsibility for the contents of this 
message.  Although the Barclays Group operates anti-virus programmes, 
it does not accept responsibility for any damage whatsoever that is 
caused by viruses being passed.  Any views or opinions presented are 
solely those of the author and do not necessarily represent those of the 
Barclays Group.  Replies to this email may be monitored by the Barclays 
Group for operational or business reasons.






RE: [Ganglia-general] config file confusion

2006-02-09 Thread Richard.Grevis
Exactly.

~Richard

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jason
A. Smith
Sent: 09 February 2006 18:04
To: [EMAIL PROTECTED]
Cc: Ganglia General
Subject: Re: [Ganglia-general] config file confusion


Just a guessyou probably have all 500 nodes using the same multicast
address & port so that each node will know about all the others.  The
cluster name only comes from the node that gmetad is currently polling.
Try using a different multicast address for each cluster.

~Jason


On Thu, 2006-02-09 at 09:49 -0800, [EMAIL PROTECTED] wrote:
> Ganglia developers,
> 
> I am trying to implement Ganglia on a 500+ node cluster.  My goal is 
> to use the grid functionality to have the data split up into several 
> groupings (by architecture -- it is a non-homogeneous cluster).  The 
> issue I am having is that every one of my nodes is showing up for all 
> "data_source"  definitions.  In other words, the metrics from all ~500

> nodes shows up under each data_source heading even though I have 
> defined distinct "cluster names" in the /etc/gmond.conf files.
> 
> My config files are provided below for reference.
> 
> Thanks for any advice you can provide,
> Mike
> 
> GANGLIA FRONT-END CONFIGS (web front-end / gmetad / gmond) 
> /etc/gmetad.conf # data_source "my cluster" 10 localhost  
> my.machine.edu:8649  1.2.3.5:8655 # data_source "my grid" 50 
> 1.3.4.7:8655 grid.org:8651 grid-backup.org:8651 # data_source "another

> source" 1.3.4.7:8655  1.3.4.8 data_source "Servers" 60 localhost
> data_source "Opterons" 60 hsd380 hsd400
> data_source "Athlons" 60 hsd250 hsd258
> #
> 
> /etc/gmond.conf
> }
> 
> /* If a cluster attribute is specified, then all gmond hosts are 
> wrapped inside
>  * of a  tag.  If you do not specify a cluster tag, then all
 will
>  * NOT be wrapped inside of a  tag. */
> cluster {
>   name = "Servers"
> }
> 
> 
> GANGLIA NODE CONFIGS
> /etc/gmond.conf  (on Opteron nodes)
> }
> 
> /* If a cluster attribute is specified, then all gmond hosts are 
> wrapped inside
>  * of a  tag.  If you do not specify a cluster tag, then all
 will
>  * NOT be wrapped inside of a  tag. */
> cluster {
>   name = "Opterons"
> }
> 
> 
> /etc/gmond.conf  (on Athlon nodes)
> }
> 
> /* If a cluster attribute is specified, then all gmond hosts are 
> wrapped inside
>  * of a  tag.  If you do not specify a cluster tag, then all
 will
>  * NOT be wrapped inside of a  tag. */
> cluster {
>   name = "Athlons"
> }
> 
> 
> 
> ---
> This SF.net email is sponsored by: Splunk Inc. Do you grep through log

> files for problems?  Stop!  Download the new AJAX search engine that 
> makes searching your log files as easy as surfing the  web.  DOWNLOAD 
> SPLUNK! 
>
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
> ___
> Ganglia-general mailing list
> Ganglia-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
> 
-- 
/--\
|  Jason A. Smith  Email:  [EMAIL PROTECTED] |
|  Atlas Computing Facility, Bldg. 510MPhone:  (631)344-4226   |
|  Brookhaven National Lab, P.O. Box 5000  Fax:(631)344-7616   |
|  Upton, NY 11973-5000|
\--/




---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log
files for problems?  Stop!  Download the new AJAX search engine that
makes searching your log files as easy as surfing the  web.  DOWNLOAD
SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
___
Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general



For more information about Barclays Capital, please
visit our web site at http://www.barcap.com.


Internet communications are not secure and therefore the Barclays 
Group does not accept legal responsibility for the contents of this 
message.  Although the Barclays Group operates anti-virus programmes, 
it does not accept responsibility for any damage whatsoever that is 
caused by viruses being passed.  Any views or opinions presented are 
solely those of the author and do not necessarily represent those of the 
Barclays Group.  Replies to this email may be monitored by the Barclays 
Group for operational or business reasons.






RE: [Ganglia-general] gmetrics in cluster/grid view

2006-01-31 Thread Richard.Grevis
The php, being what it is, kind of encourages everyone
to do their own thing. The problem is which changes
are appropriate for the whole community, and which are appropriate
to only a few.

The second problem is an engineering one. Hacks are easy - it is usually
what you do first. Generating a properly documented patch for the
community
is quite another matter.

Take me, for our pilot, there are a few things I have done that I quite
like:

- Added a kind of automatic custom metric display capability at the
cluster view. Basically
in the cluster view, you get the normal load/network reports, but if
there is a metric in
there that is not a standard one, it goes ahead and displays it along
with the standard ones
(above the little graphs for the individual hosts). This is slightly
less ugly than the array
thing, but it remains true you have no control. Selection capability
would clearly be better.

- Added a RRD MAX consolidation function as well as the AVERAGE. The PNG
may not
survive the mailing list, but having max and average both displayed is
very effective
for not getting the wrong idea about loads over longer time frames.

- Removed the sizing parameters from the mailto:[EMAIL PROTECTED] On Behalf Of Martin
Knoblauch
Sent: 31 January 2006 08:38
To: Alex Balk; ganglia-general@lists.sourceforge.net
Subject: Re: [Ganglia-general] gmetrics in cluster/grid view


Hi Alex,

 what exactely do you have in mind? From your description I am not
really sure.

Martin

--- Alex Balk <[EMAIL PROTECTED]> wrote:

> Hi all,
> 
> 
> I'm interested in viewing data gathered from gmetrics in a 
> cluster/grid view, sort of like the aggregation graphs for 
> load/memory. This would require some changes to the web frontend code 
> and I'd like to know, has
> anyone here made such changes?
> 
> 
> I've written a quick & dirty hack that provides this functionality 
> (only cluster-view at the moment) but it requires entering arguments 
> to rrdtool into some array in conf.php. Needless to say that's way too
> ugly
> to go in a production environment.
> 
> 
> My plan is to write a separate interface in which one could choose the
> desired gmetrics, color, graph style and time interval, and the
> graphs
> would be generated accordingly. This interface will be linked from
> each
> cluster/grid view and would display the desired graphs for that view.
> 
> 
> Your thoughts (and patches..) will be appreciated!
> 
> 
> Cheers,
> 
> Alex
> 
> 
> 
> ---
> This SF.net email is sponsored by: Splunk Inc. Do you grep through log

> files for problems?  Stop!  Download the new AJAX search engine that 
> makes searching your log files as easy as surfing the  web.  DOWNLOAD
> SPLUNK!
>
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
> ___
> Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
> 
> 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de


---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log
files for problems?  Stop!  Download the new AJAX search engine that
makes searching your log files as easy as surfing the  web.  DOWNLOAD
SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
___
Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general



For more information about Barclays Capital, please
visit our web site at http://www.barcap.com.


Internet communications are not secure and therefore the Barclays 
Group does not accept legal responsibility for the contents of this 
message.  Although the Barclays Group operates anti-virus programmes, 
it does not accept responsibility for any damage whatsoever that is 
caused by viruses being passed.  Any views or opinions presented are 
solely those of the author and do not necessarily represent those of the 
Barclays Group.  Replies to this email may be monitored by the Barclays 
Group for operational or business reasons.



<>

[Ganglia-general] Disk I/O in the Linux gmond

2006-01-30 Thread Richard.Grevis
Has anyone extended the Linux gmond to include disk I/O
or disk latency stats?

kind regards,
richard grevis



For more information about Barclays Capital, please
visit our web site at http://www.barcap.com.


Internet communications are not secure and therefore the Barclays 
Group does not accept legal responsibility for the contents of this 
message.  Although the Barclays Group operates anti-virus programmes, 
it does not accept responsibility for any damage whatsoever that is 
caused by viruses being passed.  Any views or opinions presented are 
solely those of the author and do not necessarily represent those of the 
Barclays Group.  Replies to this email may be monitored by the Barclays 
Group for operational or business reasons.






RE: [Ganglia-general] Pointers on architecting a largescale ganglia setup??

2006-01-30 Thread Richard.Grevis
Multicast.

If you unicast to 2 separate nodes for resiliency, then
you are actually sending double the UDP traffic to a multicast solution.

So unicast does not save on UDP network traffic - the opposite actually
in the resiliency
case. What unicast does do of course is reduce the load and memory
footprint
of most of the individual gmond processes, as they are not taking care
of the
state of the whole cluster.

Dear Gurus, I hope this paper analysis is correct.

regards,
richard



For more information about Barclays Capital, please
visit our web site at http://www.barcap.com.


Internet communications are not secure and therefore the Barclays 
Group does not accept legal responsibility for the contents of this 
message.  Although the Barclays Group operates anti-virus programmes, 
it does not accept responsibility for any damage whatsoever that is 
caused by viruses being passed.  Any views or opinions presented are 
solely those of the author and do not necessarily represent those of the 
Barclays Group.  Replies to this email may be monitored by the Barclays 
Group for operational or business reasons.






RE: [Ganglia-general] Pointers on architecting a large scale ganglia setup??

2006-01-27 Thread Richard.Grevis
My experience so far:

RRD files on ramdisk is a good idea. RRD is very basic with its I/O, it
writes as soon
as it gets a data point (and reads as well). In my case, a simple blade
engnieering server
with simple local disk was really being hammered with 100 nodes, except
at a 5 second poll,
with 5 second RRD files, and RRDs for max as well as average
consolidation. In our pilot
we are moving to SAN.

BTW, do the sums on your ram disk space. 1000 nodes is a lot of rrd
files!

I/O and network load will reduce in direct proportion to the metric
refresh/timeout/threshold rates
and to the gmetad polling rate. So reducing the poll to 60 seconds will
have a direct benefit.
But if you reduce to 60 seconds, remember to change the RRD file
definitions appropriately as well.
This may be telling you to suck eggs, but a gmetad poll will only ever
give you the most recent
cluster state. If the cluster updates faster than you poll, the extra
data is just lost.

direct connection using gmetad and tcp connects to each and every node
(all 1000) is bound
to be a bad, nay impossible idea. Ask the other experts, but I cant see
it working at all.
There "may" be some use in a middle path. e.g. groups or 200 nodes
unicasting to a designated head
node and then configuring gmetad to go to (in this case) 5 head nodes.

Another idea I have not yet explored would be to unicast all your
metrics back to the gmond running
on your gmetric ganglia server. The advantage of this is that the TCP
conect to get the cluster
state would occur on the loopback interface of the local machine. Faster
than an actual network
transfer of the XML.

phew. My 2 cents worth.

kind regards,
richard

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Joel
Krauska
Sent: 27 January 2006 09:41
To: ganglia-general@lists.sourceforge.net
Subject: [Ganglia-general] Pointers on architecting a large scale
ganglia setup??


I've seen similar scaling questions asked, but not a lot of answers.

I hope this query falls on some ears with experience in the brains 
behind them.

I'm looking to deploy ganglia on a largish cluster.
(1000+ nodes)

Here were some of my thoughts on how I could help scale the system.

Any opinions or suggestions are greatly appreciated.

- Put gmetad rrd files on a ramdisk.
This should decrease the frequency of disk writes
during normal runs.
If I rsync to a local disk every hour or so, I can get away with limited
disk writes, and still have a reasonable backup of data.


- Use TCP polling queries instead of UDP or Multicast push. (disable
UDP/multicast pushing) I'd prefer to let gmetad poll instead of having
1000 UDP messages flying 
around on odd intervals.  A good practice?


- Alter timers for lighter network load?
examples? ideas?
Was going to just go to 30 or 60s timers in gmetad.conf cluster 
definition to start.


- Consider "federating"?
Create groups of 100 gmond hosts managed by single gmetas, all linking
up to a core gmetad.


Thanks much,

Joel Krauska


---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log
files for problems?  Stop!  Download the new AJAX search engine that
makes searching your log files as easy as surfing the  web.  DOWNLOAD
SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
___
Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general



For more information about Barclays Capital, please
visit our web site at http://www.barcap.com.


Internet communications are not secure and therefore the Barclays 
Group does not accept legal responsibility for the contents of this 
message.  Although the Barclays Group operates anti-virus programmes, 
it does not accept responsibility for any damage whatsoever that is 
caused by viruses being passed.  Any views or opinions presented are 
solely those of the author and do not necessarily represent those of the 
Barclays Group.  Replies to this email may be monitored by the Barclays 
Group for operational or business reasons.






RE: [Ganglia-general] intermittent blanks in graphs

2006-01-25 Thread Richard.Grevis
Call me old fashioned, but:

who | wc -l | awk '{print $1}'

strikes me as safer

regard,
richard


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Martin
Knoblauch
Sent: 25 January 2006 09:25
To: Ben Hartshorne; ganglia-general@lists.sourceforge.net
Subject: Re: [Ganglia-general] intermittent blanks in graphs


> error message:
> 
> Jan 24 17:24:18 localhost /usr/sbin/gmetad[30443]: RRD_update
> (/var/lib/ganglia/rrds/production/raiden-8-db1/users.rrd): conversion 
> of 'min,' to float not complete: tail 'min,'
> 
> This seems to relate to a recent change I made that I had forgotten 
> about.  :)  I added the following line to my crontab:
> 
> */2 * * * * /usr/bin/gmetric --name="users" --value=`w | head -1 | awk

> '{print $6}'` --type=int16
> 
> The purpose of this line is to create a graph representing the number 
> of logged in users to the host.  it seems right to me - do any of you
> see a
> problem with this line?
> 

 actually, on my system (FC4) your command results in:

$ w | head -1 | awk '{print $6}'
users,
$

 which is not really what you want to put into that metric :-)
Apparently yours seem to report "min," which would be "$4" on my
system. The number of users would be "$5". Maybe different versions of
"procps"?

 Hmm. Weird. Just played around with the setting of "LANG" and not the
command reports "load" instead of "users,". Really weird .

Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de


---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log
files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general



For more information about Barclays Capital, please
visit our web site at http://www.barcap.com.


Internet communications are not secure and therefore the Barclays 
Group does not accept legal responsibility for the contents of this 
message.  Although the Barclays Group operates anti-virus programmes, 
it does not accept responsibility for any damage whatsoever that is 
caused by viruses being passed.  Any views or opinions presented are 
solely those of the author and do not necessarily represent those of the 
Barclays Group.  Replies to this email may be monitored by the Barclays 
Group for operational or business reasons.






RE: [Ganglia-general] intermittent blanks in graphs

2006-01-25 Thread Richard.Grevis
There is another way this failure can occur, although it is unlikely
(it happened to me though).

gmond appears to do a reverse IP lookup of the udp packets'
source address to generate the hostname in the XML. We had an error
in the reverse DNS, and 2 separate hosts in the cluster ended up having
the same hostname. As soon as the duplicate hostname was encountered
(even though the IP differed)
gmetad tried to update the rrd with data from the same second, causing
the
failure already described.

So also check your XML for duplicate hostnames.

I fixed my DNS of course, but frankly I also just patched
"gmetad/rrd_helpers.c"
function RRD_update to never return an error. Crude, wrong, but it was a
quick way
to stop gmetad bombing on the rest of the data.

regards,
richard


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Ben
Hartshorne
Sent: 25 January 2006 03:08
To: ganglia-general@lists.sourceforge.net
Subject: Re: [Ganglia-general] intermittent blanks in graphs



Everyone,

thanks very much for your suggestions.  I've replied to each below.


On Tue, Jan 24, 2006 at 04:16:08AM -0800, Martin Knoblauch wrote:
>  just a thought - are your cluster nodes time-synched? Are they 
> [still] in-synch?

to within a second or so.  I also have several gmetrics that are running
at a 2-min interval, and they exhibit the same behavior.  I would be
suprised to see them reporting the same second, 2 minutes apart...

On Tue, Jan 24, 2006 at 07:45:31AM -0500, Woods, Jeff wrote:
> We had a similar problem a few weeks ago, except that our gmetad never

> seemed to recover.  It was crashing, and had to be restarted manually 
> almost daily.  I enabled the debug output to syslog, but received no 
> indication of what was failing -- it just quit!

restarting the server doesn't seem to have any effect.  :(

> At the time, we were in the process of consolidating our gmetad's to a

> single server (we have three clusters being monitored, and each had 
> its own gmetad and web interface).  Following the migration to the new

> server, the problem went away so we never followed up.

I intend to migrate to a new server soon as well... Of course, that's
one of those projects that's going to happen Real Soon Now(tm).   I'm
worried though, because I realized today that a second instance of
ganglia I've got running on a completely separate network is also
showing these symptoms.  Different hardware, different network,
different switches, different load, same OS (mostly.  Fedora core 3/4).

> The gmetad we had problems with worked reliably for nearly a year 
> before having the problems.  Once the problem started, it occurred 
> reliably (nearly every night).  I could reenable the interface if it 
> might help to resolve a bigger problem.

Thanks for the offer, but I'll do some more poking before putting you to
that trouble.  It's just such a wierd problem...


On Tue, Jan 24, 2006 at 04:46:50PM -0500, Rick Mohr wrote:
> Also, you could use rrdtool to generate the exact same graph that is 
> shown
> on the web page for one of these metrice and dump it straight into a
file.  
> Then you could compare that with the image seen on the web page (to
check 
> for the unlikely event that the generated image if fine, but the web
server 
> is messing something up).

hmm... that's a good suggestion.  

Here's an excerpt from 'rrdtool dump':

  9.315467e+00

  8.80e+00

  8.80e+00

  8.80e+00

  8.80e+00

  NaN 
  NaN 
  NaN 
  NaN 
  NaN 

Correspondingly, in the graph seen through ganglia, the data ends about
17:38.  I'm suprised it's registering these things every 15 seconds!  I
thought the period was slower than that (every min).

I checked a few other rrds at different resolutions, and the NaN
sections do correspond to the blank parts.

So what does it mean?  This tells us that the data is not getting put
into the rrds.  We know that the values are getting to the collector
host, because clicking on the 'gmetric' portion of the website shows
current data.  But that data is not making it into the RRD somehow...

I thought maybe the RRDs had become corrupted somehow, so tried out
moving the rrds out of place so ganglia would recreate them all.  The
symptom was still in evidence.

On Tue, Jan 24, 2006 at 01:56:08PM -0800, steven wagner wrote:
> Running gmetad in the foreground with a very high debug level may 
> offer
> additional clues.  Also, keep an eye on the modification times on the 
> RRD files that are gapping.

I can't see anything too interesting running gmetad in the foreground
with debugging set to '9'.  :(

modification time of the rrd files seem to be current.  This matches the
rrd dump showing 'NaN' in all those fields instead of something
unmodified.


On Tue, Jan 24, 2006 at 05:06:48PM -0500, Jason A. Smith wrote:
> I have seen gaps sometimes.  They almost always happen when gmetad 
> gets data from a cluster that has the same exact timestamp as its last

> 

RE: [Ganglia-general] Different names for different hosts - why it happens

2006-01-05 Thread Richard.Grevis
And of course, being a reverse DNS lookup, it will depend on
nsswitch.conf, which determines whether the data comes from
/etc/hosts, NIS, or a DNS server.

And in our local environment, we have AD as well as bind DNS,
and even the case (upper/lower) of the fully qualified domain
name can depend on the phase of the moon.

Because of this, and because FQDN strings can be long, I have
patched my local ganglia PHPs to remove the domain component of
the hostname to avoid long strings as headings and on labels of graphs.

kind regards,
richard

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Alex
Balk
Sent: 04 January 2006 22:33
To: ganglia-general@lists.sourceforge.net
Subject: [Ganglia-general] Different names for different hosts - why it
happens


Hey all,

For anyone wondering why for some nodes you get hostname and for others
hostname.domain or IP address, it's all about the node's /etc/hosts
file.

Below is a list of possible entries in your node's /etc/hosts file, and
how its name would appear in Ganglia:

192.168.0.1   host.domain host   ==> host.domain
192.168.0.1   host host.domain   ==> host
192.168.0.1   host   ==> host
192.168.0.1  ==> IP

Again, this is determined on per-node basis, assuming you have "files"
as the first option of the "hosts:" line in your /etc/nsswitch.conf. If
you have "nis" set as first, it's probably determined by the naming
order there (though the same rules stated above should apply). Same goes
for "dns" being first.

Hope this helps...

Cheers,

Alex





---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log
files for problems?  Stop!  Download the new AJAX search engine that
makes searching your log files as easy as surfing the  web.  DOWNLOAD
SPLUNK! http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
___
Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general



For more information about Barclays Capital, please
visit our web site at http://www.barcap.com.


Internet communications are not secure and therefore the Barclays 
Group does not accept legal responsibility for the contents of this 
message.  Although the Barclays Group operates anti-virus programmes, 
it does not accept responsibility for any damage whatsoever that is 
caused by viruses being passed.  Any views or opinions presented are 
solely those of the author and do not necessarily represent those of the 
Barclays Group.  Replies to this email may be monitored by the Barclays 
Group for operational or business reasons.






[Ganglia-general] Improved gmond for windows, and gmetric for windows.

2006-01-04 Thread Richard.Grevis
Martin, Matt, and others

with help from a collegue, we have a 3.0.2 release binary for gmond,
and also a patch that corrects the "number of processors" misreporting.
It is also married to the most recent version of cygwni1.dll. and it
seems that windows gmond multicast works, although maybe I was imagining
this.
It is not msi packaged though.

We also have a windows binary for gmetric, which is handy. It just
compiled, actually.

Should I feed this back in somehow? How would you like to receive it?

kind regards,
Richard Grevis




For more information about Barclays Capital, please
visit our web site at http://www.barcap.com.


Internet communications are not secure and therefore the Barclays 
Group does not accept legal responsibility for the contents of this 
message.  Although the Barclays Group operates anti-virus programmes, 
it does not accept responsibility for any damage whatsoever that is 
caused by viruses being passed.  Any views or opinions presented are 
solely those of the author and do not necessarily represent those of the 
Barclays Group.  Replies to this email may be monitored by the Barclays 
Group for operational or business reasons.






RE: [Ganglia-general] PHP front end: has anyone modified the load metric color / computation?

2006-01-04 Thread Richard.Grevis
Of you could hack the load value itself by dividing by 5 in
cluster_view.php.
 
regards,
richard
 
p.s.
this is a bit yuk, but is certainly easy.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Alexei
Rodriguez
Sent: 04 January 2006 07:05
To: ganglia-general@lists.sourceforge.net
Subject: [Ganglia-general] PHP front end: has anyone modified
the load metric color / computation?


Greetings. First off, I want to say that ganglia rocks. It has
been a very valuable tool in the short time we have had it deployed, and
we are only using the very basic things.

The load on our systems tends to be "high" (5.0 and above), on
Solaris 10 systems (on AMD Opteron servers). The problem is that the
graphs being generated are all of the same color (bright, bloody red).
Given that all the systems have such high (relative) loads, I wanted to
see what the best way of changing the PHP front end to reflect my local
"colors and load" scheme.

If I change $load_colors in php.conf, such that the number
ranges are multiplied by 5x, would that work or is there a better way?

I just want to make sure that the solution I implement does not
make upgrades difficult :)


thanks!


Alexei






For more information about Barclays Capital, please
visit our web site at http://www.barcap.com.


Internet communications are not secure and therefore the Barclays 
Group does not accept legal responsibility for the contents of this 
message.  Although the Barclays Group operates anti-virus programmes, 
it does not accept responsibility for any damage whatsoever that is 
caused by viruses being passed.  Any views or opinions presented are 
solely those of the author and do not necessarily represent those of the 
Barclays Group.  Replies to this email may be monitored by the Barclays 
Group for operational or business reasons.





RE: [Ganglia-general] Name of cluster nodes

2005-12-29 Thread Richard.Grevis
This may or may not be the answer to you.
gmond itself does a reverse lookup of the IP of packets
sent to it from other gmonds. The resultant name is purely up to
a reverse DNS lookup. If this lookup works you get a hostname.
fail and you get an IP. It is the reverse DNS lookup domain
as executed from the gmond node that works as the "head node"
for tcp communication to gmetad.

At least I think so.

regards,
richard

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Branimir Ackovic
Sent: 21 December 2005 14:59
To: ganglia-general@lists.sourceforge.net
Subject: Re: [Ganglia-general] Name of cluster nodes


On Wednesday 21 December 2005 15:56, Andrés Cañada wrote:
> Hi!
> In the ganglia's web interface I can see all the cluster nodes. Down, 
> where the colored nodes are, I see a graphic per node, but some nodes 
> can be identified by their ip and some nodes by their hostname. I'd 
> like to know from where does ganglia get this identificator?? (I'd 
> like to see the hostname instead of the ip) is there a configuration 
> parameter to change this? Thankyou!
>

I think that it is up to your name server. Are all nodes configured in name 
server? How do you configure gmond and gmetad conf files?

-
Branimir Ackovic
E-mail: [EMAIL PROTECTED]
Web: http://scl.phy.bg.ac.yu/

Phone: +381 11 3160260, Ext. 152
Fax: +381 11 3162190

Scientific Computing Laboratory
Institute of Physics, Belgrade
Serbia and Montenegro
-


---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files 
for problems?  Stop!  Download the new AJAX search engine that makes searching 
your log files as easy as surfing the  web.  DOWNLOAD SPLUNK! 
http://ads.osdn.com/?ad_idv37&alloc_id865&op=ick
___
Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general



For more information about Barclays Capital, please
visit our web site at http://www.barcap.com.


Internet communications are not secure and therefore the Barclays 
Group does not accept legal responsibility for the contents of this 
message.  Although the Barclays Group operates anti-virus programmes, 
it does not accept responsibility for any damage whatsoever that is 
caused by viruses being passed.  Any views or opinions presented are 
solely those of the author and do not necessarily represent those of the 
Barclays Group.  Replies to this email may be monitored by the Barclays 
Group for operational or business reasons.






[Ganglia-general] Solaris 8 gmond and gmetric.

2005-12-05 Thread Richard.Grevis
Does anyone have compiled solaris 8 binaries they can send me?
My efforts to compile ganglia on solaris 8 with gcc did not
get very far. Not to sure why but if someone can send me the binaries,
that would make me a happy puppy.

kind regards,
Richard

ps, if interested:

Making all in examples
/bin/bash ../libtool --mode=link gcc  -g -O2  -L../src/  -o simple  simple.o 
../src/libconfuse.la
gcc -g -O2 -o simple simple.o  -L/var/tmp/rg/ganglia-3.0.2/srclib/confuse/src 
../src/.libs/libconfuse.a
Undefined   first referenced
 symbol in file
cfg_scan_string_end ../src/.libs/libconfuse.a(confuse.o)
cfg_scan_fp_begin   ../src/.libs/libconfuse.a(confuse.o)
cfg_scan_string_begin   ../src/.libs/libconfuse.a(confuse.o)
cfg_scan_fp_end ../src/.libs/libconfuse.a(confuse.o)
cfg_yyin../src/.libs/libconfuse.a(confuse.o)
cfg_yylex   ../src/.libs/libconfuse.a(confuse.o)
cfg_lexer_include   ../src/.libs/libconfuse.a(confuse.o)
ld: fatal: Symbol referencing errors. No output written to simple
collect2: ld returned 1 exit status
*** Error code 1
make: Fatal error: Command failed for target `simple'
-
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Martin Knoblauch
Sent: 30 November 2005 09:46
To: Ramon Bastiaans
Cc: Markus "Törnqvist; ganglia-general@lists.sourceforge.net
Subject: Re: [Ganglia-general] Unicast issue


Hi,

 some more info:

- udp_send_channel does not have a "bind" attribute, just forget my comment 
below. Looking at the code sometimes helps. 
- udp_recv_channel: if you specify "mcast_join" and "bind" with different IP 
adresses, no unicast processing will take place (from the gmond.conf man page)

 And forget the comment about "localhost". It is a bit more complicated like 
that

Martin
--- Martin Knoblauch <[EMAIL PROTECTED]> wrote:

> Ramon, Markus,
> 
>  actually, below one works fine for me. The same config file is used 
> on all gmond-hosts in the cluster (actually pretty beautiful :-).
> 
> - host "172.17.17.103" receives the metrics from all participating 
> gmonds.
> - all other hosts will report empty metrics if queried. If you want 
> them to report their own metrics, add a upd_send_channel for 
> "localhost.
> - host "172.17.33.108" is the only one allowed to query the TCP port. 
> This is the host where gmetad would be running (no gmond necessary on 
> this host). If you leave out the "acl" all hosts may query the TCP 
> port.
> 
>  The "bind" in the udp_recv_channel maybe needed if you have more than
> one network interface and the traffic does not come on the first one.
> For the upd-send-channel, no bind should ever be *neccessary*. But I
> am
> really not sure about this.
> 
> 
> 
> udp_send_channel {
>   host = 172.17.17.103
>   port = 9649
> }
> 
> udp_recv_channel {
>   port = 9649
> }
> 
> tcp_accept_channel {
>   acl {
> default = "deny"
> access {
>   ip = 172.17.33.108
>   mask = 32
>   action = "allow"
> }
>   }
>   port = 9649
> }
> -
> 
> Cheers
> Martin
> 
> --- Ramon Bastiaans <[EMAIL PROTECTED]> wrote:
> 
> > Actually, bind is needed to specify what local ip to bind to and 
> > listen on in a unicast setup.
> > mcast_join is used when listening to multicasting.
> > 
> > However, why are you using 2 different ip adresses in the recv and 
> > send channel? This will never work.
> > You need to set you send channel to the same ip/port as your recv
> > channel.
> > Else you are sending the information to 1 place and listening for
> > that 
> > information on another place.
> > 
> > Kind regards,
> > - Ramon.
> > 
> > Martin Knoblauch wrote:
> > 
> > >Markus,
> > >
> > > if you want unicast, I would leave out the "bind" thing. That is
> > for
> > >multicast, AFAIK.
> > >
> > >telnet w.x.y.z 8649
> > >
> > >Should give you a correct list of metrices.
> > >
> > >Cheers
> > >Martin
> > >
> > >--- Markus Törnqvist <[EMAIL PROTECTED]> wrote:
> > >
> > >  
> > >
> > >>Hi!
> > >>
> > >>I'm experiencing the weirdest issue here with unicasting; not
> even
> > >>the mail archives helped so I hope someone here can give me a
> hand.
> > >>
> > >>Shouldn't it suffice to have the config file look like this: 
> > >>udp_send_channel {
> > >>  host = w.x.y.z
> > >>  port = 8649
> > >>}
> > >>
> > >>udp_recv_channel {
> > >>  bind = w2.x2.y2.z2
> > >>  port = 8649
> > >>}
> > >>
> > >>for those parts?
> > >>
> > >>Nothing anywhere that points to multicasts?
> > >>
> > >>Right now, with that kind of configuration, I get an empty result 
> > >>set; 
> > >> > >>OWNER="unspecified" LATLONG="unspecified" URL="unspecified">
> > >>
> > >>
> > >>Connection closed by foreign host.
> > >>
> > >>It's somewhat annoying because 

RE: [Ganglia-general] windows gmond client

2005-11-10 Thread Richard.Grevis
Exactly.

I should have been clearer. The default windows/cygwin client is neither
correct enough
(cygwin's fault) nor provides all the metrics we want (in fact, because
some of our farms are not just HPC
farms, we want some other metrics as well). I remain grateful to whoever
developed it, none-the-less.

I am not a windows man, but we are looking at the possibility of
developing a fully native
(no cygwin) client ourselves. The reason for the TCP question is that my
feeling was that it
would be much easier to produce a native "first pass" windows gmond
client deliverying TCP
only, rather that all that clever UDP stuff as well.

But of course with the TCP route, I have fears of scaling. But there is
a GEM in Martins reply
(and a Doh moment for me), in that I assumed that every node would have
to be polled by
a gmetad to get the cluster info. But you remind me this is not so, I
can do the structural
equivalent of the udp unicast to a head node using TCP to a head node,
that gmetad then interogates.

Have I got this right guys?

And the other thing for the community is asking whether anyone else out
there is
considering developing a native windows gmond.

Kind regards,
Richard



-Original Message-
From: michael chang [mailto:[EMAIL PROTECTED] 
Sent: 08 November 2005 00:21
To: [EMAIL PROTECTED]
Cc: Grevis, Richard: IT (LDN); ganglia-general@lists.sourceforge.net
Subject: Re: [Ganglia-general] windows gmond client


On 11/7/05, Martin Knoblauch <[EMAIL PROTECTED]> wrote:
> > If we were to cheat, and create a windows agent that only produced 
> > the XML via the tcp interface, and not the udp niceness, can anyone 
> > give me an idea of how this will scale? This obviously moves more 
> > work to gmetad. Will gmetad poop with 5 data sources, 100?
> >
>
>  Not knowing the Cygwin implementation at all, but what is wrong with 
> using the unicast TCP setup. Just select one or two nodes per 
> *cluster* to run "gmond" in TCP receive mode and let all other nodes 
> send data to them. Use the selected node(s) as data source for 
> "gmetad". Much better network usage compared to the multicast mode, 
> which produces traffic going up with N*N. And you don't have to worry 
> about switches blocking IGMP traffic.

I think he means that Ganglia on Cygwin is inaccurate because Cygwin
supposedly misreports certain metrics or can't report others.  That
would make sense, since Cygwin is a POSIX emulation layer (or whatever
you call it).  That said, I'm not sure about the validity of that
statement.

I think he wants to know if there is a Windows-specific Ganglia client
that e.g. uses metrics provided by the Windows kernel subsystems (or
similar) that "works better" or is "more accurate".  [Which makes some
sense, since there are reporting-mechanisms of some sort for Windows,
I'm sure, since the System Resource Monitor (I know this was on 9x,
forget about XP and the like) and the Task Manager (XP, or at least I
believe 2000 and later versions or something) can show e.g. CPU usage.
Whether these are internal-use-only or not, I have no clue.]

--
~Mike
 - Just my two cents
 - No man is an island, and no man is unable.



For more information about Barclays Capital, please
visit our web site at http://www.barcap.com.


Internet communications are not secure and therefore the Barclays 
Group does not accept legal responsibility for the contents of this 
message.  Although the Barclays Group operates anti-virus programmes, 
it does not accept responsibility for any damage whatsoever that is 
caused by viruses being passed.  Any views or opinions presented are 
solely those of the author and do not necessarily represent those of the 
Barclays Group.  Replies to this email may be monitored by the Barclays 
Group for operational or business reasons.






[Ganglia-general] windows gmond client

2005-11-07 Thread Richard.Grevis
All,

against all probability, but for reasonable historical reasons, we run
windows based HPC applications.
We also have large networks of similar function windows farms (e..g. web
farms). We want to improve
the visibility of the state of our estate, we like ganglia (rollups and
all that),
and want to pilot in some of our environments.

But as we all know, the windows agent is not fully performant, mostly
for cygwin reasons.

Is anyone working on a fully functional windows agent? Or a native
agent?

If we were to cheat, and create a windows agent that only produced the
XML via the tcp interface,
and not the udp niceness, can anyone give me an idea of how this will
scale? This obviously moves
more work to gmetad. Will gmetad poop with 5 data sources, 100?

Can someone suggest something clever to get windows node producing
ganglia data in a lightweight way?
Hopefully lighter weight than remote WMI access which is well
maybe someone says this is OK,
but our wintel engineers hate the thought.

kind regards,

Richard




For more information about Barclays Capital, please
visit our web site at http://www.barcap.com.


Internet communications are not secure and therefore the Barclays 
Group does not accept legal responsibility for the contents of this 
message.  Although the Barclays Group operates anti-virus programmes, 
it does not accept responsibility for any damage whatsoever that is 
caused by viruses being passed.  Any views or opinions presented are 
solely those of the author and do not necessarily represent those of the 
Barclays Group.  Replies to this email may be monitored by the Barclays 
Group for operational or business reasons.