[Ganglia-developers] Replacing core metrics with Python metric modules

2009-07-22 Thread Martin Hicks

I have a situation where there is already a mechanism that is collecting
metrics on a compute host in a cluster (Performance Co-Pilot) and
pushing them up to the head node.

I was wondering if is possible to write a Python metric module that
could replace the core set of metrics that gmond usually collects on the
compute node, and instead grab the data from PCP that is running on the
head node.

Are there any real differences between the metrics that are normally
collected by gmond, and those user-defined metrics collected by a Python
module?

The goal is to not have to double collect these metrics on each compute
host.

Thanks
mh


--
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


[Ganglia-developers] Backport vote for spoofed DSO metrics

2008-09-22 Thread Martin Hicks

I'd like to vote for backporting Spoofed DSO metrics

Index: monitor-core-3.1/STATUS
===
--- monitor-core-3.1/STATUS (revision 1817)
+++ monitor-core-3.1/STATUS (working copy)
@@ -49,7 +49,7 @@
 http://ganglia.svn.sourceforge.net/viewvc/ganglia?view=rev&revision=1386
 http://ganglia.svn.sourceforge.net/viewvc/ganglia?view=rev&revision=1389
 http://ganglia.svn.sourceforge.net/viewvc/ganglia?view=rev&revision=1622
-+1: bnicholes
++1: bnicholes, mort
 carenas: apparently includes few other unrelated changes
 
   * gmond: solaris: define fabsf for solaris < 10

mh


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] [PATCH] remove version from libganglia package name

2008-08-12 Thread Martin Hicks

On Tue, Aug 12, 2008 at 02:41:29PM +0100, Kostas Georgiou wrote:
> On Tue, Aug 12, 2008 at 07:38:15AM -0500, Martin Hicks wrote:
> 
> > 
> > On Mon, Aug 11, 2008 at 11:30:24PM +0200, Marcus Rueckert wrote:
> > > 
> > > this is not the package version. it is the soname mangled a bit. the
> > > base idea behind it is, that you can install multiple version of the
> > > same library in parallel.
> > 
> > Okay.  I guess I just don't see this very often.  Are we expecting to
> > break library compatibility often?
> 
> Even if you break library compatibility there is no need for the soname
> being encoded in the rpm name for the most current version. As it is now
> during an upgrade libganglia-$soname will stay installed even if nothing
> requires it anymore.
> 
> The common practice in the rpm world is to not to use the soname for the
> latest version and have something like compat-ganglia-30 or libganglia30
> for example for the older versions (no need to encode the minor version
> since changes there don't break compatibility).

this makes sense to me.  I did take a look through the list of "lib"
packages on a SLES machine and didn't see many (any?) instances of
libfoo-- packages...

mh


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] [PATCH] remove version from libganglia package name

2008-08-12 Thread Martin Hicks

On Mon, Aug 11, 2008 at 11:30:24PM +0200, Marcus Rueckert wrote:
> 
> this is not the package version. it is the soname mangled a bit. the
> base idea behind it is, that you can install multiple version of the
> same library in parallel.

Okay.  I guess I just don't see this very often.  Are we expecting to
break library compatibility often?

mh

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] [PATCH] remove version from libganglia package name

2008-08-11 Thread Martin Hicks

On Mon, Aug 11, 2008 at 10:09:42AM -0500, Martin Hicks wrote:
> 
> I don't think its necessary (or good form) to include a full version
> number in the RPM package name.  RPM already does versioning based on
> %version.

a little hasty.  This has been build tested.


Index: monitor-core/ganglia.spec.in
===
--- monitor-core/ganglia.spec.in(revision 1614)
+++ monitor-core/ganglia.spec.in(working copy)
@@ -141,18 +141,19 @@
 # revisit this list. it might be libtool bloat
 Requires: expat-devel, apr-devel > 1
 %if 0%{?suse_version}
-Requires: libconfuse-devel, libexpat-devel, libapr1-devel, libganglia-3_1_0
+Requires: libconfuse-devel, libexpat-devel, libapr1-devel, libganglia
 %endif
 
 %description devel
 The Ganglia Monitoring Core library provides a set of functions that 
programmers
 can use to build scalable cluster or grid applications
 
-%package -n libganglia-3_1_0
+%package -n libganglia
 Summary: Ganglia Shared Libraries http://ganglia.sourceforge.net/
 Group: System Environment/Base
+Obsoletes: libganglia-3_1_0
 
-%description -n libganglia-3_1_0
+%description -n libganglia
 The Ganglia Shared Libraries contains common libraries required by both gmond 
and
 gmetad packages
 
@@ -228,9 +229,9 @@
/sbin/chkconfig --del gmond
 fi
 
-%post   -n libganglia-3_1_0 -p /sbin/ldconfig
+%post   -n libganglia -p /sbin/ldconfig
 
-%postun -n libganglia-3_1_0 -p /sbin/ldconfig
+%postun -n libganglia -p /sbin/ldconfig
 
 %endif #ifnarch noarch
 
@@ -361,7 +362,7 @@
 %{_libdir}/libganglia*.*a
 %{_bindir}/ganglia-config
 
-%files -n libganglia-3_1_0
+%files -n libganglia
 %defattr(-,root,root,-)
 %{_libdir}/libganglia*.so.*
 

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


[Ganglia-developers] [PATCH] remove version from libganglia package name

2008-08-11 Thread Martin Hicks

I don't think its necessary (or good form) to include a full version
number in the RPM package name.  RPM already does versioning based on
%version.


Index: monitor-core/ganglia.spec.in
===
--- monitor-core/ganglia.spec.in(revision 1614)
+++ monitor-core/ganglia.spec.in(working copy)
@@ -148,11 +148,11 @@
 The Ganglia Monitoring Core library provides a set of functions that 
programmers
 can use to build scalable cluster or grid applications
 
-%package -n libganglia-3_1_0
+%package -n libganglia
 Summary: Ganglia Shared Libraries http://ganglia.sourceforge.net/
 Group: System Environment/Base
 
-%description -n libganglia-3_1_0
+%description -n libganglia
 The Ganglia Shared Libraries contains common libraries required by both gmond 
and
 gmetad packages
 


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] apply BZ#36 to 3.0.x?

2008-04-16 Thread Martin Hicks

On Wed, Apr 16, 2008 at 12:50:29AM -0500, Carlo Marcelo Arenas Belon wrote:
> On Tue, Apr 15, 2008 at 02:50:30PM -0600, Brad Nicholes wrote:
> > >>> On 4/15/2008 at 12:27 AM, in message <[EMAIL PROTECTED]>, Carlo
> > Marcelo Arenas Belon <[EMAIL PROTECTED]> wrote:
> > > 
> > > backported and tested when applied to ganglia 3.0.7 as well, but I am 
> > > afraid
> > > not the complete fix that Martin was probably expecting for, as the 
> > > heartbeat
> > > age is not calculated correctly either way.
> 
> I might had uncover another bug when testing it as I was using an adhoc
> cluster with only 1 node to test it.

If this fix is buggy and/or incomplete, then I don't think it should be
applied to 3.0.x.  I just saw this contributed fix with no additional
comments on the bug, so I thought it might be a candidate for the stable
tree.

thanks
mh


> 
> in this scenario the cluster time also stop getting updating resulting in a
> "frozen" age.
> 
> > So what is the logic that is being used to calculate this time for the down 
> > node that show up on the cluster page?
> 
> from show_node.php, lines 68 to 71
> 
> # Compute time of last heartbeat from node's dendrite.
> $clustertime=$cluster['LOCALTIME'];
> $heartbeat=$hostattrs['REPORTED'];
> $age = $clustertime - $heartbeat;
> 
> > The last heartbeat time seems to be correct here.
> 
> right, if there are more nodes in the cluster, it will be correct.
> 
> Carlo

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


[Ganglia-developers] apply BZ#36 to 3.0.x?

2008-04-14 Thread Martin Hicks

Hi,

After the recent discussions about creating another 3.0.x release...

Would the community be willing to apply the patch to fix BZ#36?  The
patch has been on the bug for three years with no updates...

http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=36

mh


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] patch for gmond to chop domain name

2008-03-03 Thread Martin Hicks

On Mon, Mar 03, 2008 at 11:21:59AM -0600, Michael Sternberg wrote:
> >resolving fqdn, and others (user defined values, injected via gmetric)
> >were using just hostname.
> >
> >I ended up patching ganglia's apr_getnameinfo() to use NI_NOFQDN
> 
> Elegant!
> 
> It'd be nice to just patch the call in gmond.c, but looks like it's a  
> pain to portably pull in netdb.h.

Yeah, I threw portability to the wind, and I knew it.

mh


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] patch for gmond to chop domain name

2008-03-03 Thread Martin Hicks

On Sun, Mar 02, 2008 at 10:37:10PM -0600, Michael Sternberg wrote:
> On Mar 2, 2008, at 21:04 , Carlo Marcelo Arenas Belon wrote:
> > On Sun, Mar 02, 2008 at 01:34:35PM -0600, Michael Sternberg wrote:
> >>
> >> Here's a simple patch for gmond/gmond.c to chop domain names off the
> >> ganglia web interface
> >
> > why not doing the change in the web interface then?
> >
> > Carlo
> 
> Good point.
> 
> (a) I'd have lost history because the rrds were originally created  
> with short names.  I realize now it's a matter of renaming the host- 
> specific RRA files in /var/lib/ganglia/rrds/.
> 
> 
> (b) The name resolution business was too finicky, and I was not alone  
> in this, re:

I was getting duplicates in the web view because some things were
resolving fqdn, and others (user defined values, injected via gmetric)
were using just hostname.

I ended up patching ganglia's apr_getnameinfo() to use NI_NOFQDN

mh


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmond Spoof memory leak fix

2008-02-25 Thread Martin Hicks

On Sat, Feb 23, 2008 at 04:32:20PM -0600, Carlo Marcelo Arenas Belon wrote:
> 
>   gmond.c: In function 'Ganglia_message_save':
>   gmond.c:840: warning: passing argument 1 of 'xdr_free' from incompatible 
> pointer type
>   gmond.c:840: warning: passing argument 2 of 'xdr_free' from incompatible 
> pointer type
> 
> attached patch silences it.

Ah okay.  I don't see those warnings.  Thanks for the update.

mh

> 
> Carlo

> Index: gmond/gmond.c
> ===
> --- gmond/gmond.c (revision 993)
> +++ gmond/gmond.c (working copy)
> @@ -837,7 +837,7 @@
>  
>metric->message.id = metric_user_defined;
>metric->message.Ganglia_message_u.gmetric = 
> message->Ganglia_message_u.spmetric.gmetric;
> -  xdr_free(xdr_Ganglia_spoof_header, 
> &message->Ganglia_message_u.spmetric.spheader);
> +  xdr_free((xdrproc_t)xdr_Ganglia_spoof_header, (char 
> *)&(message->Ganglia_message_u.spmetric.spheader));
>  
>}else{
>memcpy(&(metric->message), message, sizeof(Ganglia_message));

> -
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
> ___
> Ganglia-developers mailing list
> Ganglia-developers@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-developers


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmond Spoof memory leak fix

2008-02-20 Thread Martin Hicks

On Wed, Feb 20, 2008 at 01:18:33PM -0700, Brad Nicholes wrote:
>   I don't believe that we have the same problem in trunk, however some
>   additional testing couldn't hurt.  The spoof packet handling as well
>   as the way that the XDR data is handled in general, has changed
>   significantly in trunk.  I have gone through the trunk code
>   specifically looking for cases where xdr_free() was not being
>   called.  I checked in a few memory leaks patches a couple of weeks
>   ago that were directly related to xdr_free() not being called.  So I
>   am hoping that these issues have already been nailed in trunk.

I'll try to test-drive ganglia-3.1.x on the Altix ICE stuff soon.

mh


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmond Spoof memory leak fix

2008-02-20 Thread Martin Hicks

On Wed, Feb 20, 2008 at 10:27:33AM -0800, Martin Knoblauch wrote:
> Hi,
> 
>  if you resend it as an attachment, I would apply the fix.

You can apply it with my blabbering at the beginning. :)
patch ignores the stuff before the ---

The patch is attached for your convenience.

> 
> Cheers
> Martin
> PS: How is life at SGI nowadays?

Seems okay.  I just got here recently. :)

mh

--- ganglia-3.0.6.200802141157/gmond/gmond.c2008-02-14 14:58:58.0 
-0500
+++ ganglia-3.0.6.200802141157.mod/gmond/gmond.c2008-02-20 
11:46:23.0 -0500
@@ -831,11 +831,13 @@ Ganglia_message_save( Ganglia_host *host
   /* Copy in the data */
   // Yemi
   if(message->id == spoof_metric){
-// Store data as regular gmetric in hash table!!
+  /* Store data as regular gmetric in hash table!!
+   * Free the Spoof-related strings.
+   */
 
-  metric->message.id = metric_user_defined;   
+  metric->message.id = metric_user_defined;
   metric->message.Ganglia_message_u.gmetric = 
message->Ganglia_message_u.spmetric.gmetric;
-
+  xdr_free(xdr_Ganglia_spoof_header, 
&message->Ganglia_message_u.spmetric.spheader);
 
   }else{
   memcpy(&(metric->message), message, sizeof(Ganglia_message));
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


[Ganglia-developers] gmond Spoof memory leak fix

2008-02-20 Thread Martin Hicks

Hi,

Here's a patch against ganglia-3.0.6.200802141157 that fixes a memory
leak when using user defined metrics with spoofing.

The problem was that the spmetric was being copied out, ignoring the
spheader.  The strings that were allocated inside the spheader were
dropped.

mh

--- ganglia-3.0.6.200802141157/gmond/gmond.c2008-02-14 14:58:58.0 
-0500
+++ ganglia-3.0.6.200802141157.mod/gmond/gmond.c2008-02-20 
11:46:23.0 -0500
@@ -831,11 +831,13 @@ Ganglia_message_save( Ganglia_host *host
   /* Copy in the data */
   // Yemi
   if(message->id == spoof_metric){
-// Store data as regular gmetric in hash table!!
+  /* Store data as regular gmetric in hash table!!
+   * Free the Spoof-related strings.
+   */
 
-  metric->message.id = metric_user_defined;   
+  metric->message.id = metric_user_defined;
   metric->message.Ganglia_message_u.gmetric = 
message->Ganglia_message_u.spmetric.gmetric;
-
+  xdr_free(xdr_Ganglia_spoof_header, 
&message->Ganglia_message_u.spmetric.spheader);
 
   }else{
   memcpy(&(metric->message), message, sizeof(Ganglia_message));


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Memory leak in gmond

2008-02-19 Thread Martin Hicks

On Tue, Feb 19, 2008 at 08:17:27AM -0700, Brad Nicholes wrote:
> 
> All of the other memory leak fixes in 3.1.0 were specific to that code
> base.  Although there might be something similar going on in 3.0.x.
> The other memory leak fixes dealt with the XDR functions that create
> and free the XDR data.  There were instances in some of the new code
> that I wrote where XDR data structures were being created but not
> freed.  There could be similar instances in the 3.0.x code base.  We
> would just have to take a closer look at the code path that begins
> from process_udp_recv_channel() when a metric packet is being read and
> stored by other gmond nodes.

I still haven't figured out where we should be freeing this memory, or
how we're dropping the pointers on the floor (mostly due to still
figuring out how gmond works, and how XDR works).

It is trivially reproducible.  Just inject any metric you want with the
spoof "-S" argument.  Both of the strings will be leaked.

E.g., gmetric -n "bleh" -v 5 -t uint8 -u "goobers" -S 10.0.0.1:myhost

valgrind will report two blocks for a total of 16 bytes as being lost.

"10.0.0.1\0" and "myhost\0" would be the blocks, I believe.

mh

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Memory leak in gmond

2008-02-19 Thread Martin Hicks

On Mon, Feb 18, 2008 at 10:41:08PM -0600, Carlo Marcelo Arenas Belon wrote:
> On Tue, Feb 19, 2008 at 09:43:21AM +0530, Kumar Vaibhav wrote:
> > 
> > Did You tried the latest patched Version that Bernard send on last 
> > friday. A lot of memory leak fixes have been done.
> 
> Vaibhav, the only memory leak fixed in the last beta was the one your
> reported.

I was testing with the 3.0.6. version released last
week.

> the development version (which will be 3.1.0 when released) has some more
> "memory leak" like fixes and the report from Martin might imply another one
> needs also backporting or fixing.

Must be.  I'm still confused by the xdr stuff, so I have no idea why it
might be happening.  Its probably related to the strings allocated for
spoofing.

mh


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Memory leak in gmond

2008-02-18 Thread Martin Hicks

On Tue, Jan 22, 2008 at 04:17:07PM +0530, Kumar Vaibhav wrote:
> I am using ganglia-3.0.5 on a woodcrest processor cluster. and I see 
> that after running for weeks the memory consumption of the gmond process 
> is something about 400 MB. I tried to debug the problem by isolating a 
> single node. But the problem continues with slower rate (rss memory 
> growth). I tried to run the

I have another memory leak, I think.  I'm using spoofed metrics, and I
see a lot of memory being leaked in gmond:

==30082== 66,275 bytes in 5,924 blocks are definitely lost in loss record 18 of 
19
==30082==at 0x4A1FDEB: malloc (vg_replace_malloc.c:207)
==30082==by 0x53E52D3: xdr_string (in /lib64/libc-2.4.so)
==30082==by 0x40D9CD: xdr_Ganglia_spoof_header (protocol_xdr.c:45)
==30082==by 0x40DAC8: xdr_Ganglia_spoof_message (protocol_xdr.c:57)
==30082==by 0x40DC62: xdr_Ganglia_message (protocol_xdr.c:87)
==30082==by 0x404BE9: process_udp_recv_channel (gmond.c:903)
==30082==by 0x405D5D: main (gmond.c:1277)

I'm still investigating.  This leak is quick in my application...around
1MB every twenty minutes.

mh



-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers