[Ganglia-developers] Replacing core metrics with Python metric modules
I have a situation where there is already a mechanism that is collecting metrics on a compute host in a cluster (Performance Co-Pilot) and pushing them up to the head node. I was wondering if is possible to write a Python metric module that could replace the core set of metrics that gmond usually collects on the compute node, and instead grab the data from PCP that is running on the head node. Are there any real differences between the metrics that are normally collected by gmond, and those user-defined metrics collected by a Python module? The goal is to not have to double collect these metrics on each compute host. Thanks mh -- ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
[Ganglia-developers] Backport vote for spoofed DSO metrics
I'd like to vote for backporting Spoofed DSO metrics Index: monitor-core-3.1/STATUS === --- monitor-core-3.1/STATUS (revision 1817) +++ monitor-core-3.1/STATUS (working copy) @@ -49,7 +49,7 @@ http://ganglia.svn.sourceforge.net/viewvc/ganglia?view=rev&revision=1386 http://ganglia.svn.sourceforge.net/viewvc/ganglia?view=rev&revision=1389 http://ganglia.svn.sourceforge.net/viewvc/ganglia?view=rev&revision=1622 -+1: bnicholes ++1: bnicholes, mort carenas: apparently includes few other unrelated changes * gmond: solaris: define fabsf for solaris < 10 mh - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] [PATCH] remove version from libganglia package name
On Tue, Aug 12, 2008 at 02:41:29PM +0100, Kostas Georgiou wrote: > On Tue, Aug 12, 2008 at 07:38:15AM -0500, Martin Hicks wrote: > > > > > On Mon, Aug 11, 2008 at 11:30:24PM +0200, Marcus Rueckert wrote: > > > > > > this is not the package version. it is the soname mangled a bit. the > > > base idea behind it is, that you can install multiple version of the > > > same library in parallel. > > > > Okay. I guess I just don't see this very often. Are we expecting to > > break library compatibility often? > > Even if you break library compatibility there is no need for the soname > being encoded in the rpm name for the most current version. As it is now > during an upgrade libganglia-$soname will stay installed even if nothing > requires it anymore. > > The common practice in the rpm world is to not to use the soname for the > latest version and have something like compat-ganglia-30 or libganglia30 > for example for the older versions (no need to encode the minor version > since changes there don't break compatibility). this makes sense to me. I did take a look through the list of "lib" packages on a SLES machine and didn't see many (any?) instances of libfoo-- packages... mh - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] [PATCH] remove version from libganglia package name
On Mon, Aug 11, 2008 at 11:30:24PM +0200, Marcus Rueckert wrote: > > this is not the package version. it is the soname mangled a bit. the > base idea behind it is, that you can install multiple version of the > same library in parallel. Okay. I guess I just don't see this very often. Are we expecting to break library compatibility often? mh - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] [PATCH] remove version from libganglia package name
On Mon, Aug 11, 2008 at 10:09:42AM -0500, Martin Hicks wrote: > > I don't think its necessary (or good form) to include a full version > number in the RPM package name. RPM already does versioning based on > %version. a little hasty. This has been build tested. Index: monitor-core/ganglia.spec.in === --- monitor-core/ganglia.spec.in(revision 1614) +++ monitor-core/ganglia.spec.in(working copy) @@ -141,18 +141,19 @@ # revisit this list. it might be libtool bloat Requires: expat-devel, apr-devel > 1 %if 0%{?suse_version} -Requires: libconfuse-devel, libexpat-devel, libapr1-devel, libganglia-3_1_0 +Requires: libconfuse-devel, libexpat-devel, libapr1-devel, libganglia %endif %description devel The Ganglia Monitoring Core library provides a set of functions that programmers can use to build scalable cluster or grid applications -%package -n libganglia-3_1_0 +%package -n libganglia Summary: Ganglia Shared Libraries http://ganglia.sourceforge.net/ Group: System Environment/Base +Obsoletes: libganglia-3_1_0 -%description -n libganglia-3_1_0 +%description -n libganglia The Ganglia Shared Libraries contains common libraries required by both gmond and gmetad packages @@ -228,9 +229,9 @@ /sbin/chkconfig --del gmond fi -%post -n libganglia-3_1_0 -p /sbin/ldconfig +%post -n libganglia -p /sbin/ldconfig -%postun -n libganglia-3_1_0 -p /sbin/ldconfig +%postun -n libganglia -p /sbin/ldconfig %endif #ifnarch noarch @@ -361,7 +362,7 @@ %{_libdir}/libganglia*.*a %{_bindir}/ganglia-config -%files -n libganglia-3_1_0 +%files -n libganglia %defattr(-,root,root,-) %{_libdir}/libganglia*.so.* - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
[Ganglia-developers] [PATCH] remove version from libganglia package name
I don't think its necessary (or good form) to include a full version number in the RPM package name. RPM already does versioning based on %version. Index: monitor-core/ganglia.spec.in === --- monitor-core/ganglia.spec.in(revision 1614) +++ monitor-core/ganglia.spec.in(working copy) @@ -148,11 +148,11 @@ The Ganglia Monitoring Core library provides a set of functions that programmers can use to build scalable cluster or grid applications -%package -n libganglia-3_1_0 +%package -n libganglia Summary: Ganglia Shared Libraries http://ganglia.sourceforge.net/ Group: System Environment/Base -%description -n libganglia-3_1_0 +%description -n libganglia The Ganglia Shared Libraries contains common libraries required by both gmond and gmetad packages - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] apply BZ#36 to 3.0.x?
On Wed, Apr 16, 2008 at 12:50:29AM -0500, Carlo Marcelo Arenas Belon wrote: > On Tue, Apr 15, 2008 at 02:50:30PM -0600, Brad Nicholes wrote: > > >>> On 4/15/2008 at 12:27 AM, in message <[EMAIL PROTECTED]>, Carlo > > Marcelo Arenas Belon <[EMAIL PROTECTED]> wrote: > > > > > > backported and tested when applied to ganglia 3.0.7 as well, but I am > > > afraid > > > not the complete fix that Martin was probably expecting for, as the > > > heartbeat > > > age is not calculated correctly either way. > > I might had uncover another bug when testing it as I was using an adhoc > cluster with only 1 node to test it. If this fix is buggy and/or incomplete, then I don't think it should be applied to 3.0.x. I just saw this contributed fix with no additional comments on the bug, so I thought it might be a candidate for the stable tree. thanks mh > > in this scenario the cluster time also stop getting updating resulting in a > "frozen" age. > > > So what is the logic that is being used to calculate this time for the down > > node that show up on the cluster page? > > from show_node.php, lines 68 to 71 > > # Compute time of last heartbeat from node's dendrite. > $clustertime=$cluster['LOCALTIME']; > $heartbeat=$hostattrs['REPORTED']; > $age = $clustertime - $heartbeat; > > > The last heartbeat time seems to be correct here. > > right, if there are more nodes in the cluster, it will be correct. > > Carlo - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
[Ganglia-developers] apply BZ#36 to 3.0.x?
Hi, After the recent discussions about creating another 3.0.x release... Would the community be willing to apply the patch to fix BZ#36? The patch has been on the bug for three years with no updates... http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=36 mh - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] patch for gmond to chop domain name
On Mon, Mar 03, 2008 at 11:21:59AM -0600, Michael Sternberg wrote: > >resolving fqdn, and others (user defined values, injected via gmetric) > >were using just hostname. > > > >I ended up patching ganglia's apr_getnameinfo() to use NI_NOFQDN > > Elegant! > > It'd be nice to just patch the call in gmond.c, but looks like it's a > pain to portably pull in netdb.h. Yeah, I threw portability to the wind, and I knew it. mh - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] patch for gmond to chop domain name
On Sun, Mar 02, 2008 at 10:37:10PM -0600, Michael Sternberg wrote: > On Mar 2, 2008, at 21:04 , Carlo Marcelo Arenas Belon wrote: > > On Sun, Mar 02, 2008 at 01:34:35PM -0600, Michael Sternberg wrote: > >> > >> Here's a simple patch for gmond/gmond.c to chop domain names off the > >> ganglia web interface > > > > why not doing the change in the web interface then? > > > > Carlo > > Good point. > > (a) I'd have lost history because the rrds were originally created > with short names. I realize now it's a matter of renaming the host- > specific RRA files in /var/lib/ganglia/rrds/. > > > (b) The name resolution business was too finicky, and I was not alone > in this, re: I was getting duplicates in the web view because some things were resolving fqdn, and others (user defined values, injected via gmetric) were using just hostname. I ended up patching ganglia's apr_getnameinfo() to use NI_NOFQDN mh - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] gmond Spoof memory leak fix
On Sat, Feb 23, 2008 at 04:32:20PM -0600, Carlo Marcelo Arenas Belon wrote: > > gmond.c: In function 'Ganglia_message_save': > gmond.c:840: warning: passing argument 1 of 'xdr_free' from incompatible > pointer type > gmond.c:840: warning: passing argument 2 of 'xdr_free' from incompatible > pointer type > > attached patch silences it. Ah okay. I don't see those warnings. Thanks for the update. mh > > Carlo > Index: gmond/gmond.c > === > --- gmond/gmond.c (revision 993) > +++ gmond/gmond.c (working copy) > @@ -837,7 +837,7 @@ > >metric->message.id = metric_user_defined; >metric->message.Ganglia_message_u.gmetric = > message->Ganglia_message_u.spmetric.gmetric; > - xdr_free(xdr_Ganglia_spoof_header, > &message->Ganglia_message_u.spmetric.spheader); > + xdr_free((xdrproc_t)xdr_Ganglia_spoof_header, (char > *)&(message->Ganglia_message_u.spmetric.spheader)); > >}else{ >memcpy(&(metric->message), message, sizeof(Ganglia_message)); > - > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ > ___ > Ganglia-developers mailing list > Ganglia-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/ganglia-developers - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] gmond Spoof memory leak fix
On Wed, Feb 20, 2008 at 01:18:33PM -0700, Brad Nicholes wrote: > I don't believe that we have the same problem in trunk, however some > additional testing couldn't hurt. The spoof packet handling as well > as the way that the XDR data is handled in general, has changed > significantly in trunk. I have gone through the trunk code > specifically looking for cases where xdr_free() was not being > called. I checked in a few memory leaks patches a couple of weeks > ago that were directly related to xdr_free() not being called. So I > am hoping that these issues have already been nailed in trunk. I'll try to test-drive ganglia-3.1.x on the Altix ICE stuff soon. mh - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] gmond Spoof memory leak fix
On Wed, Feb 20, 2008 at 10:27:33AM -0800, Martin Knoblauch wrote: > Hi, > > if you resend it as an attachment, I would apply the fix. You can apply it with my blabbering at the beginning. :) patch ignores the stuff before the --- The patch is attached for your convenience. > > Cheers > Martin > PS: How is life at SGI nowadays? Seems okay. I just got here recently. :) mh --- ganglia-3.0.6.200802141157/gmond/gmond.c2008-02-14 14:58:58.0 -0500 +++ ganglia-3.0.6.200802141157.mod/gmond/gmond.c2008-02-20 11:46:23.0 -0500 @@ -831,11 +831,13 @@ Ganglia_message_save( Ganglia_host *host /* Copy in the data */ // Yemi if(message->id == spoof_metric){ -// Store data as regular gmetric in hash table!! + /* Store data as regular gmetric in hash table!! + * Free the Spoof-related strings. + */ - metric->message.id = metric_user_defined; + metric->message.id = metric_user_defined; metric->message.Ganglia_message_u.gmetric = message->Ganglia_message_u.spmetric.gmetric; - + xdr_free(xdr_Ganglia_spoof_header, &message->Ganglia_message_u.spmetric.spheader); }else{ memcpy(&(metric->message), message, sizeof(Ganglia_message)); - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
[Ganglia-developers] gmond Spoof memory leak fix
Hi, Here's a patch against ganglia-3.0.6.200802141157 that fixes a memory leak when using user defined metrics with spoofing. The problem was that the spmetric was being copied out, ignoring the spheader. The strings that were allocated inside the spheader were dropped. mh --- ganglia-3.0.6.200802141157/gmond/gmond.c2008-02-14 14:58:58.0 -0500 +++ ganglia-3.0.6.200802141157.mod/gmond/gmond.c2008-02-20 11:46:23.0 -0500 @@ -831,11 +831,13 @@ Ganglia_message_save( Ganglia_host *host /* Copy in the data */ // Yemi if(message->id == spoof_metric){ -// Store data as regular gmetric in hash table!! + /* Store data as regular gmetric in hash table!! + * Free the Spoof-related strings. + */ - metric->message.id = metric_user_defined; + metric->message.id = metric_user_defined; metric->message.Ganglia_message_u.gmetric = message->Ganglia_message_u.spmetric.gmetric; - + xdr_free(xdr_Ganglia_spoof_header, &message->Ganglia_message_u.spmetric.spheader); }else{ memcpy(&(metric->message), message, sizeof(Ganglia_message)); - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Memory leak in gmond
On Tue, Feb 19, 2008 at 08:17:27AM -0700, Brad Nicholes wrote: > > All of the other memory leak fixes in 3.1.0 were specific to that code > base. Although there might be something similar going on in 3.0.x. > The other memory leak fixes dealt with the XDR functions that create > and free the XDR data. There were instances in some of the new code > that I wrote where XDR data structures were being created but not > freed. There could be similar instances in the 3.0.x code base. We > would just have to take a closer look at the code path that begins > from process_udp_recv_channel() when a metric packet is being read and > stored by other gmond nodes. I still haven't figured out where we should be freeing this memory, or how we're dropping the pointers on the floor (mostly due to still figuring out how gmond works, and how XDR works). It is trivially reproducible. Just inject any metric you want with the spoof "-S" argument. Both of the strings will be leaked. E.g., gmetric -n "bleh" -v 5 -t uint8 -u "goobers" -S 10.0.0.1:myhost valgrind will report two blocks for a total of 16 bytes as being lost. "10.0.0.1\0" and "myhost\0" would be the blocks, I believe. mh - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Memory leak in gmond
On Mon, Feb 18, 2008 at 10:41:08PM -0600, Carlo Marcelo Arenas Belon wrote: > On Tue, Feb 19, 2008 at 09:43:21AM +0530, Kumar Vaibhav wrote: > > > > Did You tried the latest patched Version that Bernard send on last > > friday. A lot of memory leak fixes have been done. > > Vaibhav, the only memory leak fixed in the last beta was the one your > reported. I was testing with the 3.0.6. version released last week. > the development version (which will be 3.1.0 when released) has some more > "memory leak" like fixes and the report from Martin might imply another one > needs also backporting or fixing. Must be. I'm still confused by the xdr stuff, so I have no idea why it might be happening. Its probably related to the strings allocated for spoofing. mh - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] Memory leak in gmond
On Tue, Jan 22, 2008 at 04:17:07PM +0530, Kumar Vaibhav wrote: > I am using ganglia-3.0.5 on a woodcrest processor cluster. and I see > that after running for weeks the memory consumption of the gmond process > is something about 400 MB. I tried to debug the problem by isolating a > single node. But the problem continues with slower rate (rss memory > growth). I tried to run the I have another memory leak, I think. I'm using spoofed metrics, and I see a lot of memory being leaked in gmond: ==30082== 66,275 bytes in 5,924 blocks are definitely lost in loss record 18 of 19 ==30082==at 0x4A1FDEB: malloc (vg_replace_malloc.c:207) ==30082==by 0x53E52D3: xdr_string (in /lib64/libc-2.4.so) ==30082==by 0x40D9CD: xdr_Ganglia_spoof_header (protocol_xdr.c:45) ==30082==by 0x40DAC8: xdr_Ganglia_spoof_message (protocol_xdr.c:57) ==30082==by 0x40DC62: xdr_Ganglia_message (protocol_xdr.c:87) ==30082==by 0x404BE9: process_udp_recv_channel (gmond.c:903) ==30082==by 0x405D5D: main (gmond.c:1277) I'm still investigating. This leak is quick in my application...around 1MB every twenty minutes. mh - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers