Re: [Ganglia-general] Ganglia gmond memory leak?
I ran a test on two new systems, one with the modules commented out and one with the modules running. The one without the modules grew from 3MB - 49MB mem usage in 3 days and the one with the modules grew from 3MB to 50MB in 3 days. These were two freshly configured lpars with nothing else running on them. AIX 6.1 Tl5 SP6. Is there a precompiled binary for valgrind on AIX? We try not to install compilers on our systems for security. However if not available I can install it on a lab system to run with gmond. Thanks -- John Wiebalk Operating System Engineer UNIX | Enterprise Technology Infrastructure Phone: 412-647-3881 Email: wie...@upmc.edumailto:wie...@upmc.edu From: Wiebalk, John Sent: Monday, March 12, 2012 1:47 PM To: 'Ganglia-general@lists.sourceforge.net' Subject: Re: [Ganglia-general] Ganglia gmond memory leak? We are also experiencing this issue at our site. We are running Ganglia 3.2 on AIX. We recently upgrade from 3.0.7 and started experiencing this issue. We used the rpm / ibm metrics from http://www.perzl.org/ganglia/ Has anyone test to see if this issue still exists in a new version of ganglia? -- John Wiebalk Operating System Engineer UNIX | Enterprise Technology Infrastructure -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia gmond memory leak?
On 03/23/2012 12:34 PM, Wiebalk, John wrote: I ran a test on two new systems, one with the modules commented out and one with the modules running. The one without the modules grew from 3MB - 49MB mem usage in 3 days and the one with the modules grew from 3MB to 50MB in 3 days. These were two freshly configured lpars with nothing else running on them. AIX 6.1 Tl5 SP6. Hmm, so this is not really attributed then to the additional gmond modules but rather a gmond issue. Is there a precompiled binary for valgrind on AIX? We try not to install compilers on our systems for security. However if not available I can install it on a lab system to run with gmond. Unfortunately there seems to be no precompiled binary for valgrind on AIX available. I am just trying to get it compiled on AIX and will keep you updated of my progress the next couple of days. Regards , Michael -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia gmond memory leak?
My gmond's always bloat quite large. To combat this, I've dedicated a single host with massive amount of swap to run a gmonds as a 'collector' role. Every time I add a new cluster, I add a new gmond instance on the collector box with a new port. The recent discussion about setting dmax caused me to dig and find that virtually all of my custom python library metrics are setting dmax to zero which I think is the cause of this bloat. I've failed so far at determining how to set dmax and I haven't had much luck asking this list so far. dmax? how do you set it? On Tue, Mar 20, 2012 at 9:37 AM, Michael Perzl mich...@perzl.org wrote: Can you please try the following if possible: 1) Run gmond without any additional modules and check if it is still leaking memory. -- This test would exclude - if gmond is then still leaking memory - any additional gmond module as the culprit for the memory leak. 2) Do you have the chance to run gmond in the foreground for some time under some tool like valgrind or Purify? Regards, Michael On 03/20/2012 09:27 AM, Florian Munz wrote: yes, it's also happening for me on 3.3.1 Anyone know how to move forward with this? I'd consider this a quite serious issue. Cheers, Florian On 12.03.12 18:46, Wiebalk, John wrote: We are also experiencing this issue at our site. We are running Ganglia 3.2 on AIX. We recently upgrade from 3.0.7 and started experiencing this issue. We used the rpm / ibm metrics from http://www.perzl.org/ganglia/ Has anyone test to see if this issue still exists in a new version of ganglia? -- John Wiebalk Operating System Engineer UNIX | Enterprise Technology Infrastructure -- Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-dev2 ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia gmond memory leak?
On Mar 20, 2012, at 12:02 PM, David Birdsong wrote: My gmond's always bloat quite large. To combat this, I've dedicated a single host with massive amount of swap to run a gmonds as a 'collector' role. Every time I add a new cluster, I add a new gmond instance on the collector box with a new port. The recent discussion about setting dmax caused me to dig and find that virtually all of my custom python library metrics are setting dmax to zero which I think is the cause of this bloat. I've failed so far at determining how to set dmax and I haven't had much luck asking this list so far. dmax? how do you set it? Ganglia keeps three values for each metric submitted by a client: TMAX, TN, and DMAX. TMAX, as far as users are concerned, is informational. It indicates the interval at which ganglia expects new values to be submitted by a host. TN indicates the number of seconds since a metric was last updated. When TN is bigger than TMAX, ganglia is waiting to store new data. DMAX indicates how long old metrics should linger. If TN exceeds this number, ganglia will stop showing graphs for that metric. So, set DMAX to an interval equal to when you no longer care about a metric being reported on the web page. Setting it to zero tells ganglia to never consider it expired. Zero is appropriate for most stuff and in a default gmond install that's what you'll get. I would assume the only time one wants to set DMAX is in situations where the NAME attribute in a metric changes frequently (which I've seen, the torque PBS module does this). If you don't, your xml tree will quickly fill up with ancient data. -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia gmond memory leak?
On Thu, Feb 23, 2012 at 11:06 AM, Matt Massie m...@massie.us wrote: Each unique metric (keyed on metric name) requires memory space in gmond. A good test is to peek at the number of metrics in gmond over time, e.g. $ telnet localhost 8649 | grep METRIC | wc -l If the number of metrics over time increases, so will the memory use. Ganglia will release the metric and memory when the age of the metric is greater than DMAX. A DMAX value of zero will cause ganglia to hold the metric indefinitely. In order to make sure that ganglia is releasing old metrics, set the DMAX value to something like 5 minutes (300 secs). aside from metrics originating from gmetric, where does one set dmax? i can't find any reference on how to set it and my understanding of gmond.conf is that host_dmax != dmax. For example, lets assume you are doing per process monitoring and the metric name looks like cpu_user.%d % (pid,) Over time, you'll have lots of metrics (cpu_user.343493, cpu_user.343022, cpu_user.232323) that start accumulating and taking up memory space. -Matt On Thu, Feb 23, 2012 at 10:01 AM, svd.gang...@mylife.com wrote: i observed this in the past as well. running valgrind for days did not yeild any clue. i had a hunch that remote spoofed metrics were involved, as the leak seemed to get better when i had coincidentally disabled the sending of some of those spoof metrics. but, we never found anything conclusive. there was also some odd race such that sometimes after restart the leak was much faster, but after restarting a few times the leak slowed (but was always still fast enough to be a burden). -scott From: Aidan Wong aidanw...@attinteractive.com To: Ave-Lallemant, Nathan P nathan.p.ave-lallem...@efleets.com; ganglia-general ganglia-general@lists.sourceforge.net Sent: Thursday, February 23, 2012 8:34 AM Subject: Re: [Ganglia-general] Ganglia gmond memory leak? I've restarted the gmond process and memory usage drops until gmond hogs memory over time. ?Any Ganglia contributors who may want to chime in on this memory leak issue? ?I'm on Ganglia 3.2.0. ?Are there any improvements on version 3.3.1 addressing this issue? Thanks -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia gmond memory leak?
I've also observed this and have been unable to find a solution. In my case at least there was no obvious correlation with the number of metrics or weather the gmond was an aggregating or not (so several orders of magnitude in the number of metrics did not matter, it might happen on 2 out of 80 nodes). gmond would take up memory physical RAM, swap, and general sadness. I'm unfortunately not able to provide further information since we went to nightly gmond restarts as a work around. On 02/22/2012 05:10 PM, Aidan Wong wrote: Hi it looks like my install of gmond version 3.2.0 is leaking memory. The amount of resident used memory that the process uses, gets up pretty high and keeps increasing. USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 18647 0.0 9.9 2965464 1836268 ? Ss Jan14 11:24 /home/t/hadoop-ganglia-client/sbin/gmond -c /home/t/hadoop-ganglia-client/gmond.conf -p /home/t/hadoop-ganglia-client/logs/gmond.pid Is this a bug? Can anyone suggest a solution? Thank you -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-dev2 ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia gmond memory leak?
Hi Aidan, for what it is worth, I cannot reproduce the growing memory consumption on a small 3.2.0 grid using only standard metrics in unicast mode. Running now for a few hours. Will check again tomorrow. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de From: Aidan Wong aidanw...@attinteractive.com To: Ave-Lallemant, Nathan P nathan.p.ave-lallem...@efleets.com; ganglia-general ganglia-general@lists.sourceforge.net Sent: Thursday, February 23, 2012 8:34 AM Subject: Re: [Ganglia-general] Ganglia gmond memory leak? I've restarted the gmond process and memory usage drops until gmond hogs memory over time. Any Ganglia contributors who may want to chime in on this memory leak issue? I'm on Ganglia 3.2.0. Are there any improvements on version 3.3.1 addressing this issue? Thanks From: Ave-Lallemant, Nathan P nathan.p.ave-lallem...@efleets.com Date: Wed, 22 Feb 2012 16:31:58 -0600 To: Aidan Wong aidanw...@attinteractive.com, ganglia-general ganglia-general@lists.sourceforge.net Subject: RE: Ganglia gmond memory leak? I have seen the same behavior in my environment but do not have a solution. Nathan From:Aidan Wong [mailto:aidanw...@attinteractive.com] Sent: Wednesday, February 22, 2012 4:10 PM To: ganglia-general Subject: [Ganglia-general] Ganglia gmond memory leak? Hi it looks like my install of gmond version 3.2.0 is leaking memory. The amount of resident used memory that the process uses, gets up pretty high and keeps increasing. USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 18647 0.0 9.9 2965464 1836268 ? Ss Jan14 11:24 /home/t/hadoop-ganglia-client/sbin/gmond -c /home/t/hadoop-ganglia-client/gmond.conf -p /home/t/hadoop-ganglia-client/logs/gmond.pid Is this a bug? Can anyone suggest a solution? Thank you CONFIDENTIALITY NOTICE: This e-mail and any files transmitted with it are intended solely for the use of the individual or entity to whom they are addressed and may contain confidential and privileged information protected by law. If you received this e-mail in error, any review, use, dissemination, distribution, or copying of the e-mail is strictly prohibited. Please notify the sender immediately by return e-mail and delete all copies from your system. -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-dev2___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia gmond memory leak?
I'm not using any IIRC plugins as far as I know. I'm using basically Ganglia 3.2.0 right out of the box. The extra metrics that I'm sending are from my Hadoop cluster nodes where I defined the host and gmond port of the destination gmond that collects the metrics. On 2/23/12 6:27 PM, Robin Humble robin.humble+gang...@anu.edu.au wrote: On Thu, Feb 23, 2012 at 07:22:36PM +, Aidan Wong wrote: That one node that recently had the running away memory leak was sending 253 metrics. I'm using unicast sending all metrics to a specific host where I have configured the udp_send_channel with the host and port attributes defined. IIRC plugins are loaded once and then run within gmond's address space. so I guess plugins could be causing memory leaks. which plugins are you using? they alloc/free as they should? we haven't notived any leaks (certainly no serious leaks) across our ~1800 gmonds using 3.2.0, but we aren't sending that many metrics either - just using most of the standard stuff plus modified diskstat, cputemp python plugins, and with a bunch of other metrics spoof'd from chassis and switches (more cpu cycles for HPC job this way). we are using multicast. all except a few gmonds (not included below) are senders only. %CPU %MEMVSZ RSS COMMAND min 0.0 0.0 70972 1864 /usr/sbin/gmond median0.0 0.0 70972 3392 /usr/sbin/gmond ave 0.0 0.0 70977 3240 /usr/sbin/gmond max 0.0 0.0 71104 4720 /usr/sbin/gmond those with the larger RSS have been rebooted recently and haven't yet had unused pages pushed out by vm pressure. cheers, robin -- Dr Robin Humble, HPC Systems Analyst, NCI National Facility -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia gmond memory leak?
I have the following config in regards to metric cleanup: host_dmax = 259200 /*secs - 3 days*/ cleanup_threshold = 300 /*secs */ From: Matt Massie m...@massie.usmailto:m...@massie.us Date: Thu, 23 Feb 2012 11:06:03 -0800 To: svd.gang...@mylife.commailto:svd.gang...@mylife.com Cc: ganglia-general@lists.sourceforge.netmailto:ganglia-general@lists.sourceforge.net Subject: Re: [Ganglia-general] Ganglia gmond memory leak? Each unique metric (keyed on metric name) requires memory space in gmond. A good test is to peek at the number of metrics in gmond over time, e.g. $ telnet localhost 8649 | grep METRIC | wc -l If the number of metrics over time increases, so will the memory use. Ganglia will release the metric and memory when the age of the metric is greater than DMAX. A DMAX value of zero will cause ganglia to hold the metric indefinitely. In order to make sure that ganglia is releasing old metrics, set the DMAX value to something like 5 minutes (300 secs). For example, lets assume you are doing per process monitoring and the metric name looks like cpu_user.%d % (pid,) Over time, you'll have lots of metrics (cpu_user.343493, cpu_user.343022, cpu_user.232323) that start accumulating and taking up memory space. -Matt On Thu, Feb 23, 2012 at 10:01 AM, svd.gang...@mylife.commailto:svd.gang...@mylife.com wrote: i observed this in the past as well. running valgrind for days did not yeild any clue. i had a hunch that remote spoofed metrics were involved, as the leak seemed to get better when i had coincidentally disabled the sending of some of those spoof metrics. but, we never found anything conclusive. there was also some odd race such that sometimes after restart the leak was much faster, but after restarting a few times the leak slowed (but was always still fast enough to be a burden). -scott From: Aidan Wong aidanw...@attinteractive.commailto:aidanw...@attinteractive.com To: Ave-Lallemant, Nathan P nathan.p.ave-lallem...@efleets.commailto:nathan.p.ave-lallem...@efleets.com; ganglia-general ganglia-general@lists.sourceforge.netmailto:ganglia-general@lists.sourceforge.net Sent: Thursday, February 23, 2012 8:34 AM Subject: Re: [Ganglia-general] Ganglia gmond memory leak? I've restarted the gmond process and memory usage drops until gmond hogs memory over time. ?Any Ganglia contributors who may want to chime in on this memory leak issue? ?I'm on Ganglia 3.2.0. ?Are there any improvements on version 3.3.1 addressing this issue? Thanks -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.netmailto:Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.netmailto:Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia gmond memory leak?
Hi Aidan, if possible for you, I would suggest running the gmond in foreground under the control of valgrind or a similar tool. Send us the report generated by the tool. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de From: Aidan Wong aidanw...@attinteractive.com To: Ave-Lallemant, Nathan P nathan.p.ave-lallem...@efleets.com; ganglia-general ganglia-general@lists.sourceforge.net Sent: Thursday, February 23, 2012 8:34 AM Subject: Re: [Ganglia-general] Ganglia gmond memory leak? I've restarted the gmond process and memory usage drops until gmond hogs memory over time. Any Ganglia contributors who may want to chime in on this memory leak issue? I'm on Ganglia 3.2.0. Are there any improvements on version 3.3.1 addressing this issue? Thanks From: Ave-Lallemant, Nathan P nathan.p.ave-lallem...@efleets.com Date: Wed, 22 Feb 2012 16:31:58 -0600 To: Aidan Wong aidanw...@attinteractive.com, ganglia-general ganglia-general@lists.sourceforge.net Subject: RE: Ganglia gmond memory leak? I have seen the same behavior in my environment but do not have a solution. Nathan From:Aidan Wong [mailto:aidanw...@attinteractive.com] Sent: Wednesday, February 22, 2012 4:10 PM To: ganglia-general Subject: [Ganglia-general] Ganglia gmond memory leak? Hi it looks like my install of gmond version 3.2.0 is leaking memory. The amount of resident used memory that the process uses, gets up pretty high and keeps increasing. USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 18647 0.0 9.9 2965464 1836268 ? Ss Jan14 11:24 /home/t/hadoop-ganglia-client/sbin/gmond -c /home/t/hadoop-ganglia-client/gmond.conf -p /home/t/hadoop-ganglia-client/logs/gmond.pid Is this a bug? Can anyone suggest a solution? Thank you CONFIDENTIALITY NOTICE: This e-mail and any files transmitted with it are intended solely for the use of the individual or entity to whom they are addressed and may contain confidential and privileged information protected by law. If you received this e-mail in error, any review, use, dissemination, distribution, or copying of the e-mail is strictly prohibited. Please notify the sender immediately by return e-mail and delete all copies from your system. -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia gmond memory leak?
How many metrics are you monitoring? gmond must allocated memory for each metric, from each host. If you are using multicast, each gmond instance will get metrics from all other instances. If you run gmond in isolation--no traffic to/from other gmond instances--does memory usage still go up? On Wed, Feb 22, 2012 at 17:10, Aidan Wong aidanw...@attinteractive.com wrote: Hi it looks like my install of gmond version 3.2.0 is leaking memory. The amount of resident used memory that the process uses, gets up pretty high and keeps increasing. USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 18647 0.0 9.9 2965464 1836268 ? Ss Jan14 11:24 /home/t/hadoop-ganglia-client/sbin/gmond -c /home/t/hadoop-ganglia-client/gmond.conf -p /home/t/hadoop-ganglia-client/logs/gmond.pid Is this a bug? Can anyone suggest a solution? Thank you -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Jesse Becker -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia gmond memory leak?
Hi Jesse, but in that case the memory footprint of gmond would approach a maximum after some time - correct? Aidan did not say whether it grows forever or goes asymptotic. Aidan? Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de From: Jesse Becker haw...@gmail.com To: Aidan Wong aidanw...@attinteractive.com Cc: ganglia-general ganglia-general@lists.sourceforge.net Sent: Thursday, February 23, 2012 2:36 PM Subject: Re: [Ganglia-general] Ganglia gmond memory leak? How many metrics are you monitoring? gmond must allocated memory for each metric, from each host. If you are using multicast, each gmond instance will get metrics from all other instances. If you run gmond in isolation--no traffic to/from other gmond instances--does memory usage still go up? On Wed, Feb 22, 2012 at 17:10, Aidan Wong aidanw...@attinteractive.com wrote: Hi it looks like my install of gmond version 3.2.0 is leaking memory. The amount of resident used memory that the process uses, gets up pretty high and keeps increasing. USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 18647 0.0 9.9 2965464 1836268 ? Ss Jan14 11:24 /home/t/hadoop-ganglia-client/sbin/gmond -c /home/t/hadoop-ganglia-client/gmond.conf -p /home/t/hadoop-ganglia-client/logs/gmond.pid Is this a bug? Can anyone suggest a solution? Thank you -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Jesse Becker -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia gmond memory leak?
i observed this in the past as well. running valgrind for days did not yeild any clue. i had a hunch that remote spoofed metrics were involved, as the leak seemed to get better when i had coincidentally disabled the sending of some of those spoof metrics. but, we never found anything conclusive. there was also some odd race such that sometimes after restart the leak was much faster, but after restarting a few times the leak slowed (but was always still fast enough to be a burden). -scott From: Aidan Wong aidanw...@attinteractive.com To: Ave-Lallemant, Nathan P nathan.p.ave-lallem...@efleets.com; ganglia-general ganglia-general@lists.sourceforge.net Sent: Thursday, February 23, 2012 8:34 AM Subject: Re: [Ganglia-general] Ganglia gmond memory leak? I've restarted the gmond process and memory usage drops until gmond hogs memory over time. ?Any Ganglia contributors who may want to chime in on this memory leak issue? ?I'm on Ganglia 3.2.0. ?Are there any improvements on version 3.3.1 addressing this issue? Thanks -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia gmond memory leak?
makes sense, but i know in my case the number of metrics was constant after the server gmond had been started for about 10 minutes all gmetric crons had a chance to submit an initial value. -scott On Thu, 23 Feb 2012, Matt Massie wrote: Each unique metric (keyed on metric name) requires memory space in gmond. A good test is to peek at the number of metrics in gmond over time, e.g. $ telnet localhost 8649 | grep METRIC | wc -l If the number of metrics over time increases, so will the memory use. Ganglia will release the metric and memory when the age of the metric is greater than DMAX. A DMAX value of zero will cause ganglia to hold the metric indefinitely. In order to make sure that ganglia is releasing old metrics, set the DMAX value to something like 5 minutes (300 secs). For example, lets assume you are doing per process monitoring and the metric name looks like cpu_user.%d % (pid,) Over time, you'll have lots of metrics (cpu_user.343493, cpu_user.343022, cpu_user.232323) that start accumulating and taking up memory space. -Matt-- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia gmond memory leak?
That one node that recently had the running away memory leak was sending 253 metrics. I'm using unicast sending all metrics to a specific host where I have configured the udp_send_channel with the host and port attributes defined. On 2/23/12 5:36 AM, Jesse Becker haw...@gmail.com wrote: How many metrics are you monitoring? gmond must allocated memory for each metric, from each host. If you are using multicast, each gmond instance will get metrics from all other instances. If you run gmond in isolation--no traffic to/from other gmond instances--does memory usage still go up? On Wed, Feb 22, 2012 at 17:10, Aidan Wong aidanw...@attinteractive.com wrote: Hi it looks like my install of gmond version 3.2.0 is leaking memory. The amount of resident used memory that the process uses, gets up pretty high and keeps increasing. USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 18647 0.0 9.9 2965464 1836268 ? Ss Jan14 11:24 /home/t/hadoop-ganglia-client/sbin/gmond -c /home/t/hadoop-ganglia-client/gmond.conf -p /home/t/hadoop-ganglia-client/logs/gmond.pid Is this a bug? Can anyone suggest a solution? Thank you - - Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Jesse Becker -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia gmond memory leak?
To me it looks like gmond memory usage goes on as long as there is memory resource left and I've seen some nodes with gmond causing swapping. Before restart of gmond: $ free -m total used free sharedbuffers cached Mem: 18038 13217 4820 0163 8719 -/+ buffers/cache: 4335 13703 Swap: 5945 4795 1150 $ ps aux | grep gmond 1595016419 0.0 0.0 61180 760 pts/0S+ 19:25 0:00 grep gmond root 16804 0.0 5.3 9195200 979944 ? Ss2011 36:06 /home/t/hadoop-ganglia-client/sbin/gmond -c /home/t/hadoop-ganglia-client/gmond.conf -p /home/t/hadoop-ganglia-client/logs/gmond.pid After restart of gmond: $ free -m total used free sharedbuffers cached Mem: 18038 13715 4322 0165 8842 -/+ buffers/cache: 4708 13330 Swap: 5945151 5794 $ ps aux | grep gmond root 18492 0.0 0.0 43228 1348 ?Ss 19:26 0:00 /home/t/hadoop-ganglia-client/sbin/gmond -c /home/t/hadoop-ganglia-client/gmond.conf -p /home/t/hadoop-ganglia-client/logs/gmond.pid 1595018717 0.0 0.0 61184 772 pts/0S+ 19:27 0:00 grep gmond From: Martin Knoblauch kn...@knobisoft.demailto:kn...@knobisoft.de Reply-To: Martin Knoblauch kn...@knobisoft.demailto:kn...@knobisoft.de Date: Thu, 23 Feb 2012 05:56:26 -0800 To: Jesse Becker haw...@gmail.commailto:haw...@gmail.com, Aidan Wong aidanw...@attinteractive.commailto:aidanw...@attinteractive.com Cc: ganglia-general ganglia-general@lists.sourceforge.netmailto:ganglia-general@lists.sourceforge.net Subject: Re: [Ganglia-general] Ganglia gmond memory leak? Hi Jesse, but in that case the memory footprint of gmond would approach a maximum after some time - correct? Aidan did not say whether it grows forever or goes asymptotic. Aidan? Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de From: Jesse Becker haw...@gmail.commailto:haw...@gmail.com To: Aidan Wong aidanw...@attinteractive.commailto:aidanw...@attinteractive.com Cc: ganglia-general ganglia-general@lists.sourceforge.netmailto:ganglia-general@lists.sourceforge.net Sent: Thursday, February 23, 2012 2:36 PM Subject: Re: [Ganglia-general] Ganglia gmond memory leak? How many metrics are you monitoring? gmond must allocated memory for each metric, from each host. If you are using multicast, each gmond instance will get metrics from all other instances. If you run gmond in isolation--no traffic to/from other gmond instances--does memory usage still go up? On Wed, Feb 22, 2012 at 17:10, Aidan Wong aidanw...@attinteractive.commailto:aidanw...@attinteractive.com wrote: Hi it looks like my install of gmond version 3.2.0 is leaking memory. The amount of resident used memory that the process uses, gets up pretty high and keeps increasing. USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 18647 0.0 9.9 2965464 1836268 ? Ss Jan14 11:24 /home/t/hadoop-ganglia-client/sbin/gmond -c /home/t/hadoop-ganglia-client/gmond.conf -p /home/t/hadoop-ganglia-client/logs/gmond.pid Is this a bug? Can anyone suggest a solution? Thank you -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.netmailto:Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Jesse Becker -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.netmailto:Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia gmond memory leak?
On Thu, Feb 23, 2012 at 07:22:36PM +, Aidan Wong wrote: That one node that recently had the running away memory leak was sending 253 metrics. I'm using unicast sending all metrics to a specific host where I have configured the udp_send_channel with the host and port attributes defined. IIRC plugins are loaded once and then run within gmond's address space. so I guess plugins could be causing memory leaks. which plugins are you using? they alloc/free as they should? we haven't notived any leaks (certainly no serious leaks) across our ~1800 gmonds using 3.2.0, but we aren't sending that many metrics either - just using most of the standard stuff plus modified diskstat, cputemp python plugins, and with a bunch of other metrics spoof'd from chassis and switches (more cpu cycles for HPC job this way). we are using multicast. all except a few gmonds (not included below) are senders only. %CPU %MEMVSZ RSS COMMAND min 0.0 0.0 70972 1864 /usr/sbin/gmond median0.0 0.0 70972 3392 /usr/sbin/gmond ave 0.0 0.0 70977 3240 /usr/sbin/gmond max 0.0 0.0 71104 4720 /usr/sbin/gmond those with the larger RSS have been rebooted recently and haven't yet had unused pages pushed out by vm pressure. cheers, robin -- Dr Robin Humble, HPC Systems Analyst, NCI National Facility -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
[Ganglia-general] Ganglia gmond memory leak?
Hi it looks like my install of gmond version 3.2.0 is leaking memory. The amount of resident used memory that the process uses, gets up pretty high and keeps increasing. USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 18647 0.0 9.9 2965464 1836268 ? Ss Jan14 11:24 /home/t/hadoop-ganglia-client/sbin/gmond -c /home/t/hadoop-ganglia-client/gmond.conf -p /home/t/hadoop-ganglia-client/logs/gmond.pid Is this a bug? Can anyone suggest a solution? Thank you -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia gmond memory leak?
I've restarted the gmond process and memory usage drops until gmond hogs memory over time. Any Ganglia contributors who may want to chime in on this memory leak issue? I'm on Ganglia 3.2.0. Are there any improvements on version 3.3.1 addressing this issue? Thanks From: Ave-Lallemant, Nathan P nathan.p.ave-lallem...@efleets.commailto:nathan.p.ave-lallem...@efleets.com Date: Wed, 22 Feb 2012 16:31:58 -0600 To: Aidan Wong aidanw...@attinteractive.commailto:aidanw...@attinteractive.com, ganglia-general ganglia-general@lists.sourceforge.netmailto:ganglia-general@lists.sourceforge.net Subject: RE: Ganglia gmond memory leak? I have seen the same behavior in my environment but do not have a solution. Nathan From: Aidan Wong [mailto:aidanw...@attinteractive.com] Sent: Wednesday, February 22, 2012 4:10 PM To: ganglia-general Subject: [Ganglia-general] Ganglia gmond memory leak? Hi it looks like my install of gmond version 3.2.0 is leaking memory. The amount of resident used memory that the process uses, gets up pretty high and keeps increasing. USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 18647 0.0 9.9 2965464 1836268 ? Ss Jan14 11:24 /home/t/hadoop-ganglia-client/sbin/gmond -c /home/t/hadoop-ganglia-client/gmond.conf -p /home/t/hadoop-ganglia-client/logs/gmond.pid Is this a bug? Can anyone suggest a solution? Thank you CONFIDENTIALITY NOTICE: This e-mail and any files transmitted with it are intended solely for the use of the individual or entity to whom they are addressed and may contain confidential and privileged information protected by law. If you received this e-mail in error, any review, use, dissemination, distribution, or copying of the e-mail is strictly prohibited. Please notify the sender immediately by return e-mail and delete all copies from your system. -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general