On 6 May 2014, at 6:05 pm, Greg Murphy <greg.mur...@gamesparks.com> wrote:
> Attached are the valgrind outputs from two separate runs of lrmd with the > suggested variables set. Do they help narrow the issue down? They do somewhat. I'll investigate. But much of the memory is still reachable: ==26203== indirectly lost: 17,945,950 bytes in 642,546 blocks ==26203== possibly lost: 2,805 bytes in 60 blocks ==26203== still reachable: 26,104,781 bytes in 544,782 blocks ==26203== suppressed: 8,652 bytes in 176 blocks ==26203== Reachable blocks (those to which a pointer was found) are not shown. ==26203== To see them, rerun with: --leak-check=full --show-reachable=yes Could you add the --show-reachable=yes to VALGRIND_OPTS variable? > > > Thanks > > Greg > > > On 02/05/2014 03:01, "Andrew Beekhof" <and...@beekhof.net> wrote: > >> >> On 30 Apr 2014, at 9:01 pm, Greg Murphy <greg.mur...@gamesparks.com> >> wrote: >> >>> Hi >>> >>> I¹m running a two-node Pacemaker cluster on Ubuntu Saucy (13.10), >>> kernel 3.11.0-17-generic and the Ubuntu Pacemaker package, version >>> 1.1.10+git20130802-1ubuntu1. >> >> The problem is that I have no way of knowing what code is/isn't included >> in '1.1.10+git20130802-1ubuntu1'. >> You could try setting the following in your environment before starting >> pacemaker though >> >> # Variables for running child daemons under valgrind and/or checking for >> memory problems >> G_SLICE=always-malloc >> MALLOC_PERTURB_=221 # or 0 >> MALLOC_CHECK_=3 # or 0,1,2 >> PCMK_valgrind_enabled=lrmd >> VALGRIND_OPTS="--leak-check=full --trace-children=no --num-callers=25 >> --log-file=/var/lib/pacemaker/valgrind-%p >> --suppressions=/usr/share/pacemaker/tests/valgrind-pcmk.suppressions >> --gen-suppressions=all" >> >> >>> The cluster is configured with a DRBD master/slave set and then a >>> failover resource group containing MySQL (along with its DRBD >>> filesystem) and a Zabbix Proxy and Agent. >>> >>> Since I built the cluster around two months ago I¹ve noticed that on >>> the the active node the memory footprint of lrmd gradually grows to >>> quite a significant size. The cluster was last restarted three weeks >>> ago, and now lrmd has over 1GB of mapped memory on the active node and >>> only 151MB on the passive node. Current excerpts from /proc/PID/status >>> are: >>> >>> Active node >>> VmPeak: >>> 1146740 kB >>> VmSize: >>> 1146740 kB >>> VmLck: >>> 0 kB >>> VmPin: >>> 0 kB >>> VmHWM: >>> 267680 kB >>> VmRSS: >>> 188764 kB >>> VmData: >>> 1065860 kB >>> VmStk: >>> 136 kB >>> VmExe: >>> 32 kB >>> VmLib: >>> 10416 kB >>> VmPTE: >>> 2164 kB >>> VmSwap: >>> 822752 kB >>> >>> Passive node >>> VmPeak: >>> 220832 kB >>> VmSize: >>> 155428 kB >>> VmLck: >>> 0 kB >>> VmPin: >>> 0 kB >>> VmHWM: >>> 4568 kB >>> VmRSS: >>> 3880 kB >>> VmData: >>> 74548 kB >>> VmStk: >>> 136 kB >>> VmExe: >>> 32 kB >>> VmLib: >>> 10416 kB >>> VmPTE: >>> 172 kB >>> VmSwap: >>> 0 kB >>> >>> During the last week or so I¹ve taken a couple of snapshots of >>> /proc/PID/smaps on the active node, and the heap particularly stands out >>> as growing: (I have the full outputs captured if they¹ll help) >>> >>> 20140422 >>> 7f92e1578000-7f92f218b000 rw-p 00000000 00:00 0 >>> [heap] >>> Size: 274508 kB >>> Rss: 180152 kB >>> Pss: 180152 kB >>> Shared_Clean: 0 kB >>> Shared_Dirty: 0 kB >>> Private_Clean: 0 kB >>> Private_Dirty: 180152 kB >>> Referenced: 120472 kB >>> Anonymous: 180152 kB >>> AnonHugePages: 0 kB >>> Swap: 91568 kB >>> KernelPageSize: 4 kB >>> MMUPageSize: 4 kB >>> Locked: 0 kB >>> VmFlags: rd wr mr mw me ac >>> >>> >>> 20140423 >>> 7f92e1578000-7f92f305e000 rw-p 00000000 00:00 0 >>> [heap] >>> Size: 289688 kB >>> Rss: 184136 kB >>> Pss: 184136 kB >>> Shared_Clean: 0 kB >>> Shared_Dirty: 0 kB >>> Private_Clean: 0 kB >>> Private_Dirty: 184136 kB >>> Referenced: 69748 kB >>> Anonymous: 184136 kB >>> AnonHugePages: 0 kB >>> Swap: 103112 kB >>> KernelPageSize: 4 kB >>> MMUPageSize: 4 kB >>> Locked: 0 kB >>> VmFlags: rd wr mr mw me ac >>> >>> 20140430 >>> 7f92e1578000-7f92fc01d000 rw-p 00000000 00:00 0 >>> [heap] >>> Size: 436884 kB >>> Rss: 140812 kB >>> Pss: 140812 kB >>> Shared_Clean: 0 kB >>> Shared_Dirty: 0 kB >>> Private_Clean: 744 kB >>> Private_Dirty: 140068 kB >>> Referenced: 43600 kB >>> Anonymous: 140812 kB >>> AnonHugePages: 0 kB >>> Swap: 287392 kB >>> KernelPageSize: 4 kB >>> MMUPageSize: 4 kB >>> Locked: 0 kB >>> VmFlags: rd wr mr mw me ac >>> >>> I noticed in the release notes for 1.1.10-rc1 >>> (https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.10-r >>> c1) that there was work done to fix "crmd: lrmd: stonithd: fixed memory >>> leaks² but I¹m not sure which particular bug this was related to. (And >>> those fixes should be in the version I¹m running anyway). >>> >>> I¹ve also spotted a few memory leak fixes in >>> https://github.com/beekhof/pacemaker, but I¹m not sure whether they >>> relate to my issue (assuming I have a memory leak and this isn¹t >>> expected behaviour). >>> >>> Is there additional debugging that I can perform to check whether I >>> have a leak, or is there enough evidence to justify upgrading to 1.1.11? >>> >>> Thanks in advance >>> >>> Greg Murphy >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> > > <lrmd.tgz>_______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org