Hi Dan, I had a suggestion and a question in my previous response. Let us know whether the suggestion helps and please let us know about your data-set (like how many directories/files and how these directories/files are organised) to understand the problem better.
<snip> > In the > meantime can you remount glusterfs with options > --entry-timeout=0 and --attribute-timeout=0? This will make sure > that kernel won't cache inodes/attributes of the file and should > bring down the memory usage. > > I am curious to know what is your data-set like? Is it the case > of too many directories and files present in deep directories? I > am wondering whether a significant number of inodes cached by > kernel are there to hold dentry structure in kernel. </snip> regards, Raghavendra ----- Original Message ----- > From: "Dan Ragle" <dan...@biblestuph.com> > To: "Nithya Balachandran" <nbala...@redhat.com> > Cc: "gluster-users" <gluster-users@gluster.org>, "Csaba Henk" > <ch...@redhat.com> > Sent: Saturday, February 3, 2018 7:28:15 PM > Subject: Re: [Gluster-users] Run away memory with gluster mount > > > > On 2/2/2018 2:13 AM, Nithya Balachandran wrote: > > Hi Dan, > > > > It sounds like you might be running into [1]. The patch has been posted > > upstream and the fix should be in the next release. > > In the meantime, I'm afraid there is no way to get around this without > > restarting the process. > > > > Regards, > > Nithya > > > > [1]https://bugzilla.redhat.com/show_bug.cgi?id=1541264 > > > > Much appreciated. Will watch for the next release and retest then. > > Cheers! > > Dan > > > > > On 2 February 2018 at 02:57, Dan Ragle <dan...@biblestuph.com > > <mailto:dan...@biblestuph.com>> wrote: > > > > > > > > On 1/30/2018 6:31 AM, Raghavendra Gowdappa wrote: > > > > > > > > ----- Original Message ----- > > > > From: "Dan Ragle" <dan...@biblestuph.com> > > To: "Raghavendra Gowdappa" <rgowd...@redhat.com > > <mailto:rgowd...@redhat.com>>, "Ravishankar N" > > <ravishan...@redhat.com <mailto:ravishan...@redhat.com>> > > Cc: gluster-users@gluster.org > > <mailto:gluster-users@gluster.org>, "Csaba Henk" > > <ch...@redhat.com <mailto:ch...@redhat.com>>, "Niels de Vos" > > <nde...@redhat.com <mailto:nde...@redhat.com>>, "Nithya > > Balachandran" <nbala...@redhat.com > > <mailto:nbala...@redhat.com>> > > Sent: Monday, January 29, 2018 9:02:21 PM > > Subject: Re: [Gluster-users] Run away memory with gluster mount > > > > > > > > On 1/29/2018 2:36 AM, Raghavendra Gowdappa wrote: > > > > > > > > ----- Original Message ----- > > > > From: "Ravishankar N" <ravishan...@redhat.com > > <mailto:ravishan...@redhat.com>> > > To: "Dan Ragle" <dan...@biblestuph.com>, > > gluster-users@gluster.org > > <mailto:gluster-users@gluster.org> > > Cc: "Csaba Henk" <ch...@redhat.com > > <mailto:ch...@redhat.com>>, "Niels de Vos" > > <nde...@redhat.com <mailto:nde...@redhat.com>>, > > "Nithya Balachandran" <nbala...@redhat.com > > <mailto:nbala...@redhat.com>>, > > "Raghavendra Gowdappa" <rgowd...@redhat.com > > <mailto:rgowd...@redhat.com>> > > Sent: Saturday, January 27, 2018 10:23:38 AM > > Subject: Re: [Gluster-users] Run away memory with > > gluster mount > > > > > > > > On 01/27/2018 02:29 AM, Dan Ragle wrote: > > > > > > On 1/25/2018 8:21 PM, Ravishankar N wrote: > > > > > > > > On 01/25/2018 11:04 PM, Dan Ragle wrote: > > > > *sigh* trying again to correct > > formatting ... apologize for the > > earlier mess. > > > > Having a memory issue with Gluster > > 3.12.4 and not sure how to > > troubleshoot. I don't *think* this is > > expected behavior. > > > > This is on an updated CentOS 7 box. The > > setup is a simple two node > > replicated layout where the two nodes > > act as both server and > > client. > > > > The volume in question: > > > > Volume Name: GlusterWWW > > Type: Replicate > > Volume ID: > > 8e9b0e79-f309-4d9b-a5bb-45d065faaaa3 > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x 2 = 2 > > Transport-type: tcp > > Bricks: > > Brick1: > > > > vs1dlan.mydomain.com:/glusterfs_bricks/brick1/www > > Brick2: > > > > vs2dlan.mydomain.com:/glusterfs_bricks/brick1/www > > Options Reconfigured: > > nfs.disable: on > > cluster.favorite-child-policy: mtime > > transport.address-family: inet > > > > I had some other performance options in > > there, (increased > > cache-size, md invalidation, etc) but > > stripped them out in an > > attempt to > > isolate the issue. Still got the problem > > without them. > > > > The volume currently contains over 1M > > files. > > > > When mounting the volume, I get (among > > other things) a process as such: > > > > /usr/sbin/glusterfs > > --volfile-server=localhost > > --volfile-id=/GlusterWWW /var/www > > > > This process begins with little memory, > > but then as files are > > accessed in the volume the memory > > increases. I setup a script that > > simply reads the files in the volume one > > at a time (no writes). It's > > been running on and off about 12 hours > > now and the resident > > memory of the above process is already > > at 7.5G and continues to grow > > slowly. If I stop the test script the > > memory stops growing, > > but does not reduce. Restart the test > > script and the memory begins > > slowly growing again. > > > > This is obviously a contrived app > > environment. With my intended > > application load it takes about a week > > or so for the memory to get > > high enough to invoke the oom killer. > > > > > > Can you try debugging with the statedump > > > > (https://gluster.readthedocs.io/en/latest/Troubleshooting/statedump/#read-a-statedump > > > > <https://gluster.readthedocs.io/en/latest/Troubleshooting/statedump/#read-a-statedump>) > > of > > the fuse mount process and see what member > > is leaking? Take the > > statedumps in succession, maybe once > > initially during the I/O and > > once the memory gets high enough to hit the > > OOM mark. > > Share the dumps here. > > > > Regards, > > Ravi > > > > > > Thanks for the reply. I noticed yesterday that > > an update (3.12.5) had > > been posted so I went ahead and updated and > > repeated the test > > overnight. The memory usage does not appear to > > be growing as quickly > > as is was with 3.12.4, but does still appear to > > be growing. > > > > I should also mention that there is another > > process beyond my test app > > that is reading the files from the volume. > > Specifically, there is an > > rsync that runs from the second node 2-4 times > > an hour that reads from > > the GlusterWWW volume mounted on node 1. Since > > none of the files in > > that mount are changing it doesn't actually > > rsync anything, but > > nonetheless it is running and reading the files > > in addition to my test > > script. (It's a part of my intended production > > setup that I forgot was > > still running.) > > > > The mount process appears to be gaining memory > > at a rate of about 1GB > > every 4 hours or so. At that rate it'll take > > several days before it > > runs the box out of memory. But I took your > > suggestion and made some > > statedumps today anyway, about 2 hours apart, 4 > > total so far. It looks > > like there may already be some actionable > > information. These are the > > only registers where the num_allocs have grown > > with each of the four > > samples: > > > > [mount/fuse.fuse - usage-type gf_fuse_mt_gids_t > > memusage] > > ---> num_allocs at Fri Jan 26 08:57:31 2018: > > 784 > > ---> num_allocs at Fri Jan 26 10:55:50 2018: > > 831 > > ---> num_allocs at Fri Jan 26 12:55:15 2018: > > 877 > > ---> num_allocs at Fri Jan 26 14:58:27 2018: > > 908 > > > > [mount/fuse.fuse - usage-type > > gf_common_mt_fd_lk_ctx_t memusage] > > ---> num_allocs at Fri Jan 26 08:57:31 2018: 5 > > ---> num_allocs at Fri Jan 26 10:55:50 2018: 10 > > ---> num_allocs at Fri Jan 26 12:55:15 2018: 15 > > ---> num_allocs at Fri Jan 26 14:58:27 2018: 17 > > > > [cluster/distribute.GlusterWWW-dht - usage-type > > gf_dht_mt_dht_layout_t > > memusage] > > ---> num_allocs at Fri Jan 26 08:57:31 2018: > > 24243596 > > ---> num_allocs at Fri Jan 26 10:55:50 2018: > > 27902622 > > ---> num_allocs at Fri Jan 26 12:55:15 2018: > > 30678066 > > ---> num_allocs at Fri Jan 26 14:58:27 2018: > > 33801036 > > > > Not sure the best way to get you the full dumps. > > They're pretty big, > > over 1G for all four. Also, I noticed some > > filepath information in > > there that I'd rather not share. What's the > > recommended next step? > > > > > > Please run the following query on statedump files and > > report us the > > results: > > # grep itable <client-statedump> | grep active | wc -l > > # grep itable <client-statedump> | grep active_size > > # grep itable <client-statedump> | grep lru | wc -l > > # grep itable <client-statedump> | grep lru_size > > # grep itable <client-statedump> | grep purge | wc -l > > # grep itable <client-statedump> | grep purge_size > > > > > > Had to restart the test and have been running for 36 hours > > now. RSS is > > currently up to 23g. > > > > Working on getting a bug report with link to the dumps. In > > the mean > > time, I'm including the results of your above queries for > > the first > > dump, the 18 hour dump, and the 36 hour dump: > > > > # grep itable glusterdump.153904.dump.1517104561 | grep > > active | wc -l > > 53865 > > # grep itable glusterdump.153904.dump.1517169361 | grep > > active | wc -l > > 53864 > > # grep itable glusterdump.153904.dump.1517234161 | grep > > active | wc -l > > 53864 > > > > # grep itable glusterdump.153904.dump.1517104561 | grep > > active_size > > xlator.mount.fuse.itable.active_size=53864 > > # grep itable glusterdump.153904.dump.1517169361 | grep > > active_size > > xlator.mount.fuse.itable.active_size=53863 > > # grep itable glusterdump.153904.dump.1517234161 | grep > > active_size > > xlator.mount.fuse.itable.active_size=53863 > > > > # grep itable glusterdump.153904.dump.1517104561 | grep lru > > | wc -l > > 998510 > > # grep itable glusterdump.153904.dump.1517169361 | grep lru > > | wc -l > > 998510 > > # grep itable glusterdump.153904.dump.1517234161 | grep lru > > | wc -l > > 995992 > > > > # grep itable glusterdump.153904.dump.1517104561 | grep > > lru_size > > xlator.mount.fuse.itable.lru_size=998508 > > # grep itable glusterdump.153904.dump.1517169361 | grep > > lru_size > > xlator.mount.fuse.itable.lru_size=998508 > > # grep itable glusterdump.153904.dump.1517234161 | grep > > lru_size > > xlator.mount.fuse.itable.lru_size=995990 > > > > > > Around 1 million of inodes in lru table!! These are the inodes > > kernel has just cached and no operation is currently progress on > > these inodes. This could be the reason for high memory usage. > > We've a patch being worked on (merged on experimental branch > > currently) [1], that will help in these sceanrios. In the > > meantime can you remount glusterfs with options > > --entry-timeout=0 and --attribute-timeout=0? This will make sure > > that kernel won't cache inodes/attributes of the file and should > > bring down the memory usage. > > > > I am curious to know what is your data-set like? Is it the case > > of too many directories and files present in deep directories? I > > am wondering whether a significant number of inodes cached by > > kernel are there to hold dentry structure in kernel. > > > > [1] https://review.gluster.org/#/c/18665/ > > <https://review.gluster.org/#/c/18665/> > > > > > > OK, remounted with your recommended attributes and repeated the > > test. Now the mount process looks like this: > > > > /usr/sbin/glusterfs --attribute-timeout=0 --entry-timeout=0 > > --volfile-server=localhost --volfile-id=/GlusterWWW /var/www > > > > However after running for 36 hours it's again at about 23g (about > > the same place it was on the first test). > > > > A few metrics from the 36 hour mark: > > > > num_allocs for [cluster/distribute.GlusterWWW-dht - usage-type > > gf_dht_mt_dht_layout_t memusage] is 109140094. Seems at least > > somewhat similar to the original test, which had 117901593 at the 36 > > hour mark. > > > > The dump file at the 36 hour mark had nothing for lru or lru_size. > > However, at the dump two hours prior it had: > > > > # grep itable glusterdump.67299.dump.1517493361 | grep lru | wc -l > > 998510 > > # grep itable glusterdump.67299.dump.1517493361 | grep lru_size > > xlator.mount.fuse.itable.lru_size=998508 > > > > and the same thing for the dump four hours later. Are these values > > only relevant when the ls -R is actually running? I'm thinking the > > 36 hour dump may have caught the ls -R between runs there (?) > > > > The data set is multiple Web sites. I know there's some litter there > > we can clean up, but I'd guess not more than 200-300k files or so. > > The biggest culprit is a single directory that we use as a > > multi-purpose file store, with filenames stored as GUIDs and linked > > to a DB. That directory currently has 500k+ files. Another directory > > serves a similar purpose and has about 66k files in it. The rest is > > generally distributed more "normally", I.E., a mixed nesting of > > directories and files. > > > > Cheers! > > > > Dan > > > > > > > > # grep itable glusterdump.153904.dump.1517104561 | grep > > purge | wc -l > > 1 > > # grep itable glusterdump.153904.dump.1517169361 | grep > > purge | wc -l > > 1 > > # grep itable glusterdump.153904.dump.1517234161 | grep > > purge | wc -l > > 1 > > > > # grep itable glusterdump.153904.dump.1517104561 | grep > > purge_size > > xlator.mount.fuse.itable.purge_size=0 > > # grep itable glusterdump.153904.dump.1517169361 | grep > > purge_size > > xlator.mount.fuse.itable.purge_size=0 > > # grep itable glusterdump.153904.dump.1517234161 | grep > > purge_size > > xlator.mount.fuse.itable.purge_size=0 > > > > Cheers, > > > > Dan > > > > > > > > I've CC'd the fuse/ dht devs to see if these data > > types have potential > > leaks. Could you raise a bug with the volume info > > and a (dropbox?) link > > from which we can download the dumps? You can > > remove/replace the > > filepaths from them. > > > > Regards. > > Ravi > > > > > > Cheers! > > > > Dan > > > > > > Is there potentially something > > misconfigured here? > > > > I did see a reference to a memory leak > > in another thread in this > > list, but that had to do with the > > setting of quotas, I don't have > > any quotas set on my system. > > > > Thanks, > > > > Dan Ragle > > dan...@biblestuph.com > > > > On 1/25/2018 11:04 AM, Dan Ragle wrote: > > > > Having a memory issue with Gluster > > 3.12.4 and not sure how to > > troubleshoot. I don't *think* this > > is expected behavior. This is on an > > updated CentOS 7 box. The setup is a > > simple two node replicated layout > > where the two nodes act as both > > server and client. The volume in > > question: Volume Name: GlusterWWW > > Type: Replicate Volume ID: > > 8e9b0e79-f309-4d9b-a5bb-45d065faaaa3 > > Status: Started Snapshot Count: 0 > > Number of Bricks: 1 x 2 = 2 > > Transport-type: tcp Bricks: Brick1: > > > > vs1dlan.mydomain.com:/glusterfs_bricks/brick1/www > > Brick2: > > > > vs2dlan.mydomain.com:/glusterfs_bricks/brick1/www > > Options > > Reconfigured: > > nfs.disable: on > > cluster.favorite-child-policy: mtime > > transport.address-family: inet I had > > some other performance options in > > there, (increased cache-size, md > > invalidation, etc) but stripped them > > out in an attempt to isolate the > > issue. Still got the problem without > > them. The volume currently contains > > over 1M files. When mounting the > > volume, I get (among other things) a > > process as such: > > /usr/sbin/glusterfs > > --volfile-server=localhost > > --volfile-id=/GlusterWWW > > /var/www This process begins with > > little memory, but then as files are > > accessed in the volume the memory > > increases. I setup a script that > > simply reads the files in the volume > > one at a time (no writes). It's > > been running on and off about 12 > > hours now and the resident memory of > > the above process is already at 7.5G > > and continues to grow slowly. > > If I > > stop the test script the memory > > stops growing, but does not reduce. > > Restart the test script and the > > memory begins slowly growing again. > > This > > is obviously a contrived app > > environment. With my intended > > application > > load it takes about a week or so for > > the memory to get high enough to > > invoke the oom killer. Is there > > potentially something misconfigured > > here? Thanks, Dan Ragle > > dan...@biblestuph.com > > > > > > > > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users@gluster.org > > <mailto:Gluster-users@gluster.org> > > > > http://lists.gluster.org/mailman/listinfo/gluster-users > > > > <http://lists.gluster.org/mailman/listinfo/gluster-users> > > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users@gluster.org > > <mailto:Gluster-users@gluster.org> > > > > http://lists.gluster.org/mailman/listinfo/gluster-users > > > > <http://lists.gluster.org/mailman/listinfo/gluster-users> > > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users@gluster.org > > <mailto:Gluster-users@gluster.org> > > > > http://lists.gluster.org/mailman/listinfo/gluster-users > > > > <http://lists.gluster.org/mailman/listinfo/gluster-users> > > > > > > > > > > > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users@gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users