Pls see below From: Vijaikumar Mallikarjuna [mailto:vmall...@redhat.com] Sent: Wednesday, 21 October 2015 6:37 PM To: Sincock, John [FLCPTY] Cc: gluster-devel@gluster.org Subject: Re: [Gluster-devel] Need advice re some major issues with glusterfind
On Wed, Oct 21, 2015 at 5:53 AM, Sincock, John [FLCPTY] <j.sinc...@fugro.com<mailto:j.sinc...@fugro.com>> wrote: Hi Everybody, We have recently upgraded our 220 TB gluster to 3.7.4, and we've been trying to use the new glusterfind feature but have been having some serious problems with it. Overall the glusterfind looks very promising, so I don't want to offend anyone by raising these issues. If these issues can be resolved or worked around, glusterfind will be a great feature. So I would really appreciate any information or advice: 1) What can be done about the vast number of tiny changelogs? We are seeing often 5+ small 89 byte changelog files per minute on EACH brick. Larger files if busier. We've been generating these changelogs for a few weeks and have in excess of 10,000 or 12,000 on most bricks. This makes glusterfinds very, very slow, especially on a node which has a lot of bricks, and looks unsustainable in the long run. Why are these files so small, and why are there so many of them, and how are they supposed to be managed in the long run? The sheer number of these files looks sure to impact performance in the long run. 2) Pgfid xattribute is wreaking havoc with our backup scheme - when gluster adds this extended attribute to files it changes the ctime, which we were using to determine which files need to be archived. There should be a warning added to release notes & upgrade notes, so people can make a plan to manage this if required. Also, we ran a rebalance immediately after the 3.7.4 upgrade, and the rebalance took 5 days or so to complete, which looks like a major speed improvement over the more serial rebalance algorithm, so that's good. But I was hoping that the rebalance would also have had the side-effect of triggering all files to be labelled with the pgfid attribute by the time the rebalance completed, or failing that, after creation of an mlocate database across our entire gluster (which would have accessed every file, unless it is getting the info it needs only from directory inodes). Now it looks like ctimes are still being modified, and I think this can only be caused by files still being labelled with pgfids. How can we force gluster to get this pgfid labelling over and done with, for all files that are already on the volume? We can't have gluster continuing to add pgfids in bursts here and there, eg when files are read for the first time since the upgrade. We need to get it over and done with. We have just had to turn off pgfid creation on the volume until we can force gluster to get it over and done with in one go. Hi John, Was quota turned on/off before/after performing re-balance? If the pgfid is missing, this can be healed by performing 'find <mount_point> | xargs stat', all the files will get looked-up once and the pgfid healing will happen. Also could you please provide all the volume files under '/var/lib/glusterd/vols/<volname>/*.vol'? Thanks, Vijay Hi Vijay Quota has never been turned on in our gluster, so it can’t be any quota-related xattrs which are resetting our ctimes, so I’m pretty sure it must be due to pgfids still being added. Thanks for the tip re using stat, if that should trigger the pgfid build on each file, then I will run that when I have a chance. We’ll have to get our archiving of data back up to date, re-enable pgfid build option, and then run the stat over a weekend or something, as it will take a while. I’m still quite concerned about the number of changelogs being generated. Do you know if there any plans to change the way changelogs are generated so there aren’t so many of them, and to process them more efficiently? I think this will be vital to improving performance of glusterfind in future, as there are currently an enormous number of these small changelogs being generated on each of our gluster bricks. Below is the volfile for one brick, the others are all equivalent. We haven’t tweaked the volume options much, besides increasing the io thread count to 32, and client/event threads to 6 (since we have a lot of small files on our gluster (30 million files, a lot of which are small, and some of which are large to very large): [root@g-unit-1 sbin]# cat /var/lib/glusterd/vols/vol00/vol00.g-unit-1.mnt-glusterfs-bricks-1.vol volume vol00-posix type storage/posix option update-link-count-parent off option volume-id 292b8701-d394-48ee-a224-b5a20ca7ce0f option directory /mnt/glusterfs/bricks/1 end-volume volume vol00-trash type features/trash option trash-internal-op off option brick-path /mnt/glusterfs/bricks/1 option trash-dir .trashcan subvolumes vol00-posix end-volume volume vol00-changetimerecorder type features/changetimerecorder option record-counters off option ctr-enabled off option record-entry on option ctr_inode_heal_expire_period 300 option ctr_hardlink_heal_expire_period 300 option ctr_link_consistency off option record-exit off option db-path /mnt/glusterfs/bricks/1/.glusterfs/ option db-name 1.db option hot-brick off option db-type sqlite3 subvolumes vol00-trash end-volume volume vol00-changelog type features/changelog option capture-del-path on option changelog-barrier-timeout 120 option changelog on option changelog-dir /mnt/glusterfs/bricks/1/.glusterfs/changelogs option changelog-brick /mnt/glusterfs/bricks/1 subvolumes vol00-changetimerecorder end-volume volume vol00-bitrot-stub type features/bitrot-stub option export /mnt/glusterfs/bricks/1 subvolumes vol00-changelog end-volume volume vol00-access-control type features/access-control subvolumes vol00-bitrot-stub end-volume volume vol00-locks type features/locks subvolumes vol00-access-control end-volume volume vol00-upcall type features/upcall option cache-invalidation off subvolumes vol00-locks end-volume volume vol00-io-threads type performance/io-threads option thread-count 32 subvolumes vol00-upcall end-volume volume vol00-marker type features/marker option inode-quota off option quota off option gsync-force-xtime off option xtime off option timestamp-file /var/lib/glusterd/vols/vol00/marker.tstamp option volume-uuid 292b8701-d394-48ee-a224-b5a20ca7ce0f subvolumes vol00-io-threads end-volume volume vol00-barrier type features/barrier option barrier-timeout 120 option barrier disable subvolumes vol00-marker end-volume volume vol00-index type features/index option index-base /mnt/glusterfs/bricks/1/.glusterfs/indices subvolumes vol00-barrier end-volume volume vol00-quota type features/quota option deem-statfs off option timeout 0 option server-quota off option volume-uuid vol00 subvolumes vol00-index end-volume volume vol00-worm type features/worm option worm off subvolumes vol00-quota end-volume volume vol00-read-only type features/read-only option read-only off subvolumes vol00-worm end-volume volume /mnt/glusterfs/bricks/1 type debug/io-stats option count-fop-hits off option latency-measurement off subvolumes vol00-read-only end-volume volume vol00-server type protocol/server option event-threads 6 option rpc-auth-allow-insecure on option auth.addr./mnt/glusterfs/bricks/1.allow * option auth.login.dc3d05ba-40ce-47ee-8f4c-a729917784dc.password 58c2072b-8d1c-4921-9270-bf4b477c4126 option auth.login./mnt/glusterfs/bricks/1.allow dc3d05ba-40ce-47ee-8f4c-a729917784dc option transport-type tcp subvolumes /mnt/glusterfs/bricks/1 end-volume
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel