Pls see below

From: Vijaikumar Mallikarjuna [mailto:vmall...@redhat.com]
Sent: Wednesday, 21 October 2015 6:37 PM
To: Sincock, John [FLCPTY]
Cc: gluster-devel@gluster.org
Subject: Re: [Gluster-devel] Need advice re some major issues with glusterfind



On Wed, Oct 21, 2015 at 5:53 AM, Sincock, John [FLCPTY] 
<j.sinc...@fugro.com<mailto:j.sinc...@fugro.com>> wrote:
Hi Everybody,

We have recently upgraded our 220 TB gluster to 3.7.4, and we've been trying to 
use the new glusterfind feature but have been having some serious problems with 
it. Overall the glusterfind looks very promising, so I don't want to offend 
anyone by raising these issues.

If these issues can be resolved or worked around, glusterfind will be a great 
feature.  So I would really appreciate any information or advice:

1) What can be done about the vast number of tiny changelogs? We are seeing 
often 5+ small 89 byte changelog files per minute on EACH brick. Larger files 
if busier. We've been generating these changelogs for a few weeks and have in 
excess of 10,000 or 12,000 on most bricks. This makes glusterfinds very, very 
slow, especially on a node which has a lot of bricks, and looks unsustainable 
in the long run. Why are these files so small, and why are there so many of 
them, and how are they supposed to be managed in the long run? The sheer number 
of these files looks sure to impact performance in the long run.

2) Pgfid xattribute is wreaking havoc with our backup scheme - when gluster 
adds this extended attribute to files it changes the ctime, which we were using 
to determine which files need to be archived. There should be a warning added 
to release notes & upgrade notes, so people can make a plan to manage this if 
required.

Also, we ran a rebalance immediately after the 3.7.4 upgrade, and the rebalance 
took 5 days or so to complete, which looks like a major speed improvement over 
the more serial rebalance algorithm, so that's good. But I was hoping that the 
rebalance would also have had the side-effect of triggering all files to be 
labelled with the pgfid attribute by the time the rebalance completed, or 
failing that, after creation of an mlocate database across our entire gluster 
(which would have accessed every file, unless it is getting the info it needs 
only from directory inodes). Now it looks like ctimes are still being modified, 
and I think this can only be caused by files still being labelled with pgfids.

How can we force gluster to get this pgfid labelling over and done with, for 
all files that are already on the volume? We can't have gluster continuing to 
add pgfids in bursts here and there, eg when files are read for the first time 
since the upgrade. We need to get it over and done with. We have just had to 
turn off pgfid creation on the volume until we can force gluster to get it over 
and done with in one go.


Hi John,

Was quota turned on/off before/after performing re-balance? If the pgfid is  
missing, this can be healed by performing 'find <mount_point> | xargs stat', 
all the files will get looked-up once and the pgfid healing will happen.
Also could you please provide all the volume files under 
'/var/lib/glusterd/vols/<volname>/*.vol'?

Thanks,
Vijay


Hi Vijay

Quota has never been turned on in our gluster, so it can’t be any quota-related 
xattrs which are resetting our ctimes, so I’m pretty sure it must be due to 
pgfids still being added.

Thanks for the tip re using stat, if that should trigger the pgfid build on 
each file, then I will run that when I have a chance. We’ll have to get our 
archiving of data back up to date, re-enable pgfid build option, and then run 
the stat over a weekend or something, as it will take a while.

I’m still quite concerned about the number of changelogs being generated. Do 
you know if there any plans to change the way changelogs are generated so there 
aren’t so many of them, and to process them more efficiently? I think this will 
be vital to improving performance of glusterfind in future, as there are 
currently an enormous number of these small changelogs being generated on each 
of our gluster bricks.

Below is the volfile for one brick, the others are all equivalent. We haven’t 
tweaked the volume options much, besides increasing the io thread count to 32, 
and client/event threads to 6 (since we have a lot of small files on our 
gluster (30 million files, a lot of which are small, and some of which are 
large to very large):

[root@g-unit-1 sbin]# cat 
/var/lib/glusterd/vols/vol00/vol00.g-unit-1.mnt-glusterfs-bricks-1.vol
volume vol00-posix
    type storage/posix
    option update-link-count-parent off
    option volume-id 292b8701-d394-48ee-a224-b5a20ca7ce0f
    option directory /mnt/glusterfs/bricks/1
end-volume

volume vol00-trash
    type features/trash
    option trash-internal-op off
    option brick-path /mnt/glusterfs/bricks/1
    option trash-dir .trashcan
    subvolumes vol00-posix
end-volume

volume vol00-changetimerecorder
    type features/changetimerecorder
    option record-counters off
    option ctr-enabled off
    option record-entry on
    option ctr_inode_heal_expire_period 300
    option ctr_hardlink_heal_expire_period 300
    option ctr_link_consistency off
    option record-exit off
    option db-path /mnt/glusterfs/bricks/1/.glusterfs/
    option db-name 1.db
    option hot-brick off
    option db-type sqlite3
    subvolumes vol00-trash
end-volume

volume vol00-changelog
    type features/changelog
    option capture-del-path on
    option changelog-barrier-timeout 120
    option changelog on
    option changelog-dir /mnt/glusterfs/bricks/1/.glusterfs/changelogs
    option changelog-brick /mnt/glusterfs/bricks/1
    subvolumes vol00-changetimerecorder
end-volume

volume vol00-bitrot-stub
    type features/bitrot-stub
    option export /mnt/glusterfs/bricks/1
    subvolumes vol00-changelog
end-volume

volume vol00-access-control
    type features/access-control
    subvolumes vol00-bitrot-stub
end-volume

volume vol00-locks
    type features/locks
    subvolumes vol00-access-control
end-volume

volume vol00-upcall
    type features/upcall
    option cache-invalidation off
    subvolumes vol00-locks
end-volume

volume vol00-io-threads
    type performance/io-threads
    option thread-count 32
    subvolumes vol00-upcall
end-volume

volume vol00-marker
    type features/marker
    option inode-quota off
    option quota off
    option gsync-force-xtime off
    option xtime off
    option timestamp-file /var/lib/glusterd/vols/vol00/marker.tstamp
    option volume-uuid 292b8701-d394-48ee-a224-b5a20ca7ce0f
    subvolumes vol00-io-threads
end-volume

volume vol00-barrier
    type features/barrier
    option barrier-timeout 120
    option barrier disable
    subvolumes vol00-marker
end-volume

volume vol00-index
    type features/index
    option index-base /mnt/glusterfs/bricks/1/.glusterfs/indices
    subvolumes vol00-barrier
end-volume

volume vol00-quota
    type features/quota
    option deem-statfs off
    option timeout 0
    option server-quota off
    option volume-uuid vol00
    subvolumes vol00-index
end-volume

volume vol00-worm
    type features/worm
    option worm off
    subvolumes vol00-quota
end-volume

volume vol00-read-only
    type features/read-only
    option read-only off
    subvolumes vol00-worm
end-volume

volume /mnt/glusterfs/bricks/1
    type debug/io-stats
    option count-fop-hits off
    option latency-measurement off
    subvolumes vol00-read-only
end-volume

volume vol00-server
    type protocol/server
    option event-threads 6
    option rpc-auth-allow-insecure on
    option auth.addr./mnt/glusterfs/bricks/1.allow *
    option auth.login.dc3d05ba-40ce-47ee-8f4c-a729917784dc.password 
58c2072b-8d1c-4921-9270-bf4b477c4126
    option auth.login./mnt/glusterfs/bricks/1.allow 
dc3d05ba-40ce-47ee-8f4c-a729917784dc
    option transport-type tcp
    subvolumes /mnt/glusterfs/bricks/1
end-volume





_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Reply via email to