Hi Kotresh/Venky, Could you please provide your inputs on the change-log issues mentioned below?
Thanks, Vijay On Fri, Oct 23, 2015 at 9:54 AM, Sincock, John [FLCPTY] <j.sinc...@fugro.com > wrote: > > Hi Vijay, pls see below again (I'm wondering if top-posting would be > easier, that's usually what I do, though I know some ppl don’t like it) > > > On Wed, Oct 21, 2015 at 5:53 AM, Sincock, John [FLCPTY] < > j.sinc...@fugro.com> wrote: > Hi Everybody, > > We have recently upgraded our 220 TB gluster to 3.7.4, and we've been > trying to use the new glusterfind feature but have been having some serious > problems with it. Overall the glusterfind looks very promising, so I don't > want to offend anyone by raising these issues. > > If these issues can be resolved or worked around, glusterfind will be a > great feature. So I would really appreciate any information or advice: > > 1) What can be done about the vast number of tiny changelogs? We are > seeing often 5+ small 89 byte changelog files per minute on EACH brick. > Larger files if busier. We've been generating these changelogs for a few > weeks and have in excess of 10,000 or 12,000 on most bricks. This makes > glusterfinds very, very slow, especially on a node which has a lot of > bricks, and looks unsustainable in the long run. Why are these files so > small, and why are there so many of them, and how are they supposed to be > managed in the long run? The sheer number of these files looks sure to > impact performance in the long run. > > 2) Pgfid xattribute is wreaking havoc with our backup scheme - when > gluster adds this extended attribute to files it changes the ctime, which > we were using to determine which files need to be archived. There should be > a warning added to release notes & upgrade notes, so people can make a plan > to manage this if required. > > Also, we ran a rebalance immediately after the 3.7.4 upgrade, and the > rebalance took 5 days or so to complete, which looks like a major speed > improvement over the more serial rebalance algorithm, so that's good. But I > was hoping that the rebalance would also have had the side-effect of > triggering all files to be labelled with the pgfid attribute by the time > the rebalance completed, or failing that, after creation of an mlocate > database across our entire gluster (which would have accessed every file, > unless it is getting the info it needs only from directory inodes). Now it > looks like ctimes are still being modified, and I think this can only be > caused by files still being labelled with pgfids. > > How can we force gluster to get this pgfid labelling over and done with, > for all files that are already on the volume? We can't have gluster > continuing to add pgfids in bursts here and there, eg when files are read > for the first time since the upgrade. We need to get it over and done with. > We have just had to turn off pgfid creation on the volume until we can > force gluster to get it over and done with in one go. > > > Hi John, > > Was quota turned on/off before/after performing re-balance? If the pgfid > is missing, this can be healed by performing 'find <mount_point> | xargs > stat', all the files will get looked-up once and the pgfid healing will > happen. > Also could you please provide all the volume files under > '/var/lib/glusterd/vols/<volname>/*.vol'? > > Thanks, > Vijay > > > Hi Vijay > > Quota has never been turned on in our gluster, so it can’t be any > quota-related xattrs which are resetting our ctimes, so I’m pretty sure it > must be due to pgfids still being added. > > Thanks for the tip re using stat, if that should trigger the pgfid build > on each file, then I will run that when I have a chance. We’ll have to get > our archiving of data back up to date, re-enable pgfid build option, and > then run the stat over a weekend or something, as it will take a while. > > I’m still quite concerned about the number of changelogs being generated. > Do you know if there any plans to change the way changelogs are generated > so there aren’t so many of them, and to process them more efficiently? I > think this will be vital to improving performance of glusterfind in future, > as there are currently an enormous number of these small changelogs being > generated on each of our gluster bricks. > > Below is the volfile for one brick, the others are all equivalent. We > haven’t tweaked the volume options much, besides increasing the io thread > count to 32, and client/event threads to 6 (since we have a lot of small > files on our gluster (30 million files, a lot of which are small, and some > of which are large to very large): > > > Hi John, > > PGFID xattrs are updated only when update-link-count-parent is enabled in > the brick volume file. This option is enabled when quota is enabled on a > volume. > In the volume file you provided below > has update-link-count-parent disabled, I am wondering why PGFID xattrs are > updated. > > Thanks, > Vijay > > > Hi Vijay, > somewhere in the 3.7.5 upgrade instructions or the glusterfind > documentation, there was a mention that we should enable a server option > called storage.build-pgfid, which we did as it speeds up glusterfinds. You > cannot see this in the volfile but you can see it when you do gluster > volume info volname. So for our volume we currently have: > > Options Reconfigured: > server.allow-insecure: on > nfs.disable: false > performance.io-thread-count: 32 > features.quota: off > client.bind-insecure: on > > storage.build-pgfid: off > > changelog.changelog: on > changelog.capture-del-path: on > server.event-threads: 6 > client.event-threads: 6 > > We've turned storage.build-pgfid OFF now, but we turned it on when we did > the upgrade to 3.7.4, and we had it on until a few days ago. So, for us, > with update-link-count-parent off - storage.build-pgfid would've been the > thing responsible for adding the pgfids to files on our volume. > > I should've realised the best thing to do would’ve been to do a stat on > every file, in order to trigger the pgfid build, but at first I thought the > pgfids would be added to every file during the rebalance which was a > priority at the time (we had just added 40TB of new bricks to a very full > volume), and then we hit pgfid/backup issues etc. > > I think we can get the pgfid issue resolved now you've confirmed that a > stat will do it (thanks :-) We'll just have to stop our clients writing to > the volume for a day or so while we stat every file on the volume. Then, if > we've stopped our clients writing during that time, we can re-jig our > backups to safely ignore any changed ctimes that've changed during the day > or so we were stating the volume. > > I'll let you know how things go with the pgfid's if we can get them turned > back on and added to every file sometime as soon as possible. > > I'm definitely more concerned now about the changelog issue. As mentioned > we have an enormous number of these, eg as of now (about 25 days since > upgrading to 3.7.4), we have 13000 or so changelogs on each of our bricks: > > ls -la /mnt/glusterfs/bricks/1/.glusterfs/changelogs/ | wc -l > 13096 > > And they are very small, about 5 KB on average, and ranging from (many at > just) 89 bytes, up to 20 KB or so for the larger ones: > du -hs /mnt/glusterfs/bricks/1/.glusterfs/changelogs/ > 68M /mnt/glusterfs/bricks/1/.glusterfs/changelogs/ > > The size of the changelogs is not an issue (68M for almost a month worth > of changes is nothing), but the sheer number of files is, as is the fact > that it seems to be very cpu-intensive to process these files (eg an strace > showed glusterfind taking 2.7 million system calls to process just one of > these small changelogs). > > Do you know if anyone is working on reducing the number of these > changelogs and/or processing them more efficiently? > > Thanks again for any info! > > >
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel