OK I just hit the other issue too, where .shard doesn't get healed. :) Investigating as to why that is the case. Give me some time.
-Krutika On Wed, Aug 31, 2016 at 12:39 PM, Krutika Dhananjay <kdhan...@redhat.com> wrote: > Just figured the steps Anuradha has provided won't work if granular entry > heal is on. > So when you bring down a brick and create fake2 under / of the volume, > granular entry heal feature causes > sh to remember only the fact that 'fake2' needs to be recreated on the > offline brick (because changelogs are granular). > > In this case, we would be required to indicate to self-heal-daemon that > the entire directory tree from '/' needs to be repaired on the brick that > contains no data. > > To fix this, I did the following (for users who use granular entry > self-healing): > > 1. Kill the last brick process in the replica (/bricks/3) > > 2. [root@server-3 ~]# rm -rf /bricks/3 > > 3. [root@server-3 ~]# mkdir /bricks/3 > > 4. Create a new dir on the mount point: > [root@client-1 ~]# mkdir /mnt/fake > > 5. Set some fake xattr on the root of the volume, and not the 'fake' > directory itself. > [root@client-1 ~]# setfattr -n "user.some-name" -v "some-value" /mnt > > 6. Make sure there's no io happening on your volume. > > 7. Check the pending xattrs on the brick directories of the two good > copies (on bricks 1 and 2), you should be seeing same values as the one > marked in red in both bricks. > (note that the client-<num> xattr key will have the same last digit as the > index of the brick that is down, when counting from 0. So if the first > brick is the one that is down, it would read trusted.afr.*-client-0; if the > second brick is the one that is empty and down, it would read > trusted.afr.*-client-1 and so on). > > [root@server-1 ~]# getfattr -d -m . -e hex /bricks/1 > # file: 1 > security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 > 23a6574635f72756e74696d655f743a733000 > trusted.afr.dirty=0x000000000000000000000000 > *trusted.afr.rep-client-2=0x000000000000000100000001* > trusted.gfid=0x00000000000000000000000000000001 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b > > [root@server-2 ~]# getfattr -d -m . -e hex /bricks/2 > # file: 2 > security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 > 23a6574635f72756e74696d655f743a733000 > trusted.afr.dirty=0x000000000000000000000000 > *trusted.afr.rep-client-2=0x000**000000000000100000001* > trusted.gfid=0x00000000000000000000000000000001 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b > > 8. Flip the 8th digit in the trusted.afr.<VOLNAME>-client-2 to a 1. > > [root@server-1 ~]# setfattr -n trusted.afr.rep-client-2 -v > *0x000000010000000100000001* /bricks/1 > [root@server-2 ~]# setfattr -n trusted.afr.rep-client-2 -v > *0x000000010000000100000001* /bricks/2 > > 9. Get the xattrs again and check the xattrs are set properly now > > [root@server-1 ~]# getfattr -d -m . -e hex /bricks/1 > # file: 1 > security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 > 23a6574635f72756e74696d655f743a733000 > trusted.afr.dirty=0x000000000000000000000000 > *trusted.afr.rep-client-2=0x000**000010000000100000001* > trusted.gfid=0x00000000000000000000000000000001 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b > > [root@server-2 ~]# getfattr -d -m . -e hex /bricks/2 > # file: 2 > security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 > 23a6574635f72756e74696d655f743a733000 > trusted.afr.dirty=0x000000000000000000000000 > *trusted.afr.rep-client-2=0x000**000010000000100000001* > trusted.gfid=0x00000000000000000000000000000001 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b > > 10. Force-start the volume. > > [root@server-1 ~]# gluster volume start rep force > volume start: rep: success > > 11. Monitor heal-info command to ensure the number of entries keeps > growing. > > 12. Keep monitoring with step 10 and eventually the number of entries > needing heal must come down to 0. > Also the checksums of the files on the previously empty brick should now > match with the copies on the other two bricks. > > Could you check if the above steps work for you, in your test environment? > > You caught a nice bug in the manual steps to follow when granular > entry-heal is enabled and an empty brick needs heal. Thanks for reporting > it. :) We will fix the documentation appropriately. > > -Krutika > > > On Wed, Aug 31, 2016 at 11:29 AM, Krutika Dhananjay <kdhan...@redhat.com> > wrote: > >> Tried this. >> >> With me, only 'fake2' gets healed after i bring the 'empty' brick back up >> and it stops there unless I do a 'heal-full'. >> >> Is that what you're seeing as well? >> >> -Krutika >> >> On Wed, Aug 31, 2016 at 4:43 AM, David Gossage < >> dgoss...@carouselchecks.com> wrote: >> >>> Same issue brought up glusterd on problem node heal count still stuck at >>> 6330. >>> >>> Ran gluster v heal GUSTER1 full >>> >>> glustershd on problem node shows a sweep starting and finishing in >>> seconds. Other 2 nodes show no activity in log. They should start a sweep >>> too shouldn't they? >>> >>> Tried starting from scratch >>> >>> kill -15 brickpid >>> rm -Rf /brick >>> mkdir -p /brick >>> mkdir mkdir /gsmount/fake2 >>> setfattr -n "user.some-name" -v "some-value" /gsmount/fake2 >>> >>> Heals visible dirs instantly then stops. >>> >>> gluster v heal GLUSTER1 full >>> >>> see sweep star on problem node and end almost instantly. no files added >>> t heal list no files healed no more logging >>> >>> [2016-08-30 23:11:31.544331] I [MSGID: 108026] >>> [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0: >>> starting full sweep on subvol GLUSTER1-client-1 >>> [2016-08-30 23:11:33.776235] I [MSGID: 108026] >>> [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0: >>> finished full sweep on subvol GLUSTER1-client-1 >>> >>> same results no matter which node you run command on. Still stuck with >>> 6330 files showing needing healed out of 19k. still showing in logs no >>> heals are occuring. >>> >>> Is their a way to forcibly reset any prior heal data? Could it be stuck >>> on some past failed heal start? >>> >>> >>> >>> >>> *David Gossage* >>> *Carousel Checks Inc. | System Administrator* >>> *Office* 708.613.2284 >>> >>> On Tue, Aug 30, 2016 at 10:03 AM, David Gossage < >>> dgoss...@carouselchecks.com> wrote: >>> >>>> On Tue, Aug 30, 2016 at 10:02 AM, David Gossage < >>>> dgoss...@carouselchecks.com> wrote: >>>> >>>>> updated test server to 3.8.3 >>>>> >>>>> Brick1: 192.168.71.10:/gluster2/brick1/1 >>>>> Brick2: 192.168.71.11:/gluster2/brick2/1 >>>>> Brick3: 192.168.71.12:/gluster2/brick3/1 >>>>> Options Reconfigured: >>>>> cluster.granular-entry-heal: on >>>>> performance.readdir-ahead: on >>>>> performance.read-ahead: off >>>>> nfs.disable: on >>>>> nfs.addr-namelookup: off >>>>> nfs.enable-ino32: off >>>>> cluster.background-self-heal-count: 16 >>>>> cluster.self-heal-window-size: 1024 >>>>> performance.quick-read: off >>>>> performance.io-cache: off >>>>> performance.stat-prefetch: off >>>>> cluster.eager-lock: enable >>>>> network.remote-dio: on >>>>> cluster.quorum-type: auto >>>>> cluster.server-quorum-type: server >>>>> storage.owner-gid: 36 >>>>> storage.owner-uid: 36 >>>>> server.allow-insecure: on >>>>> features.shard: on >>>>> features.shard-block-size: 64MB >>>>> performance.strict-o-direct: off >>>>> cluster.locking-scheme: granular >>>>> >>>>> kill -15 brickpid >>>>> rm -Rf /gluster2/brick3 >>>>> mkdir -p /gluster2/brick3/1 >>>>> mkdir mkdir /rhev/data-center/mnt/glusterSD/192.168.71.10 >>>>> \:_glustershard/fake2 >>>>> setfattr -n "user.some-name" -v "some-value" >>>>> /rhev/data-center/mnt/glusterSD/192.168.71.10\:_glustershard/fake2 >>>>> gluster v start glustershard force >>>>> >>>>> at this point brick process starts and all visible files including new >>>>> dir are made on brick >>>>> handful of shards are in heal statistics still but no .shard directory >>>>> created and no increase in shard count >>>>> >>>>> gluster v heal glustershard >>>>> >>>>> At this point still no increase in count or dir made no additional >>>>> activity in logs for healing generated. waited few minutes tailing logs >>>>> to >>>>> check if anything kicked in. >>>>> >>>>> gluster v heal glustershard full >>>>> >>>>> gluster shards added to list and heal commences. logs show full sweep >>>>> starting on all 3 nodes. though this time it only shows as finishing on >>>>> one which looks to be the one that had brick deleted. >>>>> >>>>> [2016-08-30 14:45:33.098589] I [MSGID: 108026] >>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>> 0-glustershard-replicate-0: starting full sweep on subvol >>>>> glustershard-client-0 >>>>> [2016-08-30 14:45:33.099492] I [MSGID: 108026] >>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>> 0-glustershard-replicate-0: starting full sweep on subvol >>>>> glustershard-client-1 >>>>> [2016-08-30 14:45:33.100093] I [MSGID: 108026] >>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>> 0-glustershard-replicate-0: starting full sweep on subvol >>>>> glustershard-client-2 >>>>> [2016-08-30 14:52:29.760213] I [MSGID: 108026] >>>>> [afr-self-heald.c:656:afr_shd_full_healer] >>>>> 0-glustershard-replicate-0: finished full sweep on subvol >>>>> glustershard-client-2 >>>>> >>>> >>>> Just realized its still healing so that may be why sweep on 2 other >>>> bricks haven't replied as finished. >>>> >>>>> >>>>> >>>>> my hope is that later tonight a full heal will work on production. Is >>>>> it possible self-heal daemon can get stale or stop listening but still >>>>> show >>>>> as active? Would stopping and starting self-heal daemon from gluster cli >>>>> before doing these heals be helpful? >>>>> >>>>> >>>>> On Tue, Aug 30, 2016 at 9:29 AM, David Gossage < >>>>> dgoss...@carouselchecks.com> wrote: >>>>> >>>>>> On Tue, Aug 30, 2016 at 8:52 AM, David Gossage < >>>>>> dgoss...@carouselchecks.com> wrote: >>>>>> >>>>>>> On Tue, Aug 30, 2016 at 8:01 AM, Krutika Dhananjay < >>>>>>> kdhan...@redhat.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Aug 30, 2016 at 6:20 PM, Krutika Dhananjay < >>>>>>>> kdhan...@redhat.com> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Aug 30, 2016 at 6:07 PM, David Gossage < >>>>>>>>> dgoss...@carouselchecks.com> wrote: >>>>>>>>> >>>>>>>>>> On Tue, Aug 30, 2016 at 7:18 AM, Krutika Dhananjay < >>>>>>>>>> kdhan...@redhat.com> wrote: >>>>>>>>>> >>>>>>>>>>> Could you also share the glustershd logs? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I'll get them when I get to work sure >>>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I tried the same steps that you mentioned multiple times, but >>>>>>>>>>> heal is running to completion without any issues. >>>>>>>>>>> >>>>>>>>>>> It must be said that 'heal full' traverses the files and >>>>>>>>>>> directories in a depth-first order and does heals also in the same >>>>>>>>>>> order. >>>>>>>>>>> But if it gets interrupted in the middle (say because >>>>>>>>>>> self-heal-daemon was >>>>>>>>>>> either intentionally or unintentionally brought offline and then >>>>>>>>>>> brought >>>>>>>>>>> back up), self-heal will only pick up the entries that are so far >>>>>>>>>>> marked as >>>>>>>>>>> new-entries that need heal which it will find in indices/xattrop >>>>>>>>>>> directory. >>>>>>>>>>> What this means is that those files and directories that were not >>>>>>>>>>> visited >>>>>>>>>>> during the crawl, will remain untouched and unhealed in this second >>>>>>>>>>> iteration of heal, unless you execute a 'heal-full' again. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> So should it start healing shards as it crawls or not until after >>>>>>>>>> it crawls the entire .shard directory? At the pace it was going >>>>>>>>>> that could >>>>>>>>>> be a week with one node appearing in the cluster but with no shard >>>>>>>>>> files if >>>>>>>>>> anything tries to access a file on that node. From my experience >>>>>>>>>> other day >>>>>>>>>> telling it to heal full again did nothing regardless of node used. >>>>>>>>>> >>>>>>>>> >>>>>>>> Crawl is started from '/' of the volume. Whenever self-heal detects >>>>>>>> during the crawl that a file or directory is present in some brick(s) >>>>>>>> and >>>>>>>> absent in others, it creates the file on the bricks where it is absent >>>>>>>> and >>>>>>>> marks the fact that the file or directory might need data/entry and >>>>>>>> metadata heal too (this also means that an index is created under >>>>>>>> .glusterfs/indices/xattrop of the src bricks). And the data/entry and >>>>>>>> metadata heal are picked up and done in >>>>>>>> >>>>>>> the background with the help of these indices. >>>>>>>> >>>>>>> >>>>>>> Looking at my 3rd node as example i find nearly an exact same number >>>>>>> of files in xattrop dir as reported by heal count at time I brought down >>>>>>> node2 to try and alleviate read io errors that seemed to occur from >>>>>>> what I >>>>>>> was guessing as attempts to use the node with no shards for reads. >>>>>>> >>>>>>> Also attached are the glustershd logs from the 3 nodes, along with >>>>>>> the test node i tried yesterday with same results. >>>>>>> >>>>>> >>>>>> Looking at my own logs I notice that a full sweep was only ever >>>>>> recorded in glustershd.log on 2nd node with missing directory. I >>>>>> believe I >>>>>> should have found a sweep begun on every node correct? >>>>>> >>>>>> On my test dev when it did work I do see that >>>>>> >>>>>> [2016-08-30 13:56:25.223333] I [MSGID: 108026] >>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>> 0-glustershard-replicate-0: starting full sweep on subvol >>>>>> glustershard-client-0 >>>>>> [2016-08-30 13:56:25.223522] I [MSGID: 108026] >>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>> 0-glustershard-replicate-0: starting full sweep on subvol >>>>>> glustershard-client-1 >>>>>> [2016-08-30 13:56:25.224616] I [MSGID: 108026] >>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>> 0-glustershard-replicate-0: starting full sweep on subvol >>>>>> glustershard-client-2 >>>>>> [2016-08-30 14:18:48.333740] I [MSGID: 108026] >>>>>> [afr-self-heald.c:656:afr_shd_full_healer] >>>>>> 0-glustershard-replicate-0: finished full sweep on subvol >>>>>> glustershard-client-2 >>>>>> [2016-08-30 14:18:48.356008] I [MSGID: 108026] >>>>>> [afr-self-heald.c:656:afr_shd_full_healer] >>>>>> 0-glustershard-replicate-0: finished full sweep on subvol >>>>>> glustershard-client-1 >>>>>> [2016-08-30 14:18:49.637811] I [MSGID: 108026] >>>>>> [afr-self-heald.c:656:afr_shd_full_healer] >>>>>> 0-glustershard-replicate-0: finished full sweep on subvol >>>>>> glustershard-client-0 >>>>>> >>>>>> While when looking at past few days of the 3 prod nodes i only found >>>>>> that on my 2nd node >>>>>> [2016-08-27 01:26:42.638772] I [MSGID: 108026] >>>>>> [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0: >>>>>> starting full sweep on subvol GLUSTER1-client-1 >>>>>> [2016-08-27 11:37:01.732366] I [MSGID: 108026] >>>>>> [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0: >>>>>> finished full sweep on subvol GLUSTER1-client-1 >>>>>> [2016-08-27 12:58:34.597228] I [MSGID: 108026] >>>>>> [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0: >>>>>> starting full sweep on subvol GLUSTER1-client-1 >>>>>> [2016-08-27 12:59:28.041173] I [MSGID: 108026] >>>>>> [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0: >>>>>> finished full sweep on subvol GLUSTER1-client-1 >>>>>> [2016-08-27 20:03:42.560188] I [MSGID: 108026] >>>>>> [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0: >>>>>> starting full sweep on subvol GLUSTER1-client-1 >>>>>> [2016-08-27 20:03:44.278274] I [MSGID: 108026] >>>>>> [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0: >>>>>> finished full sweep on subvol GLUSTER1-client-1 >>>>>> [2016-08-27 21:00:42.603315] I [MSGID: 108026] >>>>>> [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0: >>>>>> starting full sweep on subvol GLUSTER1-client-1 >>>>>> [2016-08-27 21:00:46.148674] I [MSGID: 108026] >>>>>> [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0: >>>>>> finished full sweep on subvol GLUSTER1-client-1 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>>>>> My suspicion is that this is what happened on your setup. Could >>>>>>>>>>> you confirm if that was the case? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Brick was brought online with force start then a full heal >>>>>>>>>> launched. Hours later after it became evident that it was not >>>>>>>>>> adding new >>>>>>>>>> files to heal I did try restarting self-heal daemon and relaunching >>>>>>>>>> full >>>>>>>>>> heal again. But this was after the heal had basically already failed >>>>>>>>>> to >>>>>>>>>> work as intended. >>>>>>>>>> >>>>>>>>> >>>>>>>>> OK. How did you figure it was not adding any new files? I need to >>>>>>>>> know what places you were monitoring to come to this conclusion. >>>>>>>>> >>>>>>>>> -Krutika >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> As for those logs, I did manager to do something that caused >>>>>>>>>>> these warning messages you shared earlier to appear in my client >>>>>>>>>>> and server >>>>>>>>>>> logs. >>>>>>>>>>> Although these logs are annoying and a bit scary too, they >>>>>>>>>>> didn't do any harm to the data in my volume. Why they appear just >>>>>>>>>>> after a >>>>>>>>>>> brick is replaced and under no other circumstances is something I'm >>>>>>>>>>> still >>>>>>>>>>> investigating. >>>>>>>>>>> >>>>>>>>>>> But for future, it would be good to follow the steps Anuradha >>>>>>>>>>> gave as that would allow self-heal to at least detect that it has >>>>>>>>>>> some >>>>>>>>>>> repairing to do whenever it is restarted whether intentionally or >>>>>>>>>>> otherwise. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I followed those steps as described on my test box and ended up >>>>>>>>>> with exact same outcome of adding shards at an agonizing slow pace >>>>>>>>>> and no >>>>>>>>>> creation of .shard directory or heals on shard directory. >>>>>>>>>> Directories >>>>>>>>>> visible from mount healed quickly. This was with one VM so it has >>>>>>>>>> only 800 >>>>>>>>>> shards as well. After hours at work it had added a total of 33 >>>>>>>>>> shards to >>>>>>>>>> be healed. I sent those logs yesterday as well though not the >>>>>>>>>> glustershd. >>>>>>>>>> >>>>>>>>>> Does replace-brick command copy files in same manner? For these >>>>>>>>>> purposes I am contemplating just skipping the heal route. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> -Krutika >>>>>>>>>>> >>>>>>>>>>> On Tue, Aug 30, 2016 at 2:22 AM, David Gossage < >>>>>>>>>>> dgoss...@carouselchecks.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> attached brick and client logs from test machine where same >>>>>>>>>>>> behavior occurred not sure if anything new is there. its still on >>>>>>>>>>>> 3.8.2 >>>>>>>>>>>> >>>>>>>>>>>> Number of Bricks: 1 x 3 = 3 >>>>>>>>>>>> Transport-type: tcp >>>>>>>>>>>> Bricks: >>>>>>>>>>>> Brick1: 192.168.71.10:/gluster2/brick1/1 >>>>>>>>>>>> Brick2: 192.168.71.11:/gluster2/brick2/1 >>>>>>>>>>>> Brick3: 192.168.71.12:/gluster2/brick3/1 >>>>>>>>>>>> Options Reconfigured: >>>>>>>>>>>> cluster.locking-scheme: granular >>>>>>>>>>>> performance.strict-o-direct: off >>>>>>>>>>>> features.shard-block-size: 64MB >>>>>>>>>>>> features.shard: on >>>>>>>>>>>> server.allow-insecure: on >>>>>>>>>>>> storage.owner-uid: 36 >>>>>>>>>>>> storage.owner-gid: 36 >>>>>>>>>>>> cluster.server-quorum-type: server >>>>>>>>>>>> cluster.quorum-type: auto >>>>>>>>>>>> network.remote-dio: on >>>>>>>>>>>> cluster.eager-lock: enable >>>>>>>>>>>> performance.stat-prefetch: off >>>>>>>>>>>> performance.io-cache: off >>>>>>>>>>>> performance.quick-read: off >>>>>>>>>>>> cluster.self-heal-window-size: 1024 >>>>>>>>>>>> cluster.background-self-heal-count: 16 >>>>>>>>>>>> nfs.enable-ino32: off >>>>>>>>>>>> nfs.addr-namelookup: off >>>>>>>>>>>> nfs.disable: on >>>>>>>>>>>> performance.read-ahead: off >>>>>>>>>>>> performance.readdir-ahead: on >>>>>>>>>>>> cluster.granular-entry-heal: on >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Aug 29, 2016 at 2:20 PM, David Gossage < >>>>>>>>>>>> dgoss...@carouselchecks.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Aug 29, 2016 at 7:01 AM, Anuradha Talur < >>>>>>>>>>>>> ata...@redhat.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> ----- Original Message ----- >>>>>>>>>>>>>> > From: "David Gossage" <dgoss...@carouselchecks.com> >>>>>>>>>>>>>> > To: "Anuradha Talur" <ata...@redhat.com> >>>>>>>>>>>>>> > Cc: "gluster-users@gluster.org List" < >>>>>>>>>>>>>> Gluster-users@gluster.org>, "Krutika Dhananjay" < >>>>>>>>>>>>>> kdhan...@redhat.com> >>>>>>>>>>>>>> > Sent: Monday, August 29, 2016 5:12:42 PM >>>>>>>>>>>>>> > Subject: Re: [Gluster-users] 3.8.3 Shards Healing Glacier >>>>>>>>>>>>>> Slow >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > On Mon, Aug 29, 2016 at 5:39 AM, Anuradha Talur < >>>>>>>>>>>>>> ata...@redhat.com> wrote: >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > > Response inline. >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > ----- Original Message ----- >>>>>>>>>>>>>> > > > From: "Krutika Dhananjay" <kdhan...@redhat.com> >>>>>>>>>>>>>> > > > To: "David Gossage" <dgoss...@carouselchecks.com> >>>>>>>>>>>>>> > > > Cc: "gluster-users@gluster.org List" < >>>>>>>>>>>>>> Gluster-users@gluster.org> >>>>>>>>>>>>>> > > > Sent: Monday, August 29, 2016 3:55:04 PM >>>>>>>>>>>>>> > > > Subject: Re: [Gluster-users] 3.8.3 Shards Healing >>>>>>>>>>>>>> Glacier Slow >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > Could you attach both client and brick logs? Meanwhile >>>>>>>>>>>>>> I will try these >>>>>>>>>>>>>> > > steps >>>>>>>>>>>>>> > > > out on my machines and see if it is easily recreatable. >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > -Krutika >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > On Mon, Aug 29, 2016 at 2:31 PM, David Gossage < >>>>>>>>>>>>>> > > dgoss...@carouselchecks.com >>>>>>>>>>>>>> > > > > wrote: >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > Centos 7 Gluster 3.8.3 >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > Brick1: ccgl1.gl.local:/gluster1/BRICK1/1 >>>>>>>>>>>>>> > > > Brick2: ccgl2.gl.local:/gluster1/BRICK1/1 >>>>>>>>>>>>>> > > > Brick3: ccgl4.gl.local:/gluster1/BRICK1/1 >>>>>>>>>>>>>> > > > Options Reconfigured: >>>>>>>>>>>>>> > > > cluster.data-self-heal-algorithm: full >>>>>>>>>>>>>> > > > cluster.self-heal-daemon: on >>>>>>>>>>>>>> > > > cluster.locking-scheme: granular >>>>>>>>>>>>>> > > > features.shard-block-size: 64MB >>>>>>>>>>>>>> > > > features.shard: on >>>>>>>>>>>>>> > > > performance.readdir-ahead: on >>>>>>>>>>>>>> > > > storage.owner-uid: 36 >>>>>>>>>>>>>> > > > storage.owner-gid: 36 >>>>>>>>>>>>>> > > > performance.quick-read: off >>>>>>>>>>>>>> > > > performance.read-ahead: off >>>>>>>>>>>>>> > > > performance.io-cache: off >>>>>>>>>>>>>> > > > performance.stat-prefetch: on >>>>>>>>>>>>>> > > > cluster.eager-lock: enable >>>>>>>>>>>>>> > > > network.remote-dio: enable >>>>>>>>>>>>>> > > > cluster.quorum-type: auto >>>>>>>>>>>>>> > > > cluster.server-quorum-type: server >>>>>>>>>>>>>> > > > server.allow-insecure: on >>>>>>>>>>>>>> > > > cluster.self-heal-window-size: 1024 >>>>>>>>>>>>>> > > > cluster.background-self-heal-count: 16 >>>>>>>>>>>>>> > > > performance.strict-write-ordering: off >>>>>>>>>>>>>> > > > nfs.disable: on >>>>>>>>>>>>>> > > > nfs.addr-namelookup: off >>>>>>>>>>>>>> > > > nfs.enable-ino32: off >>>>>>>>>>>>>> > > > cluster.granular-entry-heal: on >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > Friday did rolling upgrade from 3.8.3->3.8.3 no issues. >>>>>>>>>>>>>> > > > Following steps detailed in previous recommendations >>>>>>>>>>>>>> began proces of >>>>>>>>>>>>>> > > > replacing and healngbricks one node at a time. >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > 1) kill pid of brick >>>>>>>>>>>>>> > > > 2) reconfigure brick from raid6 to raid10 >>>>>>>>>>>>>> > > > 3) recreate directory of brick >>>>>>>>>>>>>> > > > 4) gluster volume start <> force >>>>>>>>>>>>>> > > > 5) gluster volume heal <> full >>>>>>>>>>>>>> > > Hi, >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > I'd suggest that full heal is not used. There are a few >>>>>>>>>>>>>> bugs in full heal. >>>>>>>>>>>>>> > > Better safe than sorry ;) >>>>>>>>>>>>>> > > Instead I'd suggest the following steps: >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > Currently I brought the node down by systemctl stop >>>>>>>>>>>>>> glusterd as I was >>>>>>>>>>>>>> > getting sporadic io issues and a few VM's paused so hoping >>>>>>>>>>>>>> that will help. >>>>>>>>>>>>>> > I may wait to do this till around 4PM when most work is >>>>>>>>>>>>>> done in case it >>>>>>>>>>>>>> > shoots load up. >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > > 1) kill pid of brick >>>>>>>>>>>>>> > > 2) to configuring of brick that you need >>>>>>>>>>>>>> > > 3) recreate brick dir >>>>>>>>>>>>>> > > 4) while the brick is still down, from the mount point: >>>>>>>>>>>>>> > > a) create a dummy non existent dir under / of mount. >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > so if noee 2 is down brick, pick node for example 3 and >>>>>>>>>>>>>> make a test dir >>>>>>>>>>>>>> > under its brick directory that doesnt exist on 2 or should >>>>>>>>>>>>>> I be dong this >>>>>>>>>>>>>> > over a gluster mount? >>>>>>>>>>>>>> You should be doing this over gluster mount. >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > > b) set a non existent extended attribute on / of mount. >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > Could you give me an example of an attribute to set? I've >>>>>>>>>>>>>> read a tad on >>>>>>>>>>>>>> > this, and looked up attributes but haven't set any yet >>>>>>>>>>>>>> myself. >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Sure. setfattr -n "user.some-name" -v "some-value" >>>>>>>>>>>>>> <path-to-mount> >>>>>>>>>>>>>> > Doing these steps will ensure that heal happens only from >>>>>>>>>>>>>> updated brick to >>>>>>>>>>>>>> > > down brick. >>>>>>>>>>>>>> > > 5) gluster v start <> force >>>>>>>>>>>>>> > > 6) gluster v heal <> >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > Will it matter if somewhere in gluster the full heal >>>>>>>>>>>>>> command was run other >>>>>>>>>>>>>> > day? Not sure if it eventually stops or times out. >>>>>>>>>>>>>> > >>>>>>>>>>>>>> full heal will stop once the crawl is done. So if you want to >>>>>>>>>>>>>> trigger heal again, >>>>>>>>>>>>>> run gluster v heal <>. Actually even brick up or volume start >>>>>>>>>>>>>> force should >>>>>>>>>>>>>> trigger the heal. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Did this on test bed today. its one server with 3 bricks on >>>>>>>>>>>>> same machine so take that for what its worth. also it still runs >>>>>>>>>>>>> 3.8.2. >>>>>>>>>>>>> Maybe ill update and re-run test. >>>>>>>>>>>>> >>>>>>>>>>>>> killed brick >>>>>>>>>>>>> deleted brick dir >>>>>>>>>>>>> recreated brick dir >>>>>>>>>>>>> created fake dir on gluster mount >>>>>>>>>>>>> set suggested fake attribute on it >>>>>>>>>>>>> ran volume start <> force >>>>>>>>>>>>> >>>>>>>>>>>>> looked at files it said needed healing and it was just 8 >>>>>>>>>>>>> shards that were modified for few minutes I ran through steps >>>>>>>>>>>>> >>>>>>>>>>>>> gave it few minutes and it stayed same >>>>>>>>>>>>> ran gluster volume <> heal >>>>>>>>>>>>> >>>>>>>>>>>>> it healed all the directories and files you can see over mount >>>>>>>>>>>>> including fakedir. >>>>>>>>>>>>> >>>>>>>>>>>>> same issue for shards though. it adds more shards to heal at >>>>>>>>>>>>> glacier pace. slight jump in speed if I stat every file and dir >>>>>>>>>>>>> in VM >>>>>>>>>>>>> running but not all shards. >>>>>>>>>>>>> >>>>>>>>>>>>> It started with 8 shards to heal and is now only at 33 out of >>>>>>>>>>>>> 800 and probably wont finish adding for few days at rate it goes. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > > 1st node worked as expected took 12 hours to heal 1TB >>>>>>>>>>>>>> data. Load was >>>>>>>>>>>>>> > > little >>>>>>>>>>>>>> > > > heavy but nothing shocking. >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > About an hour after node 1 finished I began same >>>>>>>>>>>>>> process on node2. Heal >>>>>>>>>>>>>> > > > proces kicked in as before and the files in directories >>>>>>>>>>>>>> visible from >>>>>>>>>>>>>> > > mount >>>>>>>>>>>>>> > > > and .glusterfs healed in short time. Then it began >>>>>>>>>>>>>> crawl of .shard adding >>>>>>>>>>>>>> > > > those files to heal count at which point the entire >>>>>>>>>>>>>> proces ground to a >>>>>>>>>>>>>> > > halt >>>>>>>>>>>>>> > > > basically. After 48 hours out of 19k shards it has >>>>>>>>>>>>>> added 5900 to heal >>>>>>>>>>>>>> > > list. >>>>>>>>>>>>>> > > > Load on all 3 machnes is negligible. It was suggested >>>>>>>>>>>>>> to change this >>>>>>>>>>>>>> > > value >>>>>>>>>>>>>> > > > to full cluster.data-self-heal-algorithm and restart >>>>>>>>>>>>>> volume which I >>>>>>>>>>>>>> > > did. No >>>>>>>>>>>>>> > > > efffect. Tried relaunching heal no effect, despite any >>>>>>>>>>>>>> node picked. I >>>>>>>>>>>>>> > > > started each VM and performed a stat of all files from >>>>>>>>>>>>>> within it, or a >>>>>>>>>>>>>> > > full >>>>>>>>>>>>>> > > > virus scan and that seemed to cause short small spikes >>>>>>>>>>>>>> in shards added, >>>>>>>>>>>>>> > > but >>>>>>>>>>>>>> > > > not by much. Logs are showing no real messages >>>>>>>>>>>>>> indicating anything is >>>>>>>>>>>>>> > > going >>>>>>>>>>>>>> > > > on. I get hits to brick log on occasion of null lookups >>>>>>>>>>>>>> making me think >>>>>>>>>>>>>> > > its >>>>>>>>>>>>>> > > > not really crawling shards directory but waiting for a >>>>>>>>>>>>>> shard lookup to >>>>>>>>>>>>>> > > add >>>>>>>>>>>>>> > > > it. I'll get following in brick log but not constant >>>>>>>>>>>>>> and sometime >>>>>>>>>>>>>> > > multiple >>>>>>>>>>>>>> > > > for same shard. >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > [2016-08-29 08:31:57.478125] W [MSGID: 115009] >>>>>>>>>>>>>> > > > [server-resolve.c:569:server_resolve] >>>>>>>>>>>>>> 0-GLUSTER1-server: no resolution >>>>>>>>>>>>>> > > type >>>>>>>>>>>>>> > > > for (null) (LOOKUP) >>>>>>>>>>>>>> > > > [2016-08-29 08:31:57.478170] E [MSGID: 115050] >>>>>>>>>>>>>> > > > [server-rpc-fops.c:156:server_lookup_cbk] >>>>>>>>>>>>>> 0-GLUSTER1-server: 12591783: >>>>>>>>>>>>>> > > > LOOKUP (null) (00000000-0000-0000-00 >>>>>>>>>>>>>> > > > 00-000000000000/241a55ed-f0d5-4dbc-a6ce-ab784a0ba6ff.221) >>>>>>>>>>>>>> ==> (Invalid >>>>>>>>>>>>>> > > > argument) [Invalid argument] >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > This one repeated about 30 times in row then nothing >>>>>>>>>>>>>> for 10 minutes then >>>>>>>>>>>>>> > > one >>>>>>>>>>>>>> > > > hit for one different shard by itself. >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > How can I determine if Heal is actually running? How >>>>>>>>>>>>>> can I kill it or >>>>>>>>>>>>>> > > force >>>>>>>>>>>>>> > > > restart? Does node I start it from determine which >>>>>>>>>>>>>> directory gets >>>>>>>>>>>>>> > > crawled to >>>>>>>>>>>>>> > > > determine heals? >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > David Gossage >>>>>>>>>>>>>> > > > Carousel Checks Inc. | System Administrator >>>>>>>>>>>>>> > > > Office 708.613.2284 >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > _______________________________________________ >>>>>>>>>>>>>> > > > Gluster-users mailing list >>>>>>>>>>>>>> > > > Gluster-users@gluster.org >>>>>>>>>>>>>> > > > http://www.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > _______________________________________________ >>>>>>>>>>>>>> > > > Gluster-users mailing list >>>>>>>>>>>>>> > > > Gluster-users@gluster.org >>>>>>>>>>>>>> > > > http://www.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > -- >>>>>>>>>>>>>> > > Thanks, >>>>>>>>>>>>>> > > Anuradha. >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Anuradha. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users