Re: [Gluster-users] Self healing does not see files to heal

Ravishankar N Wed, 17 Aug 2016 03:39:13 -0700

On 08/17/2016 03:48 PM, Дмитрий Глушенок wrote:

Unfortunately not:


Remount FS, then access test file from second client:

[root@srv02 ~]# umount /mnt
[root@srv02 ~]# mount -t glusterfs srv01:/test01 /mnt
[root@srv02 ~]# ls -l /mnt/passwd
-rw-r--r--. 1 root root 1505 авг 16 19:59 /mnt/passwd
[root@srv02 ~]# ls -l /R1/test01/
итого 4
-rw-r--r--. 2 root root 1505 авг 16 19:59 passwd
[root@srv02 ~]#

Then remount FS and check if accessing the file from second nodetriggered self-heal on first node:


[root@srv01 ~]# umount /mnt
[root@srv01 ~]# mount -t glusterfs srv01:/test01 /mnt
[root@srv01 ~]# ls -l /mnt

Can you try `stat /mnt/passwd` from this node after remounting? You needto explicitly lookup the file. `ls -l /mnt` is only triggering readdiron the parent directory.If that doesn't work, is this mount connected to both clients? i.e. ifyou create a new file from here, is it getting replicated to both bricks?


-Ravi

итого 0
[root@srv01 ~]# ls -l /R1/test01/
итого 0
[root@srv01 ~]#

Nothing appeared.

[root@srv01 ~]# gluster volume info test01
Volume Name: test01
Type: Replicate
Volume ID: 2c227085-0b06-4804-805c-ea9c1bb11d8b
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: srv01:/R1/test01
Brick2: srv02:/R1/test01
Options Reconfigured:
features.scrub-freq: hourly
features.scrub: Active
features.bitrot: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
[root@srv01 ~]#

[root@srv01 ~]# gluster volume get test01 all | grep heal
cluster.background-self-heal-count      8
cluster.metadata-self-heal              on
cluster.data-self-heal                  on
cluster.entry-self-heal                 on
cluster.self-heal-daemon                on
cluster.heal-timeout                    600
cluster.self-heal-window-size           1
cluster.data-self-heal-algorithm        (null)
cluster.self-heal-readdir-size          1KB
cluster.heal-wait-queue-length          128
features.lock-heal                      off
features.lock-heal                      off
storage.health-check-interval           30
features.ctr_lookupheal_link_timeout    300
features.ctr_lookupheal_inode_timeout   300
cluster.disperse-self-heal-daemon       enable
disperse.background-heals               8
disperse.heal-wait-qlength              128
cluster.heal-timeout                    600
cluster.granular-entry-heal             no
[root@srv01 ~]#

--
Dmitry Glushenok
Jet Infosystems
17 авг. 2016 г., в 11:30, Ravishankar N <ravishan...@redhat.com<mailto:ravishan...@redhat.com>> написал(а):
On 08/17/2016 01:48 PM, Дмитрий Глушенок wrote:
Hello Ravi,
Thank you for reply. Found bug number (for those who will google theemail) https://bugzilla.redhat.com/show_bug.cgi?id=1112158
Accessing the removed file from mount-point is not always workingbecause we have to find a special client which DHT will point to thebrick with removed file. Otherwise the file will be accessed fromgood brick and self-healing will not happen (just verified). Or byaccessing you meant something like touch?
Sorry should have been more explicit. I meant triggering a lookup onthat file with `stat filename`. I don't think you need a specialclient. DHT sends the lookup to AFR which in turn sends to all itschildren. When one of them returns ENOENT (because you removed itfrom the brick), AFR will automatically trigger heal. I'm guessing itis not always working in your case due to caching at various levelsand the lookup not coming till AFR. If you do it from a fresh mount,it should always work.
-Ravi
Dmitry Glushenok
Jet Infosystems
17 авг. 2016 г., в 4:24, Ravishankar N <ravishan...@redhat.com<mailto:ravishan...@redhat.com>> написал(а):
On 08/16/2016 10:44 PM, Дмитрий Глушенок wrote:
Hello,
While testing healing after bitrot error it was found that selfhealing cannot heal files which were manually deleted from brick.Gluster 3.8.1:
- Create volume, mount it locally and copy test file to it
[root@srv01 ~]# gluster volume create test01 replica 2srv01:/R1/test01 srv02:/R1/test01
volume create: test01: success: please start the volume to access data
[root@srv01 ~]# gluster volume start test01
volume start: test01: success
[root@srv01 ~]# mount -t glusterfs srv01:/test01 /mnt
[root@srv01 ~]# cp /etc/passwd /mnt
[root@srv01 ~]# ls -l /mnt
итого 2
-rw-r--r--. 1 root root 1505 авг 16 19:59 passwd
- Then remove test file from first brick like we have to do incase of bitrot error in the file
You also need to remove all hard-links to the corrupted file fromthe brick, including the one in the .glusterfs folder.There is a bug in heal-full that prevents it from crawling allbricks of the replica. The right way to heal the corrupted files asof now is to access them from the mount-point like you did afterremoving the hard-links. The list of files that are corrupted canbe obtained with the scrub status command.
Hope this helps,
Ravi
[root@srv01 ~]# rm /R1/test01/passwd
[root@srv01 ~]# ls -l /mnt
итого 0
[root@srv01 ~]#

- Issue full self heal
[root@srv01 ~]# gluster volume heal test01 full
Launching heal operation to perform full self heal on volumetest01 has been successful
Use heal info commands to check status
[root@srv01 ~]# tail -2 /var/log/glusterfs/glustershd.log
[2016-08-16 16:59:56.483767] I [MSGID: 108026][afr-self-heald.c:611:afr_shd_full_healer] 0-test01-replicate-0:starting full sweep on subvol test01-client-0[2016-08-16 16:59:56.486560] I [MSGID: 108026][afr-self-heald.c:621:afr_shd_full_healer] 0-test01-replicate-0:finished full sweep on subvol test01-client-0
- Now we still see no files in mount point (it becomes empty rightafter removing file from the brick)
[root@srv01 ~]# ls -l /mnt
итого 0
[root@srv01 ~]#
- Then try to access file by using full name (lookup-optimize andreaddir-optimize are turned off by default). Now glusterfs showsthe file!
[root@srv01 ~]# ls -l /mnt/passwd
-rw-r--r--. 1 root root 1505 авг 16 19:59 /mnt/passwd

- And it reappeared in the brick
[root@srv01 ~]# ls -l /R1/test01/
итого 4
-rw-r--r--. 2 root root 1505 авг 16 19:59 passwd
[root@srv01 ~]#
Is it a bug or we can tell self heal to scan all files on allbricks in the volume?
--
Dmitry Glushenok
Jet Infosystems

_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org <mailto:Gluster-users@gluster.org>
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Self healing does not see files to heal

Reply via email to