You are right, stat triggers self-heal. Thank you! -- Dmitry Glushenok Jet Infosystems
> 17 авг. 2016 г., в 13:38, Ravishankar N <ravishan...@redhat.com> написал(а): > > On 08/17/2016 03:48 PM, Дмитрий Глушенок wrote: >> Unfortunately not: >> >> Remount FS, then access test file from second client: >> >> [root@srv02 ~]# umount /mnt >> [root@srv02 ~]# mount -t glusterfs srv01:/test01 /mnt >> [root@srv02 ~]# ls -l /mnt/passwd >> -rw-r--r--. 1 root root 1505 авг 16 19:59 /mnt/passwd >> [root@srv02 ~]# ls -l /R1/test01/ >> итого 4 >> -rw-r--r--. 2 root root 1505 авг 16 19:59 passwd >> [root@srv02 ~]# >> >> Then remount FS and check if accessing the file from second node triggered >> self-heal on first node: >> >> [root@srv01 ~]# umount /mnt >> [root@srv01 ~]# mount -t glusterfs srv01:/test01 /mnt >> [root@srv01 ~]# ls -l /mnt > > Can you try `stat /mnt/passwd` from this node after remounting? You need to > explicitly lookup the file. `ls -l /mnt` is only triggering readdir on the > parent directory. > If that doesn't work, is this mount connected to both clients? i.e. if you > create a new file from here, is it getting replicated to both bricks? > > -Ravi > >> итого 0 >> [root@srv01 ~]# ls -l /R1/test01/ >> итого 0 >> [root@srv01 ~]# >> >> Nothing appeared. >> >> [root@srv01 ~]# gluster volume info test01 >> >> Volume Name: test01 >> Type: Replicate >> Volume ID: 2c227085-0b06-4804-805c-ea9c1bb11d8b >> Status: Started >> Number of Bricks: 1 x 2 = 2 >> Transport-type: tcp >> Bricks: >> Brick1: srv01:/R1/test01 >> Brick2: srv02:/R1/test01 >> Options Reconfigured: >> features.scrub-freq: hourly >> features.scrub: Active >> features.bitrot: on >> transport.address-family: inet >> performance.readdir-ahead: on >> nfs.disable: on >> [root@srv01 ~]# >> >> [root@srv01 ~]# gluster volume get test01 all | grep heal >> cluster.background-self-heal-count 8 >> >> cluster.metadata-self-heal on >> >> cluster.data-self-heal on >> >> cluster.entry-self-heal on >> >> cluster.self-heal-daemon on >> >> cluster.heal-timeout 600 >> >> cluster.self-heal-window-size 1 >> >> cluster.data-self-heal-algorithm (null) >> >> cluster.self-heal-readdir-size 1KB >> >> cluster.heal-wait-queue-length 128 >> >> features.lock-heal off >> >> features.lock-heal off >> >> storage.health-check-interval 30 >> >> features.ctr_lookupheal_link_timeout 300 >> >> features.ctr_lookupheal_inode_timeout 300 >> >> cluster.disperse-self-heal-daemon enable >> >> disperse.background-heals 8 >> >> disperse.heal-wait-qlength 128 >> >> cluster.heal-timeout 600 >> >> cluster.granular-entry-heal no >> >> [root@srv01 ~]# >> >> -- >> Dmitry Glushenok >> Jet Infosystems >> >>> 17 авг. 2016 г., в 11:30, Ravishankar N <ravishan...@redhat.com >>> <mailto:ravishan...@redhat.com>> написал(а): >>> >>> On 08/17/2016 01:48 PM, Дмитрий Глушенок wrote: >>>> Hello Ravi, >>>> >>>> Thank you for reply. Found bug number (for those who will google the >>>> email) https://bugzilla.redhat.com/show_bug.cgi?id=1112158 >>>> <https://bugzilla.redhat.com/show_bug.cgi?id=1112158> >>>> >>>> Accessing the removed file from mount-point is not always working because >>>> we have to find a special client which DHT will point to the brick with >>>> removed file. Otherwise the file will be accessed from good brick and >>>> self-healing will not happen (just verified). Or by accessing you meant >>>> something like touch? >>> >>> Sorry should have been more explicit. I meant triggering a lookup on that >>> file with `stat filename`. I don't think you need a special client. DHT >>> sends the lookup to AFR which in turn sends to all its children. When one >>> of them returns ENOENT (because you removed it from the brick), AFR will >>> automatically trigger heal. I'm guessing it is not always working in your >>> case due to caching at various levels and the lookup not coming till AFR. >>> If you do it from a fresh mount ,it should always work. >>> -Ravi >>> >>>> Dmitry Glushenok >>>> Jet Infosystems >>>> >>>>> 17 авг. 2016 г., в 4:24, Ravishankar N <ravishan...@redhat.com >>>>> <mailto:ravishan...@redhat.com>> написал(а): >>>>> >>>>> On 08/16/2016 10:44 PM, Дмитрий Глушенок wrote: >>>>>> Hello, >>>>>> >>>>>> While testing healing after bitrot error it was found that self healing >>>>>> cannot heal files which were manually deleted from brick. Gluster 3.8.1: >>>>>> >>>>>> - Create volume, mount it locally and copy test file to it >>>>>> [root@srv01 ~]# gluster volume create test01 replica 2 srv01:/R1/test01 >>>>>> srv02:/R1/test01 >>>>>> volume create: test01: success: please start the volume to access data >>>>>> [root@srv01 ~]# gluster volume start test01 >>>>>> volume start: test01: success >>>>>> [root@srv01 ~]# mount -t glusterfs srv01:/test01 /mnt >>>>>> [root@srv01 ~]# cp /etc/passwd /mnt >>>>>> [root@srv01 ~]# ls -l /mnt >>>>>> итого 2 >>>>>> -rw-r--r--. 1 root root 1505 авг 16 19:59 passwd >>>>>> >>>>>> - Then remove test file from first brick like we have to do in case of >>>>>> bitrot error in the file >>>>> >>>>> You also need to remove all hard-links to the corrupted file from the >>>>> brick, including the one in the .glusterfs folder. >>>>> There is a bug in heal-full that prevents it from crawling all bricks of >>>>> the replica. The right way to heal the corrupted files as of now is to >>>>> access them from the mount-point like you did after removing the >>>>> hard-links. The list of files that are corrupted can be obtained with the >>>>> scrub status command. >>>>> >>>>> Hope this helps, >>>>> Ravi >>>>> >>>>>> [root@srv01 ~]# rm /R1/test01/passwd >>>>>> [root@srv01 ~]# ls -l /mnt >>>>>> итого 0 >>>>>> [root@srv01 ~]# >>>>>> >>>>>> - Issue full self heal >>>>>> [root@srv01 ~]# gluster volume heal test01 full >>>>>> Launching heal operation to perform full self heal on volume test01 has >>>>>> been successful >>>>>> Use heal info commands to check status >>>>>> [root@srv01 ~]# tail -2 /var/log/glusterfs/glustershd.log >>>>>> [2016-08-16 16:59:56.483767] I [MSGID: 108026] >>>>>> [afr-self-heald.c:611:afr_shd_full_healer] 0-test01-replicate-0: >>>>>> starting full sweep on subvol test01-client-0 >>>>>> [2016-08-16 16:59:56.486560] I [MSGID: 108026] >>>>>> [afr-self-heald.c:621:afr_shd_full_healer] 0-test01-replicate-0: >>>>>> finished full sweep on subvol test01-client-0 >>>>>> >>>>>> - Now we still see no files in mount point (it becomes empty right after >>>>>> removing file from the brick) >>>>>> [root@srv01 ~]# ls -l /mnt >>>>>> итого 0 >>>>>> [root@srv01 ~]# >>>>>> >>>>>> - Then try to access file by using full name (lookup-optimize and >>>>>> readdir-optimize are turned off by default). Now glusterfs shows the >>>>>> file! >>>>>> [root@srv01 ~]# ls -l /mnt/passwd >>>>>> -rw-r--r--. 1 root root 1505 авг 16 19:59 /mnt/passwd >>>>>> >>>>>> - And it reappeared in the brick >>>>>> [root@srv01 ~]# ls -l /R1/test01/ >>>>>> итого 4 >>>>>> -rw-r--r--. 2 root root 1505 авг 16 19:59 passwd >>>>>> [root@srv01 ~]# >>>>>> >>>>>> Is it a bug or we can tell self heal to scan all files on all bricks in >>>>>> the volume? >>>>>> >>>>>> -- >>>>>> Dmitry Glushenok >>>>>> Jet Infosystems >>>>>> >>>>>> _______________________________________________ >>>>>> Gluster-users mailing list >>>>>> Gluster-users@gluster.org <mailto:Gluster-users@gluster.org> >>>>>> http://www.gluster.org/mailman/listinfo/gluster-users >>>>>> <http://www.gluster.org/mailman/listinfo/gluster-users> >>> >> >
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users