You are right, stat triggers self-heal. Thank you!

--
Dmitry Glushenok
Jet Infosystems

> 17 авг. 2016 г., в 13:38, Ravishankar N <ravishan...@redhat.com> написал(а):
> 
> On 08/17/2016 03:48 PM, Дмитрий Глушенок wrote:
>> Unfortunately not:
>> 
>> Remount FS, then access test file from second client:
>> 
>> [root@srv02 ~]# umount /mnt
>> [root@srv02 ~]# mount -t glusterfs srv01:/test01 /mnt
>> [root@srv02 ~]# ls -l /mnt/passwd 
>> -rw-r--r--. 1 root root 1505 авг 16 19:59 /mnt/passwd
>> [root@srv02 ~]# ls -l /R1/test01/
>> итого 4
>> -rw-r--r--. 2 root root 1505 авг 16 19:59 passwd
>> [root@srv02 ~]# 
>> 
>> Then remount FS and check if accessing the file from second node triggered 
>> self-heal on first node:
>> 
>> [root@srv01 ~]# umount /mnt
>> [root@srv01 ~]# mount -t glusterfs srv01:/test01 /mnt
>> [root@srv01 ~]# ls -l /mnt
> 
> Can you try `stat /mnt/passwd` from this node after remounting? You need to 
> explicitly lookup the file.  `ls -l /mnt`  is only triggering readdir on the 
> parent directory.
> If that doesn't work, is this mount connected to both clients? i.e. if you 
> create a new file from here, is it getting replicated to both bricks?
> 
> -Ravi
> 
>> итого 0
>> [root@srv01 ~]# ls -l /R1/test01/
>> итого 0
>> [root@srv01 ~]#
>> 
>> Nothing appeared.
>> 
>> [root@srv01 ~]# gluster volume info test01
>>  
>> Volume Name: test01
>> Type: Replicate
>> Volume ID: 2c227085-0b06-4804-805c-ea9c1bb11d8b
>> Status: Started
>> Number of Bricks: 1 x 2 = 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: srv01:/R1/test01
>> Brick2: srv02:/R1/test01
>> Options Reconfigured:
>> features.scrub-freq: hourly
>> features.scrub: Active
>> features.bitrot: on
>> transport.address-family: inet
>> performance.readdir-ahead: on
>> nfs.disable: on
>> [root@srv01 ~]# 
>> 
>> [root@srv01 ~]# gluster volume get test01 all | grep heal
>> cluster.background-self-heal-count      8                                    
>>    
>> cluster.metadata-self-heal              on                                   
>>    
>> cluster.data-self-heal                  on                                   
>>    
>> cluster.entry-self-heal                 on                                   
>>    
>> cluster.self-heal-daemon                on                                   
>>    
>> cluster.heal-timeout                    600                                  
>>    
>> cluster.self-heal-window-size           1                                    
>>    
>> cluster.data-self-heal-algorithm        (null)                               
>>    
>> cluster.self-heal-readdir-size          1KB                                  
>>    
>> cluster.heal-wait-queue-length          128                                  
>>    
>> features.lock-heal                      off                                  
>>    
>> features.lock-heal                      off                                  
>>    
>> storage.health-check-interval           30                                   
>>    
>> features.ctr_lookupheal_link_timeout    300                                  
>>    
>> features.ctr_lookupheal_inode_timeout   300                                  
>>    
>> cluster.disperse-self-heal-daemon       enable                               
>>    
>> disperse.background-heals               8                                    
>>    
>> disperse.heal-wait-qlength              128                                  
>>    
>> cluster.heal-timeout                    600                                  
>>    
>> cluster.granular-entry-heal             no                                   
>>    
>> [root@srv01 ~]#
>> 
>> --
>> Dmitry Glushenok
>> Jet Infosystems
>> 
>>> 17 авг. 2016 г., в 11:30, Ravishankar N <ravishan...@redhat.com 
>>> <mailto:ravishan...@redhat.com>> написал(а):
>>> 
>>> On 08/17/2016 01:48 PM, Дмитрий Глушенок wrote:
>>>> Hello Ravi,
>>>> 
>>>> Thank you for reply. Found bug number (for those who will google the 
>>>> email) https://bugzilla.redhat.com/show_bug.cgi?id=1112158 
>>>> <https://bugzilla.redhat.com/show_bug.cgi?id=1112158>
>>>> 
>>>> Accessing the removed file from mount-point is not always working because 
>>>> we have to find a special client which DHT will point to the brick with 
>>>> removed file. Otherwise the file will be accessed from good brick and 
>>>> self-healing will not happen (just verified). Or by accessing you meant 
>>>> something like touch?
>>> 
>>> Sorry should have been more explicit. I meant triggering a lookup on that 
>>> file with `stat filename`. I don't think you need a special client. DHT 
>>> sends the lookup to AFR which in turn sends to all its children. When one 
>>> of them returns ENOENT (because you removed it from the brick), AFR will 
>>> automatically trigger heal. I'm guessing it is not always working in your 
>>> case due to caching at various levels and the lookup not coming till AFR. 
>>> If you do it from a fresh mount ,it should always work.
>>> -Ravi
>>> 
>>>> Dmitry Glushenok
>>>> Jet Infosystems
>>>> 
>>>>> 17 авг. 2016 г., в 4:24, Ravishankar N <ravishan...@redhat.com 
>>>>> <mailto:ravishan...@redhat.com>> написал(а):
>>>>> 
>>>>> On 08/16/2016 10:44 PM, Дмитрий Глушенок wrote:
>>>>>> Hello,
>>>>>> 
>>>>>> While testing healing after bitrot error it was found that self healing 
>>>>>> cannot heal files which were manually deleted from brick. Gluster 3.8.1:
>>>>>> 
>>>>>> - Create volume, mount it locally and copy test file to it
>>>>>> [root@srv01 ~]# gluster volume create test01 replica 2  srv01:/R1/test01 
>>>>>> srv02:/R1/test01
>>>>>> volume create: test01: success: please start the volume to access data
>>>>>> [root@srv01 ~]# gluster volume start test01
>>>>>> volume start: test01: success
>>>>>> [root@srv01 ~]# mount -t glusterfs srv01:/test01 /mnt
>>>>>> [root@srv01 ~]# cp /etc/passwd /mnt
>>>>>> [root@srv01 ~]# ls -l /mnt
>>>>>> итого 2
>>>>>> -rw-r--r--. 1 root root 1505 авг 16 19:59 passwd
>>>>>> 
>>>>>> - Then remove test file from first brick like we have to do in case of 
>>>>>> bitrot error in the file
>>>>> 
>>>>> You also need to remove all hard-links to the corrupted file from the 
>>>>> brick, including the one in the .glusterfs folder.
>>>>> There is a bug in heal-full that prevents it from crawling all bricks of 
>>>>> the replica. The right way to heal the corrupted files as of now is to 
>>>>> access them from the mount-point like you did after removing the 
>>>>> hard-links. The list of files that are corrupted can be obtained with the 
>>>>> scrub status command.
>>>>> 
>>>>> Hope this helps,
>>>>> Ravi
>>>>> 
>>>>>> [root@srv01 ~]# rm /R1/test01/passwd
>>>>>> [root@srv01 ~]# ls -l /mnt
>>>>>> итого 0
>>>>>> [root@srv01 ~]#
>>>>>> 
>>>>>> - Issue full self heal
>>>>>> [root@srv01 ~]# gluster volume heal test01 full
>>>>>> Launching heal operation to perform full self heal on volume test01 has 
>>>>>> been successful
>>>>>> Use heal info commands to check status
>>>>>> [root@srv01 ~]# tail -2 /var/log/glusterfs/glustershd.log
>>>>>> [2016-08-16 16:59:56.483767] I [MSGID: 108026] 
>>>>>> [afr-self-heald.c:611:afr_shd_full_healer] 0-test01-replicate-0: 
>>>>>> starting full sweep on subvol test01-client-0
>>>>>> [2016-08-16 16:59:56.486560] I [MSGID: 108026] 
>>>>>> [afr-self-heald.c:621:afr_shd_full_healer] 0-test01-replicate-0: 
>>>>>> finished full sweep on subvol test01-client-0
>>>>>> 
>>>>>> - Now we still see no files in mount point (it becomes empty right after 
>>>>>> removing file from the brick)
>>>>>> [root@srv01 ~]# ls -l /mnt
>>>>>> итого 0
>>>>>> [root@srv01 ~]#
>>>>>> 
>>>>>> - Then try to access file by using full name (lookup-optimize and 
>>>>>> readdir-optimize are turned off by default). Now glusterfs shows the 
>>>>>> file!
>>>>>> [root@srv01 ~]# ls -l /mnt/passwd
>>>>>> -rw-r--r--. 1 root root 1505 авг 16 19:59 /mnt/passwd
>>>>>> 
>>>>>> - And it reappeared in the brick
>>>>>> [root@srv01 ~]# ls -l /R1/test01/
>>>>>> итого 4
>>>>>> -rw-r--r--. 2 root root 1505 авг 16 19:59 passwd
>>>>>> [root@srv01 ~]#
>>>>>> 
>>>>>> Is it a bug or we can tell self heal to scan all files on all bricks in 
>>>>>> the volume?
>>>>>> 
>>>>>> --
>>>>>> Dmitry Glushenok
>>>>>> Jet Infosystems
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users@gluster.org <mailto:Gluster-users@gluster.org>
>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users 
>>>>>> <http://www.gluster.org/mailman/listinfo/gluster-users>
>>> 
>> 
> 

_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Reply via email to