Dear Aravinda,

Thank you for the analysis and submitting a patch for this issue. I hope it can 
make it into the next GlusterFS release 3.7.7. 


As suggested I ran the find_gfid_issues.py script on my brick on the two master 
nodes and slave nodes but the only output it shows to me is the following:


NO GFID(DIR)     : /data/myvolume-geo/brick/test
NO GFID(DIR)     : /data/myvolume-geo/brick/data
NO GFID(DIR)     : /data/myvolume-geo/brick/data/files_encryption
NO GFID(DIR)     : /data/myvolume-geo/brick/data/username


As you can see there are no files at all. So I am still left with 394 files of 
0 kBytes on my geo-rep slave node. Do you have any suggestion how to cleanup 
this mess?


Best regards
ML



On Tuesday, February 2, 2016 7:59 AM, Aravinda <avish...@redhat.com> wrote:
Hi ML,

We analyzed the issue. Looks like Changelog is replayed may be because 
of Geo-rep worker crash or Active/Passive switch or both Geo-rep workers 
becoming active.

>From changelogs,

CREATE  logo-login-04.svg.part
RENAME logo-login-04.svg.part logo-login-04.svg

When it is replayed,
CREATE  logo-login-04.svg.part
RENAME logo-login-04.svg.part logo-login-04.svg
CREATE  logo-login-04.svg.part
RENAME logo-login-04.svg.part logo-login-04.svg

During replay backend GFID link is broken and Geo-rep failed to cleanup. 
Milind is working on the patch to fix the same. Patches are in review 
and expected to be available in 3.7.8 release.

http://review.gluster.org/#/c/13316/
http://review.gluster.org/#/c/13189/

Following script can be used to find problematic file in each Brick backend.
https://gist.github.com/aravindavk/29f673f13c2f8963447e

regards
Aravinda

On 02/01/2016 08:45 PM, ML mail wrote:
> Sure, I will just send it to you through an encrypted cloud storage app and 
> send you the password via private mail.
>
> Regards
> ML
>
>
>
> On Monday, February 1, 2016 3:14 PM, Saravanakumar Arumugam 
> <sarum...@redhat.com> wrote:
>
>
> On 02/01/2016 07:22 PM, ML mail wrote:
>> I just found out I needed to run the getfattr on a mount and not on the 
>> glusterfs server directly. So here are the additional output you asked for:
>>
>>
>> # getfattr -n glusterfs.gfid.string  -m .  logo-login-09.svg
>> # file: logo-login-09.svg
>> glusterfs.gfid.string="1c648409-e98b-4544-a7fa-c2aef87f92ad"
>>
>> # grep 1c648409-e98b-4544-a7fa-c2aef87f92ad 
>> /data/myvolume/brick/.glusterfs/changelogs -rn
>> Binary file /data/myvolume/brick/.glusterfs/changelogs/CHANGELOG.1454278219 
>> matches
> Great!  Can you share the CHANGELOG ?  ( It contains various fops
> carried out on this gfid)
>
>> Regards
>> ML
>>
>>
>>
>> On Monday, February 1, 2016 1:30 PM, Saravanakumar Arumugam 
>> <sarum...@redhat.com> wrote:
>> Hi,
>>
>> On 02/01/2016 02:14 PM, ML mail wrote:
>>> Hello,
>>>
>>> I just set up distributed geo-replication to a slave on my 2 nodes' 
>>> replicated volume and noticed quite a few error messages (around 70 of 
>>> them) in the slave's brick log file:
>>>
>>> The exact log file is: /var/log/glusterfs/bricks/data-myvolume-geo-brick.log
>>>
>>> [2016-01-31 22:19:29.524370] E [MSGID: 113020] [posix.c:1221:posix_mknod] 
>>> 0-myvolume-geo-posix: setting gfid on 
>>> /data/myvolume-geo/brick/data/username/files/shared/logo-login-09.svg.ocTransferId1789604916.part
>>>  failed
>>> [2016-01-31 22:19:29.535478] W [MSGID: 113026] [posix.c:1338:posix_mkdir] 
>>> 0-myvolume-geo-posix: mkdir 
>>> (/data/username/files_encryption/keys/files/shared/logo-login-09.svg.ocTransferId1789604916.part):
>>>  gfid (15bbcec6-a332-4c21-81e4-c52472b1e13d) isalready associated with 
>>> directory 
>>> (/data/myvolume-geo/brick/.glusterfs/49/5d/495d6868-4844-4632-8ff9-ad9646a878fe/logo-login-09.svg).
>>>  Hence,both directories will share same gfid and thiscan lead to 
>>> inconsistencies.
>> Can you grep for this gfid(of the corresponding files) in changelogs and
>> share those files ?
>>
>> {
>> For example:
>>
>> 1. Get gfid of the files like this:
>>
>> # getfattr -n glusterfs.gfid.string  -m .  /mnt/slave/file456
>> getfattr: Removing leading '/' from absolute path names
>> # file: mnt/slave/file456
>> glusterfs.gfid.string="05b22446-de9e-42df-a63e-399c24d690c4"
>>
>> 2. grep for the corresponding gfid in brick back end like below:
>>
>> [root@gfvm3 changelogs]# grep 05b22446-de9e-42df-a63e-399c24d690c4
>> /opt/volume_test/tv_2/b1/.glusterfs/changelogs/ -rn
>> Binary file
>> /opt/volume_test/tv_2/b1/.glusterfs/changelogs/CHANGELOG.1454135265 matches
>> Binary file
>> /opt/volume_test/tv_2/b1/.glusterfs/changelogs/CHANGELOG.1454135476 matches
>>
>> }
>> This will help in understanding what operations are carried out in
>> master volume, which leads to this inconsistency.
>>
>> Also, get the following:
>> gluster version
>> gluster volume info
>> gluster volume geo-replication status
>>
>>> This doesn't look good at all because the file mentioned in the error 
>>> message (
>>> logo-login-09.svg.ocTransferId1789604916.part) is left there with 0 kbytes 
>>> and does not get deleted or cleaned up by glusterfs, leaving my geo-rep 
>>> slave node in an inconsistent state which does not reflect the reality from 
>>> the master nodes. The master nodes don't have that file anymore (which is 
>>> correct). Here below is an "ls" of the concerned file with the correct file 
>>> on top.
>>>
>>>
>>> -rw-r--r-- 2 www-data www-data   24312 Jan  6  2014 logo-login-09.svg
>>> -rw-r--r-- 1 root     root           0 Jan 31 23:19 
>>> logo-login-09.svg.ocTransferId1789604916.part
>> Rename issues in geo-replication are fixed lately. This looks similar to
>>
>> one.
>>
>> Thanks,
>> Saravana
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users

> _______________________________________________
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Reply via email to