Re: [Gluster-users] Setting gfid failed on slave geo-rep node
That's correct, I had in total 394 files and directories which where not existant on any of my two master nodes bricks. So as you suggested I have now stopped the geo-rep and deleted the concerned files and directories on the slave node and restarted the geo-rep. It's all clean again but I will not use it anymore in production until the patch is out in order to fix this. Thanks again for your help and I am looking forward for next release including that patch. Regards ML On Thursday, February 4, 2016 11:14 AM, Saravanakumar Arumugam wrote: Hi, On 02/03/2016 08:09 PM, ML mail wrote: > Dear Aravinda, > > Thank you for the analysis and submitting a patch for this issue. I hope it > can make it into the next GlusterFS release 3.7.7. > > > As suggested I ran the find_gfid_issues.py script on my brick on the two > master nodes and slave nodes but the only output it shows to me is the > following: You need to run the script only in Slave. > > NO GFID(DIR) : /data/myvolume-geo/brick/test > NO GFID(DIR) : /data/myvolume-geo/brick/data > NO GFID(DIR) : /data/myvolume-geo/brick/data/files_encryption > NO GFID(DIR) : /data/myvolume-geo/brick/data/username > > > As you can see there are no files at all. So I am still left with 394 files > of 0 kBytes on my geo-rep slave node. Do you have any suggestion how to > cleanup this mess? Do you mean to say script shows only 4 directories, but there are 394 files on slave node? Ok, as of now there is no automatic way of cleaning these files..and you need to manually remove them. You can follow these steps: 1. stop geo-replication session. 2. get list of all 0 kByte files and delete them. It is important to ensure there is no source file exists in master for those files. ( For example, logo-login-09.svg.ocTransferId1789604916.part is 0 kByte file, ensure no such source file exists in master Otherwise, you may end up delete files which are in sync progress) 3. Start geo-replication session. With the patch coming in, these errors should not be encountered in future. Thanks, Saravana > > Best regards > ML > > > > On Tuesday, February 2, 2016 7:59 AM, Aravinda wrote: > Hi ML, > > We analyzed the issue. Looks like Changelog is replayed may be because > of Geo-rep worker crash or Active/Passive switch or both Geo-rep workers > becoming active. > > From changelogs, > > CREATE logo-login-04.svg.part > RENAME logo-login-04.svg.part logo-login-04.svg > > When it is replayed, > CREATE logo-login-04.svg.part > RENAME logo-login-04.svg.part logo-login-04.svg > CREATE logo-login-04.svg.part > RENAME logo-login-04.svg.part logo-login-04.svg > > During replay backend GFID link is broken and Geo-rep failed to cleanup. > Milind is working on the patch to fix the same. Patches are in review > and expected to be available in 3.7.8 release. > > http://review.gluster.org/#/c/13316/ > http://review.gluster.org/#/c/13189/ > > Following script can be used to find problematic file in each Brick backend. > https://gist.github.com/aravindavk/29f673f13c2f8963447e > > regards > Aravinda > > On 02/01/2016 08:45 PM, ML mail wrote: >> Sure, I will just send it to you through an encrypted cloud storage app and >> send you the password via private mail. >> >> Regards >> ML >> >> >> >> On Monday, February 1, 2016 3:14 PM, Saravanakumar Arumugam >> wrote: >> >> >> On 02/01/2016 07:22 PM, ML mail wrote: >>> I just found out I needed to run the getfattr on a mount and not on the >>> glusterfs server directly. So here are the additional output you asked for: >>> >>> >>> # getfattr -n glusterfs.gfid.string -m . logo-login-09.svg >>> # file: logo-login-09.svg >>> glusterfs.gfid.string="1c648409-e98b-4544-a7fa-c2aef87f92ad" >>> >>> # grep 1c648409-e98b-4544-a7fa-c2aef87f92ad >>> /data/myvolume/brick/.glusterfs/changelogs -rn >>> Binary file /data/myvolume/brick/.glusterfs/changelogs/CHANGELOG.1454278219 >>> matches >> Great! Can you share the CHANGELOG ? ( It contains various fops >> carried out on this gfid) >> >>> Regards >>> ML >>> >>> >>> >>> On Monday, February 1, 2016 1:30 PM, Saravanakumar Arumugam >>> wrote: >>> Hi, >>> >>> On 02/01/2016 02:14 PM, ML mail wrote: Hello, I just set up distributed geo-replication to a slave on my 2 nodes' replicated volume and noticed quite a few error messages (around 70 of them) in the slave's brick log file: The exact log file is: /var/log/glusterfs/bricks/data-myvolume-geo-brick.log [2016-01-31 22:19:29.524370] E [MSGID: 113020] [posix.c:1221:posix_mknod] 0-myvolume-geo-posix: setting gfid on /data/myvolume-geo/brick/data/username/files/shared/logo-login-09.svg.ocTransferId1789604916.part failed [2016-01-31 22:19:29.535478] W [MSGID: 113026] [posix.c:1338:posix_mkdir] 0-myvolume-geo-posix: mkdir (/data/username/files_encryption/keys/files/shared/logo-login-09.svg.ocTransferId1789604916.part): gfid (15bbcec6-a332-
Re: [Gluster-users] Setting gfid failed on slave geo-rep node
Hi, On 02/03/2016 08:09 PM, ML mail wrote: Dear Aravinda, Thank you for the analysis and submitting a patch for this issue. I hope it can make it into the next GlusterFS release 3.7.7. As suggested I ran the find_gfid_issues.py script on my brick on the two master nodes and slave nodes but the only output it shows to me is the following: You need to run the script only in Slave. NO GFID(DIR) : /data/myvolume-geo/brick/test NO GFID(DIR) : /data/myvolume-geo/brick/data NO GFID(DIR) : /data/myvolume-geo/brick/data/files_encryption NO GFID(DIR) : /data/myvolume-geo/brick/data/username As you can see there are no files at all. So I am still left with 394 files of 0 kBytes on my geo-rep slave node. Do you have any suggestion how to cleanup this mess? Do you mean to say script shows only 4 directories, but there are 394 files on slave node? Ok, as of now there is no automatic way of cleaning these files..and you need to manually remove them. You can follow these steps: 1. stop geo-replication session. 2. get list of all 0 kByte files and delete them. It is important to ensure there is no source file exists in master for those files. ( For example, logo-login-09.svg.ocTransferId1789604916.part is 0 kByte file, ensure no such source file exists in master Otherwise, you may end up delete files which are in sync progress) 3. Start geo-replication session. With the patch coming in, these errors should not be encountered in future. Thanks, Saravana Best regards ML On Tuesday, February 2, 2016 7:59 AM, Aravinda wrote: Hi ML, We analyzed the issue. Looks like Changelog is replayed may be because of Geo-rep worker crash or Active/Passive switch or both Geo-rep workers becoming active. From changelogs, CREATE logo-login-04.svg.part RENAME logo-login-04.svg.part logo-login-04.svg When it is replayed, CREATE logo-login-04.svg.part RENAME logo-login-04.svg.part logo-login-04.svg CREATE logo-login-04.svg.part RENAME logo-login-04.svg.part logo-login-04.svg During replay backend GFID link is broken and Geo-rep failed to cleanup. Milind is working on the patch to fix the same. Patches are in review and expected to be available in 3.7.8 release. http://review.gluster.org/#/c/13316/ http://review.gluster.org/#/c/13189/ Following script can be used to find problematic file in each Brick backend. https://gist.github.com/aravindavk/29f673f13c2f8963447e regards Aravinda On 02/01/2016 08:45 PM, ML mail wrote: Sure, I will just send it to you through an encrypted cloud storage app and send you the password via private mail. Regards ML On Monday, February 1, 2016 3:14 PM, Saravanakumar Arumugam wrote: On 02/01/2016 07:22 PM, ML mail wrote: I just found out I needed to run the getfattr on a mount and not on the glusterfs server directly. So here are the additional output you asked for: # getfattr -n glusterfs.gfid.string -m . logo-login-09.svg # file: logo-login-09.svg glusterfs.gfid.string="1c648409-e98b-4544-a7fa-c2aef87f92ad" # grep 1c648409-e98b-4544-a7fa-c2aef87f92ad /data/myvolume/brick/.glusterfs/changelogs -rn Binary file /data/myvolume/brick/.glusterfs/changelogs/CHANGELOG.1454278219 matches Great! Can you share the CHANGELOG ? ( It contains various fops carried out on this gfid) Regards ML On Monday, February 1, 2016 1:30 PM, Saravanakumar Arumugam wrote: Hi, On 02/01/2016 02:14 PM, ML mail wrote: Hello, I just set up distributed geo-replication to a slave on my 2 nodes' replicated volume and noticed quite a few error messages (around 70 of them) in the slave's brick log file: The exact log file is: /var/log/glusterfs/bricks/data-myvolume-geo-brick.log [2016-01-31 22:19:29.524370] E [MSGID: 113020] [posix.c:1221:posix_mknod] 0-myvolume-geo-posix: setting gfid on /data/myvolume-geo/brick/data/username/files/shared/logo-login-09.svg.ocTransferId1789604916.part failed [2016-01-31 22:19:29.535478] W [MSGID: 113026] [posix.c:1338:posix_mkdir] 0-myvolume-geo-posix: mkdir (/data/username/files_encryption/keys/files/shared/logo-login-09.svg.ocTransferId1789604916.part): gfid (15bbcec6-a332-4c21-81e4-c52472b1e13d) isalready associated with directory (/data/myvolume-geo/brick/.glusterfs/49/5d/495d6868-4844-4632-8ff9-ad9646a878fe/logo-login-09.svg). Hence,both directories will share same gfid and thiscan lead to inconsistencies. Can you grep for this gfid(of the corresponding files) in changelogs and share those files ? { For example: 1. Get gfid of the files like this: # getfattr -n glusterfs.gfid.string -m . /mnt/slave/file456 getfattr: Removing leading '/' from absolute path names # file: mnt/slave/file456 glusterfs.gfid.string="05b22446-de9e-42df-a63e-399c24d690c4" 2. grep for the corresponding gfid in brick back end like below: [root@gfvm3 changelogs]# grep 05b22446-de9e-42df-a63e-399c24d690c4 /opt/volume_test/tv_2/b1/.glusterfs/changelogs/ -rn Binary file /opt/volume_test/tv_2/b1/.glusterfs/changelogs/CHANGELO
Re: [Gluster-users] Setting gfid failed on slave geo-rep node
Dear Aravinda, Thank you for the analysis and submitting a patch for this issue. I hope it can make it into the next GlusterFS release 3.7.7. As suggested I ran the find_gfid_issues.py script on my brick on the two master nodes and slave nodes but the only output it shows to me is the following: NO GFID(DIR) : /data/myvolume-geo/brick/test NO GFID(DIR) : /data/myvolume-geo/brick/data NO GFID(DIR) : /data/myvolume-geo/brick/data/files_encryption NO GFID(DIR) : /data/myvolume-geo/brick/data/username As you can see there are no files at all. So I am still left with 394 files of 0 kBytes on my geo-rep slave node. Do you have any suggestion how to cleanup this mess? Best regards ML On Tuesday, February 2, 2016 7:59 AM, Aravinda wrote: Hi ML, We analyzed the issue. Looks like Changelog is replayed may be because of Geo-rep worker crash or Active/Passive switch or both Geo-rep workers becoming active. >From changelogs, CREATE logo-login-04.svg.part RENAME logo-login-04.svg.part logo-login-04.svg When it is replayed, CREATE logo-login-04.svg.part RENAME logo-login-04.svg.part logo-login-04.svg CREATE logo-login-04.svg.part RENAME logo-login-04.svg.part logo-login-04.svg During replay backend GFID link is broken and Geo-rep failed to cleanup. Milind is working on the patch to fix the same. Patches are in review and expected to be available in 3.7.8 release. http://review.gluster.org/#/c/13316/ http://review.gluster.org/#/c/13189/ Following script can be used to find problematic file in each Brick backend. https://gist.github.com/aravindavk/29f673f13c2f8963447e regards Aravinda On 02/01/2016 08:45 PM, ML mail wrote: > Sure, I will just send it to you through an encrypted cloud storage app and > send you the password via private mail. > > Regards > ML > > > > On Monday, February 1, 2016 3:14 PM, Saravanakumar Arumugam > wrote: > > > On 02/01/2016 07:22 PM, ML mail wrote: >> I just found out I needed to run the getfattr on a mount and not on the >> glusterfs server directly. So here are the additional output you asked for: >> >> >> # getfattr -n glusterfs.gfid.string -m . logo-login-09.svg >> # file: logo-login-09.svg >> glusterfs.gfid.string="1c648409-e98b-4544-a7fa-c2aef87f92ad" >> >> # grep 1c648409-e98b-4544-a7fa-c2aef87f92ad >> /data/myvolume/brick/.glusterfs/changelogs -rn >> Binary file /data/myvolume/brick/.glusterfs/changelogs/CHANGELOG.1454278219 >> matches > Great! Can you share the CHANGELOG ? ( It contains various fops > carried out on this gfid) > >> Regards >> ML >> >> >> >> On Monday, February 1, 2016 1:30 PM, Saravanakumar Arumugam >> wrote: >> Hi, >> >> On 02/01/2016 02:14 PM, ML mail wrote: >>> Hello, >>> >>> I just set up distributed geo-replication to a slave on my 2 nodes' >>> replicated volume and noticed quite a few error messages (around 70 of >>> them) in the slave's brick log file: >>> >>> The exact log file is: /var/log/glusterfs/bricks/data-myvolume-geo-brick.log >>> >>> [2016-01-31 22:19:29.524370] E [MSGID: 113020] [posix.c:1221:posix_mknod] >>> 0-myvolume-geo-posix: setting gfid on >>> /data/myvolume-geo/brick/data/username/files/shared/logo-login-09.svg.ocTransferId1789604916.part >>> failed >>> [2016-01-31 22:19:29.535478] W [MSGID: 113026] [posix.c:1338:posix_mkdir] >>> 0-myvolume-geo-posix: mkdir >>> (/data/username/files_encryption/keys/files/shared/logo-login-09.svg.ocTransferId1789604916.part): >>> gfid (15bbcec6-a332-4c21-81e4-c52472b1e13d) isalready associated with >>> directory >>> (/data/myvolume-geo/brick/.glusterfs/49/5d/495d6868-4844-4632-8ff9-ad9646a878fe/logo-login-09.svg). >>> Hence,both directories will share same gfid and thiscan lead to >>> inconsistencies. >> Can you grep for this gfid(of the corresponding files) in changelogs and >> share those files ? >> >> { >> For example: >> >> 1. Get gfid of the files like this: >> >> # getfattr -n glusterfs.gfid.string -m . /mnt/slave/file456 >> getfattr: Removing leading '/' from absolute path names >> # file: mnt/slave/file456 >> glusterfs.gfid.string="05b22446-de9e-42df-a63e-399c24d690c4" >> >> 2. grep for the corresponding gfid in brick back end like below: >> >> [root@gfvm3 changelogs]# grep 05b22446-de9e-42df-a63e-399c24d690c4 >> /opt/volume_test/tv_2/b1/.glusterfs/changelogs/ -rn >> Binary file >> /opt/volume_test/tv_2/b1/.glusterfs/changelogs/CHANGELOG.1454135265 matches >> Binary file >> /opt/volume_test/tv_2/b1/.glusterfs/changelogs/CHANGELOG.1454135476 matches >> >> } >> This will help in understanding what operations are carried out in >> master volume, which leads to this inconsistency. >> >> Also, get the following: >> gluster version >> gluster volume info >> gluster volume geo-replication status >> >>> This doesn't look good at all because the file mentioned in the error >>> message ( >>> logo-login-09.svg.ocTransferId1789604916.part) is left there with 0 kbytes >>> and does not get deleted or cleaned up by glusterfs, leaving my geo-
Re: [Gluster-users] Setting gfid failed on slave geo-rep node
Hi ML, We analyzed the issue. Looks like Changelog is replayed may be because of Geo-rep worker crash or Active/Passive switch or both Geo-rep workers becoming active. From changelogs, CREATE logo-login-04.svg.part RENAME logo-login-04.svg.part logo-login-04.svg When it is replayed, CREATE logo-login-04.svg.part RENAME logo-login-04.svg.part logo-login-04.svg CREATE logo-login-04.svg.part RENAME logo-login-04.svg.part logo-login-04.svg During replay backend GFID link is broken and Geo-rep failed to cleanup. Milind is working on the patch to fix the same. Patches are in review and expected to be available in 3.7.8 release. http://review.gluster.org/#/c/13316/ http://review.gluster.org/#/c/13189/ Following script can be used to find problematic file in each Brick backend. https://gist.github.com/aravindavk/29f673f13c2f8963447e regards Aravinda On 02/01/2016 08:45 PM, ML mail wrote: Sure, I will just send it to you through an encrypted cloud storage app and send you the password via private mail. Regards ML On Monday, February 1, 2016 3:14 PM, Saravanakumar Arumugam wrote: On 02/01/2016 07:22 PM, ML mail wrote: I just found out I needed to run the getfattr on a mount and not on the glusterfs server directly. So here are the additional output you asked for: # getfattr -n glusterfs.gfid.string -m . logo-login-09.svg # file: logo-login-09.svg glusterfs.gfid.string="1c648409-e98b-4544-a7fa-c2aef87f92ad" # grep 1c648409-e98b-4544-a7fa-c2aef87f92ad /data/myvolume/brick/.glusterfs/changelogs -rn Binary file /data/myvolume/brick/.glusterfs/changelogs/CHANGELOG.1454278219 matches Great! Can you share the CHANGELOG ? ( It contains various fops carried out on this gfid) Regards ML On Monday, February 1, 2016 1:30 PM, Saravanakumar Arumugam wrote: Hi, On 02/01/2016 02:14 PM, ML mail wrote: Hello, I just set up distributed geo-replication to a slave on my 2 nodes' replicated volume and noticed quite a few error messages (around 70 of them) in the slave's brick log file: The exact log file is: /var/log/glusterfs/bricks/data-myvolume-geo-brick.log [2016-01-31 22:19:29.524370] E [MSGID: 113020] [posix.c:1221:posix_mknod] 0-myvolume-geo-posix: setting gfid on /data/myvolume-geo/brick/data/username/files/shared/logo-login-09.svg.ocTransferId1789604916.part failed [2016-01-31 22:19:29.535478] W [MSGID: 113026] [posix.c:1338:posix_mkdir] 0-myvolume-geo-posix: mkdir (/data/username/files_encryption/keys/files/shared/logo-login-09.svg.ocTransferId1789604916.part): gfid (15bbcec6-a332-4c21-81e4-c52472b1e13d) isalready associated with directory (/data/myvolume-geo/brick/.glusterfs/49/5d/495d6868-4844-4632-8ff9-ad9646a878fe/logo-login-09.svg). Hence,both directories will share same gfid and thiscan lead to inconsistencies. Can you grep for this gfid(of the corresponding files) in changelogs and share those files ? { For example: 1. Get gfid of the files like this: # getfattr -n glusterfs.gfid.string -m . /mnt/slave/file456 getfattr: Removing leading '/' from absolute path names # file: mnt/slave/file456 glusterfs.gfid.string="05b22446-de9e-42df-a63e-399c24d690c4" 2. grep for the corresponding gfid in brick back end like below: [root@gfvm3 changelogs]# grep 05b22446-de9e-42df-a63e-399c24d690c4 /opt/volume_test/tv_2/b1/.glusterfs/changelogs/ -rn Binary file /opt/volume_test/tv_2/b1/.glusterfs/changelogs/CHANGELOG.1454135265 matches Binary file /opt/volume_test/tv_2/b1/.glusterfs/changelogs/CHANGELOG.1454135476 matches } This will help in understanding what operations are carried out in master volume, which leads to this inconsistency. Also, get the following: gluster version gluster volume info gluster volume geo-replication status This doesn't look good at all because the file mentioned in the error message ( logo-login-09.svg.ocTransferId1789604916.part) is left there with 0 kbytes and does not get deleted or cleaned up by glusterfs, leaving my geo-rep slave node in an inconsistent state which does not reflect the reality from the master nodes. The master nodes don't have that file anymore (which is correct). Here below is an "ls" of the concerned file with the correct file on top. -rw-r--r-- 2 www-data www-data 24312 Jan 6 2014 logo-login-09.svg -rw-r--r-- 1 root root 0 Jan 31 23:19 logo-login-09.svg.ocTransferId1789604916.part Rename issues in geo-replication are fixed lately. This looks similar to one. Thanks, Saravana ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Setting gfid failed on slave geo-rep node
Sure, I will just send it to you through an encrypted cloud storage app and send you the password via private mail. Regards ML On Monday, February 1, 2016 3:14 PM, Saravanakumar Arumugam wrote: On 02/01/2016 07:22 PM, ML mail wrote: > I just found out I needed to run the getfattr on a mount and not on the > glusterfs server directly. So here are the additional output you asked for: > > > # getfattr -n glusterfs.gfid.string -m . logo-login-09.svg > # file: logo-login-09.svg > glusterfs.gfid.string="1c648409-e98b-4544-a7fa-c2aef87f92ad" > > # grep 1c648409-e98b-4544-a7fa-c2aef87f92ad > /data/myvolume/brick/.glusterfs/changelogs -rn > Binary file /data/myvolume/brick/.glusterfs/changelogs/CHANGELOG.1454278219 > matches Great! Can you share the CHANGELOG ? ( It contains various fops carried out on this gfid) > Regards > ML > > > > On Monday, February 1, 2016 1:30 PM, Saravanakumar Arumugam > wrote: > Hi, > > On 02/01/2016 02:14 PM, ML mail wrote: >> Hello, >> >> I just set up distributed geo-replication to a slave on my 2 nodes' >> replicated volume and noticed quite a few error messages (around 70 of them) >> in the slave's brick log file: >> >> The exact log file is: /var/log/glusterfs/bricks/data-myvolume-geo-brick.log >> >> [2016-01-31 22:19:29.524370] E [MSGID: 113020] [posix.c:1221:posix_mknod] >> 0-myvolume-geo-posix: setting gfid on >> /data/myvolume-geo/brick/data/username/files/shared/logo-login-09.svg.ocTransferId1789604916.part >> failed >> [2016-01-31 22:19:29.535478] W [MSGID: 113026] [posix.c:1338:posix_mkdir] >> 0-myvolume-geo-posix: mkdir >> (/data/username/files_encryption/keys/files/shared/logo-login-09.svg.ocTransferId1789604916.part): >> gfid (15bbcec6-a332-4c21-81e4-c52472b1e13d) isalready associated with >> directory >> (/data/myvolume-geo/brick/.glusterfs/49/5d/495d6868-4844-4632-8ff9-ad9646a878fe/logo-login-09.svg). >> Hence,both directories will share same gfid and thiscan lead to >> inconsistencies. > Can you grep for this gfid(of the corresponding files) in changelogs and > share those files ? > > { > For example: > > 1. Get gfid of the files like this: > > # getfattr -n glusterfs.gfid.string -m . /mnt/slave/file456 > getfattr: Removing leading '/' from absolute path names > # file: mnt/slave/file456 > glusterfs.gfid.string="05b22446-de9e-42df-a63e-399c24d690c4" > > 2. grep for the corresponding gfid in brick back end like below: > > [root@gfvm3 changelogs]# grep 05b22446-de9e-42df-a63e-399c24d690c4 > /opt/volume_test/tv_2/b1/.glusterfs/changelogs/ -rn > Binary file > /opt/volume_test/tv_2/b1/.glusterfs/changelogs/CHANGELOG.1454135265 matches > Binary file > /opt/volume_test/tv_2/b1/.glusterfs/changelogs/CHANGELOG.1454135476 matches > > } > This will help in understanding what operations are carried out in > master volume, which leads to this inconsistency. > > Also, get the following: > gluster version > gluster volume info > gluster volume geo-replication status > >> This doesn't look good at all because the file mentioned in the error >> message ( >> logo-login-09.svg.ocTransferId1789604916.part) is left there with 0 kbytes >> and does not get deleted or cleaned up by glusterfs, leaving my geo-rep >> slave node in an inconsistent state which does not reflect the reality from >> the master nodes. The master nodes don't have that file anymore (which is >> correct). Here below is an "ls" of the concerned file with the correct file >> on top. >> >> >> -rw-r--r-- 2 www-data www-data 24312 Jan 6 2014 logo-login-09.svg >> -rw-r--r-- 1 root root 0 Jan 31 23:19 >> logo-login-09.svg.ocTransferId1789604916.part > Rename issues in geo-replication are fixed lately. This looks similar to > > one. > > Thanks, > Saravana > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Setting gfid failed on slave geo-rep node
On 02/01/2016 07:22 PM, ML mail wrote: I just found out I needed to run the getfattr on a mount and not on the glusterfs server directly. So here are the additional output you asked for: # getfattr -n glusterfs.gfid.string -m . logo-login-09.svg # file: logo-login-09.svg glusterfs.gfid.string="1c648409-e98b-4544-a7fa-c2aef87f92ad" # grep 1c648409-e98b-4544-a7fa-c2aef87f92ad /data/myvolume/brick/.glusterfs/changelogs -rn Binary file /data/myvolume/brick/.glusterfs/changelogs/CHANGELOG.1454278219 matches Great! Can you share the CHANGELOG ? ( It contains various fops carried out on this gfid) Regards ML On Monday, February 1, 2016 1:30 PM, Saravanakumar Arumugam wrote: Hi, On 02/01/2016 02:14 PM, ML mail wrote: Hello, I just set up distributed geo-replication to a slave on my 2 nodes' replicated volume and noticed quite a few error messages (around 70 of them) in the slave's brick log file: The exact log file is: /var/log/glusterfs/bricks/data-myvolume-geo-brick.log [2016-01-31 22:19:29.524370] E [MSGID: 113020] [posix.c:1221:posix_mknod] 0-myvolume-geo-posix: setting gfid on /data/myvolume-geo/brick/data/username/files/shared/logo-login-09.svg.ocTransferId1789604916.part failed [2016-01-31 22:19:29.535478] W [MSGID: 113026] [posix.c:1338:posix_mkdir] 0-myvolume-geo-posix: mkdir (/data/username/files_encryption/keys/files/shared/logo-login-09.svg.ocTransferId1789604916.part): gfid (15bbcec6-a332-4c21-81e4-c52472b1e13d) isalready associated with directory (/data/myvolume-geo/brick/.glusterfs/49/5d/495d6868-4844-4632-8ff9-ad9646a878fe/logo-login-09.svg). Hence,both directories will share same gfid and thiscan lead to inconsistencies. Can you grep for this gfid(of the corresponding files) in changelogs and share those files ? { For example: 1. Get gfid of the files like this: # getfattr -n glusterfs.gfid.string -m . /mnt/slave/file456 getfattr: Removing leading '/' from absolute path names # file: mnt/slave/file456 glusterfs.gfid.string="05b22446-de9e-42df-a63e-399c24d690c4" 2. grep for the corresponding gfid in brick back end like below: [root@gfvm3 changelogs]# grep 05b22446-de9e-42df-a63e-399c24d690c4 /opt/volume_test/tv_2/b1/.glusterfs/changelogs/ -rn Binary file /opt/volume_test/tv_2/b1/.glusterfs/changelogs/CHANGELOG.1454135265 matches Binary file /opt/volume_test/tv_2/b1/.glusterfs/changelogs/CHANGELOG.1454135476 matches } This will help in understanding what operations are carried out in master volume, which leads to this inconsistency. Also, get the following: gluster version gluster volume info gluster volume geo-replication status This doesn't look good at all because the file mentioned in the error message ( logo-login-09.svg.ocTransferId1789604916.part) is left there with 0 kbytes and does not get deleted or cleaned up by glusterfs, leaving my geo-rep slave node in an inconsistent state which does not reflect the reality from the master nodes. The master nodes don't have that file anymore (which is correct). Here below is an "ls" of the concerned file with the correct file on top. -rw-r--r-- 2 www-data www-data 24312 Jan 6 2014 logo-login-09.svg -rw-r--r-- 1 root root 0 Jan 31 23:19 logo-login-09.svg.ocTransferId1789604916.part Rename issues in geo-replication are fixed lately. This looks similar to one. Thanks, Saravana ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Setting gfid failed on slave geo-rep node
I just found out I needed to run the getfattr on a mount and not on the glusterfs server directly. So here are the additional output you asked for: # getfattr -n glusterfs.gfid.string -m . logo-login-09.svg # file: logo-login-09.svg glusterfs.gfid.string="1c648409-e98b-4544-a7fa-c2aef87f92ad" # grep 1c648409-e98b-4544-a7fa-c2aef87f92ad /data/myvolume/brick/.glusterfs/changelogs -rn Binary file /data/myvolume/brick/.glusterfs/changelogs/CHANGELOG.1454278219 matches Regards ML On Monday, February 1, 2016 1:30 PM, Saravanakumar Arumugam wrote: Hi, On 02/01/2016 02:14 PM, ML mail wrote: > Hello, > > I just set up distributed geo-replication to a slave on my 2 nodes' > replicated volume and noticed quite a few error messages (around 70 of them) > in the slave's brick log file: > > The exact log file is: /var/log/glusterfs/bricks/data-myvolume-geo-brick.log > > [2016-01-31 22:19:29.524370] E [MSGID: 113020] [posix.c:1221:posix_mknod] > 0-myvolume-geo-posix: setting gfid on > /data/myvolume-geo/brick/data/username/files/shared/logo-login-09.svg.ocTransferId1789604916.part > failed > [2016-01-31 22:19:29.535478] W [MSGID: 113026] [posix.c:1338:posix_mkdir] > 0-myvolume-geo-posix: mkdir > (/data/username/files_encryption/keys/files/shared/logo-login-09.svg.ocTransferId1789604916.part): > gfid (15bbcec6-a332-4c21-81e4-c52472b1e13d) isalready associated with > directory > (/data/myvolume-geo/brick/.glusterfs/49/5d/495d6868-4844-4632-8ff9-ad9646a878fe/logo-login-09.svg). > Hence,both directories will share same gfid and thiscan lead to > inconsistencies. Can you grep for this gfid(of the corresponding files) in changelogs and share those files ? { For example: 1. Get gfid of the files like this: # getfattr -n glusterfs.gfid.string -m . /mnt/slave/file456 getfattr: Removing leading '/' from absolute path names # file: mnt/slave/file456 glusterfs.gfid.string="05b22446-de9e-42df-a63e-399c24d690c4" 2. grep for the corresponding gfid in brick back end like below: [root@gfvm3 changelogs]# grep 05b22446-de9e-42df-a63e-399c24d690c4 /opt/volume_test/tv_2/b1/.glusterfs/changelogs/ -rn Binary file /opt/volume_test/tv_2/b1/.glusterfs/changelogs/CHANGELOG.1454135265 matches Binary file /opt/volume_test/tv_2/b1/.glusterfs/changelogs/CHANGELOG.1454135476 matches } This will help in understanding what operations are carried out in master volume, which leads to this inconsistency. Also, get the following: gluster version gluster volume info gluster volume geo-replication status > > This doesn't look good at all because the file mentioned in the error message > ( > logo-login-09.svg.ocTransferId1789604916.part) is left there with 0 kbytes > and does not get deleted or cleaned up by glusterfs, leaving my geo-rep slave > node in an inconsistent state which does not reflect the reality from the > master nodes. The master nodes don't have that file anymore (which is > correct). Here below is an "ls" of the concerned file with the correct file > on top. > > > -rw-r--r-- 2 www-data www-data 24312 Jan 6 2014 logo-login-09.svg > -rw-r--r-- 1 root root 0 Jan 31 23:19 > logo-login-09.svg.ocTransferId1789604916.part Rename issues in geo-replication are fixed lately. This looks similar to one. Thanks, Saravana ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Setting gfid failed on slave geo-rep node
Hi, Thank you for your answer, below is the ouput of the requested commands. There is just one issue with the GFID, as it does not seem to work. I am running the getfattr command on the master but if I run it on the slave node it also says operation not supported. # getfattr -n glusterfs.gfid.string -m . logo-login-09.svg logo-login-04.svg: glusterfs.gfid.string: Operation not supported # file logo-login-09.svg logo-login-04.svg: ASCII text, with very long lines, with no line terminators # gluster version 3.7.6 #gluster volume info Volume Name: myvolume Type: Replicate Volume ID: *REMOVED* Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: gfs1a.domain.tld:/data/myvolume/brick Brick2: gfs1b.domain.tld:/data/myvolume/brick Options Reconfigured: changelog.changelog: on geo-replication.ignore-pid-check: on geo-replication.indexing: on performance.readdir-ahead: on nfs.disable: on # gluster volume geo-replication status MASTER NODEMASTER VOL MASTER BRICK SLAVE USERSLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED gfs1a myvolume/data/myvolume/brickroot ssh://gfs1geo.domain.tld::myvolume-geogfs1geo.domain.tldActive Changelog Crawl2016-02-01 09:29:26 gfs1b myvolume/data/myvolume/brickroot ssh://gfs1geo.domain.tld::myvolume-geogfs1geo.domain.tldPassiveN/A N/A Regards ML On Monday, February 1, 2016 1:30 PM, Saravanakumar Arumugam wrote: Hi, On 02/01/2016 02:14 PM, ML mail wrote: > Hello, > > I just set up distributed geo-replication to a slave on my 2 nodes' > replicated volume and noticed quite a few error messages (around 70 of them) > in the slave's brick log file: > > The exact log file is: /var/log/glusterfs/bricks/data-myvolume-geo-brick.log > > [2016-01-31 22:19:29.524370] E [MSGID: 113020] [posix.c:1221:posix_mknod] > 0-myvolume-geo-posix: setting gfid on > /data/myvolume-geo/brick/data/username/files/shared/logo-login-09.svg.ocTransferId1789604916.part > failed > [2016-01-31 22:19:29.535478] W [MSGID: 113026] [posix.c:1338:posix_mkdir] > 0-myvolume-geo-posix: mkdir > (/data/username/files_encryption/keys/files/shared/logo-login-09.svg.ocTransferId1789604916.part): > gfid (15bbcec6-a332-4c21-81e4-c52472b1e13d) isalready associated with > directory > (/data/myvolume-geo/brick/.glusterfs/49/5d/495d6868-4844-4632-8ff9-ad9646a878fe/logo-login-09.svg). > Hence,both directories will share same gfid and thiscan lead to > inconsistencies. Can you grep for this gfid(of the corresponding files) in changelogs and share those files ? { For example: 1. Get gfid of the files like this: # getfattr -n glusterfs.gfid.string -m . /mnt/slave/file456 getfattr: Removing leading '/' from absolute path names # file: mnt/slave/file456 glusterfs.gfid.string="05b22446-de9e-42df-a63e-399c24d690c4" 2. grep for the corresponding gfid in brick back end like below: [root@gfvm3 changelogs]# grep 05b22446-de9e-42df-a63e-399c24d690c4 /opt/volume_test/tv_2/b1/.glusterfs/changelogs/ -rn Binary file /opt/volume_test/tv_2/b1/.glusterfs/changelogs/CHANGELOG.1454135265 matches Binary file /opt/volume_test/tv_2/b1/.glusterfs/changelogs/CHANGELOG.1454135476 matches } This will help in understanding what operations are carried out in master volume, which leads to this inconsistency. Also, get the following: gluster version gluster volume info gluster volume geo-replication status > > This doesn't look good at all because the file mentioned in the error message > ( > logo-login-09.svg.ocTransferId1789604916.part) is left there with 0 kbytes > and does not get deleted or cleaned up by glusterfs, leaving my geo-rep slave > node in an inconsistent state which does not reflect the reality from the > master nodes. The master nodes don't have that file anymore (which is > correct). Here below is an "ls" of the concerned file with the correct file > on top. > > > -rw-r--r-- 2 www-data www-data 24312 Jan 6 2014 logo-login-09.svg > -rw-r--r-- 1 root root 0 Jan 31 23:19 > logo-login-09.svg.ocTransferId1789604916.part Rename issues in geo-replication are fixed lately. This looks similar to one. Thanks, Saravana ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Setting gfid failed on slave geo-rep node
Hi, On 02/01/2016 02:14 PM, ML mail wrote: Hello, I just set up distributed geo-replication to a slave on my 2 nodes' replicated volume and noticed quite a few error messages (around 70 of them) in the slave's brick log file: The exact log file is: /var/log/glusterfs/bricks/data-myvolume-geo-brick.log [2016-01-31 22:19:29.524370] E [MSGID: 113020] [posix.c:1221:posix_mknod] 0-myvolume-geo-posix: setting gfid on /data/myvolume-geo/brick/data/username/files/shared/logo-login-09.svg.ocTransferId1789604916.part failed [2016-01-31 22:19:29.535478] W [MSGID: 113026] [posix.c:1338:posix_mkdir] 0-myvolume-geo-posix: mkdir (/data/username/files_encryption/keys/files/shared/logo-login-09.svg.ocTransferId1789604916.part): gfid (15bbcec6-a332-4c21-81e4-c52472b1e13d) isalready associated with directory (/data/myvolume-geo/brick/.glusterfs/49/5d/495d6868-4844-4632-8ff9-ad9646a878fe/logo-login-09.svg). Hence,both directories will share same gfid and thiscan lead to inconsistencies. Can you grep for this gfid(of the corresponding files) in changelogs and share those files ? { For example: 1. Get gfid of the files like this: # getfattr -n glusterfs.gfid.string -m . /mnt/slave/file456 getfattr: Removing leading '/' from absolute path names # file: mnt/slave/file456 glusterfs.gfid.string="05b22446-de9e-42df-a63e-399c24d690c4" 2. grep for the corresponding gfid in brick back end like below: [root@gfvm3 changelogs]# grep 05b22446-de9e-42df-a63e-399c24d690c4 /opt/volume_test/tv_2/b1/.glusterfs/changelogs/ -rn Binary file /opt/volume_test/tv_2/b1/.glusterfs/changelogs/CHANGELOG.1454135265 matches Binary file /opt/volume_test/tv_2/b1/.glusterfs/changelogs/CHANGELOG.1454135476 matches } This will help in understanding what operations are carried out in master volume, which leads to this inconsistency. Also, get the following: gluster version gluster volume info gluster volume geo-replication status This doesn't look good at all because the file mentioned in the error message ( logo-login-09.svg.ocTransferId1789604916.part) is left there with 0 kbytes and does not get deleted or cleaned up by glusterfs, leaving my geo-rep slave node in an inconsistent state which does not reflect the reality from the master nodes. The master nodes don't have that file anymore (which is correct). Here below is an "ls" of the concerned file with the correct file on top. -rw-r--r-- 2 www-data www-data 24312 Jan 6 2014 logo-login-09.svg -rw-r--r-- 1 root root 0 Jan 31 23:19 logo-login-09.svg.ocTransferId1789604916.part Rename issues in geo-replication are fixed lately. This looks similar to one. Thanks, Saravana ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Setting gfid failed on slave geo-rep node
Hello, I just set up distributed geo-replication to a slave on my 2 nodes' replicated volume and noticed quite a few error messages (around 70 of them) in the slave's brick log file: The exact log file is: /var/log/glusterfs/bricks/data-myvolume-geo-brick.log [2016-01-31 22:19:29.524370] E [MSGID: 113020] [posix.c:1221:posix_mknod] 0-myvolume-geo-posix: setting gfid on /data/myvolume-geo/brick/data/username/files/shared/logo-login-09.svg.ocTransferId1789604916.part failed [2016-01-31 22:19:29.535478] W [MSGID: 113026] [posix.c:1338:posix_mkdir] 0-myvolume-geo-posix: mkdir (/data/username/files_encryption/keys/files/shared/logo-login-09.svg.ocTransferId1789604916.part): gfid (15bbcec6-a332-4c21-81e4-c52472b1e13d) isalready associated with directory (/data/myvolume-geo/brick/.glusterfs/49/5d/495d6868-4844-4632-8ff9-ad9646a878fe/logo-login-09.svg). Hence,both directories will share same gfid and thiscan lead to inconsistencies. This doesn't look good at all because the file mentioned in the error message ( logo-login-09.svg.ocTransferId1789604916.part) is left there with 0 kbytes and does not get deleted or cleaned up by glusterfs, leaving my geo-rep slave node in an inconsistent state which does not reflect the reality from the master nodes. The master nodes don't have that file anymore (which is correct). Here below is an "ls" of the concerned file with the correct file on top. -rw-r--r-- 2 www-data www-data 24312 Jan 6 2014 logo-login-09.svg -rw-r--r-- 1 root root 0 Jan 31 23:19 logo-login-09.svg.ocTransferId1789604916.part So at least I have the correct file (first file in the list) but gluster leaves this second "temporary" or "transient" file although it should delete it. Any ideas? Regards ML ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users