Re: [Gluster-users] Setting gfid failed on slave geo-rep node

2016-02-04 Thread ML mail
That's correct, I had in total 394 files and directories which where not 
existant on any of my two master nodes bricks. So as you suggested I have now 
stopped the geo-rep and deleted the concerned files and directories on the 
slave node and restarted the geo-rep. It's all clean again but I will not use 
it anymore in production until the patch is out in order to fix this.


Thanks again for your help and I am looking forward for next release including 
that patch.



Regards
ML


On Thursday, February 4, 2016 11:14 AM, Saravanakumar Arumugam 
 wrote:
Hi,

On 02/03/2016 08:09 PM, ML mail wrote:
> Dear Aravinda,
>
> Thank you for the analysis and submitting a patch for this issue. I hope it 
> can make it into the next GlusterFS release 3.7.7.
>
>
> As suggested I ran the find_gfid_issues.py script on my brick on the two 
> master nodes and slave nodes but the only output it shows to me is the 
> following:
You need to run the script only in Slave.
>
> NO GFID(DIR) : /data/myvolume-geo/brick/test
> NO GFID(DIR) : /data/myvolume-geo/brick/data
> NO GFID(DIR) : /data/myvolume-geo/brick/data/files_encryption
> NO GFID(DIR) : /data/myvolume-geo/brick/data/username
>
>
> As you can see there are no files at all. So I am still left with 394 files 
> of 0 kBytes on my geo-rep slave node. Do you have any suggestion how to 
> cleanup this mess?
Do you mean to say script shows only 4 directories, but there are 394 
files on slave node?

Ok, as of now there is no automatic way of cleaning these files..and you 
need to manually remove them.

You can follow these steps:

1. stop geo-replication session.

2. get list of all 0 kByte files and delete them.
 It is important to ensure there is no source file exists in master 
for those files.
( For example,  logo-login-09.svg.ocTransferId1789604916.part is 0 kByte 
file, ensure no such source file exists in master
   Otherwise, you may end up delete files which are in sync progress)

3. Start geo-replication session.

With the patch coming in, these errors should not be encountered in future.

Thanks,
Saravana


>
> Best regards
> ML
>
>
>
> On Tuesday, February 2, 2016 7:59 AM, Aravinda  wrote:
> Hi ML,
>
> We analyzed the issue. Looks like Changelog is replayed may be because
> of Geo-rep worker crash or Active/Passive switch or both Geo-rep workers
> becoming active.
>
>  From changelogs,
>
> CREATE  logo-login-04.svg.part
> RENAME logo-login-04.svg.part logo-login-04.svg
>
> When it is replayed,
> CREATE  logo-login-04.svg.part
> RENAME logo-login-04.svg.part logo-login-04.svg
> CREATE  logo-login-04.svg.part
> RENAME logo-login-04.svg.part logo-login-04.svg
>
> During replay backend GFID link is broken and Geo-rep failed to cleanup.
> Milind is working on the patch to fix the same. Patches are in review
> and expected to be available in 3.7.8 release.
>
> http://review.gluster.org/#/c/13316/
> http://review.gluster.org/#/c/13189/
>
> Following script can be used to find problematic file in each Brick backend.
> https://gist.github.com/aravindavk/29f673f13c2f8963447e
>
> regards
> Aravinda
>
> On 02/01/2016 08:45 PM, ML mail wrote:
>> Sure, I will just send it to you through an encrypted cloud storage app and 
>> send you the password via private mail.
>>
>> Regards
>> ML
>>
>>
>>
>> On Monday, February 1, 2016 3:14 PM, Saravanakumar Arumugam 
>>  wrote:
>>
>>
>> On 02/01/2016 07:22 PM, ML mail wrote:
>>> I just found out I needed to run the getfattr on a mount and not on the 
>>> glusterfs server directly. So here are the additional output you asked for:
>>>
>>>
>>> # getfattr -n glusterfs.gfid.string  -m .  logo-login-09.svg
>>> # file: logo-login-09.svg
>>> glusterfs.gfid.string="1c648409-e98b-4544-a7fa-c2aef87f92ad"
>>>
>>> # grep 1c648409-e98b-4544-a7fa-c2aef87f92ad 
>>> /data/myvolume/brick/.glusterfs/changelogs -rn
>>> Binary file /data/myvolume/brick/.glusterfs/changelogs/CHANGELOG.1454278219 
>>> matches
>> Great!  Can you share the CHANGELOG ?  ( It contains various fops
>> carried out on this gfid)
>>
>>> Regards
>>> ML
>>>
>>>
>>>
>>> On Monday, February 1, 2016 1:30 PM, Saravanakumar Arumugam 
>>>  wrote:
>>> Hi,
>>>
>>> On 02/01/2016 02:14 PM, ML mail wrote:
 Hello,

 I just set up distributed geo-replication to a slave on my 2 nodes' 
 replicated volume and noticed quite a few error messages (around 70 of 
 them) in the slave's brick log file:

 The exact log file is: 
 /var/log/glusterfs/bricks/data-myvolume-geo-brick.log

 [2016-01-31 22:19:29.524370] E [MSGID: 113020] [posix.c:1221:posix_mknod] 
 0-myvolume-geo-posix: setting gfid on 
 /data/myvolume-geo/brick/data/username/files/shared/logo-login-09.svg.ocTransferId1789604916.part
  failed
 [2016-01-31 22:19:29.535478] W [MSGID: 113026] [posix.c:1338:posix_mkdir] 
 0-myvolume-geo-posix: mkdir 
 (/data/username/files_encryption/keys/files/shared/logo-login-09.svg.ocTransferId1789604916.part):
  gfid (15bbcec6-a332-

Re: [Gluster-users] Setting gfid failed on slave geo-rep node

2016-02-04 Thread Saravanakumar Arumugam

Hi,

On 02/03/2016 08:09 PM, ML mail wrote:

Dear Aravinda,

Thank you for the analysis and submitting a patch for this issue. I hope it can 
make it into the next GlusterFS release 3.7.7.


As suggested I ran the find_gfid_issues.py script on my brick on the two master 
nodes and slave nodes but the only output it shows to me is the following:

You need to run the script only in Slave.


NO GFID(DIR) : /data/myvolume-geo/brick/test
NO GFID(DIR) : /data/myvolume-geo/brick/data
NO GFID(DIR) : /data/myvolume-geo/brick/data/files_encryption
NO GFID(DIR) : /data/myvolume-geo/brick/data/username


As you can see there are no files at all. So I am still left with 394 files of 
0 kBytes on my geo-rep slave node. Do you have any suggestion how to cleanup 
this mess?
Do you mean to say script shows only 4 directories, but there are 394 
files on slave node?


Ok, as of now there is no automatic way of cleaning these files..and you 
need to manually remove them.


You can follow these steps:

1. stop geo-replication session.

2. get list of all 0 kByte files and delete them.
It is important to ensure there is no source file exists in master 
for those files.
( For example,  logo-login-09.svg.ocTransferId1789604916.part is 0 kByte 
file, ensure no such source file exists in master

  Otherwise, you may end up delete files which are in sync progress)

3. Start geo-replication session.

With the patch coming in, these errors should not be encountered in future.

Thanks,
Saravana



Best regards
ML



On Tuesday, February 2, 2016 7:59 AM, Aravinda  wrote:
Hi ML,

We analyzed the issue. Looks like Changelog is replayed may be because
of Geo-rep worker crash or Active/Passive switch or both Geo-rep workers
becoming active.

 From changelogs,

CREATE  logo-login-04.svg.part
RENAME logo-login-04.svg.part logo-login-04.svg

When it is replayed,
CREATE  logo-login-04.svg.part
RENAME logo-login-04.svg.part logo-login-04.svg
CREATE  logo-login-04.svg.part
RENAME logo-login-04.svg.part logo-login-04.svg

During replay backend GFID link is broken and Geo-rep failed to cleanup.
Milind is working on the patch to fix the same. Patches are in review
and expected to be available in 3.7.8 release.

http://review.gluster.org/#/c/13316/
http://review.gluster.org/#/c/13189/

Following script can be used to find problematic file in each Brick backend.
https://gist.github.com/aravindavk/29f673f13c2f8963447e

regards
Aravinda

On 02/01/2016 08:45 PM, ML mail wrote:

Sure, I will just send it to you through an encrypted cloud storage app and 
send you the password via private mail.

Regards
ML



On Monday, February 1, 2016 3:14 PM, Saravanakumar Arumugam 
 wrote:


On 02/01/2016 07:22 PM, ML mail wrote:

I just found out I needed to run the getfattr on a mount and not on the 
glusterfs server directly. So here are the additional output you asked for:


# getfattr -n glusterfs.gfid.string  -m .  logo-login-09.svg
# file: logo-login-09.svg
glusterfs.gfid.string="1c648409-e98b-4544-a7fa-c2aef87f92ad"

# grep 1c648409-e98b-4544-a7fa-c2aef87f92ad 
/data/myvolume/brick/.glusterfs/changelogs -rn
Binary file /data/myvolume/brick/.glusterfs/changelogs/CHANGELOG.1454278219 
matches

Great!  Can you share the CHANGELOG ?  ( It contains various fops
carried out on this gfid)


Regards
ML



On Monday, February 1, 2016 1:30 PM, Saravanakumar Arumugam 
 wrote:
Hi,

On 02/01/2016 02:14 PM, ML mail wrote:

Hello,

I just set up distributed geo-replication to a slave on my 2 nodes' replicated 
volume and noticed quite a few error messages (around 70 of them) in the 
slave's brick log file:

The exact log file is: /var/log/glusterfs/bricks/data-myvolume-geo-brick.log

[2016-01-31 22:19:29.524370] E [MSGID: 113020] [posix.c:1221:posix_mknod] 
0-myvolume-geo-posix: setting gfid on 
/data/myvolume-geo/brick/data/username/files/shared/logo-login-09.svg.ocTransferId1789604916.part
 failed
[2016-01-31 22:19:29.535478] W [MSGID: 113026] [posix.c:1338:posix_mkdir] 
0-myvolume-geo-posix: mkdir 
(/data/username/files_encryption/keys/files/shared/logo-login-09.svg.ocTransferId1789604916.part):
 gfid (15bbcec6-a332-4c21-81e4-c52472b1e13d) isalready associated with 
directory 
(/data/myvolume-geo/brick/.glusterfs/49/5d/495d6868-4844-4632-8ff9-ad9646a878fe/logo-login-09.svg).
 Hence,both directories will share same gfid and thiscan lead to 
inconsistencies.

Can you grep for this gfid(of the corresponding files) in changelogs and
share those files ?

{
For example:

1. Get gfid of the files like this:

# getfattr -n glusterfs.gfid.string  -m .  /mnt/slave/file456
getfattr: Removing leading '/' from absolute path names
# file: mnt/slave/file456
glusterfs.gfid.string="05b22446-de9e-42df-a63e-399c24d690c4"

2. grep for the corresponding gfid in brick back end like below:

[root@gfvm3 changelogs]# grep 05b22446-de9e-42df-a63e-399c24d690c4
/opt/volume_test/tv_2/b1/.glusterfs/changelogs/ -rn
Binary file
/opt/volume_test/tv_2/b1/.glusterfs/changelogs/CHANGELO

Re: [Gluster-users] Setting gfid failed on slave geo-rep node

2016-02-03 Thread ML mail
Dear Aravinda,

Thank you for the analysis and submitting a patch for this issue. I hope it can 
make it into the next GlusterFS release 3.7.7. 


As suggested I ran the find_gfid_issues.py script on my brick on the two master 
nodes and slave nodes but the only output it shows to me is the following:


NO GFID(DIR) : /data/myvolume-geo/brick/test
NO GFID(DIR) : /data/myvolume-geo/brick/data
NO GFID(DIR) : /data/myvolume-geo/brick/data/files_encryption
NO GFID(DIR) : /data/myvolume-geo/brick/data/username


As you can see there are no files at all. So I am still left with 394 files of 
0 kBytes on my geo-rep slave node. Do you have any suggestion how to cleanup 
this mess?


Best regards
ML



On Tuesday, February 2, 2016 7:59 AM, Aravinda  wrote:
Hi ML,

We analyzed the issue. Looks like Changelog is replayed may be because 
of Geo-rep worker crash or Active/Passive switch or both Geo-rep workers 
becoming active.

>From changelogs,

CREATE  logo-login-04.svg.part
RENAME logo-login-04.svg.part logo-login-04.svg

When it is replayed,
CREATE  logo-login-04.svg.part
RENAME logo-login-04.svg.part logo-login-04.svg
CREATE  logo-login-04.svg.part
RENAME logo-login-04.svg.part logo-login-04.svg

During replay backend GFID link is broken and Geo-rep failed to cleanup. 
Milind is working on the patch to fix the same. Patches are in review 
and expected to be available in 3.7.8 release.

http://review.gluster.org/#/c/13316/
http://review.gluster.org/#/c/13189/

Following script can be used to find problematic file in each Brick backend.
https://gist.github.com/aravindavk/29f673f13c2f8963447e

regards
Aravinda

On 02/01/2016 08:45 PM, ML mail wrote:
> Sure, I will just send it to you through an encrypted cloud storage app and 
> send you the password via private mail.
>
> Regards
> ML
>
>
>
> On Monday, February 1, 2016 3:14 PM, Saravanakumar Arumugam 
>  wrote:
>
>
> On 02/01/2016 07:22 PM, ML mail wrote:
>> I just found out I needed to run the getfattr on a mount and not on the 
>> glusterfs server directly. So here are the additional output you asked for:
>>
>>
>> # getfattr -n glusterfs.gfid.string  -m .  logo-login-09.svg
>> # file: logo-login-09.svg
>> glusterfs.gfid.string="1c648409-e98b-4544-a7fa-c2aef87f92ad"
>>
>> # grep 1c648409-e98b-4544-a7fa-c2aef87f92ad 
>> /data/myvolume/brick/.glusterfs/changelogs -rn
>> Binary file /data/myvolume/brick/.glusterfs/changelogs/CHANGELOG.1454278219 
>> matches
> Great!  Can you share the CHANGELOG ?  ( It contains various fops
> carried out on this gfid)
>
>> Regards
>> ML
>>
>>
>>
>> On Monday, February 1, 2016 1:30 PM, Saravanakumar Arumugam 
>>  wrote:
>> Hi,
>>
>> On 02/01/2016 02:14 PM, ML mail wrote:
>>> Hello,
>>>
>>> I just set up distributed geo-replication to a slave on my 2 nodes' 
>>> replicated volume and noticed quite a few error messages (around 70 of 
>>> them) in the slave's brick log file:
>>>
>>> The exact log file is: /var/log/glusterfs/bricks/data-myvolume-geo-brick.log
>>>
>>> [2016-01-31 22:19:29.524370] E [MSGID: 113020] [posix.c:1221:posix_mknod] 
>>> 0-myvolume-geo-posix: setting gfid on 
>>> /data/myvolume-geo/brick/data/username/files/shared/logo-login-09.svg.ocTransferId1789604916.part
>>>  failed
>>> [2016-01-31 22:19:29.535478] W [MSGID: 113026] [posix.c:1338:posix_mkdir] 
>>> 0-myvolume-geo-posix: mkdir 
>>> (/data/username/files_encryption/keys/files/shared/logo-login-09.svg.ocTransferId1789604916.part):
>>>  gfid (15bbcec6-a332-4c21-81e4-c52472b1e13d) isalready associated with 
>>> directory 
>>> (/data/myvolume-geo/brick/.glusterfs/49/5d/495d6868-4844-4632-8ff9-ad9646a878fe/logo-login-09.svg).
>>>  Hence,both directories will share same gfid and thiscan lead to 
>>> inconsistencies.
>> Can you grep for this gfid(of the corresponding files) in changelogs and
>> share those files ?
>>
>> {
>> For example:
>>
>> 1. Get gfid of the files like this:
>>
>> # getfattr -n glusterfs.gfid.string  -m .  /mnt/slave/file456
>> getfattr: Removing leading '/' from absolute path names
>> # file: mnt/slave/file456
>> glusterfs.gfid.string="05b22446-de9e-42df-a63e-399c24d690c4"
>>
>> 2. grep for the corresponding gfid in brick back end like below:
>>
>> [root@gfvm3 changelogs]# grep 05b22446-de9e-42df-a63e-399c24d690c4
>> /opt/volume_test/tv_2/b1/.glusterfs/changelogs/ -rn
>> Binary file
>> /opt/volume_test/tv_2/b1/.glusterfs/changelogs/CHANGELOG.1454135265 matches
>> Binary file
>> /opt/volume_test/tv_2/b1/.glusterfs/changelogs/CHANGELOG.1454135476 matches
>>
>> }
>> This will help in understanding what operations are carried out in
>> master volume, which leads to this inconsistency.
>>
>> Also, get the following:
>> gluster version
>> gluster volume info
>> gluster volume geo-replication status
>>
>>> This doesn't look good at all because the file mentioned in the error 
>>> message (
>>> logo-login-09.svg.ocTransferId1789604916.part) is left there with 0 kbytes 
>>> and does not get deleted or cleaned up by glusterfs, leaving my geo-

Re: [Gluster-users] Setting gfid failed on slave geo-rep node

2016-02-01 Thread Aravinda

Hi ML,

We analyzed the issue. Looks like Changelog is replayed may be because 
of Geo-rep worker crash or Active/Passive switch or both Geo-rep workers 
becoming active.


From changelogs,

CREATE  logo-login-04.svg.part
RENAME logo-login-04.svg.part logo-login-04.svg

When it is replayed,
CREATE  logo-login-04.svg.part
RENAME logo-login-04.svg.part logo-login-04.svg
CREATE  logo-login-04.svg.part
RENAME logo-login-04.svg.part logo-login-04.svg

During replay backend GFID link is broken and Geo-rep failed to cleanup. 
Milind is working on the patch to fix the same. Patches are in review 
and expected to be available in 3.7.8 release.


http://review.gluster.org/#/c/13316/
http://review.gluster.org/#/c/13189/

Following script can be used to find problematic file in each Brick backend.
https://gist.github.com/aravindavk/29f673f13c2f8963447e

regards
Aravinda

On 02/01/2016 08:45 PM, ML mail wrote:

Sure, I will just send it to you through an encrypted cloud storage app and 
send you the password via private mail.

Regards
ML



On Monday, February 1, 2016 3:14 PM, Saravanakumar Arumugam 
 wrote:


On 02/01/2016 07:22 PM, ML mail wrote:

I just found out I needed to run the getfattr on a mount and not on the 
glusterfs server directly. So here are the additional output you asked for:


# getfattr -n glusterfs.gfid.string  -m .  logo-login-09.svg
# file: logo-login-09.svg
glusterfs.gfid.string="1c648409-e98b-4544-a7fa-c2aef87f92ad"

# grep 1c648409-e98b-4544-a7fa-c2aef87f92ad 
/data/myvolume/brick/.glusterfs/changelogs -rn
Binary file /data/myvolume/brick/.glusterfs/changelogs/CHANGELOG.1454278219 
matches

Great!  Can you share the CHANGELOG ?  ( It contains various fops
carried out on this gfid)


Regards
ML



On Monday, February 1, 2016 1:30 PM, Saravanakumar Arumugam 
 wrote:
Hi,

On 02/01/2016 02:14 PM, ML mail wrote:

Hello,

I just set up distributed geo-replication to a slave on my 2 nodes' replicated 
volume and noticed quite a few error messages (around 70 of them) in the 
slave's brick log file:

The exact log file is: /var/log/glusterfs/bricks/data-myvolume-geo-brick.log

[2016-01-31 22:19:29.524370] E [MSGID: 113020] [posix.c:1221:posix_mknod] 
0-myvolume-geo-posix: setting gfid on 
/data/myvolume-geo/brick/data/username/files/shared/logo-login-09.svg.ocTransferId1789604916.part
 failed
[2016-01-31 22:19:29.535478] W [MSGID: 113026] [posix.c:1338:posix_mkdir] 
0-myvolume-geo-posix: mkdir 
(/data/username/files_encryption/keys/files/shared/logo-login-09.svg.ocTransferId1789604916.part):
 gfid (15bbcec6-a332-4c21-81e4-c52472b1e13d) isalready associated with 
directory 
(/data/myvolume-geo/brick/.glusterfs/49/5d/495d6868-4844-4632-8ff9-ad9646a878fe/logo-login-09.svg).
 Hence,both directories will share same gfid and thiscan lead to 
inconsistencies.

Can you grep for this gfid(of the corresponding files) in changelogs and
share those files ?

{
For example:

1. Get gfid of the files like this:

# getfattr -n glusterfs.gfid.string  -m .  /mnt/slave/file456
getfattr: Removing leading '/' from absolute path names
# file: mnt/slave/file456
glusterfs.gfid.string="05b22446-de9e-42df-a63e-399c24d690c4"

2. grep for the corresponding gfid in brick back end like below:

[root@gfvm3 changelogs]# grep 05b22446-de9e-42df-a63e-399c24d690c4
/opt/volume_test/tv_2/b1/.glusterfs/changelogs/ -rn
Binary file
/opt/volume_test/tv_2/b1/.glusterfs/changelogs/CHANGELOG.1454135265 matches
Binary file
/opt/volume_test/tv_2/b1/.glusterfs/changelogs/CHANGELOG.1454135476 matches

}
This will help in understanding what operations are carried out in
master volume, which leads to this inconsistency.

Also, get the following:
gluster version
gluster volume info
gluster volume geo-replication status


This doesn't look good at all because the file mentioned in the error message (
logo-login-09.svg.ocTransferId1789604916.part) is left there with 0 kbytes and does not 
get deleted or cleaned up by glusterfs, leaving my geo-rep slave node in an inconsistent 
state which does not reflect the reality from the master nodes. The master nodes don't 
have that file anymore (which is correct). Here below is an "ls" of the 
concerned file with the correct file on top.


-rw-r--r-- 2 www-data www-data   24312 Jan  6  2014 logo-login-09.svg
-rw-r--r-- 1 root root   0 Jan 31 23:19 
logo-login-09.svg.ocTransferId1789604916.part

Rename issues in geo-replication are fixed lately. This looks similar to

one.

Thanks,
Saravana
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Setting gfid failed on slave geo-rep node

2016-02-01 Thread ML mail
Sure, I will just send it to you through an encrypted cloud storage app and 
send you the password via private mail.

Regards
ML



On Monday, February 1, 2016 3:14 PM, Saravanakumar Arumugam 
 wrote:


On 02/01/2016 07:22 PM, ML mail wrote:
> I just found out I needed to run the getfattr on a mount and not on the 
> glusterfs server directly. So here are the additional output you asked for:
>
>
> # getfattr -n glusterfs.gfid.string  -m .  logo-login-09.svg
> # file: logo-login-09.svg
> glusterfs.gfid.string="1c648409-e98b-4544-a7fa-c2aef87f92ad"
>
> # grep 1c648409-e98b-4544-a7fa-c2aef87f92ad 
> /data/myvolume/brick/.glusterfs/changelogs -rn
> Binary file /data/myvolume/brick/.glusterfs/changelogs/CHANGELOG.1454278219 
> matches
Great!  Can you share the CHANGELOG ?  ( It contains various fops 
carried out on this gfid)

> Regards
> ML
>
>
>
> On Monday, February 1, 2016 1:30 PM, Saravanakumar Arumugam 
>  wrote:
> Hi,
>
> On 02/01/2016 02:14 PM, ML mail wrote:
>> Hello,
>>
>> I just set up distributed geo-replication to a slave on my 2 nodes' 
>> replicated volume and noticed quite a few error messages (around 70 of them) 
>> in the slave's brick log file:
>>
>> The exact log file is: /var/log/glusterfs/bricks/data-myvolume-geo-brick.log
>>
>> [2016-01-31 22:19:29.524370] E [MSGID: 113020] [posix.c:1221:posix_mknod] 
>> 0-myvolume-geo-posix: setting gfid on 
>> /data/myvolume-geo/brick/data/username/files/shared/logo-login-09.svg.ocTransferId1789604916.part
>>  failed
>> [2016-01-31 22:19:29.535478] W [MSGID: 113026] [posix.c:1338:posix_mkdir] 
>> 0-myvolume-geo-posix: mkdir 
>> (/data/username/files_encryption/keys/files/shared/logo-login-09.svg.ocTransferId1789604916.part):
>>  gfid (15bbcec6-a332-4c21-81e4-c52472b1e13d) isalready associated with 
>> directory 
>> (/data/myvolume-geo/brick/.glusterfs/49/5d/495d6868-4844-4632-8ff9-ad9646a878fe/logo-login-09.svg).
>>  Hence,both directories will share same gfid and thiscan lead to 
>> inconsistencies.
> Can you grep for this gfid(of the corresponding files) in changelogs and
> share those files ?
>
> {
> For example:
>
> 1. Get gfid of the files like this:
>
> # getfattr -n glusterfs.gfid.string  -m .  /mnt/slave/file456
> getfattr: Removing leading '/' from absolute path names
> # file: mnt/slave/file456
> glusterfs.gfid.string="05b22446-de9e-42df-a63e-399c24d690c4"
>
> 2. grep for the corresponding gfid in brick back end like below:
>
> [root@gfvm3 changelogs]# grep 05b22446-de9e-42df-a63e-399c24d690c4
> /opt/volume_test/tv_2/b1/.glusterfs/changelogs/ -rn
> Binary file
> /opt/volume_test/tv_2/b1/.glusterfs/changelogs/CHANGELOG.1454135265 matches
> Binary file
> /opt/volume_test/tv_2/b1/.glusterfs/changelogs/CHANGELOG.1454135476 matches
>
> }
> This will help in understanding what operations are carried out in
> master volume, which leads to this inconsistency.
>
> Also, get the following:
> gluster version
> gluster volume info
> gluster volume geo-replication status
>
>> This doesn't look good at all because the file mentioned in the error 
>> message (
>> logo-login-09.svg.ocTransferId1789604916.part) is left there with 0 kbytes 
>> and does not get deleted or cleaned up by glusterfs, leaving my geo-rep 
>> slave node in an inconsistent state which does not reflect the reality from 
>> the master nodes. The master nodes don't have that file anymore (which is 
>> correct). Here below is an "ls" of the concerned file with the correct file 
>> on top.
>>
>>
>> -rw-r--r-- 2 www-data www-data   24312 Jan  6  2014 logo-login-09.svg
>> -rw-r--r-- 1 root root   0 Jan 31 23:19 
>> logo-login-09.svg.ocTransferId1789604916.part
> Rename issues in geo-replication are fixed lately. This looks similar to
>
> one.
>
> Thanks,
> Saravana
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Setting gfid failed on slave geo-rep node

2016-02-01 Thread Saravanakumar Arumugam



On 02/01/2016 07:22 PM, ML mail wrote:

I just found out I needed to run the getfattr on a mount and not on the 
glusterfs server directly. So here are the additional output you asked for:


# getfattr -n glusterfs.gfid.string  -m .  logo-login-09.svg
# file: logo-login-09.svg
glusterfs.gfid.string="1c648409-e98b-4544-a7fa-c2aef87f92ad"

# grep 1c648409-e98b-4544-a7fa-c2aef87f92ad 
/data/myvolume/brick/.glusterfs/changelogs -rn
Binary file /data/myvolume/brick/.glusterfs/changelogs/CHANGELOG.1454278219 
matches
Great!  Can you share the CHANGELOG ?  ( It contains various fops 
carried out on this gfid)

Regards
ML



On Monday, February 1, 2016 1:30 PM, Saravanakumar Arumugam 
 wrote:
Hi,

On 02/01/2016 02:14 PM, ML mail wrote:

Hello,

I just set up distributed geo-replication to a slave on my 2 nodes' replicated 
volume and noticed quite a few error messages (around 70 of them) in the 
slave's brick log file:

The exact log file is: /var/log/glusterfs/bricks/data-myvolume-geo-brick.log

[2016-01-31 22:19:29.524370] E [MSGID: 113020] [posix.c:1221:posix_mknod] 
0-myvolume-geo-posix: setting gfid on 
/data/myvolume-geo/brick/data/username/files/shared/logo-login-09.svg.ocTransferId1789604916.part
 failed
[2016-01-31 22:19:29.535478] W [MSGID: 113026] [posix.c:1338:posix_mkdir] 
0-myvolume-geo-posix: mkdir 
(/data/username/files_encryption/keys/files/shared/logo-login-09.svg.ocTransferId1789604916.part):
 gfid (15bbcec6-a332-4c21-81e4-c52472b1e13d) isalready associated with 
directory 
(/data/myvolume-geo/brick/.glusterfs/49/5d/495d6868-4844-4632-8ff9-ad9646a878fe/logo-login-09.svg).
 Hence,both directories will share same gfid and thiscan lead to 
inconsistencies.

Can you grep for this gfid(of the corresponding files) in changelogs and
share those files ?

{
For example:

1. Get gfid of the files like this:

# getfattr -n glusterfs.gfid.string  -m .  /mnt/slave/file456
getfattr: Removing leading '/' from absolute path names
# file: mnt/slave/file456
glusterfs.gfid.string="05b22446-de9e-42df-a63e-399c24d690c4"

2. grep for the corresponding gfid in brick back end like below:

[root@gfvm3 changelogs]# grep 05b22446-de9e-42df-a63e-399c24d690c4
/opt/volume_test/tv_2/b1/.glusterfs/changelogs/ -rn
Binary file
/opt/volume_test/tv_2/b1/.glusterfs/changelogs/CHANGELOG.1454135265 matches
Binary file
/opt/volume_test/tv_2/b1/.glusterfs/changelogs/CHANGELOG.1454135476 matches

}
This will help in understanding what operations are carried out in
master volume, which leads to this inconsistency.

Also, get the following:
gluster version
gluster volume info
gluster volume geo-replication status


This doesn't look good at all because the file mentioned in the error message (
logo-login-09.svg.ocTransferId1789604916.part) is left there with 0 kbytes and does not 
get deleted or cleaned up by glusterfs, leaving my geo-rep slave node in an inconsistent 
state which does not reflect the reality from the master nodes. The master nodes don't 
have that file anymore (which is correct). Here below is an "ls" of the 
concerned file with the correct file on top.


-rw-r--r-- 2 www-data www-data   24312 Jan  6  2014 logo-login-09.svg
-rw-r--r-- 1 root root   0 Jan 31 23:19 
logo-login-09.svg.ocTransferId1789604916.part

Rename issues in geo-replication are fixed lately. This looks similar to

one.

Thanks,
Saravana
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Setting gfid failed on slave geo-rep node

2016-02-01 Thread ML mail
I just found out I needed to run the getfattr on a mount and not on the 
glusterfs server directly. So here are the additional output you asked for:


# getfattr -n glusterfs.gfid.string  -m .  logo-login-09.svg
# file: logo-login-09.svg
glusterfs.gfid.string="1c648409-e98b-4544-a7fa-c2aef87f92ad"

# grep 1c648409-e98b-4544-a7fa-c2aef87f92ad 
/data/myvolume/brick/.glusterfs/changelogs -rn
Binary file /data/myvolume/brick/.glusterfs/changelogs/CHANGELOG.1454278219 
matches

Regards
ML



On Monday, February 1, 2016 1:30 PM, Saravanakumar Arumugam 
 wrote:
Hi,

On 02/01/2016 02:14 PM, ML mail wrote:
> Hello,
>
> I just set up distributed geo-replication to a slave on my 2 nodes' 
> replicated volume and noticed quite a few error messages (around 70 of them) 
> in the slave's brick log file:
>
> The exact log file is: /var/log/glusterfs/bricks/data-myvolume-geo-brick.log
>
> [2016-01-31 22:19:29.524370] E [MSGID: 113020] [posix.c:1221:posix_mknod] 
> 0-myvolume-geo-posix: setting gfid on 
> /data/myvolume-geo/brick/data/username/files/shared/logo-login-09.svg.ocTransferId1789604916.part
>  failed
> [2016-01-31 22:19:29.535478] W [MSGID: 113026] [posix.c:1338:posix_mkdir] 
> 0-myvolume-geo-posix: mkdir 
> (/data/username/files_encryption/keys/files/shared/logo-login-09.svg.ocTransferId1789604916.part):
>  gfid (15bbcec6-a332-4c21-81e4-c52472b1e13d) isalready associated with 
> directory 
> (/data/myvolume-geo/brick/.glusterfs/49/5d/495d6868-4844-4632-8ff9-ad9646a878fe/logo-login-09.svg).
>  Hence,both directories will share same gfid and thiscan lead to 
> inconsistencies.
Can you grep for this gfid(of the corresponding files) in changelogs and 
share those files ?

{
For example:

1. Get gfid of the files like this:

# getfattr -n glusterfs.gfid.string  -m .  /mnt/slave/file456
getfattr: Removing leading '/' from absolute path names
# file: mnt/slave/file456
glusterfs.gfid.string="05b22446-de9e-42df-a63e-399c24d690c4"

2. grep for the corresponding gfid in brick back end like below:

[root@gfvm3 changelogs]# grep 05b22446-de9e-42df-a63e-399c24d690c4 
/opt/volume_test/tv_2/b1/.glusterfs/changelogs/ -rn
Binary file 
/opt/volume_test/tv_2/b1/.glusterfs/changelogs/CHANGELOG.1454135265 matches
Binary file 
/opt/volume_test/tv_2/b1/.glusterfs/changelogs/CHANGELOG.1454135476 matches

}
This will help in understanding what operations are carried out in 
master volume, which leads to this inconsistency.

Also, get the following:
gluster version
gluster volume info
gluster volume geo-replication status

>
> This doesn't look good at all because the file mentioned in the error message 
> (
> logo-login-09.svg.ocTransferId1789604916.part) is left there with 0 kbytes 
> and does not get deleted or cleaned up by glusterfs, leaving my geo-rep slave 
> node in an inconsistent state which does not reflect the reality from the 
> master nodes. The master nodes don't have that file anymore (which is 
> correct). Here below is an "ls" of the concerned file with the correct file 
> on top.
>
>
> -rw-r--r-- 2 www-data www-data   24312 Jan  6  2014 logo-login-09.svg
> -rw-r--r-- 1 root root   0 Jan 31 23:19 
> logo-login-09.svg.ocTransferId1789604916.part
Rename issues in geo-replication are fixed lately. This looks similar to 

one.

Thanks,
Saravana
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Setting gfid failed on slave geo-rep node

2016-02-01 Thread ML mail
Hi,

Thank you for your answer, below is the ouput of the requested commands. There 
is just one issue with the GFID, as it does not seem to work. I am running the 
getfattr command on the master but if I run it on the slave node it also says 
operation not supported.


# getfattr -n glusterfs.gfid.string  -m .  logo-login-09.svg
logo-login-04.svg: glusterfs.gfid.string: Operation not supported

# file logo-login-09.svg
logo-login-04.svg: ASCII text, with very long lines, with no line terminators

# gluster version
3.7.6

#gluster volume info
Volume Name: myvolume
Type: Replicate
Volume ID: *REMOVED*
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gfs1a.domain.tld:/data/myvolume/brick
Brick2: gfs1b.domain.tld:/data/myvolume/brick
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
performance.readdir-ahead: on
nfs.disable: on

# gluster volume geo-replication status
MASTER NODEMASTER VOL MASTER BRICK   SLAVE USERSLAVE
   SLAVE NODE   
STATUS CRAWL STATUS   LAST_SYNCED

gfs1a  myvolume/data/myvolume/brickroot  
ssh://gfs1geo.domain.tld::myvolume-geogfs1geo.domain.tldActive 
Changelog Crawl2016-02-01 09:29:26
gfs1b  myvolume/data/myvolume/brickroot  
ssh://gfs1geo.domain.tld::myvolume-geogfs1geo.domain.tldPassiveN/A  
  N/A

Regards
ML



On Monday, February 1, 2016 1:30 PM, Saravanakumar Arumugam 
 wrote:
Hi,

On 02/01/2016 02:14 PM, ML mail wrote:
> Hello,
>
> I just set up distributed geo-replication to a slave on my 2 nodes' 
> replicated volume and noticed quite a few error messages (around 70 of them) 
> in the slave's brick log file:
>
> The exact log file is: /var/log/glusterfs/bricks/data-myvolume-geo-brick.log
>
> [2016-01-31 22:19:29.524370] E [MSGID: 113020] [posix.c:1221:posix_mknod] 
> 0-myvolume-geo-posix: setting gfid on 
> /data/myvolume-geo/brick/data/username/files/shared/logo-login-09.svg.ocTransferId1789604916.part
>  failed
> [2016-01-31 22:19:29.535478] W [MSGID: 113026] [posix.c:1338:posix_mkdir] 
> 0-myvolume-geo-posix: mkdir 
> (/data/username/files_encryption/keys/files/shared/logo-login-09.svg.ocTransferId1789604916.part):
>  gfid (15bbcec6-a332-4c21-81e4-c52472b1e13d) isalready associated with 
> directory 
> (/data/myvolume-geo/brick/.glusterfs/49/5d/495d6868-4844-4632-8ff9-ad9646a878fe/logo-login-09.svg).
>  Hence,both directories will share same gfid and thiscan lead to 
> inconsistencies.
Can you grep for this gfid(of the corresponding files) in changelogs and 
share those files ?

{
For example:

1. Get gfid of the files like this:

# getfattr -n glusterfs.gfid.string  -m .  /mnt/slave/file456
getfattr: Removing leading '/' from absolute path names
# file: mnt/slave/file456
glusterfs.gfid.string="05b22446-de9e-42df-a63e-399c24d690c4"

2. grep for the corresponding gfid in brick back end like below:

[root@gfvm3 changelogs]# grep 05b22446-de9e-42df-a63e-399c24d690c4 
/opt/volume_test/tv_2/b1/.glusterfs/changelogs/ -rn
Binary file 
/opt/volume_test/tv_2/b1/.glusterfs/changelogs/CHANGELOG.1454135265 matches
Binary file 
/opt/volume_test/tv_2/b1/.glusterfs/changelogs/CHANGELOG.1454135476 matches

}
This will help in understanding what operations are carried out in 
master volume, which leads to this inconsistency.

Also, get the following:
gluster version
gluster volume info
gluster volume geo-replication status

>
> This doesn't look good at all because the file mentioned in the error message 
> (
> logo-login-09.svg.ocTransferId1789604916.part) is left there with 0 kbytes 
> and does not get deleted or cleaned up by glusterfs, leaving my geo-rep slave 
> node in an inconsistent state which does not reflect the reality from the 
> master nodes. The master nodes don't have that file anymore (which is 
> correct). Here below is an "ls" of the concerned file with the correct file 
> on top.
>
>
> -rw-r--r-- 2 www-data www-data   24312 Jan  6  2014 logo-login-09.svg
> -rw-r--r-- 1 root root   0 Jan 31 23:19 
> logo-login-09.svg.ocTransferId1789604916.part
Rename issues in geo-replication are fixed lately. This looks similar to 

one.

Thanks,
Saravana
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Setting gfid failed on slave geo-rep node

2016-02-01 Thread Saravanakumar Arumugam

Hi,

On 02/01/2016 02:14 PM, ML mail wrote:

Hello,

I just set up distributed geo-replication to a slave on my 2 nodes' replicated 
volume and noticed quite a few error messages (around 70 of them) in the 
slave's brick log file:

The exact log file is: /var/log/glusterfs/bricks/data-myvolume-geo-brick.log

[2016-01-31 22:19:29.524370] E [MSGID: 113020] [posix.c:1221:posix_mknod] 
0-myvolume-geo-posix: setting gfid on 
/data/myvolume-geo/brick/data/username/files/shared/logo-login-09.svg.ocTransferId1789604916.part
 failed
[2016-01-31 22:19:29.535478] W [MSGID: 113026] [posix.c:1338:posix_mkdir] 
0-myvolume-geo-posix: mkdir 
(/data/username/files_encryption/keys/files/shared/logo-login-09.svg.ocTransferId1789604916.part):
 gfid (15bbcec6-a332-4c21-81e4-c52472b1e13d) isalready associated with 
directory 
(/data/myvolume-geo/brick/.glusterfs/49/5d/495d6868-4844-4632-8ff9-ad9646a878fe/logo-login-09.svg).
 Hence,both directories will share same gfid and thiscan lead to 
inconsistencies.
Can you grep for this gfid(of the corresponding files) in changelogs and 
share those files ?


{
For example:

1. Get gfid of the files like this:

# getfattr -n glusterfs.gfid.string  -m .  /mnt/slave/file456
getfattr: Removing leading '/' from absolute path names
# file: mnt/slave/file456
glusterfs.gfid.string="05b22446-de9e-42df-a63e-399c24d690c4"

2. grep for the corresponding gfid in brick back end like below:

[root@gfvm3 changelogs]# grep 05b22446-de9e-42df-a63e-399c24d690c4 
/opt/volume_test/tv_2/b1/.glusterfs/changelogs/ -rn
Binary file 
/opt/volume_test/tv_2/b1/.glusterfs/changelogs/CHANGELOG.1454135265 matches
Binary file 
/opt/volume_test/tv_2/b1/.glusterfs/changelogs/CHANGELOG.1454135476 matches


}
This will help in understanding what operations are carried out in 
master volume, which leads to this inconsistency.


Also, get the following:
gluster version
gluster volume info
gluster volume geo-replication status



This doesn't look good at all because the file mentioned in the error message (
logo-login-09.svg.ocTransferId1789604916.part) is left there with 0 kbytes and does not 
get deleted or cleaned up by glusterfs, leaving my geo-rep slave node in an inconsistent 
state which does not reflect the reality from the master nodes. The master nodes don't 
have that file anymore (which is correct). Here below is an "ls" of the 
concerned file with the correct file on top.


-rw-r--r-- 2 www-data www-data   24312 Jan  6  2014 logo-login-09.svg
-rw-r--r-- 1 root root   0 Jan 31 23:19 
logo-login-09.svg.ocTransferId1789604916.part
Rename issues in geo-replication are fixed lately. This looks similar to 
one.


Thanks,
Saravana

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Setting gfid failed on slave geo-rep node

2016-02-01 Thread ML mail
Hello,

I just set up distributed geo-replication to a slave on my 2 nodes' replicated 
volume and noticed quite a few error messages (around 70 of them) in the 
slave's brick log file:

The exact log file is: /var/log/glusterfs/bricks/data-myvolume-geo-brick.log

[2016-01-31 22:19:29.524370] E [MSGID: 113020] [posix.c:1221:posix_mknod] 
0-myvolume-geo-posix: setting gfid on 
/data/myvolume-geo/brick/data/username/files/shared/logo-login-09.svg.ocTransferId1789604916.part
 failed
[2016-01-31 22:19:29.535478] W [MSGID: 113026] [posix.c:1338:posix_mkdir] 
0-myvolume-geo-posix: mkdir 
(/data/username/files_encryption/keys/files/shared/logo-login-09.svg.ocTransferId1789604916.part):
 gfid (15bbcec6-a332-4c21-81e4-c52472b1e13d) isalready associated with 
directory 
(/data/myvolume-geo/brick/.glusterfs/49/5d/495d6868-4844-4632-8ff9-ad9646a878fe/logo-login-09.svg).
 Hence,both directories will share same gfid and thiscan lead to 
inconsistencies.

This doesn't look good at all because the file mentioned in the error message (
logo-login-09.svg.ocTransferId1789604916.part) is left there with 0 kbytes and 
does not get deleted or cleaned up by glusterfs, leaving my geo-rep slave node 
in an inconsistent state which does not reflect the reality from the master 
nodes. The master nodes don't have that file anymore (which is correct). Here 
below is an "ls" of the concerned file with the correct file on top.


-rw-r--r-- 2 www-data www-data   24312 Jan  6  2014 logo-login-09.svg
-rw-r--r-- 1 root root   0 Jan 31 23:19 
logo-login-09.svg.ocTransferId1789604916.part

So at least I have the correct file (first file in the list) but gluster leaves 
this second "temporary" or "transient" file although it should delete it.

Any ideas?

Regards
ML
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users