Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3
Dear Ravi, Thank you for your mail and info. This is great news if these patches can make it into 3.12.12. I will then upgrade asap. Could anyone confirm in case these patches do not make it into 3.12.12? Because I would then rather wait for the next release. I was already told on this list that 3.12.9 should have fixed this issue but unfortunately it didn't. Best regards, Mabi ‐‐‐ Original Message ‐‐‐ On July 4, 2018 5:41 PM, Ravishankar N wrote: > > > Hi mabi, there are a couple of AFR patches from master that I'm > > currently back porting to the 3.12 branch: > > afr: heal gfids when file is not present on all bricks > > afr: don't update readables if inode refresh failed on all children > > afr: fix bug-1363721.t failure > > afr: add quorum checks in pre-op > > afr: don't treat all cases all bricks being blamed as split-brain > > afr: capture the correct errno in post-op quorum check > > afr: add quorum checks in post-op > > Many of these help make the transaction code more robust by fixing > > various corner cases. It would be great if you can wait for the next > > 3.12 minor release (3.12.12 ?) and upgrade to that build and see if the > > issues go away. > > Note: CC'ing Karthik and Jiffin for their help in reviewing and merging > > the backports for the above patches. > > Thanks, > > Ravi > > On 07/04/2018 06:51 PM, mabi wrote: > > > Hello, > > > > I just wanted to let you know that last week I have upgraded my two replica > > nodes from Debian 8 to Debian 9 so now all my 3 nodes (including aribter) > > are running Debian 9 with a Linux 4 kernel. > > > > Unfortunately I still have the exact same issue. Another detail I might > > have not mentioned yet is that I have quotas enabled on this volume, I > > don't really know if that is relevant but who knows... > > > > As a reminder here is what happens on the client side which has the volume > > mounted via FUSE (take earlier today from the > > /var/log/glusterfs/mnt-myvol-private.log logfile). Note here that in this > > specific case it's only one single file who had this issue. > > > > [2018-07-04 08:23:49.314252] E [MSGID: 109089] > > [dht-helper.c:1481:dht_migration_complete_check_task] 0-myvol-private-dht: > > failed to open the fd (0x7fccb00a5120, flags=010) on file > > /dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey > > @ myvol-replicate-0 [Input/output error] > > > > [2018-07-04 08:23:49.328712] W [MSGID: 108027] > > [afr-common.c:2821:afr_discover_done] 0-myvol-private-replicate-0: no read > > subvols for > > /dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey > > > > [2018-07-04 08:23:49.330749] W [fuse-bridge.c:779:fuse_truncate_cbk] > > 0-glusterfs-fuse: 55916791: TRUNCATE() > > /dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey > > => -1 (Input/output error) > > > > Best regards, > > > > M. > > > > ‐‐‐ Original Message ‐‐‐ > > > > On June 22, 2018 4:44 PM, mabi m...@protonmail.ch wrote: > > > > > Hi, > > > > > > Now that this issue has happened a few times I noticed a few things which > > > might be helpful for debugging: > > > > > > - This problem happens when files are uploaded via a cloud app called > > > Nextcloud where the files are encrypted by the app itself on the server > > > side (PHP code) but only rarely and randomly. > > > > > > - It does not seem to happen with Nextcloud installation which does not > > > have server side encryption enabled. > > > > > > - When this happens both first and second node of the replica have 120k > > > of context switches and 25k interrupts, the arbiter node 30k context > > > switches/20k interrupts. No nodes are overloaded, there is no io/wait and > > > no network issues or disconnections. > > > > > > - All of the problematic files to heal have spaces in one of their > > > sub-directories (might be totally irrelevant). > > > > > > If that's of any use my two replica nodes are Debian 8 physical > > > servers with ZFS as file system for the bricks and the arbiter is a > > > Debian 9 virtual machine with XFS as file system for the brick. To mount > > > the volume I use a glusterfs fuse mount on the web server which has > > > Nextcloud running. > > > > > > Regards, > > > > > > M. > > > > > > ‐‐‐ Original Message ‐‐‐ > > > > > > On May 25, 2018 5:55 PM, mabi m...@protonmail.ch wrote: > > > > > > > > > > Thanks Ravi. Let me know when you have time to have a look. It sort of > > > > happens around once or twice per week but today it was 24 files in one > > > > go which are unsynched and where I need to manually reset the xattrs on > > > > the arbiter node. > > > > > > > > By the way on this volume I use quotas which I set on specifc > > > > directories, I don't know if this
Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3
Hi mabi, there are a couple of AFR patches from master that I'm currently back porting to the 3.12 branch: afr: heal gfids when file is not present on all bricks afr: don't update readables if inode refresh failed on all children afr: fix bug-1363721.t failure afr: add quorum checks in pre-op afr: don't treat all cases all bricks being blamed as split-brain afr: capture the correct errno in post-op quorum check afr: add quorum checks in post-op Many of these help make the transaction code more robust by fixing various corner cases. It would be great if you can wait for the next 3.12 minor release (3.12.12 ?) and upgrade to that build and see if the issues go away. Note: CC'ing Karthik and Jiffin for their help in reviewing and merging the backports for the above patches. Thanks, Ravi On 07/04/2018 06:51 PM, mabi wrote: Hello, I just wanted to let you know that last week I have upgraded my two replica nodes from Debian 8 to Debian 9 so now all my 3 nodes (including aribter) are running Debian 9 with a Linux 4 kernel. Unfortunately I still have the exact same issue. Another detail I might have not mentioned yet is that I have quotas enabled on this volume, I don't really know if that is relevant but who knows... As a reminder here is what happens on the client side which has the volume mounted via FUSE (take earlier today from the /var/log/glusterfs/mnt-myvol-private.log logfile). Note here that in this specific case it's only one single file who had this issue. [2018-07-04 08:23:49.314252] E [MSGID: 109089] [dht-helper.c:1481:dht_migration_complete_check_task] 0-myvol-private-dht: failed to open the fd (0x7fccb00a5120, flags=010) on file /dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey @ myvol-replicate-0 [Input/output error] [2018-07-04 08:23:49.328712] W [MSGID: 108027] [afr-common.c:2821:afr_discover_done] 0-myvol-private-replicate-0: no read subvols for /dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey [2018-07-04 08:23:49.330749] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 55916791: TRUNCATE() /dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey => -1 (Input/output error) Best regards, M. ‐‐‐ Original Message ‐‐‐ On June 22, 2018 4:44 PM, mabi wrote: Hi, Now that this issue has happened a few times I noticed a few things which might be helpful for debugging: - This problem happens when files are uploaded via a cloud app called Nextcloud where the files are encrypted by the app itself on the server side (PHP code) but only rarely and randomly. - It does not seem to happen with Nextcloud installation which does not have server side encryption enabled. - When this happens both first and second node of the replica have 120k of context switches and 25k interrupts, the arbiter node 30k context switches/20k interrupts. No nodes are overloaded, there is no io/wait and no network issues or disconnections. - All of the problematic files to heal have spaces in one of their sub-directories (might be totally irrelevant). If that's of any use my two replica nodes are Debian 8 physical servers with ZFS as file system for the bricks and the arbiter is a Debian 9 virtual machine with XFS as file system for the brick. To mount the volume I use a glusterfs fuse mount on the web server which has Nextcloud running. Regards, M. ‐‐‐ Original Message ‐‐‐ On May 25, 2018 5:55 PM, mabi m...@protonmail.ch wrote: Thanks Ravi. Let me know when you have time to have a look. It sort of happens around once or twice per week but today it was 24 files in one go which are unsynched and where I need to manually reset the xattrs on the arbiter node. By the way on this volume I use quotas which I set on specifc directories, I don't know if this is relevant or not but thought I would just mention. ‐‐‐ Original Message ‐‐‐ On May 23, 2018 9:25 AM, Ravishankar N ravishan...@redhat.com wrote: On 05/23/2018 12:47 PM, mabi wrote: Hello, I just wanted to ask if you had time to look into this bug I am encountering and if there is anything else I can do? For now in order to get rid of these 3 unsynched files shall I do the same method that was suggested to me in this thread? Sorry Mabi, I haven't had a chance to dig deeper into this. The workaround of resetting xattrs should be fine though. Thanks, Ravi Thanks, Mabi ‐‐‐ Original Message ‐‐‐ On May 17, 2018 11:07 PM, mabi m...@protonmail.ch wrote: Hi Ravi, Please fine below the answers to your questions 1. I have never touched the cluster.quorum-type option. Currently it is set as following for this volume: Option Value cluster.quorum-type none 2. The .shareKey files are not supposed to be empty. They should be 512 bytes big a
Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3
Hello, I just wanted to let you know that last week I have upgraded my two replica nodes from Debian 8 to Debian 9 so now all my 3 nodes (including aribter) are running Debian 9 with a Linux 4 kernel. Unfortunately I still have the exact same issue. Another detail I might have not mentioned yet is that I have quotas enabled on this volume, I don't really know if that is relevant but who knows... As a reminder here is what happens on the client side which has the volume mounted via FUSE (take earlier today from the /var/log/glusterfs/mnt-myvol-private.log logfile). Note here that in this specific case it's only one single file who had this issue. [2018-07-04 08:23:49.314252] E [MSGID: 109089] [dht-helper.c:1481:dht_migration_complete_check_task] 0-myvol-private-dht: failed to open the fd (0x7fccb00a5120, flags=010) on file /dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey @ myvol-replicate-0 [Input/output error] [2018-07-04 08:23:49.328712] W [MSGID: 108027] [afr-common.c:2821:afr_discover_done] 0-myvol-private-replicate-0: no read subvols for /dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey [2018-07-04 08:23:49.330749] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 55916791: TRUNCATE() /dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey => -1 (Input/output error) Best regards, M. ‐‐‐ Original Message ‐‐‐ On June 22, 2018 4:44 PM, mabi wrote: > > > Hi, > > Now that this issue has happened a few times I noticed a few things which > might be helpful for debugging: > > - This problem happens when files are uploaded via a cloud app called > Nextcloud where the files are encrypted by the app itself on the server side > (PHP code) but only rarely and randomly. > - It does not seem to happen with Nextcloud installation which does not > have server side encryption enabled. > - When this happens both first and second node of the replica have 120k of > context switches and 25k interrupts, the arbiter node 30k context > switches/20k interrupts. No nodes are overloaded, there is no io/wait and no > network issues or disconnections. > - All of the problematic files to heal have spaces in one of their > sub-directories (might be totally irrelevant). > > If that's of any use my two replica nodes are Debian 8 physical servers > with ZFS as file system for the bricks and the arbiter is a Debian 9 virtual > machine with XFS as file system for the brick. To mount the volume I use a > glusterfs fuse mount on the web server which has Nextcloud running. > > Regards, > > M. > > ‐‐‐ Original Message ‐‐‐ > > On May 25, 2018 5:55 PM, mabi m...@protonmail.ch wrote: > > > > Thanks Ravi. Let me know when you have time to have a look. It sort of > > happens around once or twice per week but today it was 24 files in one go > > which are unsynched and where I need to manually reset the xattrs on the > > arbiter node. > > > > By the way on this volume I use quotas which I set on specifc directories, > > I don't know if this is relevant or not but thought I would just mention. > > > > ‐‐‐ Original Message ‐‐‐ > > > > On May 23, 2018 9:25 AM, Ravishankar N ravishan...@redhat.com wrote: > > > > > On 05/23/2018 12:47 PM, mabi wrote: > > > > > > > Hello, > > > > > > > > I just wanted to ask if you had time to look into this bug I am > > > > encountering and if there is anything else I can do? > > > > > > > > For now in order to get rid of these 3 unsynched files shall I do the > > > > same method that was suggested to me in this thread? > > > > > > Sorry Mabi, I haven't had a chance to dig deeper into this. The > > > > > > workaround of resetting xattrs should be fine though. > > > > > > Thanks, > > > > > > Ravi > > > > > > > Thanks, > > > > > > > > Mabi > > > > > > > > ‐‐‐ Original Message ‐‐‐ > > > > > > > > On May 17, 2018 11:07 PM, mabi m...@protonmail.ch wrote: > > > > > > > > > Hi Ravi, > > > > > > > > > > Please fine below the answers to your questions > > > > > > > > > > 1. I have never touched the cluster.quorum-type option. Currently it > > > > > is set as following for this volume: > > > > > > > > > > Option Value > > > > > > > > > > > > > > > cluster.quorum-type none > > > > > > > > > > 2. The .shareKey files are not supposed to be empty. They should be > > > > > 512 bytes big and contain binary data (PGP Secret Sub-key). I am not > > > > > in a position to say why it is in this specific case only 0 bytes and > > > > > if it is the fault of the software (Nextcloud) or GlusterFS. I can > > > > > just say here that I have another file server which is a simple NFS > > > > > server with another Nextcloud installation and there I never saw any > > > > > 0 bytes .shareKey files being created.
Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3
Hi, Now that this issue has happened a few times I noticed a few things which might be helpful for debugging: - This problem happens when files are uploaded via a cloud app called Nextcloud where the files are encrypted by the app itself on the server side (PHP code) but only rarely and randomly. - It does not seem to happen with Nextcloud installation which does not have server side encryption enabled. - When this happens both first and second node of the replica have 120k of context switches and 25k interrupts, the arbiter node 30k context switches/20k interrupts. No nodes are overloaded, there is no io/wait and no network issues or disconnections. - All of the problematic files to heal have spaces in one of their sub-directories (might be totally irrelevant). If that's of any use my two replica nodes are Debian 8 physical servers with ZFS as file system for the bricks and the arbiter is a Debian 9 virtual machine with XFS as file system for the brick. To mount the volume I use a glusterfs fuse mount on the web server which has Nextcloud running. Regards, M. ‐‐‐ Original Message ‐‐‐ On May 25, 2018 5:55 PM, mabi wrote: > > > Thanks Ravi. Let me know when you have time to have a look. It sort of > happens around once or twice per week but today it was 24 files in one go > which are unsynched and where I need to manually reset the xattrs on the > arbiter node. > > By the way on this volume I use quotas which I set on specifc directories, I > don't know if this is relevant or not but thought I would just mention. > > ‐‐‐ Original Message ‐‐‐ > > On May 23, 2018 9:25 AM, Ravishankar N ravishan...@redhat.com wrote: > > > On 05/23/2018 12:47 PM, mabi wrote: > > > > > Hello, > > > > > > I just wanted to ask if you had time to look into this bug I am > > > encountering and if there is anything else I can do? > > > > > > For now in order to get rid of these 3 unsynched files shall I do the > > > same method that was suggested to me in this thread? > > > > Sorry Mabi, I haven't had a chance to dig deeper into this. The > > > > workaround of resetting xattrs should be fine though. > > > > Thanks, > > > > Ravi > > > > > Thanks, > > > > > > Mabi > > > > > > ‐‐‐ Original Message ‐‐‐ > > > > > > On May 17, 2018 11:07 PM, mabi m...@protonmail.ch wrote: > > > > > > > Hi Ravi, > > > > > > > > Please fine below the answers to your questions > > > > > > > > 1. I have never touched the cluster.quorum-type option. Currently it > > > > is set as following for this volume: > > > > > > > > Option Value > > > > > > > > > > > > cluster.quorum-type none > > > > > > > > 2. The .shareKey files are not supposed to be empty. They should be > > > > 512 bytes big and contain binary data (PGP Secret Sub-key). I am not in > > > > a position to say why it is in this specific case only 0 bytes and if > > > > it is the fault of the software (Nextcloud) or GlusterFS. I can just > > > > say here that I have another file server which is a simple NFS server > > > > with another Nextcloud installation and there I never saw any 0 bytes > > > > .shareKey files being created. > > > > > > > > 3. It seems to be quite random and I am not the person who uses the > > > > Nextcloud software so I can't say what it was doing at that specific > > > > time but I guess uploading files or moving files around. Basically I > > > > use GlusterFS to store the files/data of the Nextcloud web application > > > > where I have it mounted using a fuse mount (mount -t glusterfs). > > > > > > > > > > > > Regarding the logs I have attached the mount log file from the client > > > > and below are the relevant log entries from the brick log file of all 3 > > > > nodes. Let me know if you need any other log files. Also if you know > > > > any "log file sanitizer tool" which can replace sensitive file names > > > > with random file names in log files that would like to use it as right > > > > now I have to do that manually. > > > > > > > > NODE 1 brick log: > > > > > > > > [2018-05-15 06:54:20.176679] E [MSGID: 113015] > > > > [posix.c:1211:posix_opendir] 0-myvol-private-posix: opendir failed on > > > > /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE > > > > [No such file or directory] > > > > > > > > NODE 2 brick log: > > > > > > > > [2018-05-15 06:54:20.176415] E [MSGID: 113015] > > > > [posix.c:1211:posix_opendir] 0-myvol-private-posix: opendir failed on > > > > /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE > > > > [No such file or directory] > > > > > > > > NODE 3 (arbiter) brick log: > > > > > > > > [2018-05-15 06:54:19.898981] W [MSGID: 113103] > > > > [posix.c:285:posix_lookup] 0-myvol-private-posix: Found stale gfid > > > > handle > > > > /srv/glusterfs/myvol-private/brick/.glusterfs/f0/65/f065a5e7-ac06-445f
Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3
Thanks Ravi. Let me know when you have time to have a look. It sort of happens around once or twice per week but today it was 24 files in one go which are unsynched and where I need to manually reset the xattrs on the arbiter node. By the way on this volume I use quotas which I set on specifc directories, I don't know if this is relevant or not but thought I would just mention. ‐‐‐ Original Message ‐‐‐ On May 23, 2018 9:25 AM, Ravishankar N wrote: > > > On 05/23/2018 12:47 PM, mabi wrote: > > > Hello, > > > > I just wanted to ask if you had time to look into this bug I am > > encountering and if there is anything else I can do? > > > > For now in order to get rid of these 3 unsynched files shall I do the same > > method that was suggested to me in this thread? > > Sorry Mabi, I haven't had a chance to dig deeper into this. The > > workaround of resetting xattrs should be fine though. > > Thanks, > > Ravi > > > Thanks, > > > > Mabi > > > > ‐‐‐ Original Message ‐‐‐ > > > > On May 17, 2018 11:07 PM, mabi m...@protonmail.ch wrote: > > > > > Hi Ravi, > > > > > > Please fine below the answers to your questions > > > > > > 1. I have never touched the cluster.quorum-type option. Currently it is > > > set as following for this volume: > > > > > > Option Value > > > > > > > > > cluster.quorum-type none > > > > > > 2. The .shareKey files are not supposed to be empty. They should be 512 > > > bytes big and contain binary data (PGP Secret Sub-key). I am not in a > > > position to say why it is in this specific case only 0 bytes and if it is > > > the fault of the software (Nextcloud) or GlusterFS. I can just say here > > > that I have another file server which is a simple NFS server with another > > > Nextcloud installation and there I never saw any 0 bytes .shareKey files > > > being created. > > > > > > 3. It seems to be quite random and I am not the person who uses the > > > Nextcloud software so I can't say what it was doing at that specific time > > > but I guess uploading files or moving files around. Basically I use > > > GlusterFS to store the files/data of the Nextcloud web application where > > > I have it mounted using a fuse mount (mount -t glusterfs). > > > > > > > > > Regarding the logs I have attached the mount log file from the client and > > > below are the relevant log entries from the brick log file of all 3 > > > nodes. Let me know if you need any other log files. Also if you know any > > > "log file sanitizer tool" which can replace sensitive file names with > > > random file names in log files that would like to use it as right now I > > > have to do that manually. > > > > > > NODE 1 brick log: > > > > > > [2018-05-15 06:54:20.176679] E [MSGID: 113015] > > > [posix.c:1211:posix_opendir] 0-myvol-private-posix: opendir failed on > > > /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE > > > [No such file or directory] > > > > > > NODE 2 brick log: > > > > > > [2018-05-15 06:54:20.176415] E [MSGID: 113015] > > > [posix.c:1211:posix_opendir] 0-myvol-private-posix: opendir failed on > > > /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE > > > [No such file or directory] > > > > > > NODE 3 (arbiter) brick log: > > > > > > [2018-05-15 06:54:19.898981] W [MSGID: 113103] [posix.c:285:posix_lookup] > > > 0-myvol-private-posix: Found stale gfid handle > > > /srv/glusterfs/myvol-private/brick/.glusterfs/f0/65/f065a5e7-ac06-445f-add0-83acf8ce4155, > > > removing it. [Stale file handle] > > > > > > [2018-05-15 06:54:20.056196] W [MSGID: 113103] [posix.c:285:posix_lookup] > > > 0-myvol-private-posix: Found stale gfid handle > > > /srv/glusterfs/myvol-private/brick/.glusterfs/8f/a1/8fa15dbd-cd5c-4900-b889-0fe7fce46a13, > > > removing it. [Stale file handle] > > > > > > [2018-05-15 06:54:20.172823] I [MSGID: 115056] > > > [server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: > > > 14740125: RMDIR > > > /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE > > > (f065a5e7-ac06-445f-add0-83acf8ce4155/OC_DEFAULT_MODULE), client: > > > nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0, > > > error-xlator: myvol-private-posix [Directory not empty] > > > > > > [2018-05-15 06:54:20.190911] I [MSGID: 115056] > > > [server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: > > > 14740141: RMDIR > > > /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir > > > (72a1613e-2ac0-48bd-8ace-f2f723f3796c/2016.03.15 > > > AVB_Photovoltaik-Versicherung 2013.pdf), client: > > > nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0, > > > error-xlator: myvol-private-posix [Directory not empty] > > > > > > Best regards, > > > > > > Mabi > > > > > > ‐‐‐ Original M
Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3
On 05/23/2018 12:47 PM, mabi wrote: Hello, I just wanted to ask if you had time to look into this bug I am encountering and if there is anything else I can do? For now in order to get rid of these 3 unsynched files shall I do the same method that was suggested to me in this thread? Sorry Mabi, I haven't had a chance to dig deeper into this. The workaround of resetting xattrs should be fine though. Thanks, Ravi Thanks, Mabi ‐‐‐ Original Message ‐‐‐ On May 17, 2018 11:07 PM, mabi wrote: Hi Ravi, Please fine below the answers to your questions 1. I have never touched the cluster.quorum-type option. Currently it is set as following for this volume: Option Value cluster.quorum-type none 2) The .shareKey files are not supposed to be empty. They should be 512 bytes big and contain binary data (PGP Secret Sub-key). I am not in a position to say why it is in this specific case only 0 bytes and if it is the fault of the software (Nextcloud) or GlusterFS. I can just say here that I have another file server which is a simple NFS server with another Nextcloud installation and there I never saw any 0 bytes .shareKey files being created. 3) It seems to be quite random and I am not the person who uses the Nextcloud software so I can't say what it was doing at that specific time but I guess uploading files or moving files around. Basically I use GlusterFS to store the files/data of the Nextcloud web application where I have it mounted using a fuse mount (mount -t glusterfs). Regarding the logs I have attached the mount log file from the client and below are the relevant log entries from the brick log file of all 3 nodes. Let me know if you need any other log files. Also if you know any "log file sanitizer tool" which can replace sensitive file names with random file names in log files that would like to use it as right now I have to do that manually. NODE 1 brick log: [2018-05-15 06:54:20.176679] E [MSGID: 113015] [posix.c:1211:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE [No such file or directory] NODE 2 brick log: [2018-05-15 06:54:20.176415] E [MSGID: 113015] [posix.c:1211:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE [No such file or directory] NODE 3 (arbiter) brick log: [2018-05-15 06:54:19.898981] W [MSGID: 113103] [posix.c:285:posix_lookup] 0-myvol-private-posix: Found stale gfid handle /srv/glusterfs/myvol-private/brick/.glusterfs/f0/65/f065a5e7-ac06-445f-add0-83acf8ce4155, removing it. [Stale file handle] [2018-05-15 06:54:20.056196] W [MSGID: 113103] [posix.c:285:posix_lookup] 0-myvol-private-posix: Found stale gfid handle /srv/glusterfs/myvol-private/brick/.glusterfs/8f/a1/8fa15dbd-cd5c-4900-b889-0fe7fce46a13, removing it. [Stale file handle] [2018-05-15 06:54:20.172823] I [MSGID: 115056] [server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 14740125: RMDIR /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE (f065a5e7-ac06-445f-add0-83acf8ce4155/OC_DEFAULT_MODULE), client: nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0, error-xlator: myvol-private-posix [Directory not empty] [2018-05-15 06:54:20.190911] I [MSGID: 115056] [server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 14740141: RMDIR /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir (72a1613e-2ac0-48bd-8ace-f2f723f3796c/2016.03.15 AVB_Photovoltaik-Versicherung 2013.pdf), client: nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0, error-xlator: myvol-private-posix [Directory not empty] Best regards, Mabi ‐‐‐ Original Message ‐‐‐ On May 17, 2018 7:00 AM, Ravishankar N ravishan...@redhat.com wrote: Hi mabi, Some questions: -Did you by any chance change the cluster.quorum-type option from the default values? -Is filename.shareKey supposed to be any empty file? Looks like the file was fallocated with the keep-size option but never written to. (On the 2 data bricks, stat output shows Size =0, but non zero Blocks and yet a 'regular empty file'). -Do you have some sort of a reproducer/ steps that you perform when the issue occurs? Please also share the logs from all 3 nodes and the client(s). Thanks, Ravi On 05/15/2018 05:26 PM, mabi wrote: Thank you Ravi for your fast answer. As requested you will find below the "stat" and "getfattr" of one of the files and its parent directory from all three nodes of my cluster. NODE 1: File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’ Size: 0 Blocks: 38 IO Block: 131072 regular empty file Device: 23h/35d Inode: 744413 Links: 2 Access: (0644
Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3
Hello, I just wanted to ask if you had time to look into this bug I am encountering and if there is anything else I can do? For now in order to get rid of these 3 unsynched files shall I do the same method that was suggested to me in this thread? Thanks, Mabi ‐‐‐ Original Message ‐‐‐ On May 17, 2018 11:07 PM, mabi wrote: > > > Hi Ravi, > > Please fine below the answers to your questions > > 1. I have never touched the cluster.quorum-type option. Currently it is set > as following for this volume: > > Option Value > > > cluster.quorum-type none > > 2) The .shareKey files are not supposed to be empty. They should be 512 bytes > big and contain binary data (PGP Secret Sub-key). I am not in a position to > say why it is in this specific case only 0 bytes and if it is the fault of > the software (Nextcloud) or GlusterFS. I can just say here that I have > another file server which is a simple NFS server with another Nextcloud > installation and there I never saw any 0 bytes .shareKey files being created. > > 3) It seems to be quite random and I am not the person who uses the Nextcloud > software so I can't say what it was doing at that specific time but I guess > uploading files or moving files around. Basically I use GlusterFS to store > the files/data of the Nextcloud web application where I have it mounted using > a fuse mount (mount -t glusterfs). > > Regarding the logs I have attached the mount log file from the client and > below are the relevant log entries from the brick log file of all 3 nodes. > Let me know if you need any other log files. Also if you know any "log file > sanitizer tool" which can replace sensitive file names with random file names > in log files that would like to use it as right now I have to do that > manually. > > NODE 1 brick log: > > [2018-05-15 06:54:20.176679] E [MSGID: 113015] [posix.c:1211:posix_opendir] > 0-myvol-private-posix: opendir failed on > /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE > [No such file or directory] > > NODE 2 brick log: > > [2018-05-15 06:54:20.176415] E [MSGID: 113015] [posix.c:1211:posix_opendir] > 0-myvol-private-posix: opendir failed on > /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE > [No such file or directory] > > NODE 3 (arbiter) brick log: > > [2018-05-15 06:54:19.898981] W [MSGID: 113103] [posix.c:285:posix_lookup] > 0-myvol-private-posix: Found stale gfid handle > /srv/glusterfs/myvol-private/brick/.glusterfs/f0/65/f065a5e7-ac06-445f-add0-83acf8ce4155, > removing it. [Stale file handle] > > [2018-05-15 06:54:20.056196] W [MSGID: 113103] [posix.c:285:posix_lookup] > 0-myvol-private-posix: Found stale gfid handle > /srv/glusterfs/myvol-private/brick/.glusterfs/8f/a1/8fa15dbd-cd5c-4900-b889-0fe7fce46a13, > removing it. [Stale file handle] > > [2018-05-15 06:54:20.172823] I [MSGID: 115056] > [server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 14740125: > RMDIR > /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE > (f065a5e7-ac06-445f-add0-83acf8ce4155/OC_DEFAULT_MODULE), client: > nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0, > error-xlator: myvol-private-posix [Directory not empty] > > [2018-05-15 06:54:20.190911] I [MSGID: 115056] > [server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 14740141: > RMDIR /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir > (72a1613e-2ac0-48bd-8ace-f2f723f3796c/2016.03.15 > AVB_Photovoltaik-Versicherung 2013.pdf), client: > nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0, > error-xlator: myvol-private-posix [Directory not empty] > > Best regards, > > Mabi > > ‐‐‐ Original Message ‐‐‐ > > On May 17, 2018 7:00 AM, Ravishankar N ravishan...@redhat.com wrote: > > > Hi mabi, > > > > Some questions: > > > > -Did you by any chance change the cluster.quorum-type option from the > > > > default values? > > > > -Is filename.shareKey supposed to be any empty file? Looks like the file > > > > was fallocated with the keep-size option but never written to. (On the 2 > > > > data bricks, stat output shows Size =0, but non zero Blocks and yet a > > > > 'regular empty file'). > > > > -Do you have some sort of a reproducer/ steps that you perform when the > > > > issue occurs? Please also share the logs from all 3 nodes and the client(s). > > > > Thanks, > > > > Ravi > > > > On 05/15/2018 05:26 PM, mabi wrote: > > > > > Thank you Ravi for your fast answer. As requested you will find below the > > > "stat" and "getfattr" of one of the files and its parent directory from > > > all three nodes of my cluster. > > > > > > NODE 1: > > > > > > File: > > > ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAU
Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3
Hi Ravi, Please fine below the answers to your questions 1) I have never touched the cluster.quorum-type option. Currently it is set as following for this volume: Option Value -- - cluster.quorum-type none 2) The .shareKey files are not supposed to be empty. They should be 512 bytes big and contain binary data (PGP Secret Sub-key). I am not in a position to say why it is in this specific case only 0 bytes and if it is the fault of the software (Nextcloud) or GlusterFS. I can just say here that I have another file server which is a simple NFS server with another Nextcloud installation and there I never saw any 0 bytes .shareKey files being created. 3) It seems to be quite random and I am not the person who uses the Nextcloud software so I can't say what it was doing at that specific time but I guess uploading files or moving files around. Basically I use GlusterFS to store the files/data of the Nextcloud web application where I have it mounted using a fuse mount (mount -t glusterfs). Regarding the logs I have attached the mount log file from the client and below are the relevant log entries from the brick log file of all 3 nodes. Let me know if you need any other log files. Also if you know any "log file sanitizer tool" which can replace sensitive file names with random file names in log files that would like to use it as right now I have to do that manually. NODE 1 brick log: [2018-05-15 06:54:20.176679] E [MSGID: 113015] [posix.c:1211:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE [No such file or directory] NODE 2 brick log: [2018-05-15 06:54:20.176415] E [MSGID: 113015] [posix.c:1211:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE [No such file or directory] NODE 3 (arbiter) brick log: [2018-05-15 06:54:19.898981] W [MSGID: 113103] [posix.c:285:posix_lookup] 0-myvol-private-posix: Found stale gfid handle /srv/glusterfs/myvol-private/brick/.glusterfs/f0/65/f065a5e7-ac06-445f-add0-83acf8ce4155, removing it. [Stale file handle] [2018-05-15 06:54:20.056196] W [MSGID: 113103] [posix.c:285:posix_lookup] 0-myvol-private-posix: Found stale gfid handle /srv/glusterfs/myvol-private/brick/.glusterfs/8f/a1/8fa15dbd-cd5c-4900-b889-0fe7fce46a13, removing it. [Stale file handle] [2018-05-15 06:54:20.172823] I [MSGID: 115056] [server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 14740125: RMDIR /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE (f065a5e7-ac06-445f-add0-83acf8ce4155/OC_DEFAULT_MODULE), client: nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0, error-xlator: myvol-private-posix [Directory not empty] [2018-05-15 06:54:20.190911] I [MSGID: 115056] [server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 14740141: RMDIR /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir (72a1613e-2ac0-48bd-8ace-f2f723f3796c/2016.03.15 AVB_Photovoltaik-Versicherung 2013.pdf), client: nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0, error-xlator: myvol-private-posix [Directory not empty] Best regards, Mabi ‐‐‐ Original Message ‐‐‐ On May 17, 2018 7:00 AM, Ravishankar N wrote: > > > Hi mabi, > > Some questions: > > -Did you by any chance change the cluster.quorum-type option from the > > default values? > > -Is filename.shareKey supposed to be any empty file? Looks like the file > > was fallocated with the keep-size option but never written to. (On the 2 > > data bricks, stat output shows Size =0, but non zero Blocks and yet a > > 'regular empty file'). > > -Do you have some sort of a reproducer/ steps that you perform when the > > issue occurs? Please also share the logs from all 3 nodes and the client(s). > > Thanks, > > Ravi > > On 05/15/2018 05:26 PM, mabi wrote: > > > Thank you Ravi for your fast answer. As requested you will find below the > > "stat" and "getfattr" of one of the files and its parent directory from all > > three nodes of my cluster. > > > > NODE 1: > > > > File: > > ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’ > > > > Size: 0 Blocks: 38 IO Block: 131072 regular empty file > > > > Device: 23h/35d Inode: 744413 Links: 2 > > > > Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN) > > > > Access: 2018-05-15 08:54:20.296048887 +0200 > > > > Modify: 2018-05-15 08:54:20.296048887 +0200 > > > > Change: 2018-05-15 08:54:20.340048505 +0200 > > > > Birth: - > > > > File: > > ‘/data/myvolume-pri
Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3
Hi mabi, Some questions: -Did you by any chance change the cluster.quorum-type option from the default values? -Is filename.shareKey supposed to be any empty file? Looks like the file was fallocated with the keep-size option but never written to. (On the 2 data bricks, stat output shows Size =0, but non zero Blocks and yet a 'regular empty file'). -Do you have some sort of a reproducer/ steps that you perform when the issue occurs? Please also share the logs from all 3 nodes and the client(s). Thanks, Ravi On 05/15/2018 05:26 PM, mabi wrote: Thank you Ravi for your fast answer. As requested you will find below the "stat" and "getfattr" of one of the files and its parent directory from all three nodes of my cluster. NODE 1: File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’ Size: 0 Blocks: 38 IO Block: 131072 regular empty file Device: 23h/35d Inode: 744413 Links: 2 Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN) Access: 2018-05-15 08:54:20.296048887 +0200 Modify: 2018-05-15 08:54:20.296048887 +0200 Change: 2018-05-15 08:54:20.340048505 +0200 Birth: - File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/’ Size: 8 Blocks: 74 IO Block: 131072 directory Device: 23h/35d Inode: 744410 Links: 2 Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN) Access: 2018-04-25 09:41:24.276780766 +0200 Modify: 2018-05-15 08:54:20.392048056 +0200 Change: 2018-05-15 08:54:20.392048056 +0200 Birth: - # file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63 trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579 trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x0001 trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x0001 # file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/ trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13 trusted.glusterfs.dht=0x0001 trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x06060001 trusted.glusterfs.quota.size.1=0x06060001 NODE 2: File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’ Size: 0 Blocks: 38 IO Block: 131072 regular empty file Device: 24h/36d Inode: 5428150 Links: 2 Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN) Access: 2018-05-15 08:54:20.294280254 +0200 Modify: 2018-05-15 08:54:20.294280254 +0200 Change: 2018-05-15 08:54:20.338279576 +0200 Birth: - File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/’ Size: 8 Blocks: 74 IO Block: 131072 directory Device: 24h/36d Inode: 5428147 Links: 2 Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN) Access: 2018-04-25 09:41:24.276780766 +0200 Modify: 2018-05-15 08:54:20.394278717 +0200 Change: 2018-05-15 08:54:20.394278717 +0200 Birth: - # file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63 trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579 trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x0001 trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x0001 # file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/ trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13 trusted.glusterfs.dht=0x0001 trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x06060001 trusted.glusterfs.quota.size.1=0x06060001 NODE 3 (arbiter): File: /srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey Size: 0 Blocks: 8 IO Block: 4096 regular empty file Device: ca11h/51729dInode: 271434295 Links: 2 Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN) Access: 2018-04-25 09:41:24.322527555 +0200 Modify: 2018-04-25 09:41:24.322527555 +0200 Change: 2018-05-15 08:54:20.343667380 +0200 Birth: - File: /srv
Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3
Thank you Ravi for your fast answer. As requested you will find below the "stat" and "getfattr" of one of the files and its parent directory from all three nodes of my cluster. NODE 1: File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’ Size: 0 Blocks: 38 IO Block: 131072 regular empty file Device: 23h/35d Inode: 744413 Links: 2 Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN) Access: 2018-05-15 08:54:20.296048887 +0200 Modify: 2018-05-15 08:54:20.296048887 +0200 Change: 2018-05-15 08:54:20.340048505 +0200 Birth: - File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/’ Size: 8 Blocks: 74 IO Block: 131072 directory Device: 23h/35d Inode: 744410 Links: 2 Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN) Access: 2018-04-25 09:41:24.276780766 +0200 Modify: 2018-05-15 08:54:20.392048056 +0200 Change: 2018-05-15 08:54:20.392048056 +0200 Birth: - # file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63 trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579 trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x0001 trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x0001 # file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/ trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13 trusted.glusterfs.dht=0x0001 trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x06060001 trusted.glusterfs.quota.size.1=0x06060001 NODE 2: File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’ Size: 0 Blocks: 38 IO Block: 131072 regular empty file Device: 24h/36d Inode: 5428150 Links: 2 Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN) Access: 2018-05-15 08:54:20.294280254 +0200 Modify: 2018-05-15 08:54:20.294280254 +0200 Change: 2018-05-15 08:54:20.338279576 +0200 Birth: - File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/’ Size: 8 Blocks: 74 IO Block: 131072 directory Device: 24h/36d Inode: 5428147 Links: 2 Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN) Access: 2018-04-25 09:41:24.276780766 +0200 Modify: 2018-05-15 08:54:20.394278717 +0200 Change: 2018-05-15 08:54:20.394278717 +0200 Birth: - # file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63 trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579 trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x0001 trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x0001 # file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/ trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13 trusted.glusterfs.dht=0x0001 trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x06060001 trusted.glusterfs.quota.size.1=0x06060001 NODE 3 (arbiter): File: /srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey Size: 0 Blocks: 8 IO Block: 4096 regular empty file Device: ca11h/51729dInode: 271434295 Links: 2 Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN) Access: 2018-04-25 09:41:24.322527555 +0200 Modify: 2018-04-25 09:41:24.322527555 +0200 Change: 2018-05-15 08:54:20.343667380 +0200 Birth: - File: /srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/ Size: 8192Blocks: 24 IO Block: 4096 directory Device: ca11h/51729dInode: 271434288 Links: 2 Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN) Access: 2018-04-25 09:41:24.276780766 +0200 Modify: 2018-05-15 08:54:20.391667997 +0200 Change: 2018-05-15 08:54:20.395668048 +0200 Birth: - # file: srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.sh
Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3
On 05/15/2018 12:38 PM, mabi wrote: Dear all, I have upgraded my replica 3 GlusterFS cluster (and clients) last Friday from 3.12.7 to 3.12.9 in order to fix this bug but unfortunately I notice that I still have exactly the same problem as initially posted in this thread. It looks like this bug is not resolved as I just got right now 3 unsynched files on my arbiter node like I used to do before upgrading. This problem started since I upgraded to 3.12.7... Could you provide the stat and 'getfattr -d -m . - hex brick/path/to/file' outputs of one of these files and also the corresponding parent directory from all 3 bricks? Thanks, Ravi Thank you very much in advance for your advise. Best regards, Mabi ‐‐‐ Original Message ‐‐‐ On April 9, 2018 2:31 PM, Ravishankar N wrote: On 04/09/2018 05:54 PM, Dmitry Melekhov wrote: 09.04.2018 16:18, Ravishankar N пишет: On 04/09/2018 05:40 PM, mabi wrote: Again thanks that worked and I have now no more unsynched files. You mentioned that this bug has been fixed in 3.13, would it be possible to backport it to 3.12? I am asking because 3.13 is not a long-term release and as such I would not like to have to upgrade to 3.13. I don't think there will be another 3.12 release. Why not? It is LTS, right? My bad. Just checked the schedule [1], and you are right. It is LTM. [1] https://www.gluster.org/release-schedule/ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3
Dear all, I have upgraded my replica 3 GlusterFS cluster (and clients) last Friday from 3.12.7 to 3.12.9 in order to fix this bug but unfortunately I notice that I still have exactly the same problem as initially posted in this thread. It looks like this bug is not resolved as I just got right now 3 unsynched files on my arbiter node like I used to do before upgrading. This problem started since I upgraded to 3.12.7... Thank you very much in advance for your advise. Best regards, Mabi ‐‐‐ Original Message ‐‐‐ On April 9, 2018 2:31 PM, Ravishankar N wrote: > > > On 04/09/2018 05:54 PM, Dmitry Melekhov wrote: > > > 09.04.2018 16:18, Ravishankar N пишет: > > > > > On 04/09/2018 05:40 PM, mabi wrote: > > > > > > > Again thanks that worked and I have now no more unsynched files. > > > > > > > > You mentioned that this bug has been fixed in 3.13, would it be > > > > > > > > possible to backport it to 3.12? I am asking because 3.13 is not a > > > > > > > > long-term release and as such I would not like to have to upgrade to > > > > > > > > 3.13. > > > > > > I don't think there will be another 3.12 release. > > > > Why not? It is LTS, right? > > My bad. Just checked the schedule [1], and you are right. It is LTM. > > [1] https://www.gluster.org/release-schedule/ > > > Gluster-users mailing list > > > > Gluster-users@gluster.org > > > > http://lists.gluster.org/mailman/listinfo/gluster-users > > Gluster-users mailing list > > Gluster-users@gluster.org > > http://lists.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3
Thanks for clarifying that point. Does this mean that the fix for this bug will get backported to the next 3.12 release? ‐‐‐ Original Message ‐‐‐ On April 9, 2018 2:31 PM, Ravishankar N wrote: > > > On 04/09/2018 05:54 PM, Dmitry Melekhov wrote: > > > 09.04.2018 16:18, Ravishankar N пишет: > > > > > On 04/09/2018 05:40 PM, mabi wrote: > > > > > > > Again thanks that worked and I have now no more unsynched files. > > > > > > > > You mentioned that this bug has been fixed in 3.13, would it be > > > > > > > > possible to backport it to 3.12? I am asking because 3.13 is not a > > > > > > > > long-term release and as such I would not like to have to upgrade to > > > > > > > > 3.13. > > > > > > I don't think there will be another 3.12 release. > > > > Why not? It is LTS, right? > > My bad. Just checked the schedule [1], and you are right. It is LTM. > > [1] https://www.gluster.org/release-schedule/ > > > Gluster-users mailing list > > > > Gluster-users@gluster.org > > > > http://lists.gluster.org/mailman/listinfo/gluster-users > > Gluster-users mailing list > > Gluster-users@gluster.org > > http://lists.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3
On 04/09/2018 05:54 PM, Dmitry Melekhov wrote: 09.04.2018 16:18, Ravishankar N пишет: On 04/09/2018 05:40 PM, mabi wrote: Again thanks that worked and I have now no more unsynched files. You mentioned that this bug has been fixed in 3.13, would it be possible to backport it to 3.12? I am asking because 3.13 is not a long-term release and as such I would not like to have to upgrade to 3.13. I don't think there will be another 3.12 release. Why not? It is LTS, right? My bad. Just checked the schedule [1], and you are right. It is LTM. [1] https://www.gluster.org/release-schedule/ ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3
09.04.2018 16:18, Ravishankar N пишет: On 04/09/2018 05:40 PM, mabi wrote: Again thanks that worked and I have now no more unsynched files. You mentioned that this bug has been fixed in 3.13, would it be possible to backport it to 3.12? I am asking because 3.13 is not a long-term release and as such I would not like to have to upgrade to 3.13. I don't think there will be another 3.12 release. Why not? It is LTS, right? ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3
On 04/09/2018 05:40 PM, mabi wrote: Again thanks that worked and I have now no more unsynched files. You mentioned that this bug has been fixed in 3.13, would it be possible to backport it to 3.12? I am asking because 3.13 is not a long-term release and as such I would not like to have to upgrade to 3.13. I don't think there will be another 3.12 release. Adding Karthik to see if this is something he can back-port to 3.10.12, which is likely to be out by the end of the month. -Ravi ‐‐‐ Original Message ‐‐‐ On April 9, 2018 1:46 PM, Ravishankar N wrote: On 04/09/2018 05:09 PM, mabi wrote: Thanks Ravi for your answer. Stupid question but how do I delete the trusted.afr xattrs on this brick? And when you say "this brick", do you mean the brick on the arbitrer node (node 3 in my case)? Sorry I should have been clearer. Yes the brick on the 3rd node. `setfattr -x trusted.afr.myvol-private-client-0 /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile` `setfattr -x trusted.afr.myvol-private-client-1 /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile` After doing this for all files, run 'gluster volume heal `. HTH, Ravi ‐‐‐ Original Message ‐‐‐ On April 9, 2018 1:24 PM, Ravishankar N ravishan...@redhat.com wrote: On 04/09/2018 04:36 PM, mabi wrote: As I was suggested in the past by this mailing list a now ran a stat and getfattr on one of the problematic files on all nodes and at the end a stat on the fuse mount directly. The output is below: NODE1: STAT: File: ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’ Size: 0 Blocks: 38 IO Block: 131072 regular empty file Device: 23h/35d Inode: 6822549 Links: 2 Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) Access: 2018-04-09 08:58:54.311556621 +0200 Modify: 2018-04-09 08:58:54.311556621 +0200 Change: 2018-04-09 08:58:54.423555611 +0200 Birth: - GETFATTR: trusted.gfid=0smMGdfAozTLS8v1d4jMb42w== trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile" trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ== trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ== NODE2: STAT: File: ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’ Size: 0 Blocks: 38 IO Block: 131072 regular empty file Device: 24h/36d Inode: 6825876 Links: 2 Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) Access: 2018-04-09 08:58:54.311775605 +0200 Modify: 2018-04-09 08:58:54.311775605 +0200 Change: 2018-04-09 08:58:54.423774007 +0200 Birth: - GETFATTR: trusted.gfid=0smMGdfAozTLS8v1d4jMb42w== trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile" trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ== trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ== NODE3: STAT: File: /srv/glusterfs/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile Size: 0 Blocks: 8 IO Block: 4096 regular empty file Device: ca11h/51729d Inode: 404058268 Links: 2 Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) Access: 2018-04-05 16:16:55.292341447 +0200 Modify: 2018-04-05 16:16:55.292341447 +0200 Change: 2018-04-09 08:58:54.428074177 +0200 Birth: - GETFATTR: trusted.afr.dirty=0s trusted.afr.myvol-private-client-0=0sAQAA trusted.afr.myvol-private-client-1=0sAQAA Looks like you hit the bug of arbiter becoming source (BZ 1482064) fixed by Karthik in 3.13. Just delete the trusted.afr xattrs on this brick and launch heal, that should fix it. But the file seems to have no content on both data bricks as well, so you might want to check if that was expected. -Ravi trusted.bit-rot.version=0sAgBavUW2AAGBaA== trusted.gfid=0smMGdfAozTLS8v1d4jMb42w== trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ== trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ== CLIENT GLUSTER MOUNT: STAT: File: '/mnt/myvol-private/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile' Size: 0 Blocks: 0 IO Block: 131072 regular empty file Device: 1eh/30d Inode: 13600685574951729371 Links: 1 Access: (0644/-rw-r--r--) Uid: (20909/nch20909) Gid: (20909/nch20909) Access: 2018-04-09 08:58:54.311556621 +0200 Modify: 2018-04-09 08:58:54.311556621 +0200 Change: 2018-04-09 08:58:54.423555611 +0200 Birt
Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3
Again thanks that worked and I have now no more unsynched files. You mentioned that this bug has been fixed in 3.13, would it be possible to backport it to 3.12? I am asking because 3.13 is not a long-term release and as such I would not like to have to upgrade to 3.13. ‐‐‐ Original Message ‐‐‐ On April 9, 2018 1:46 PM, Ravishankar N wrote: > > > On 04/09/2018 05:09 PM, mabi wrote: > > > Thanks Ravi for your answer. > > > > Stupid question but how do I delete the trusted.afr xattrs on this brick? > > > > And when you say "this brick", do you mean the brick on the arbitrer node > > (node 3 in my case)? > > Sorry I should have been clearer. Yes the brick on the 3rd node. > > `setfattr -x trusted.afr.myvol-private-client-0 > /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile` > > `setfattr -x trusted.afr.myvol-private-client-1 > /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile` > > After doing this for all files, run 'gluster volume heal `. > > HTH, > > Ravi > > > ‐‐‐ Original Message ‐‐‐ > > > > On April 9, 2018 1:24 PM, Ravishankar N ravishan...@redhat.com wrote: > > > > > On 04/09/2018 04:36 PM, mabi wrote: > > > > > > > As I was suggested in the past by this mailing list a now ran a stat > > > > and getfattr on one of the problematic files on all nodes and at the > > > > end a stat on the fuse mount directly. The output is below: > > > > > > > > NODE1: > > > > > > > > STAT: > > > > > > > > File: > > > > ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’ > > > > > > > > Size: 0 Blocks: 38 IO Block: 131072 regular empty file > > > > > > > > Device: 23h/35d Inode: 6822549 Links: 2 > > > > > > > > Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) > > > > > > > > Access: 2018-04-09 08:58:54.311556621 +0200 > > > > > > > > Modify: 2018-04-09 08:58:54.311556621 +0200 > > > > > > > > Change: 2018-04-09 08:58:54.423555611 +0200 > > > > > > > > Birth: - > > > > > > > > GETFATTR: > > > > > > > > trusted.gfid=0smMGdfAozTLS8v1d4jMb42w== > > > > > > > > trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile" > > > > > > > > trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ== > > > > > > > > trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ== > > > > > > > > NODE2: > > > > > > > > STAT: > > > > > > > > File: > > > > ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’ > > > > > > > > Size: 0 Blocks: 38 IO Block: 131072 regular empty file > > > > > > > > Device: 24h/36d Inode: 6825876 Links: 2 > > > > > > > > Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) > > > > > > > > Access: 2018-04-09 08:58:54.311775605 +0200 > > > > > > > > Modify: 2018-04-09 08:58:54.311775605 +0200 > > > > > > > > Change: 2018-04-09 08:58:54.423774007 +0200 > > > > > > > > Birth: - > > > > > > > > GETFATTR: > > > > > > > > trusted.gfid=0smMGdfAozTLS8v1d4jMb42w== > > > > > > > > trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile" > > > > > > > > trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ== > > > > > > > > trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ== > > > > > > > > NODE3: > > > > > > > > STAT: > > > > > > > > File: > > > > /srv/glusterfs/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile > > > > > > > > Size: 0 Blocks: 8 IO Block: 4096 regular empty file > > > > > > > > Device: ca11h/51729d Inode: 404058268 Links: 2 > > > > > > > > Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) > > > > > > > > Access: 2018-04-05 16:16:55.292341447 +0200 > > > > > > > > Modify: 2018-04-05 16:16:55.292341447 +0200 > > > > > > > > Change: 2018-04-09 08:58:54.428074177 +0200 > > > > > > > > Birth: - > > > > > > > > GETFATTR: > > > > > > > > trusted.afr.dirty=0s > > > > > > > > trusted.afr.myvol-private-client-0=0sAQAA > > > > > > > > trusted.afr.myvol-private-client-1=0sAQAA > > > > > > > > Looks like you hit the bug of arbiter becoming source (BZ 1482064) fixed > > > > > > by Karthik in 3.13. Just delete the trusted.afr xattrs on this brick and > > > > > > launch heal, that should fix it. But the file seems to have no content > > > > > > on both data bricks as well, so you might want to check if that was > > > > > > expected. > > > > > > -Ravi > > > > > > > trusted.bit-rot.version=0sAgBavUW2AAGBaA== >
Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3
On 04/09/2018 05:09 PM, mabi wrote: Thanks Ravi for your answer. Stupid question but how do I delete the trusted.afr xattrs on this brick? And when you say "this brick", do you mean the brick on the arbitrer node (node 3 in my case)? Sorry I should have been clearer. Yes the brick on the 3rd node. `setfattr -x trusted.afr.myvol-private-client-0 /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile` `setfattr -x trusted.afr.myvol-private-client-1 /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile` After doing this for all files, run 'gluster volume heal `. HTH, Ravi ‐‐‐ Original Message ‐‐‐ On April 9, 2018 1:24 PM, Ravishankar N wrote: On 04/09/2018 04:36 PM, mabi wrote: As I was suggested in the past by this mailing list a now ran a stat and getfattr on one of the problematic files on all nodes and at the end a stat on the fuse mount directly. The output is below: NODE1: STAT: File: ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’ Size: 0 Blocks: 38 IO Block: 131072 regular empty file Device: 23h/35d Inode: 6822549 Links: 2 Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) Access: 2018-04-09 08:58:54.311556621 +0200 Modify: 2018-04-09 08:58:54.311556621 +0200 Change: 2018-04-09 08:58:54.423555611 +0200 Birth: - GETFATTR: trusted.gfid=0smMGdfAozTLS8v1d4jMb42w== trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile" trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ== trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ== NODE2: STAT: File: ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’ Size: 0 Blocks: 38 IO Block: 131072 regular empty file Device: 24h/36d Inode: 6825876 Links: 2 Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) Access: 2018-04-09 08:58:54.311775605 +0200 Modify: 2018-04-09 08:58:54.311775605 +0200 Change: 2018-04-09 08:58:54.423774007 +0200 Birth: - GETFATTR: trusted.gfid=0smMGdfAozTLS8v1d4jMb42w== trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile" trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ== trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ== NODE3: STAT: File: /srv/glusterfs/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile Size: 0 Blocks: 8 IO Block: 4096 regular empty file Device: ca11h/51729d Inode: 404058268 Links: 2 Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) Access: 2018-04-05 16:16:55.292341447 +0200 Modify: 2018-04-05 16:16:55.292341447 +0200 Change: 2018-04-09 08:58:54.428074177 +0200 Birth: - GETFATTR: trusted.afr.dirty=0s trusted.afr.myvol-private-client-0=0sAQAA trusted.afr.myvol-private-client-1=0sAQAA Looks like you hit the bug of arbiter becoming source (BZ 1482064) fixed by Karthik in 3.13. Just delete the trusted.afr xattrs on this brick and launch heal, that should fix it. But the file seems to have no content on both data bricks as well, so you might want to check if that was expected. -Ravi trusted.bit-rot.version=0sAgBavUW2AAGBaA== trusted.gfid=0smMGdfAozTLS8v1d4jMb42w== trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ== trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ== CLIENT GLUSTER MOUNT: STAT: File: '/mnt/myvol-private/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile' Size: 0 Blocks: 0 IO Block: 131072 regular empty file Device: 1eh/30d Inode: 13600685574951729371 Links: 1 Access: (0644/-rw-r--r--) Uid: (20909/nch20909) Gid: (20909/nch20909) Access: 2018-04-09 08:58:54.311556621 +0200 Modify: 2018-04-09 08:58:54.311556621 +0200 Change: 2018-04-09 08:58:54.423555611 +0200 Birth: - ‐‐‐ Original Message ‐‐‐ On April 9, 2018 9:49 AM, mabi m...@protonmail.ch wrote: Here would be also the corresponding log entries on a gluster node brick log file: [2018-04-09 06:58:47.363536] W [MSGID: 113093] [posix-gfid-path.c:84:posix_remove_gfid2path_xattr] 0-myvol-private-posix: removing gfid2path xattr failed on /data/myvol-private/brick/.glusterfs/12/67/126759f6-8364-453c-9a9c-d9ed39198b7a: key = trusted.gfid2path.2529bb66b56be110 [No data available] [2018-04-09 06:58:54.178133] E [MSGID: 113015] [posix.c:1208:posix_opendir] 0-myvol-private-posix: opendir failed
Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3
Thanks Ravi for your answer. Stupid question but how do I delete the trusted.afr xattrs on this brick? And when you say "this brick", do you mean the brick on the arbitrer node (node 3 in my case)? ‐‐‐ Original Message ‐‐‐ On April 9, 2018 1:24 PM, Ravishankar N wrote: > > > On 04/09/2018 04:36 PM, mabi wrote: > > > As I was suggested in the past by this mailing list a now ran a stat and > > getfattr on one of the problematic files on all nodes and at the end a stat > > on the fuse mount directly. The output is below: > > > > NODE1: > > > > STAT: > > > > File: > > ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’ > > > > Size: 0 Blocks: 38 IO Block: 131072 regular empty file > > > > Device: 23h/35d Inode: 6822549 Links: 2 > > > > Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) > > > > Access: 2018-04-09 08:58:54.311556621 +0200 > > > > Modify: 2018-04-09 08:58:54.311556621 +0200 > > > > Change: 2018-04-09 08:58:54.423555611 +0200 > > > > Birth: - > > > > GETFATTR: > > > > trusted.gfid=0smMGdfAozTLS8v1d4jMb42w== > > > > trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile" > > > > trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ== > > > > trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ== > > > > NODE2: > > > > STAT: > > > > File: > > ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’ > > > > Size: 0 Blocks: 38 IO Block: 131072 regular empty file > > > > Device: 24h/36d Inode: 6825876 Links: 2 > > > > Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) > > > > Access: 2018-04-09 08:58:54.311775605 +0200 > > > > Modify: 2018-04-09 08:58:54.311775605 +0200 > > > > Change: 2018-04-09 08:58:54.423774007 +0200 > > > > Birth: - > > > > GETFATTR: > > > > trusted.gfid=0smMGdfAozTLS8v1d4jMb42w== > > > > trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile" > > > > trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ== > > > > trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ== > > > > NODE3: > > > > STAT: > > > > File: > > /srv/glusterfs/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile > > > > Size: 0 Blocks: 8 IO Block: 4096 regular empty file > > > > Device: ca11h/51729d Inode: 404058268 Links: 2 > > > > Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) > > > > Access: 2018-04-05 16:16:55.292341447 +0200 > > > > Modify: 2018-04-05 16:16:55.292341447 +0200 > > > > Change: 2018-04-09 08:58:54.428074177 +0200 > > > > Birth: - > > > > GETFATTR: > > > > trusted.afr.dirty=0s > > > > trusted.afr.myvol-private-client-0=0sAQAA > > > > trusted.afr.myvol-private-client-1=0sAQAA > > Looks like you hit the bug of arbiter becoming source (BZ 1482064) fixed > > by Karthik in 3.13. Just delete the trusted.afr xattrs on this brick and > > launch heal, that should fix it. But the file seems to have no content > > on both data bricks as well, so you might want to check if that was > > expected. > > -Ravi > > > trusted.bit-rot.version=0sAgBavUW2AAGBaA== > > > > trusted.gfid=0smMGdfAozTLS8v1d4jMb42w== > > > > trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ== > > > > trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ== > > > > CLIENT GLUSTER MOUNT: > > > > STAT: > > > > File: > > '/mnt/myvol-private/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile' > > > > Size: 0 Blocks: 0 IO Block: 131072 regular empty file > > > > Device: 1eh/30d Inode: 13600685574951729371 Links: 1 > > > > Access: (0644/-rw-r--r--) Uid: (20909/nch20909) Gid: (20909/nch20909) > > > > Access: 2018-04-09 08:58:54.311556621 +0200 > > > > Modify: 2018-04-09 08:58:54.311556621 +0200 > > > > Change: 2018-04-09 08:58:54.423555611 +0200 > > > > Birth: - > > > > ‐‐‐ Original Message ‐‐‐ > > > > On April 9, 2018 9:49 AM, mabi m...@protonmail.ch wrote: > > > > > Here would be also the corresponding log entries on a gluster node brick > > > log file: > > > > > > [2018-04-09 06:58:47.363536] W [MSGID: 113093] > > > [posix-gfid-path.c:84:posix_remove_gfid2path_xattr] > > > 0-myvol-private-posix: removing gfid2path xattr failed on > > > /data/myvol-private/brick/.glusterfs/12/67/126759f6-8364-453c-9a9c-d9ed39198b7a: > > > key = trusted.gfid2path.2529bb66b56be110 [No data available] > > > > > > [2018-04-09 06:58:54.178133] E [MSGID: 113015] > > > [posix.c:1208:posix_opendir] 0-myvol-
Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3
On 04/09/2018 04:36 PM, mabi wrote: As I was suggested in the past by this mailing list a now ran a stat and getfattr on one of the problematic files on all nodes and at the end a stat on the fuse mount directly. The output is below: NODE1: STAT: File: ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’ Size: 0 Blocks: 38 IO Block: 131072 regular empty file Device: 23h/35d Inode: 6822549 Links: 2 Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) Access: 2018-04-09 08:58:54.311556621 +0200 Modify: 2018-04-09 08:58:54.311556621 +0200 Change: 2018-04-09 08:58:54.423555611 +0200 Birth: - GETFATTR: trusted.gfid=0smMGdfAozTLS8v1d4jMb42w== trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile" trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ== trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ== NODE2: STAT: File: ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’ Size: 0 Blocks: 38 IO Block: 131072 regular empty file Device: 24h/36d Inode: 6825876 Links: 2 Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) Access: 2018-04-09 08:58:54.311775605 +0200 Modify: 2018-04-09 08:58:54.311775605 +0200 Change: 2018-04-09 08:58:54.423774007 +0200 Birth: - GETFATTR: trusted.gfid=0smMGdfAozTLS8v1d4jMb42w== trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile" trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ== trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ== NODE3: STAT: File: /srv/glusterfs/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile Size: 0 Blocks: 8 IO Block: 4096 regular empty file Device: ca11h/51729dInode: 404058268 Links: 2 Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) Access: 2018-04-05 16:16:55.292341447 +0200 Modify: 2018-04-05 16:16:55.292341447 +0200 Change: 2018-04-09 08:58:54.428074177 +0200 Birth: - GETFATTR: trusted.afr.dirty=0s trusted.afr.myvol-private-client-0=0sAQAA trusted.afr.myvol-private-client-1=0sAQAA Looks like you hit the bug of arbiter becoming source (BZ 1482064) fixed by Karthik in 3.13. Just delete the trusted.afr xattrs on this brick and launch heal, that should fix it. But the file seems to have no content on both data bricks as well, so you might want to check if that was expected. -Ravi trusted.bit-rot.version=0sAgBavUW2AAGBaA== trusted.gfid=0smMGdfAozTLS8v1d4jMb42w== trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ== trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ== CLIENT GLUSTER MOUNT: STAT: File: '/mnt/myvol-private/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile' Size: 0 Blocks: 0 IO Block: 131072 regular empty file Device: 1eh/30d Inode: 13600685574951729371 Links: 1 Access: (0644/-rw-r--r--) Uid: (20909/nch20909) Gid: (20909/nch20909) Access: 2018-04-09 08:58:54.311556621 +0200 Modify: 2018-04-09 08:58:54.311556621 +0200 Change: 2018-04-09 08:58:54.423555611 +0200 Birth: - ‐‐‐ Original Message ‐‐‐ On April 9, 2018 9:49 AM, mabi wrote: Here would be also the corresponding log entries on a gluster node brick log file: [2018-04-09 06:58:47.363536] W [MSGID: 113093] [posix-gfid-path.c:84:posix_remove_gfid2path_xattr] 0-myvol-private-posix: removing gfid2path xattr failed on /data/myvol-private/brick/.glusterfs/12/67/126759f6-8364-453c-9a9c-d9ed39198b7a: key = trusted.gfid2path.2529bb66b56be110 [No data available] [2018-04-09 06:58:54.178133] E [MSGID: 113015] [posix.c:1208:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip/OC_DEFAULT_MODULE [No such file or directory] Hope that helps to find out the issue. ‐‐‐ Original Message ‐‐‐ On April 9, 2018 9:37 AM, mabi m...@protonmail.ch wrote: Hello, Last Friday I upgraded my GlusterFS 3.10.7 3-way replica (with arbitrer) cluster to 3.12.7 and this morning I got a warning that 9 files on one of my volumes are not synced. Ineeded checking that volume with a "volume heal info" shows that the third node (the arbitrer node) has 9 files to be healed but are not being healed automatically. All nodes were always online and there was no network interruption so I am wondering if this might not really be a split-brain is
Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3
As I was suggested in the past by this mailing list a now ran a stat and getfattr on one of the problematic files on all nodes and at the end a stat on the fuse mount directly. The output is below: NODE1: STAT: File: ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’ Size: 0 Blocks: 38 IO Block: 131072 regular empty file Device: 23h/35d Inode: 6822549 Links: 2 Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) Access: 2018-04-09 08:58:54.311556621 +0200 Modify: 2018-04-09 08:58:54.311556621 +0200 Change: 2018-04-09 08:58:54.423555611 +0200 Birth: - GETFATTR: trusted.gfid=0smMGdfAozTLS8v1d4jMb42w== trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile" trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ== trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ== NODE2: STAT: File: ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’ Size: 0 Blocks: 38 IO Block: 131072 regular empty file Device: 24h/36d Inode: 6825876 Links: 2 Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) Access: 2018-04-09 08:58:54.311775605 +0200 Modify: 2018-04-09 08:58:54.311775605 +0200 Change: 2018-04-09 08:58:54.423774007 +0200 Birth: - GETFATTR: trusted.gfid=0smMGdfAozTLS8v1d4jMb42w== trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile" trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ== trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ== NODE3: STAT: File: /srv/glusterfs/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile Size: 0 Blocks: 8 IO Block: 4096 regular empty file Device: ca11h/51729dInode: 404058268 Links: 2 Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) Access: 2018-04-05 16:16:55.292341447 +0200 Modify: 2018-04-05 16:16:55.292341447 +0200 Change: 2018-04-09 08:58:54.428074177 +0200 Birth: - GETFATTR: trusted.afr.dirty=0s trusted.afr.myvol-private-client-0=0sAQAA trusted.afr.myvol-private-client-1=0sAQAA trusted.bit-rot.version=0sAgBavUW2AAGBaA== trusted.gfid=0smMGdfAozTLS8v1d4jMb42w== trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ== trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ== CLIENT GLUSTER MOUNT: STAT: File: '/mnt/myvol-private/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile' Size: 0 Blocks: 0 IO Block: 131072 regular empty file Device: 1eh/30d Inode: 13600685574951729371 Links: 1 Access: (0644/-rw-r--r--) Uid: (20909/nch20909) Gid: (20909/nch20909) Access: 2018-04-09 08:58:54.311556621 +0200 Modify: 2018-04-09 08:58:54.311556621 +0200 Change: 2018-04-09 08:58:54.423555611 +0200 Birth: - ‐‐‐ Original Message ‐‐‐ On April 9, 2018 9:49 AM, mabi wrote: > > > Here would be also the corresponding log entries on a gluster node brick log > file: > > [2018-04-09 06:58:47.363536] W [MSGID: 113093] > [posix-gfid-path.c:84:posix_remove_gfid2path_xattr] 0-myvol-private-posix: > removing gfid2path xattr failed on > /data/myvol-private/brick/.glusterfs/12/67/126759f6-8364-453c-9a9c-d9ed39198b7a: > key = trusted.gfid2path.2529bb66b56be110 [No data available] > > [2018-04-09 06:58:54.178133] E [MSGID: 113015] [posix.c:1208:posix_opendir] > 0-myvol-private-posix: opendir failed on > /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip/OC_DEFAULT_MODULE > [No such file or directory] > > Hope that helps to find out the issue. > > ‐‐‐ Original Message ‐‐‐ > > On April 9, 2018 9:37 AM, mabi m...@protonmail.ch wrote: > > > Hello, > > > > Last Friday I upgraded my GlusterFS 3.10.7 3-way replica (with arbitrer) > > cluster to 3.12.7 and this morning I got a warning that 9 files on one of > > my volumes are not synced. Ineeded checking that volume with a "volume heal > > info" shows that the third node (the arbitrer node) has 9 files to be > > healed but are not being healed automatically. > > > > All nodes were always online and there was no network interruption so I am > > wondering if this might not really be a split-brain issue but something > > else. > > > > I found some interesting log entries on the client log file > > (/var/log/glusterfs/myvol-private.log) which I have included below in this > > mail. It looks like some renaming has gone wrong because a directory is not > > empty. > >
Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3
Here would be also the corresponding log entries on a gluster node brick log file: [2018-04-09 06:58:47.363536] W [MSGID: 113093] [posix-gfid-path.c:84:posix_remove_gfid2path_xattr] 0-myvol-private-posix: removing gfid2path xattr failed on /data/myvol-private/brick/.glusterfs/12/67/126759f6-8364-453c-9a9c-d9ed39198b7a: key = trusted.gfid2path.2529bb66b56be110 [No data available] [2018-04-09 06:58:54.178133] E [MSGID: 113015] [posix.c:1208:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip/OC_DEFAULT_MODULE [No such file or directory] Hope that helps to find out the issue. ‐‐‐ Original Message ‐‐‐ On April 9, 2018 9:37 AM, mabi wrote: > > > Hello, > > Last Friday I upgraded my GlusterFS 3.10.7 3-way replica (with arbitrer) > cluster to 3.12.7 and this morning I got a warning that 9 files on one of my > volumes are not synced. Ineeded checking that volume with a "volume heal > info" shows that the third node (the arbitrer node) has 9 files to be healed > but are not being healed automatically. > > All nodes were always online and there was no network interruption so I am > wondering if this might not really be a split-brain issue but something else. > > I found some interesting log entries on the client log file > (/var/log/glusterfs/myvol-private.log) which I have included below in this > mail. It looks like some renaming has gone wrong because a directory is not > empty. > > For your information I have upgraded my GlusterFS in offline mode and the > upgrade went smoothly. > > What can I do to fix that issue? > > Best regards, > > Mabi > > [2018-04-09 06:58:46.906089] I [MSGID: 109066] [dht-rename.c:1741:dht_rename] > 0-myvol-private-dht: renaming > /dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/azipfile.zip > (hash=myvol-private-replicate-0/cache=myvol-private-replicate-0) => > /dir1/di2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip > (hash=myvol-private-replicate-0/cache=) > > [2018-04-09 06:58:53.692440] W [MSGID: 114031] > [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote > operation failed [Directory not empty] > > [2018-04-09 06:58:53.714129] W [MSGID: 114031] > [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-1: > remote operation failed. Path: gfid:13880e8c-13da-442f-8180-fa40b6f5327c > (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory] > > [2018-04-09 06:58:53.714161] W [MSGID: 114031] > [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-0: > remote operation failed. Path: gfid:13880e8c-13da-442f-8180-fa40b6f5327c > (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory] > > [2018-04-09 06:58:53.715638] W [MSGID: 114031] > [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote > operation failed [Directory not empty] > > [2018-04-09 06:58:53.750372] I [MSGID: 108026] > [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] > 0-myvol-private-replicate-0: performing metadata selfheal on > 1cc6facf-eca5-481c-a905-7a39faa25156 > > [2018-04-09 06:58:53.757677] I [MSGID: 108026] > [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: > Completed metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. > sources=[2] sinks=0 1 > > [2018-04-09 06:58:53.775939] I [MSGID: 108026] > [afr-self-heal-entry.c:887:afr_selfheal_entry_do] > 0-myvol-private-replicate-0: performing entry selfheal on > 1cc6facf-eca5-481c-a905-7a39faa25156 > > [2018-04-09 06:58:53.776237] I [MSGID: 108026] > [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] > 0-myvol-private-replicate-0: performing metadata selfheal on > 13880e8c-13da-442f-8180-fa40b6f5327c > > [2018-04-09 06:58:53.781762] I [MSGID: 108026] > [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: > Completed metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c. > sources=[2] sinks=0 1 > > [2018-04-09 06:58:53.796950] I [MSGID: 108026] > [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: > Completed entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2] > sinks=0 1 > > [2018-04-09 06:58:53.812682] I [MSGID: 108026] > [afr-self-heal-entry.c:887:afr_selfheal_entry_do] > 0-myvol-private-replicate-0: performing entry selfheal on > 13880e8c-13da-442f-8180-fa40b6f5327c > > [2018-04-09 06:58:53.879382] E [MSGID: 108008] > [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: > Failing READ on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain > observed. [Input/output error] > > [2018-04-09 06:58:53.881514] E [MSGID: 108008] > [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: > Failing FGETXATTR on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain > observed. [Input/outpu