Dear All, It appears i've a stale file in one of the volumes, on 2 files. These files are qemu images (1 raw and 1 qcow2). I'll just focus on 1 file since the situation on the other seems the same.
The VM get's paused more or less directly after being booted with error; [2018-12-18 14:05:05.275713] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-ovirt-backbone-2-shard: Lookup on shard 51500 failed. Base file gfid = f28cabcb-d169-41fc-a633-9bef4c4a8e40 [Stale file handle] investigating the shard; #on the arbiter node: [root@lease-05 ovirt-backbone-2]# getfattr -n glusterfs.gfid.string /mnt/ovirt-backbone-2/b1c2c949-aef4-4aec-999b-b179efeef732/images/f6ac9660-a84e-469e-a17c-c6dbc538af4b/d6b09501-5b79-4c92-bf10-2ed3bda1b425 getfattr: Removing leading '/' from absolute path names # file: mnt/ovirt-backbone-2/b1c2c949-aef4-4aec-999b-b179efeef732/images/f6ac9660-a84e-469e-a17c-c6dbc538af4b/d6b09501-5b79-4c92-bf10-2ed3bda1b425 glusterfs.gfid.string="f28cabcb-d169-41fc-a633-9bef4c4a8e40" [root@lease-05 ovirt-backbone-2]# getfattr -d -m . -e hex .shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500 # file: .shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500 security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0x1f86b4328ec6424699aa48cc6d7b5da0 trusted.gfid2path.b48064c78d7a85c9=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f66323863616263622d643136392d343166632d613633332d3962656634633461386534302e3531353030 [root@lease-05 ovirt-backbone-2]# getfattr -d -m . -e hex .glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0 # file: .glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0 security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0x1f86b4328ec6424699aa48cc6d7b5da0 trusted.gfid2path.b48064c78d7a85c9=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f66323863616263622d643136392d343166632d613633332d3962656634633461386534302e3531353030 [root@lease-05 ovirt-backbone-2]# stat .glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0 File: ‘.glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0’ Size: 0 Blocks: 0 IO Block: 4096 regular empty file Device: fd01h/64769d Inode: 537277306 Links: 2 Access: (0660/-rw-rw----) Uid: ( 0/ root) Gid: ( 0/ root) Context: system_u:object_r:etc_runtime_t:s0 Access: 2018-12-17 21:43:36.361984810 +0000 Modify: 2018-12-17 21:43:36.361984810 +0000 Change: 2018-12-18 20:55:29.908647417 +0000 Birth: - [root@lease-05 ovirt-backbone-2]# find . -inum 537277306 ./.glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0 ./.shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500 #on the data nodes: [root@lease-08 ~]# getfattr -n glusterfs.gfid.string /mnt/ovirt-backbone-2/b1c2c949-aef4-4aec-999b-b179efeef732/images/f6ac9660-a84e-469e-a17c-c6dbc538af4b/d6b09501-5b79-4c92-bf10-2ed3bda1b425 getfattr: Removing leading '/' from absolute path names # file: mnt/ovirt-backbone-2/b1c2c949-aef4-4aec-999b-b179efeef732/images/f6ac9660-a84e-469e-a17c-c6dbc538af4b/d6b09501-5b79-4c92-bf10-2ed3bda1b425 glusterfs.gfid.string="f28cabcb-d169-41fc-a633-9bef4c4a8e40" [root@lease-08 ovirt-backbone-2]# getfattr -d -m . -e hex .shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500 # file: .shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500 security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0x1f86b4328ec6424699aa48cc6d7b5da0 trusted.gfid2path.b48064c78d7a85c9=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f66323863616263622d643136392d343166632d613633332d3962656634633461386534302e3531353030 [root@lease-08 ovirt-backbone-2]# getfattr -d -m . -e hex .glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0 # file: .glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0 security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0x1f86b4328ec6424699aa48cc6d7b5da0 trusted.gfid2path.b48064c78d7a85c9=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f66323863616263622d643136392d343166632d613633332d3962656634633461386534302e3531353030 [root@lease-08 ovirt-backbone-2]# stat .glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0 File: ‘.glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0’ Size: 2166784 Blocks: 4128 IO Block: 4096 regular file Device: fd03h/64771d Inode: 12893624759 Links: 3 Access: (0660/-rw-rw----) Uid: ( 0/ root) Gid: ( 0/ root) Context: system_u:object_r:etc_runtime_t:s0 Access: 2018-12-18 18:52:38.070776585 +0000 Modify: 2018-12-17 21:43:36.388054443 +0000 Change: 2018-12-18 21:01:47.810506528 +0000 Birth: - [root@lease-08 ovirt-backbone-2]# find . -inum 12893624759 ./.glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0 ./.shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500 ======================== [root@lease-11 ovirt-backbone-2]# getfattr -n glusterfs.gfid.string /mnt/ovirt-backbone-2/b1c2c949-aef4-4aec-999b-b179efeef732/images/f6ac9660-a84e-469e-a17c-c6dbc538af4b/d6b09501-5b79-4c92-bf10-2ed3bda1b425 getfattr: Removing leading '/' from absolute path names # file: mnt/ovirt-backbone-2/b1c2c949-aef4-4aec-999b-b179efeef732/images/f6ac9660-a84e-469e-a17c-c6dbc538af4b/d6b09501-5b79-4c92-bf10-2ed3bda1b425 glusterfs.gfid.string="f28cabcb-d169-41fc-a633-9bef4c4a8e40" [root@lease-11 ovirt-backbone-2]# getfattr -d -m . -e hex .shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500 # file: .shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500 security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0x1f86b4328ec6424699aa48cc6d7b5da0 trusted.gfid2path.b48064c78d7a85c9=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f66323863616263622d643136392d343166632d613633332d3962656634633461386534302e3531353030 [root@lease-11 ovirt-backbone-2]# getfattr -d -m . -e hex .glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0 # file: .glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0 security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0x1f86b4328ec6424699aa48cc6d7b5da0 trusted.gfid2path.b48064c78d7a85c9=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f66323863616263622d643136392d343166632d613633332d3962656634633461386534302e3531353030 [root@lease-11 ovirt-backbone-2]# stat .glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0 File: ‘.glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0’ Size: 2166784 Blocks: 4128 IO Block: 4096 regular file Device: fd03h/64771d Inode: 12956094809 Links: 3 Access: (0660/-rw-rw----) Uid: ( 0/ root) Gid: ( 0/ root) Context: system_u:object_r:etc_runtime_t:s0 Access: 2018-12-18 20:11:53.595208449 +0000 Modify: 2018-12-17 21:43:36.391580259 +0000 Change: 2018-12-18 19:19:25.888055392 +0000 Birth: - [root@lease-11 ovirt-backbone-2]# find . -inum 12956094809 ./.glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0 ./.shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500 ================ I don't really see any inconsistencies, except the dates on the stat. However this is only after i tried moving the file out of the volumes to force a heal, which does happen on the data nodes, but not on the arbiter node. Before that they were also the same. I've also compared the file ./.shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500 on the 2 nodes and they are exactly the same. Things i've further tried; - gluster v heal ovirt-backbone-2 full => gluster v heal ovirt-backbone-2 info reports 0 entries on all nodes - stop each glusterd and glusterfsd, pause around 40sec and start them again on each node, 1 at a time, waiting for the heal to recover before moving to the next node - force a heal by stopping glusterd on a node and perform these steps; mkdir /mnt/ovirt-backbone-2/trigger rmdir /mnt/ovirt-backbone-2/trigger setfattr -n trusted.non-existent-key -v abc /mnt/ovirt-backbone-2/ setfattr -x trusted.non-existent-key /mnt/ovirt-backbone-2/ start glusterd - gluster volume rebalance ovirt-backbone-2 start => success Whats further interesting is that according the mount log, the volume is in split-brain; [2018-12-18 10:06:04.606870] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-ovirt-backbone-2-replicate-2: Failing FSTAT on gfid 2a57d87d-fe49-4034-919b-fdb79531bf68: split-brain observed. [Input/output error] [2018-12-18 10:06:04.606908] E [MSGID: 133014] [shard.c:1248:shard_common_stat_cbk] 0-ovirt-backbone-2-shard: stat failed: 2a57d87d-fe49-4034-919b-fdb79531bf68 [Input/output error] [2018-12-18 10:06:04.606927] W [fuse-bridge.c:871:fuse_attr_cbk] 0-glusterfs-fuse: 428090: FSTAT() /b1c2c949-aef4-4aec-999b-b179efeef732/dom_md/ids => -1 (Input/output error) [2018-12-18 10:06:05.107729] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-ovirt-backbone-2-replicate-2: Failing FSTAT on gfid 2a57d87d-fe49-4034-919b-fdb79531bf68: split-brain observed. [Input/output error] [2018-12-18 10:06:05.107770] E [MSGID: 133014] [shard.c:1248:shard_common_stat_cbk] 0-ovirt-backbone-2-shard: stat failed: 2a57d87d-fe49-4034-919b-fdb79531bf68 [Input/output error] [2018-12-18 10:06:05.107791] W [fuse-bridge.c:871:fuse_attr_cbk] 0-glusterfs-fuse: 428091: FSTAT() /b1c2c949-aef4-4aec-999b-b179efeef732/dom_md/ids => -1 (Input/output error) [2018-12-18 10:06:05.537244] I [MSGID: 108006] [afr-common.c:5494:afr_local_init] 0-ovirt-backbone-2-replicate-1: no subvolumes up [2018-12-18 10:06:05.538523] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-ovirt-backbone-2-replicate-2: Failing STAT on gfid 00000000-0000-0000-0000-000000000001: split-brain observed. [Input/output error] [2018-12-18 10:06:05.538685] I [MSGID: 108006] [afr-common.c:5494:afr_local_init] 0-ovirt-backbone-2-replicate-1: no subvolumes up [2018-12-18 10:06:05.538794] I [MSGID: 108006] [afr-common.c:5494:afr_local_init] 0-ovirt-backbone-2-replicate-1: no subvolumes up [2018-12-18 10:06:05.539342] I [MSGID: 109063] [dht-layout.c:716:dht_layout_normalize] 0-ovirt-backbone-2-dht: Found anomalies in /b1c2c949-aef4-4aec-999b-b179efeef732 (gfid = 8c8598ce-1a52-418e-a7b4-435fee34bae8). Holes=2 overlaps=0 [2018-12-18 10:06:05.539372] W [MSGID: 109005] [dht-selfheal.c:2158:dht_selfheal_directory] 0-ovirt-backbone-2-dht: Directory selfheal failed: 2 subvolumes down.Not fixing. path = /b1c2c949-aef4-4aec-999b-b179efeef732, gfid = 8c8598ce-1a52-418e-a7b4-435fee34bae8 [2018-12-18 10:06:05.539694] I [MSGID: 108006] [afr-common.c:5494:afr_local_init] 0-ovirt-backbone-2-replicate-1: no subvolumes up [2018-12-18 10:06:05.540652] I [MSGID: 108006] [afr-common.c:5494:afr_local_init] 0-ovirt-backbone-2-replicate-1: no subvolumes up [2018-12-18 10:06:05.608612] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-ovirt-backbone-2-replicate-2: Failing FSTAT on gfid 2a57d87d-fe49-4034-919b-fdb79531bf68: split-brain observed. [Input/output error] [2018-12-18 10:06:05.608657] E [MSGID: 133014] [shard.c:1248:shard_common_stat_cbk] 0-ovirt-backbone-2-shard: stat failed: 2a57d87d-fe49-4034-919b-fdb79531bf68 [Input/output error] [2018-12-18 10:06:05.608672] W [fuse-bridge.c:871:fuse_attr_cbk] 0-glusterfs-fuse: 428096: FSTAT() /b1c2c949-aef4-4aec-999b-b179efeef732/dom_md/ids => -1 (Input/output error) [2018-12-18 10:06:06.109339] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-ovirt-backbone-2-replicate-2: Failing FSTAT on gfid 2a57d87d-fe49-4034-919b-fdb79531bf68: split-brain observed. [Input/output error] [2018-12-18 10:06:06.109378] E [MSGID: 133014] [shard.c:1248:shard_common_stat_cbk] 0-ovirt-backbone-2-shard: stat failed: 2a57d87d-fe49-4034-919b-fdb79531bf68 [Input/output error] [2018-12-18 10:06:06.109399] W [fuse-bridge.c:871:fuse_attr_cbk] 0-glusterfs-fuse: 428097: FSTAT() /b1c2c949-aef4-4aec-999b-b179efeef732/dom_md/ids => -1 (Input/output error) #note i'm able to see ; /b1c2c949-aef4-4aec-999b-b179efeef732/dom_md/ids [root@lease-11 ovirt-backbone-2]# stat /mnt/ovirt-backbone-2/b1c2c949-aef4-4aec-999b-b179efeef732/dom_md/ids File: ‘/mnt/ovirt-backbone-2/b1c2c949-aef4-4aec-999b-b179efeef732/dom_md/ids’ Size: 1048576 Blocks: 2048 IO Block: 131072 regular file Device: 41h/65d Inode: 10492258721813610344 Links: 1 Access: (0660/-rw-rw----) Uid: ( 36/ vdsm) Gid: ( 36/ kvm) Context: system_u:object_r:fusefs_t:s0 Access: 2018-12-19 20:07:39.917573869 +0000 Modify: 2018-12-19 20:07:39.928573917 +0000 Change: 2018-12-19 20:07:39.929573921 +0000 Birth: - however checking: gluster v heal ovirt-backbone-2 info split-brain reports no entries. I've also tried mounting the qemu image, and this works fine, i'm able to see all contents; losetup /dev/loop0 /mnt/ovirt-backbone-2/b1c2c949-aef4-4aec-999b-b179efeef732/images/f6ac9660-a84e-469e-a17c-c6dbc538af4b/d6b09501-5b79-4c92-bf10-2ed3bda1b425 kpartx -a /dev/loop0 vgscan vgchange -ay slave-data mkdir /mnt/slv01 mount /dev/mapper/slave--data-lvol0 /mnt/slv01/ Possible causes for this issue; 1. the machine "lease-11" suffered from a faulty RAM module (ECC), which halted the machine and causes an invalid state. (this machine also hosts other volumes, with similar configurations, which report no issue) 2. after the RAM module was replaced, the VM using the backing qemu image, was restored from a backup (the backup was file based within the VM on a different directory). This is because some files were corrupted. The backup/recovery obviously causes extra IO, possible introducing race conditions? The machine did run for about 12h without issues, and in total for about 36h. 3. since only the client (maybe only gfapi?) reports errors, something is broken there? The volume info; root@lease-06 ~# gluster v info ovirt-backbone-2 Volume Name: ovirt-backbone-2 Type: Distributed-Replicate Volume ID: 85702d35-62c8-4c8c-930d-46f455a8af28 Status: Started Snapshot Count: 0 Number of Bricks: 3 x (2 + 1) = 9 Transport-type: tcp Bricks: Brick1: 10.32.9.7:/data/gfs/bricks/brick1/ovirt-backbone-2 Brick2: 10.32.9.3:/data/gfs/bricks/brick1/ovirt-backbone-2 Brick3: 10.32.9.4:/data/gfs/bricks/bricka/ovirt-backbone-2 (arbiter) Brick4: 10.32.9.8:/data0/gfs/bricks/brick1/ovirt-backbone-2 Brick5: 10.32.9.21:/data0/gfs/bricks/brick1/ovirt-backbone-2 Brick6: 10.32.9.5:/data/gfs/bricks/bricka/ovirt-backbone-2 (arbiter) Brick7: 10.32.9.9:/data0/gfs/bricks/brick1/ovirt-backbone-2 Brick8: 10.32.9.20:/data0/gfs/bricks/brick1/ovirt-backbone-2 Brick9: 10.32.9.6:/data/gfs/bricks/bricka/ovirt-backbone-2 (arbiter) Options Reconfigured: nfs.disable: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: enable cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off storage.owner-uid: 36 storage.owner-gid: 36 features.shard-block-size: 64MB performance.write-behind-window-size: 512MB performance.cache-size: 384MB cluster.brick-multiplex: on The volume status; root@lease-06 ~# gluster v status ovirt-backbone-2 Status of volume: ovirt-backbone-2 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.32.9.7:/data/gfs/bricks/brick1/ovi rt-backbone-2 49152 0 Y 7727 Brick 10.32.9.3:/data/gfs/bricks/brick1/ovi rt-backbone-2 49152 0 Y 12620 Brick 10.32.9.4:/data/gfs/bricks/bricka/ovi rt-backbone-2 49152 0 Y 8794 Brick 10.32.9.8:/data0/gfs/bricks/brick1/ov irt-backbone-2 49161 0 Y 22333 Brick 10.32.9.21:/data0/gfs/bricks/brick1/o virt-backbone-2 49152 0 Y 15030 Brick 10.32.9.5:/data/gfs/bricks/bricka/ovi rt-backbone-2 49166 0 Y 24592 Brick 10.32.9.9:/data0/gfs/bricks/brick1/ov irt-backbone-2 49153 0 Y 20148 Brick 10.32.9.20:/data0/gfs/bricks/brick1/o virt-backbone-2 49154 0 Y 15413 Brick 10.32.9.6:/data/gfs/bricks/bricka/ovi rt-backbone-2 49152 0 Y 43120 Self-heal Daemon on localhost N/A N/A Y 44587 Self-heal Daemon on 10.201.0.2 N/A N/A Y 8401 Self-heal Daemon on 10.201.0.5 N/A N/A Y 11038 Self-heal Daemon on 10.201.0.8 N/A N/A Y 9513 Self-heal Daemon on 10.32.9.4 N/A N/A Y 23736 Self-heal Daemon on 10.32.9.20 N/A N/A Y 2738 Self-heal Daemon on 10.32.9.3 N/A N/A Y 25598 Self-heal Daemon on 10.32.9.5 N/A N/A Y 511 Self-heal Daemon on 10.32.9.9 N/A N/A Y 23357 Self-heal Daemon on 10.32.9.8 N/A N/A Y 15225 Self-heal Daemon on 10.32.9.7 N/A N/A Y 25781 Self-heal Daemon on 10.32.9.21 N/A N/A Y 5034 Task Status of Volume ovirt-backbone-2 ------------------------------------------------------------------------------ Task : Rebalance ID : 6dfbac43-0125-4568-9ac3-a2c453faaa3d Status : completed gluster version is @3.12.15 and cluster.op-version=31202 ======================== It would be nice to know if it's possible to mark the files as not stale or if i should investigate other things? Or should we consider this volume lost? Also checking the code at; https://github.com/gluster/glusterfs/blob/master/xlators/features/shard/src/shard.c it seems the functions shifted quite some (line 1724 vs. 2243), so maybe it's fixed in a future version? Any thoughts are welcome. Thanks Olaf
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users