The gfid mismatch here is between the shard and its "link-to" file, the creation of which happens at a layer below that of shard translator on the stack.
Adding DHT devs to take a look. -Krutika On Mon, Mar 26, 2018 at 1:09 AM, Ian Halliday <ihalli...@ndevix.com> wrote: > Hello all, > > We are having a rather interesting problem with one of our VM storage > systems. The GlusterFS client is throwing errors relating to GFID > mismatches. We traced this down to multiple shards being present on the > gluster nodes, with different gfids. > > Hypervisor gluster mount log: > > [2018-03-25 18:54:19.261733] E [MSGID: 133010] > [shard.c:1724:shard_common_lookup_shards_cbk] > 0-ovirt-zone1-shard: Lookup on shard 7 failed. Base file gfid = > 87137cac-49eb-492a-8f33-8e33470d8cb7 [Stale file handle] > The message "W [MSGID: 109009] [dht-common.c:2162:dht_lookup_linkfile_cbk] > 0-ovirt-zone1-dht: /.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7: gfid > different on data file on ovirt-zone1-replicate-3, gfid local = > 00000000-0000-0000-0000-000000000000, gfid node = > 57c6fcdf-52bb-4f7a-aea4-02f0dc81ff56 > " repeated 2 times between [2018-03-25 18:54:19.253748] and [2018-03-25 > 18:54:19.263576] > [2018-03-25 18:54:19.264349] W [MSGID: 109009] > [dht-common.c:1901:dht_lookup_everywhere_cbk] 0-ovirt-zone1-dht: > /.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7: gfid differs on subvolume > ovirt-zone1-replicate-3, gfid local = fdf0813b-718a-4616-a51b-6999ebba9ec3, > gfid node = 57c6fcdf-52bb-4f7a-aea4-02f0dc81ff56 > > > On the storage nodes, we found this: > > [root@n1 gluster]# find -name 87137cac-49eb-492a-8f33-8e33470d8cb7.7 > ./brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7 > ./brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7 > > [root@n1 gluster]# ls -lh ./brick2/brick/.shard/87137cac-49eb-492a-8f33- > 8e33470d8cb7.7 > ---------T. 2 root root 0 Mar 25 13:55 ./brick2/brick/.shard/ > 87137cac-49eb-492a-8f33-8e33470d8cb7.7 > [root@n1 gluster]# ls -lh ./brick4/brick/.shard/87137cac-49eb-492a-8f33- > 8e33470d8cb7.7 > -rw-rw----. 2 root root 3.8G Mar 25 13:55 ./brick4/brick/.shard/ > 87137cac-49eb-492a-8f33-8e33470d8cb7.7 > > [root@n1 gluster]# getfattr -d -m . -e hex ./brick2/brick/.shard/ > 87137cac-49eb-492a-8f33-8e33470d8cb7.7 > # file: brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7 > security.selinux=0x73797374656d5f753a6f626a6563 > 745f723a756e6c6162656c65645f743a733000 > trusted.gfid=0xfdf0813b718a4616a51b6999ebba9ec3 > trusted.glusterfs.dht.linkto=0x6f766972742d3335302d7a6f6e65 > 312d7265706c69636174652d3300 > > [root@n1 gluster]# getfattr -d -m . -e hex ./brick4/brick/.shard/ > 87137cac-49eb-492a-8f33-8e33470d8cb7.7 > # file: brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7 > security.selinux=0x73797374656d5f753a6f626a6563 > 745f723a756e6c6162656c65645f743a733000 > trusted.afr.dirty=0x000000000000000000000000 > trusted.bit-rot.version=0x020000000000000059914190000ce672 > trusted.gfid=0x57c6fcdf52bb4f7aaea402f0dc81ff56 > > > I'm wondering how they got created in the first place, and if anyone has > any insight on how to fix it? > > Storage nodes: > [root@n1 gluster]# gluster --version > glusterfs 4.0.0 > > [root@n1 gluster]# gluster volume info > > Volume Name: ovirt-350-zone1 > Type: Distributed-Replicate > Volume ID: 106738ed-9951-4270-822e-63c9bcd0a20e > Status: Started > Snapshot Count: 0 > Number of Bricks: 7 x (2 + 1) = 21 > Transport-type: tcp > Bricks: > Brick1: 10.0.6.100:/gluster/brick1/brick > Brick2: 10.0.6.101:/gluster/brick1/brick > Brick3: 10.0.6.102:/gluster/arbrick1/brick (arbiter) > Brick4: 10.0.6.100:/gluster/brick2/brick > Brick5: 10.0.6.101:/gluster/brick2/brick > Brick6: 10.0.6.102:/gluster/arbrick2/brick (arbiter) > Brick7: 10.0.6.100:/gluster/brick3/brick > Brick8: 10.0.6.101:/gluster/brick3/brick > Brick9: 10.0.6.102:/gluster/arbrick3/brick (arbiter) > Brick10: 10.0.6.100:/gluster/brick4/brick > Brick11: 10.0.6.101:/gluster/brick4/brick > Brick12: 10.0.6.102:/gluster/arbrick4/brick (arbiter) > Brick13: 10.0.6.100:/gluster/brick5/brick > Brick14: 10.0.6.101:/gluster/brick5/brick > Brick15: 10.0.6.102:/gluster/arbrick5/brick (arbiter) > Brick16: 10.0.6.100:/gluster/brick6/brick > Brick17: 10.0.6.101:/gluster/brick6/brick > Brick18: 10.0.6.102:/gluster/arbrick6/brick (arbiter) > Brick19: 10.0.6.100:/gluster/brick7/brick > Brick20: 10.0.6.101:/gluster/brick7/brick > Brick21: 10.0.6.102:/gluster/arbrick7/brick (arbiter) > Options Reconfigured: > cluster.min-free-disk: 50GB > performance.strict-write-ordering: off > performance.strict-o-direct: off > nfs.disable: off > performance.readdir-ahead: on > transport.address-family: inet > performance.cache-size: 1GB > features.shard: on > features.shard-block-size: 5GB > server.event-threads: 8 > server.outstanding-rpc-limit: 128 > storage.owner-uid: 36 > storage.owner-gid: 36 > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > performance.stat-prefetch: on > cluster.eager-lock: enable > network.remote-dio: enable > cluster.quorum-type: auto > cluster.server-quorum-type: server > cluster.data-self-heal-algorithm: full > performance.flush-behind: off > performance.write-behind-window-size: 8MB > client.event-threads: 8 > server.allow-insecure: on > > > Client version: > [root@kvm573 ~]# gluster --version > glusterfs 3.12.5 > > > Thanks! > > - Ian > > > _______________________________________________ > Gluster-users mailing list > Gluster-users@gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users >
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users