Ian, Do you've a reproducer for this bug? If not a specific one, a general outline of what operations where done on the file will help.
regards, Raghavendra On Mon, Mar 26, 2018 at 12:55 PM, Raghavendra Gowdappa <rgowd...@redhat.com> wrote: > > > On Mon, Mar 26, 2018 at 12:40 PM, Krutika Dhananjay <kdhan...@redhat.com> > wrote: > >> The gfid mismatch here is between the shard and its "link-to" file, the >> creation of which happens at a layer below that of shard translator on the >> stack. >> >> Adding DHT devs to take a look. >> > > Thanks Krutika. I assume shard doesn't do any dentry operations like > rename, link, unlink on the path of file (not the gfid handle based path) > internally while managing shards. Can you confirm? If it does these > operations, what fops does it do? > > @Ian, > > I can suggest following way to fix the problem: > * Since one of files listed is a DHT linkto file, I am assuming there is > only one shard of the file. If not, please list out gfids of other shards > and don't proceed with healing procedure. > * If gfids of all shards happen to be same and only linkto has a different > gfid, please proceed to step 3. Otherwise abort the healing procedure. > * If cluster.lookup-optimize is set to true abort the healing procedure > * Delete the linkto file - the file with permissions -------T and xattr > trusted.dht.linkto and do a lookup on the file from mount point after > turning off readdriplus [1]. > > As to reasons on how we ended up in this situation, Can you explain me > what is the I/O pattern on this file - like are there lots of entry > operations like rename, link, unlink etc on the file? There have been known > races in rename/lookup-heal-creating-linkto where linkto and data file > have different gfids. [2] fixes some of these cases > > [1] http://lists.gluster.org/pipermail/gluster-users/2017- > March/030148.html > [2] https://review.gluster.org/#/c/19547/ > > regards, > Raghavendra > >> >> >>> -Krutika >> >> On Mon, Mar 26, 2018 at 1:09 AM, Ian Halliday <ihalli...@ndevix.com> >> wrote: >> >>> Hello all, >>> >>> We are having a rather interesting problem with one of our VM storage >>> systems. The GlusterFS client is throwing errors relating to GFID >>> mismatches. We traced this down to multiple shards being present on the >>> gluster nodes, with different gfids. >>> >>> Hypervisor gluster mount log: >>> >>> [2018-03-25 18:54:19.261733] E [MSGID: 133010] >>> [shard.c:1724:shard_common_lookup_shards_cbk] 0-ovirt-zone1-shard: >>> Lookup on shard 7 failed. Base file gfid = >>> 87137cac-49eb-492a-8f33-8e33470d8cb7 >>> [Stale file handle] >>> The message "W [MSGID: 109009] [dht-common.c:2162:dht_lookup_linkfile_cbk] >>> 0-ovirt-zone1-dht: /.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7: gfid >>> different on data file on ovirt-zone1-replicate-3, gfid local = >>> 00000000-0000-0000-0000-000000000000, gfid node = >>> 57c6fcdf-52bb-4f7a-aea4-02f0dc81ff56 " repeated 2 times between >>> [2018-03-25 18:54:19.253748] and [2018-03-25 18:54:19.263576] >>> [2018-03-25 18:54:19.264349] W [MSGID: 109009] >>> [dht-common.c:1901:dht_lookup_everywhere_cbk] 0-ovirt-zone1-dht: >>> /.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7: gfid differs on >>> subvolume ovirt-zone1-replicate-3, gfid local = >>> fdf0813b-718a-4616-a51b-6999ebba9ec3, gfid node = >>> 57c6fcdf-52bb-4f7a-aea4-02f0dc81ff56 >>> >>> >>> On the storage nodes, we found this: >>> >>> [root@n1 gluster]# find -name 87137cac-49eb-492a-8f33-8e33470d8cb7.7 >>> ./brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7 >>> ./brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7 >>> >>> [root@n1 gluster]# ls -lh ./brick2/brick/.shard/87137cac >>> -49eb-492a-8f33-8e33470d8cb7.7 >>> ---------T. 2 root root 0 Mar 25 13:55 ./brick2/brick/.shard/87137cac >>> -49eb-492a-8f33-8e33470d8cb7.7 >>> [root@n1 gluster]# ls -lh ./brick4/brick/.shard/87137cac >>> -49eb-492a-8f33-8e33470d8cb7.7 >>> -rw-rw----. 2 root root 3.8G Mar 25 13:55 ./brick4/brick/.shard/87137cac >>> -49eb-492a-8f33-8e33470d8cb7.7 >>> >>> [root@n1 gluster]# getfattr -d -m . -e hex >>> ./brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7 >>> # file: brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7 >>> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6 >>> c6162656c65645f743a733000 >>> trusted.gfid=0xfdf0813b718a4616a51b6999ebba9ec3 >>> trusted.glusterfs.dht.linkto=0x6f766972742d3335302d7a6f6e653 >>> 12d7265706c69636174652d3300 >>> >>> [root@n1 gluster]# getfattr -d -m . -e hex >>> ./brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7 >>> # file: brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7 >>> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6 >>> c6162656c65645f743a733000 >>> trusted.afr.dirty=0x000000000000000000000000 >>> trusted.bit-rot.version=0x020000000000000059914190000ce672 >>> trusted.gfid=0x57c6fcdf52bb4f7aaea402f0dc81ff56 >>> >>> >>> I'm wondering how they got created in the first place, and if anyone has >>> any insight on how to fix it? >>> >>> Storage nodes: >>> [root@n1 gluster]# gluster --version >>> glusterfs 4.0.0 >>> >>> [root@n1 gluster]# gluster volume info >>> >>> Volume Name: ovirt-350-zone1 >>> Type: Distributed-Replicate >>> Volume ID: 106738ed-9951-4270-822e-63c9bcd0a20e >>> Status: Started >>> Snapshot Count: 0 >>> Number of Bricks: 7 x (2 + 1) = 21 >>> Transport-type: tcp >>> Bricks: >>> Brick1: 10.0.6.100:/gluster/brick1/brick >>> Brick2: 10.0.6.101:/gluster/brick1/brick >>> Brick3: 10.0.6.102:/gluster/arbrick1/brick (arbiter) >>> Brick4: 10.0.6.100:/gluster/brick2/brick >>> Brick5: 10.0.6.101:/gluster/brick2/brick >>> Brick6: 10.0.6.102:/gluster/arbrick2/brick (arbiter) >>> Brick7: 10.0.6.100:/gluster/brick3/brick >>> Brick8: 10.0.6.101:/gluster/brick3/brick >>> Brick9: 10.0.6.102:/gluster/arbrick3/brick (arbiter) >>> Brick10: 10.0.6.100:/gluster/brick4/brick >>> Brick11: 10.0.6.101:/gluster/brick4/brick >>> Brick12: 10.0.6.102:/gluster/arbrick4/brick (arbiter) >>> Brick13: 10.0.6.100:/gluster/brick5/brick >>> Brick14: 10.0.6.101:/gluster/brick5/brick >>> Brick15: 10.0.6.102:/gluster/arbrick5/brick (arbiter) >>> Brick16: 10.0.6.100:/gluster/brick6/brick >>> Brick17: 10.0.6.101:/gluster/brick6/brick >>> Brick18: 10.0.6.102:/gluster/arbrick6/brick (arbiter) >>> Brick19: 10.0.6.100:/gluster/brick7/brick >>> Brick20: 10.0.6.101:/gluster/brick7/brick >>> Brick21: 10.0.6.102:/gluster/arbrick7/brick (arbiter) >>> Options Reconfigured: >>> cluster.min-free-disk: 50GB >>> performance.strict-write-ordering: off >>> performance.strict-o-direct: off >>> nfs.disable: off >>> performance.readdir-ahead: on >>> transport.address-family: inet >>> performance.cache-size: 1GB >>> features.shard: on >>> features.shard-block-size: 5GB >>> server.event-threads: 8 >>> server.outstanding-rpc-limit: 128 >>> storage.owner-uid: 36 >>> storage.owner-gid: 36 >>> performance.quick-read: off >>> performance.read-ahead: off >>> performance.io-cache: off >>> performance.stat-prefetch: on >>> cluster.eager-lock: enable >>> network.remote-dio: enable >>> cluster.quorum-type: auto >>> cluster.server-quorum-type: server >>> cluster.data-self-heal-algorithm: full >>> performance.flush-behind: off >>> performance.write-behind-window-size: 8MB >>> client.event-threads: 8 >>> server.allow-insecure: on >>> >>> >>> Client version: >>> [root@kvm573 ~]# gluster --version >>> glusterfs 3.12.5 >>> >>> >>> Thanks! >>> >>> - Ian >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users@gluster.org >>> http://lists.gluster.org/mailman/listinfo/gluster-users >>> >> >> >
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users