I've seen issues with symlinks failing to heal as well. I never found a good solution on the glusterfs side of things. Most reliable fix I found is just rm and recreate the symlink in the fuse volume itself. Also, I'd strongly suggest heavy load testing before upgrading to 10.3 in production, after upgrading from 9.5 -> 10.3 I've seen frequent brick process crashes(glusterfsd), whereas 9.5 was quite stable.
On Mon, Jan 23, 2023 at 3:58 PM Matt Rubright <mrubr...@uncc.edu> wrote: > > Hi friends, > > I have recently built a new replica 3 arbiter 1 volume on 10.3 servers and > have been putting it through its paces before getting it ready for production > use. The volume will ultimately contain about 200G of web content files > shared among multiple frontends. Each will use the gluster fuse client to > connect. > > What I am experiencing sounds very much like this post from 9 years ago: > https://lists.gnu.org/archive/html/gluster-devel/2013-12/msg00103.html > > In short, if I perform these steps I can reliably end up with symlinks on the > volume which will not heal either by initiating a 'full heal' from the > cluster or using a fuse client to read each file: > > 1) Verify that all nodes are healthy, the volume is healthy, and there are no > items needing to be healed > 2) Cleanly shut down one server hosting a brick > 3) Copy data, including some symlinks, from a fuse client to the volume > 4) Bring the brick back online and observe the number and type of items > needing to be healed > 5) Initiate a full heal from one of the nodes > 6) Confirm that while files and directories are healed, symlinks are not > > Please help me determine if I have improper expectations here. I have some > basic knowledge of managing gluster volumes, but I may be misunderstanding > intended behavior. > > Here is the volume info and heal data at each step of the way: > > *** Verify that all nodes are healthy, the volume is healthy, and there are > no items needing to be healed *** > > # gluster vol info cwsvol01 > > Volume Name: cwsvol01 > Type: Replicate > Volume ID: 7b28e6e6-4a73-41b7-83fe-863a45fd27fc > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x (2 + 1) = 3 > Transport-type: tcp > Bricks: > Brick1: glfs02-172-20-1:/data/brick01/cwsvol01 > Brick2: glfs01-172-20-1:/data/brick01/cwsvol01 > Brick3: glfsarb01-172-20-1:/data/arb01/cwsvol01 (arbiter) > Options Reconfigured: > performance.client-io-threads: off > nfs.disable: on > transport.address-family: inet > storage.fips-mode-rchecksum: on > cluster.granular-entry-heal: on > > # gluster vol status > Status of volume: cwsvol01 > Gluster process TCP Port RDMA Port Online Pid > ------------------------------------------------------------------------------ > Brick glfs02-172-20-1:/data/brick01/cwsvol0 > 1 50253 0 Y 1397 > Brick glfs01-172-20-1:/data/brick01/cwsvol0 > 1 56111 0 Y 1089 > Brick glfsarb01-172-20-1:/data/arb01/cwsvol > 01 54517 0 Y > 118704 > Self-heal Daemon on localhost N/A N/A Y 1413 > Self-heal Daemon on glfs01-172-20-1 N/A N/A Y 3490 > Self-heal Daemon on glfsarb01-172-20-1 N/A N/A Y > 118720 > > Task Status of Volume cwsvol01 > ------------------------------------------------------------------------------ > There are no active volume tasks > > # gluster vol heal cwsvol01 info summary > Brick glfs02-172-20-1:/data/brick01/cwsvol01 > Status: Connected > Total Number of entries: 0 > Number of entries in heal pending: 0 > Number of entries in split-brain: 0 > Number of entries possibly healing: 0 > > Brick glfs01-172-20-1:/data/brick01/cwsvol01 > Status: Connected > Total Number of entries: 0 > Number of entries in heal pending: 0 > Number of entries in split-brain: 0 > Number of entries possibly healing: 0 > > Brick glfsarb01-172-20-1:/data/arb01/cwsvol01 > Status: Connected > Total Number of entries: 0 > Number of entries in heal pending: 0 > Number of entries in split-brain: 0 > Number of entries possibly healing: 0 > > *** Cleanly shut down one server hosting a brick *** > > *** Copy data, including some symlinks, from a fuse client to the volume *** > > # gluster vol heal cwsvol01 info summary > Brick glfs02-172-20-1:/data/brick01/cwsvol01 > Status: Transport endpoint is not connected > Total Number of entries: - > Number of entries in heal pending: - > Number of entries in split-brain: - > Number of entries possibly healing: - > > Brick glfs01-172-20-1:/data/brick01/cwsvol01 > Status: Connected > Total Number of entries: 810 > Number of entries in heal pending: 810 > Number of entries in split-brain: 0 > Number of entries possibly healing: 0 > > Brick glfsarb01-172-20-1:/data/arb01/cwsvol01 > Status: Connected > Total Number of entries: 810 > Number of entries in heal pending: 810 > Number of entries in split-brain: 0 > Number of entries possibly healing: 0 > > *** Bring the brick back online and observe the number and type of entities > needing to be healed *** > > # gluster vol heal cwsvol01 info summary > Brick glfs02-172-20-1:/data/brick01/cwsvol01 > Status: Connected > Total Number of entries: 0 > Number of entries in heal pending: 0 > Number of entries in split-brain: 0 > Number of entries possibly healing: 0 > > Brick glfs01-172-20-1:/data/brick01/cwsvol01 > Status: Connected > Total Number of entries: 769 > Number of entries in heal pending: 769 > Number of entries in split-brain: 0 > Number of entries possibly healing: 0 > > Brick glfsarb01-172-20-1:/data/arb01/cwsvol01 > Status: Connected > Total Number of entries: 769 > Number of entries in heal pending: 769 > Number of entries in split-brain: 0 > Number of entries possibly healing: 0 > > *** Initiate a full heal from one of the nodes *** > > # gluster vol heal cwsvol01 info summary > Brick glfs02-172-20-1:/data/brick01/cwsvol01 > Status: Connected > Total Number of entries: 0 > Number of entries in heal pending: 0 > Number of entries in split-brain: 0 > Number of entries possibly healing: 0 > > Brick glfs01-172-20-1:/data/brick01/cwsvol01 > Status: Connected > Total Number of entries: 148 > Number of entries in heal pending: 148 > Number of entries in split-brain: 0 > Number of entries possibly healing: 0 > > Brick glfsarb01-172-20-1:/data/arb01/cwsvol01 > Status: Connected > Total Number of entries: 148 > Number of entries in heal pending: 148 > Number of entries in split-brain: 0 > Number of entries possibly healing: 0 > > # gluster vol heal cwsvol01 info > Brick glfs02-172-20-1:/data/brick01/cwsvol01 > Status: Connected > Number of entries: 0 > > Brick glfs01-172-20-1:/data/brick01/cwsvol01 > /web01-etc > /web01-etc/nsswitch.conf > /web01-etc/swid/swidtags.d > /web01-etc/swid/swidtags.d/redhat.com > /web01-etc/os-release > /web01-etc/system-release > < truncated > > > *** Verify that one brick contains the symlink while the previously-offline > one does not *** > > [root@cws-glfs01 ~]# ls -ld /data/brick01/cwsvol01/web01-etc/nsswitch.conf > lrwxrwxrwx 2 root root 29 Jan 4 16:00 > /data/brick01/cwsvol01/web01-etc/nsswitch.conf -> > /etc/authselect/nsswitch.conf > > [root@cws-glfs02 ~]# ls -ld /data/brick01/cwsvol01/web01-etc/nsswitch.conf > ls: cannot access '/data/brick01/cwsvol01/web01-etc/nsswitch.conf': No such > file or directory > > *** Note entries in /var/log/gluster/glustershd.log *** > > [2023-01-23 20:34:40.939904 +0000] W [MSGID: 114031] > [client-rpc-fops_v2.c:2457:client4_0_link_cbk] 0-cwsvol01-client-1: remote > operation failed. [{source=<gfid:3cade471-8aba-492a-b981-d63330d2e02e>}, > {target=(null)}, {errno=116}, {error=Stale file handle}] > [2023-01-23 20:34:40.945774 +0000] W [MSGID: 114031] > [client-rpc-fops_v2.c:2457:client4_0_link_cbk] 0-cwsvol01-client-1: remote > operation failed. [{source=<gfid:35102340-9409-4d88-a391-da43c00644e7>}, > {target=(null)}, {errno=116}, {error=Stale file handle}] > [2023-01-23 20:34:40.749715 +0000] W [MSGID: 114031] > [client-rpc-fops_v2.c:2457:client4_0_link_cbk] 0-cwsvol01-client-1: remote > operation failed. [{source=<gfid:874406a9-9478-4b83-9e6a-09e262e4b85d>}, > {target=(null)}, {errno=116}, {error=Stale file handle}] > > > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://meet.google.com/cpu-eiue-hvk > Gluster-users mailing list > Gluster-users@gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users