Hi Eli, Thanks for the response. I had hoped for a simple fix here, but I think perhaps there isn't one. I have built this as a part of a new environment, eventually replacing a much older system built with Gluster 3.10 (yes - that old). I appreciate the warning about 10.3 and will run some comparative load testing against it and 9.5 both.
- Matt On Fri, Feb 24, 2023 at 8:46 AM Eli V <eliven...@gmail.com> wrote: > I've seen issues with symlinks failing to heal as well. I never found > a good solution on the glusterfs side of things. Most reliable fix I > found is just rm and recreate the symlink in the fuse volume itself. > Also, I'd strongly suggest heavy load testing before upgrading to 10.3 > in production, after upgrading from 9.5 -> 10.3 I've seen frequent > brick process crashes(glusterfsd), whereas 9.5 was quite stable. > > On Mon, Jan 23, 2023 at 3:58 PM Matt Rubright <mrubr...@uncc.edu> wrote: > > > > Hi friends, > > > > I have recently built a new replica 3 arbiter 1 volume on 10.3 servers > and have been putting it through its paces before getting it ready for > production use. The volume will ultimately contain about 200G of web > content files shared among multiple frontends. Each will use the gluster > fuse client to connect. > > > > What I am experiencing sounds very much like this post from 9 years ago: > https://lists.gnu.org/archive/html/gluster-devel/2013-12/msg00103.html > > > > In short, if I perform these steps I can reliably end up with symlinks > on the volume which will not heal either by initiating a 'full heal' from > the cluster or using a fuse client to read each file: > > > > 1) Verify that all nodes are healthy, the volume is healthy, and there > are no items needing to be healed > > 2) Cleanly shut down one server hosting a brick > > 3) Copy data, including some symlinks, from a fuse client to the volume > > 4) Bring the brick back online and observe the number and type of items > needing to be healed > > 5) Initiate a full heal from one of the nodes > > 6) Confirm that while files and directories are healed, symlinks are not > > > > Please help me determine if I have improper expectations here. I have > some basic knowledge of managing gluster volumes, but I may be > misunderstanding intended behavior. > > > > Here is the volume info and heal data at each step of the way: > > > > *** Verify that all nodes are healthy, the volume is healthy, and there > are no items needing to be healed *** > > > > # gluster vol info cwsvol01 > > > > Volume Name: cwsvol01 > > Type: Replicate > > Volume ID: 7b28e6e6-4a73-41b7-83fe-863a45fd27fc > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x (2 + 1) = 3 > > Transport-type: tcp > > Bricks: > > Brick1: glfs02-172-20-1:/data/brick01/cwsvol01 > > Brick2: glfs01-172-20-1:/data/brick01/cwsvol01 > > Brick3: glfsarb01-172-20-1:/data/arb01/cwsvol01 (arbiter) > > Options Reconfigured: > > performance.client-io-threads: off > > nfs.disable: on > > transport.address-family: inet > > storage.fips-mode-rchecksum: on > > cluster.granular-entry-heal: on > > > > # gluster vol status > > Status of volume: cwsvol01 > > Gluster process TCP Port RDMA Port Online > Pid > > > ------------------------------------------------------------------------------ > > Brick glfs02-172-20-1:/data/brick01/cwsvol0 > > 1 50253 0 Y > 1397 > > Brick glfs01-172-20-1:/data/brick01/cwsvol0 > > 1 56111 0 Y > 1089 > > Brick glfsarb01-172-20-1:/data/arb01/cwsvol > > 01 54517 0 Y > 118704 > > Self-heal Daemon on localhost N/A N/A Y > 1413 > > Self-heal Daemon on glfs01-172-20-1 N/A N/A Y > 3490 > > Self-heal Daemon on glfsarb01-172-20-1 N/A N/A Y > 118720 > > > > Task Status of Volume cwsvol01 > > > ------------------------------------------------------------------------------ > > There are no active volume tasks > > > > # gluster vol heal cwsvol01 info summary > > Brick glfs02-172-20-1:/data/brick01/cwsvol01 > > Status: Connected > > Total Number of entries: 0 > > Number of entries in heal pending: 0 > > Number of entries in split-brain: 0 > > Number of entries possibly healing: 0 > > > > Brick glfs01-172-20-1:/data/brick01/cwsvol01 > > Status: Connected > > Total Number of entries: 0 > > Number of entries in heal pending: 0 > > Number of entries in split-brain: 0 > > Number of entries possibly healing: 0 > > > > Brick glfsarb01-172-20-1:/data/arb01/cwsvol01 > > Status: Connected > > Total Number of entries: 0 > > Number of entries in heal pending: 0 > > Number of entries in split-brain: 0 > > Number of entries possibly healing: 0 > > > > *** Cleanly shut down one server hosting a brick *** > > > > *** Copy data, including some symlinks, from a fuse client to the volume > *** > > > > # gluster vol heal cwsvol01 info summary > > Brick glfs02-172-20-1:/data/brick01/cwsvol01 > > Status: Transport endpoint is not connected > > Total Number of entries: - > > Number of entries in heal pending: - > > Number of entries in split-brain: - > > Number of entries possibly healing: - > > > > Brick glfs01-172-20-1:/data/brick01/cwsvol01 > > Status: Connected > > Total Number of entries: 810 > > Number of entries in heal pending: 810 > > Number of entries in split-brain: 0 > > Number of entries possibly healing: 0 > > > > Brick glfsarb01-172-20-1:/data/arb01/cwsvol01 > > Status: Connected > > Total Number of entries: 810 > > Number of entries in heal pending: 810 > > Number of entries in split-brain: 0 > > Number of entries possibly healing: 0 > > > > *** Bring the brick back online and observe the number and type of > entities needing to be healed *** > > > > # gluster vol heal cwsvol01 info summary > > Brick glfs02-172-20-1:/data/brick01/cwsvol01 > > Status: Connected > > Total Number of entries: 0 > > Number of entries in heal pending: 0 > > Number of entries in split-brain: 0 > > Number of entries possibly healing: 0 > > > > Brick glfs01-172-20-1:/data/brick01/cwsvol01 > > Status: Connected > > Total Number of entries: 769 > > Number of entries in heal pending: 769 > > Number of entries in split-brain: 0 > > Number of entries possibly healing: 0 > > > > Brick glfsarb01-172-20-1:/data/arb01/cwsvol01 > > Status: Connected > > Total Number of entries: 769 > > Number of entries in heal pending: 769 > > Number of entries in split-brain: 0 > > Number of entries possibly healing: 0 > > > > *** Initiate a full heal from one of the nodes *** > > > > # gluster vol heal cwsvol01 info summary > > Brick glfs02-172-20-1:/data/brick01/cwsvol01 > > Status: Connected > > Total Number of entries: 0 > > Number of entries in heal pending: 0 > > Number of entries in split-brain: 0 > > Number of entries possibly healing: 0 > > > > Brick glfs01-172-20-1:/data/brick01/cwsvol01 > > Status: Connected > > Total Number of entries: 148 > > Number of entries in heal pending: 148 > > Number of entries in split-brain: 0 > > Number of entries possibly healing: 0 > > > > Brick glfsarb01-172-20-1:/data/arb01/cwsvol01 > > Status: Connected > > Total Number of entries: 148 > > Number of entries in heal pending: 148 > > Number of entries in split-brain: 0 > > Number of entries possibly healing: 0 > > > > # gluster vol heal cwsvol01 info > > Brick glfs02-172-20-1:/data/brick01/cwsvol01 > > Status: Connected > > Number of entries: 0 > > > > Brick glfs01-172-20-1:/data/brick01/cwsvol01 > > /web01-etc > > /web01-etc/nsswitch.conf > > /web01-etc/swid/swidtags.d > > /web01-etc/swid/swidtags.d/redhat.com > > /web01-etc/os-release > > /web01-etc/system-release > > < truncated > > > > > *** Verify that one brick contains the symlink while the > previously-offline one does not *** > > > > [root@cws-glfs01 ~]# ls -ld > /data/brick01/cwsvol01/web01-etc/nsswitch.conf > > lrwxrwxrwx 2 root root 29 Jan 4 16:00 > /data/brick01/cwsvol01/web01-etc/nsswitch.conf -> > /etc/authselect/nsswitch.conf > > > > [root@cws-glfs02 ~]# ls -ld > /data/brick01/cwsvol01/web01-etc/nsswitch.conf > > ls: cannot access '/data/brick01/cwsvol01/web01-etc/nsswitch.conf': No > such file or directory > > > > *** Note entries in /var/log/gluster/glustershd.log *** > > > > [2023-01-23 20:34:40.939904 +0000] W [MSGID: 114031] > [client-rpc-fops_v2.c:2457:client4_0_link_cbk] 0-cwsvol01-client-1: remote > operation failed. [{source=<gfid:3cade471-8aba-492a-b981-d63330d2e02e>}, > {target=(null)}, {errno=116}, {error=Stale file handle}] > > [2023-01-23 20:34:40.945774 +0000] W [MSGID: 114031] > [client-rpc-fops_v2.c:2457:client4_0_link_cbk] 0-cwsvol01-client-1: remote > operation failed. [{source=<gfid:35102340-9409-4d88-a391-da43c00644e7>}, > {target=(null)}, {errno=116}, {error=Stale file handle}] > > [2023-01-23 20:34:40.749715 +0000] W [MSGID: 114031] > [client-rpc-fops_v2.c:2457:client4_0_link_cbk] 0-cwsvol01-client-1: remote > operation failed. [{source=<gfid:874406a9-9478-4b83-9e6a-09e262e4b85d>}, > {target=(null)}, {errno=116}, {error=Stale file handle}] > > > > > > ________ > > > > > > > > Community Meeting Calendar: > > > > Schedule - > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > > Bridge: https://meet.google.com/cpu-eiue-hvk > > Gluster-users mailing list > > Gluster-users@gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users >
________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users