Hello,
I've run into an issue with Gluster 11.1 and need some assistance. I have a 4+1 
dispersed gluster setup consisting of 20 nodes and 200 bricks. This setup was 
15 nodes and 150 bricks until last week and was working flawlessly. We needed 
more space so we expanded the volume by adding 5 more nodes and 50 bricks.

We added the nodes and triggered a fix-layout command. Unknown to us at the 
time, one of the five new nodes had a hardware issue, the CPU cooling fan was 
bad. This caused the node to throttle down to 500mhz on all cores and 
eventually shut itself down mid fix-layout. Due to how our ISP works, we could 
only replace the entire node, so we did and executed a replace-brick command.

Presently this is the state we are in and I'm not sure how best to proceed to 
fix the errors and behavior I'm seeing. I'm not sure if running another 
fix-layout command again should be the next step or not given hundreds of 
objects are stuck in a persistent heal state, and the fact that doing just 
about any command other than status, info or heal volume info, results in all 
client mounts hanging for ~5m or bricks start to drop. The client logs show 
numerous anomolies as well such as:

[2023-11-10 17:41:52.153423 +0000] W [MSGID: 122040] 
[ec-common.c:1262:ec_prepare_update_cbk] 0-media-disperse-30: Failed to get 
size and version : FOP : 'XATTROP' failed on '/path/to/folder' with gfid 
0d295c94-5577-4445-9e57-6258f24d22c5. Parent FOP: OPENDIR [Input/output error]

[2023-11-10 17:48:46.965415 +0000] E [MSGID: 122038] 
[ec-dir-read.c:398:ec_manager_readdir] 0-media-disperse-36: EC is not winding 
readdir: FOP : 'READDIRP' failed on gfid f8ad28d0-05b4-4df3-91ea-73fabf27712c. 
Parent FOP: No Parent [File descriptor in bad state]

[2023-11-10 17:39:46.076149 +0000] I [MSGID: 109018] 
[dht-common.c:1840:dht_revalidate_cbk] 0-media-dht: Mismatching layouts for 
/path/to/folder2, gfid = f04124e5-63e6-4ddf-9b6b-aa47770f90f2

[2023-11-10 17:39:18.463421 +0000] E [MSGID: 122034] 
[ec-common.c:662:ec_log_insufficient_vol] 0-media-disperse-4: Insufficient 
available children for this request: Have : 0, Need : 4 : Child UP : 11111 
Mask: 00000, Healing : 00000 : FOP : 'XATTROP' failed on 
'/path/to/another/folder with gfid f04124e5-63e6-4ddf-9b6b-aa47770f90f2. Parent 
FOP: SETXATTR

[2023-11-10 17:36:21.565681 +0000] W [MSGID: 122006] 
[ec-combine.c:188:ec_iatt_combine] 0-media-disperse-39: Failed to combine iatt 
(inode: 13324146332441721129-13324146332441721129, links: 2-2, uid: 1000-1000, 
gid: 1000-1001, rdev: 0-0, size: 10-10, mode: 40775-40775), FOP : 'LOOKUP' 
failed on '/path/to/yet/another/folder'. Parent FOP: No Parent

[2023-11-10 17:39:46.147299 +0000] W [MSGID: 114031] 
[client-rpc-fops_v2.c:2563:client4_0_lookup_cbk] 0-media-client-1: remote 
operation failed. [{path=/path/to/folder3}, 
{gfid=00000000-0000-0000-0000-000000000000}, {errno=13}, {error=Permission 
denied}]

[2023-11-10 17:39:46.093069 +0000] W [MSGID: 114061] 
[client-common.c:1232:client_pre_readdirp_v2] 0-media-client-14: remote_fd is 
-1. EBADFD [{gfid=f04124e5-63e6-4ddf-9b6b-aa47770f90f2}, {errno=77}, 
{error=File descriptor in bad state}]

[2023-11-10 17:55:11.407630 +0000] E [MSGID: 122038] 
[ec-dir-read.c:398:ec_manager_readdir] 0-media-disperse-30: EC is not winding 
readdir: FOP : 'READDIRP' failed on gfid 2bba7b7e-7a4b-416a-80f0-dd50caffd2c2. 
Parent FOP: No Parent [File descriptor in bad state]

[2023-11-10 17:39:46.076179 +0000] W [MSGID: 109221] 
[dht-selfheal.c:2023:dht_selfheal_directory] 0-media-dht: Directory selfheal 
failed [{path=/path/to/folder7}, {misc=2}, {unrecoverable-errors}, 
{gfid=f04124e5-63e6-4ddf-9b6b-aa47770f90f2}]

Something about this failed expansion has caused these errors and I'm not sure 
how to proceed. Right now doing just about anything causes the client mounts to 
hang for up to 5 minutes including restarting a node, trying to use a volume 
set command, etc. I tried increasing a cache timeout value and ~153 bricks out 
of 200 dropped offline. Restarting a node seems to cause the mounts to hang as 
well.

I've tried:
running a gluster volume heal volumename full - will cause mounts to hang for 
3-5m but seems to proceed
Running ls -alhR against volume to trigger heals
Tried removing new bricks, which triggers a rebalance which fails almost 
immediately, and most of the self-heal agents go offline as well
Turned off bit-rot to reduce load on system
Replace a brick with a new brick (same drive, new dir.) Attempted force as well.
Changed heal mode from diff to full
Lowered parallel heal count to 4

When I replaced the one brick, the heal count dropped on that brick from ~100 
to ~6, however, those 6 are folders in the root of the volume vs subfolders 
many layers in. I suspect this is causing a lot of the issues I'm seeing and I 
don't know how to resolve this without damaging any of the existing data.

I'm hoping its just due to the fix layout failing and that just needs to run 
again but wanted to seek guidance from the group as to not make things worse. 
I'm not opposed to losing the data already copied to the new bricks, I just 
need to know how to do so without damaging the data on the original 150 bricks.

I did notice something else odd as well which I'm not sure is pertinent or not, 
but on one of the original 15 nodes, if I go to /data/brick1/volume dir and to 
an ls -l, the permissions show 1000:1000, which is how it is on the actual fuse 
mount as well. If I do the same on one of the new bricks, it shows root:root. I 
didn't alter any of this, again as to not cause more problems.

Thanks in advance for any guidance/help.
-Ed
________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to