Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-02-15 Thread Hu Bert
Hello, just to bring this to an end... the servers and the volume are "out of service", so i tried to repair. - umount all related mounts - rebooted misbehaving server - mounted volume on all clients Well, no healing happens. 'gluster volume status workdata clients' looks good btw. gluster

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-31 Thread Strahil Nikolov
Hi, This is a simplified description, see the links bellow for more detailed one.When a client makes a change to a file - it  commits that change to all bricks simultaneously and if the change passes on a quorate number of bricks (in your case 2 out of 3 is enough) it is treated as

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-30 Thread Hu Bert
Hi Strahil, hm, not sure what the clients have to do with the situation. "gluster volume status workdata clients" - lists all clients with their IP addresses. "gluster peer status" and "gluster volume status" are ok, the latter one says that all bricks are online, have a port etc. The network is

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-29 Thread Strahil Nikolov
This is your problem : bad server has only 3 clients. I remember there is another gluster volume command to list the IPs of the clients. Find it and run it to find which clients are actually OK (those 3) and the remaining 17 are not.  Then try to remount those 17 clients and if the situation

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-29 Thread Hu Bert
Hi, not sure what you mean with "clients" - do you mean the clients that mount the volume? gluster volume status workdata clients -- Brick : glusterpub2:/gluster/md3/workdata Clients connected : 20 Hostname

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-28 Thread Strahil Nikolov
2800 is too much. Most probably you are affected by a bug. How old are the clients ? Is only 1 server affected ?Have you checked if a client is not allowed to update all 3 copies ? If it's only 1 system, you can remove the brick, reinitialize it and then bring it back for a full sync. Best

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-28 Thread Hu Bert
Morning, a few bad apples - but which ones? Checked glustershd.log on the "bad" server and counted todays "gfid mismatch" entries (2800 in total): 44 /212>, 44 /174>, 44 /94037803>, 44 /94066216>, 44 /249771609>, 44 /64235523>, 44 /185>, etc. But as i said, these are

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-28 Thread Hu Bert
Hi Strahil, there's no arbiter: 3 servers with 5 bricks each. Volume Name: workdata Type: Distributed-Replicate Volume ID: 7d1e23e5-0308-4443-a832-d36f85ff7959 Status: Started Snapshot Count: 0 Number of Bricks: 5 x 3 = 15 The "problem" is: the number of files/entries to-be-healed has

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-27 Thread Strahil Nikolov
What about the arbiter node ?Actually, check on all nodes and script it - you might need it in the future. Simplest way to resolve is to make the file didappear (rename to something else and then rename it back). Another easy trick is to read thr whole file: dd if=file of=/dev/null

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-26 Thread Hu Bert
Morning, gfid1: getfattr -d -e hex -m. /gluster/md{3,4,5,6,7}/workdata/.glusterfs/fa/f5/faf59566-10f5-4ddd-8b0c-a87bc6a334fb glusterpub1 (good one): getfattr: Removing leading '/' from absolute path names # file: gluster/md6/workdata/.glusterfs/fa/f5/faf59566-10f5-4ddd-8b0c-a87bc6a334fb

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-26 Thread Strahil Nikolov
You don't need to mount it. Like this : # getfattr -d -e hex -m. /path/to/brick/.glusterfs/00/46/00462be8-3e61-4931-8bda-dae1645c639e # file: 00/46/00462be8-3e61-4931-8bda-dae1645c639e trusted.gfid=0x00462be83e6149318bdadae1645c639e

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-24 Thread Hu Bert
Good morning, hope i got it right... using: https://access.redhat.com/documentation/de-de/red_hat_gluster_storage/3.1/html/administration_guide/ch27s02 mount -t glusterfs -o aux-gfid-mount glusterpub1:/workdata /mnt/workdata gfid 1: getfattr -n trusted.glusterfs.pathinfo -e text

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-24 Thread Strahil Nikolov
Hi, Can you find and check the files with gfids: 60465723-5dc0-4ebe-aced-9f2c12e52642faf59566-10f5-4ddd-8b0c-a87bc6a334fb Use 'getfattr -d -e hex -m. ' command from https://docs.gluster.org/en/main/Troubleshooting/resolving-splitbrain/#analysis-of-the-output . Best Regards,Strahil Nikolov

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-19 Thread Hu Bert
Good morning, thx Gilberto, did the first three (set to WARNING), but the last one doesn't work. Anyway, with setting these three some new messages appear: [2024-01-20 07:23:58.561106 +] W [MSGID: 114061] [client-common.c:796:client_pre_lk_v2] 0-workdata-client-11: remote_fd is -1. EBADFD

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-19 Thread Gilberto Ferreira
gluster volume set testvol diagnostics.brick-log-level WARNING gluster volume set testvol diagnostics.brick-sys-log-level WARNING gluster volume set testvol diagnostics.client-log-level ERROR gluster --log-level=ERROR volume status --- Gilberto Nunes Ferreira Em sex., 19 de jan. de 2024 às

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-19 Thread Hu Bert
Hi Strahil, hm, don't get me wrong, it may sound a bit stupid, but... where do i set the log level? Using debian... https://access.redhat.com/documentation/de-de/red_hat_gluster_storage/3/html/administration_guide/configuring_the_log_level ls /etc/glusterfs/ eventsconfig.json

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-18 Thread Diego Zuccato
I don't want to hijack the thread. And in my case setting logs to debug would fill my /var partitions in no time. Maybe the OP can. Diego Il 18/01/2024 22:58, Strahil Nikolov ha scritto: Are you able to set the logs to debug level ? It might provide a clue what it is going on. Best Regards,

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-18 Thread Strahil Nikolov
Are you able to set the logs to debug level ?It might provide a clue what it is going on. Best Regards,Strahil Nikolov On Thu, Jan 18, 2024 at 13:08, Diego Zuccato wrote: That's the same kind of errors I keep seeing on my 2 clusters, regenerated some months ago. Seems a

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-18 Thread Hu Bert
Thx for your answer. We don't have that much data (but 33 TB anyway), but millions of files in total, on normal SATA disks. So copying stuff away and back, with a downtime maybe, is not manageable. Good thing is: the data can be re-calculated, as they are derived from source data. But one needs

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-18 Thread Diego Zuccato
Since glusterd does not consider it a split brain, you can't solve it with standard split brain tools. I've found no way to resolve it except by manually handling one file at a time: completely unmanageable with thousands of files and having to juggle between actual path on brick and metadata

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-18 Thread Hu Bert
were you able to solve the problem? Can it be treated like a "normal" split brain? 'gluster peer status' and 'gluster volume status' are ok, so kinda looks like "pseudo"... hubert Am Do., 18. Jan. 2024 um 08:28 Uhr schrieb Diego Zuccato : > > That's the same kind of errors I keep seeing on my 2

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-17 Thread Diego Zuccato
That's the same kind of errors I keep seeing on my 2 clusters, regenerated some months ago. Seems a pseudo-split-brain that should be impossible on a replica 3 cluster but keeps happening. Sadly going to ditch Gluster ASAP. Diego Il 18/01/2024 07:11, Hu Bert ha scritto: Good morning, heal

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-17 Thread Hu Bert
Good morning, heal still not running. Pending heals now sum up to 60K per brick. Heal was starting instantly e.g. after server reboot with version 10.4, but doesn't with version 11. What could be wrong? I only see these errors on one of the "good" servers in glustershd.log: [2024-01-18

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-17 Thread Hu Bert
hm, i only see such messages in glustershd.log on the 2 good servers: [2024-01-17 12:18:48.912952 +] W [MSGID: 114031] [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-6: remote operation failed. [{path=}, {gfid=ee28b56c-e352-48f8-bbb5-dbf31 babe073}, {errno=2}, {error=No

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-17 Thread Hu Bert
ok, finally managed to get all servers, volumes etc runnung, but took a couple of restarts, cksum checks etc. One problem: a volume doesn't heal automatically or doesn't heal at all. gluster volume status Status of volume: workdata Gluster process TCP Port RDMA Port

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-16 Thread Gilberto Ferreira
Ah! Indeed! You need to perform an upgrade in the clients as well. Em ter., 16 de jan. de 2024 às 03:12, Hu Bert escreveu: > morning to those still reading :-) > > i found this: >

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-15 Thread Hu Bert
morning to those still reading :-) i found this: https://docs.gluster.org/en/main/Troubleshooting/troubleshooting-glusterd/#common-issues-and-how-to-resolve-them there's a paragraph about "peer rejected" with the same error message, telling me: "Update the cluster.op-version" - i had only

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-15 Thread Hu Bert
just downgraded one node to 10.4, did a reboot - same result: cksum error. i'm able to bring it back in again, but it that error persists when downgrading all servers... Am Mo., 15. Jan. 2024 um 09:16 Uhr schrieb Hu Bert : > > Hi, > just upgraded some gluster servers from version 10.4 to version