Hello,
just to bring this to an end... the servers and the volume are "out of
service", so i tried to repair.
- umount all related mounts
- rebooted misbehaving server
- mounted volume on all clients
Well, no healing happens. 'gluster volume status workdata clients'
looks good btw.
gluster
Hi,
This is a simplified description, see the links bellow for more detailed
one.When a client makes a change to a file - it commits that change to all
bricks simultaneously and if the change passes on a quorate number of bricks
(in your case 2 out of 3 is enough) it is treated as
Hi Strahil,
hm, not sure what the clients have to do with the situation. "gluster
volume status workdata clients" - lists all clients with their IP
addresses.
"gluster peer status" and "gluster volume status" are ok, the latter
one says that all bricks are online, have a port etc. The network is
This is your problem : bad server has only 3 clients.
I remember there is another gluster volume command to list the IPs of the
clients. Find it and run it to find which clients are actually OK (those 3) and
the remaining 17 are not.
Then try to remount those 17 clients and if the situation
Hi,
not sure what you mean with "clients" - do you mean the clients that
mount the volume?
gluster volume status workdata clients
--
Brick : glusterpub2:/gluster/md3/workdata
Clients connected : 20
Hostname
2800 is too much. Most probably you are affected by a bug. How old are the
clients ? Is only 1 server affected ?Have you checked if a client is not
allowed to update all 3 copies ?
If it's only 1 system, you can remove the brick, reinitialize it and then bring
it back for a full sync.
Best
Morning,
a few bad apples - but which ones? Checked glustershd.log on the "bad"
server and counted todays "gfid mismatch" entries (2800 in total):
44 /212>,
44 /174>,
44 /94037803>,
44 /94066216>,
44 /249771609>,
44 /64235523>,
44 /185>,
etc. But as i said, these are
Hi Strahil,
there's no arbiter: 3 servers with 5 bricks each.
Volume Name: workdata
Type: Distributed-Replicate
Volume ID: 7d1e23e5-0308-4443-a832-d36f85ff7959
Status: Started
Snapshot Count: 0
Number of Bricks: 5 x 3 = 15
The "problem" is: the number of files/entries to-be-healed has
What about the arbiter node ?Actually, check on all nodes and script it - you
might need it in the future.
Simplest way to resolve is to make the file didappear (rename to something else
and then rename it back). Another easy trick is to read thr whole file: dd
if=file of=/dev/null
Morning,
gfid1:
getfattr -d -e hex -m.
/gluster/md{3,4,5,6,7}/workdata/.glusterfs/fa/f5/faf59566-10f5-4ddd-8b0c-a87bc6a334fb
glusterpub1 (good one):
getfattr: Removing leading '/' from absolute path names
# file:
gluster/md6/workdata/.glusterfs/fa/f5/faf59566-10f5-4ddd-8b0c-a87bc6a334fb
You don't need to mount it.
Like this :
# getfattr -d -e hex -m.
/path/to/brick/.glusterfs/00/46/00462be8-3e61-4931-8bda-dae1645c639e
# file: 00/46/00462be8-3e61-4931-8bda-dae1645c639e
trusted.gfid=0x00462be83e6149318bdadae1645c639e
Good morning,
hope i got it right... using:
https://access.redhat.com/documentation/de-de/red_hat_gluster_storage/3.1/html/administration_guide/ch27s02
mount -t glusterfs -o aux-gfid-mount glusterpub1:/workdata /mnt/workdata
gfid 1:
getfattr -n trusted.glusterfs.pathinfo -e text
Hi,
Can you find and check the files with gfids:
60465723-5dc0-4ebe-aced-9f2c12e52642faf59566-10f5-4ddd-8b0c-a87bc6a334fb
Use 'getfattr -d -e hex -m. ' command from
https://docs.gluster.org/en/main/Troubleshooting/resolving-splitbrain/#analysis-of-the-output
.
Best Regards,Strahil Nikolov
Good morning,
thx Gilberto, did the first three (set to WARNING), but the last one
doesn't work. Anyway, with setting these three some new messages
appear:
[2024-01-20 07:23:58.561106 +] W [MSGID: 114061]
[client-common.c:796:client_pre_lk_v2] 0-workdata-client-11: remote_fd
is -1. EBADFD
gluster volume set testvol diagnostics.brick-log-level WARNING
gluster volume set testvol diagnostics.brick-sys-log-level WARNING
gluster volume set testvol diagnostics.client-log-level ERROR
gluster --log-level=ERROR volume status
---
Gilberto Nunes Ferreira
Em sex., 19 de jan. de 2024 às
Hi Strahil,
hm, don't get me wrong, it may sound a bit stupid, but... where do i
set the log level? Using debian...
https://access.redhat.com/documentation/de-de/red_hat_gluster_storage/3/html/administration_guide/configuring_the_log_level
ls /etc/glusterfs/
eventsconfig.json
I don't want to hijack the thread. And in my case setting logs to debug
would fill my /var partitions in no time. Maybe the OP can.
Diego
Il 18/01/2024 22:58, Strahil Nikolov ha scritto:
Are you able to set the logs to debug level ?
It might provide a clue what it is going on.
Best Regards,
Are you able to set the logs to debug level ?It might provide a clue what it is
going on.
Best Regards,Strahil Nikolov
On Thu, Jan 18, 2024 at 13:08, Diego Zuccato wrote:
That's the same kind of errors I keep seeing on my 2 clusters,
regenerated some months ago. Seems a
Thx for your answer. We don't have that much data (but 33 TB anyway),
but millions of files in total, on normal SATA disks. So copying stuff
away and back, with a downtime maybe, is not manageable.
Good thing is: the data can be re-calculated, as they are derived from
source data. But one needs
Since glusterd does not consider it a split brain, you can't solve it
with standard split brain tools.
I've found no way to resolve it except by manually handling one file at
a time: completely unmanageable with thousands of files and having to
juggle between actual path on brick and metadata
were you able to solve the problem? Can it be treated like a "normal"
split brain? 'gluster peer status' and 'gluster volume status' are ok,
so kinda looks like "pseudo"...
hubert
Am Do., 18. Jan. 2024 um 08:28 Uhr schrieb Diego Zuccato
:
>
> That's the same kind of errors I keep seeing on my 2
That's the same kind of errors I keep seeing on my 2 clusters,
regenerated some months ago. Seems a pseudo-split-brain that should be
impossible on a replica 3 cluster but keeps happening.
Sadly going to ditch Gluster ASAP.
Diego
Il 18/01/2024 07:11, Hu Bert ha scritto:
Good morning,
heal
Good morning,
heal still not running. Pending heals now sum up to 60K per brick.
Heal was starting instantly e.g. after server reboot with version
10.4, but doesn't with version 11. What could be wrong?
I only see these errors on one of the "good" servers in glustershd.log:
[2024-01-18
hm, i only see such messages in glustershd.log on the 2 good servers:
[2024-01-17 12:18:48.912952 +] W [MSGID: 114031]
[client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-6:
remote operation failed.
[{path=},
{gfid=ee28b56c-e352-48f8-bbb5-dbf31
babe073}, {errno=2}, {error=No
ok, finally managed to get all servers, volumes etc runnung, but took
a couple of restarts, cksum checks etc.
One problem: a volume doesn't heal automatically or doesn't heal at all.
gluster volume status
Status of volume: workdata
Gluster process TCP Port RDMA Port
Ah! Indeed! You need to perform an upgrade in the clients as well.
Em ter., 16 de jan. de 2024 às 03:12, Hu Bert
escreveu:
> morning to those still reading :-)
>
> i found this:
>
morning to those still reading :-)
i found this:
https://docs.gluster.org/en/main/Troubleshooting/troubleshooting-glusterd/#common-issues-and-how-to-resolve-them
there's a paragraph about "peer rejected" with the same error message,
telling me: "Update the cluster.op-version" - i had only
just downgraded one node to 10.4, did a reboot - same result: cksum
error. i'm able to bring it back in again, but it that error persists
when downgrading all servers...
Am Mo., 15. Jan. 2024 um 09:16 Uhr schrieb Hu Bert :
>
> Hi,
> just upgraded some gluster servers from version 10.4 to version
28 matches
Mail list logo