Hi Xavi, The OS is Debian 11 with the proxmox kernel. Gluster packages are the official from gluster.org ( https://download.gluster.org/pub/gluster/glusterfs/10/10.3/Debian/bullseye/)
The system logs showed no other issues by the time of the crash, no OOM kill or whatsoever, and no other process was interacting with the gluster mountpoint besides proxmox. I wasn't running gdb when it crashed, so I don't really know if I can obtain a more detailed trace from logs or if there is a simple way to let it running in the background to see if it happens again (or there is a flag to start the systemd daemon in debug mode). Best, *Angel Docampo* <https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021> <[email protected]> <+34-93-1592929> El lun, 21 nov 2022 a las 15:16, Xavi Hernandez (<[email protected]>) escribió: > Hi Angel, > > On Mon, Nov 21, 2022 at 2:33 PM Angel Docampo <[email protected]> > wrote: > >> Sorry for necrobumping this, but this morning I've suffered this on my >> Proxmox + GlusterFS cluster. In the log I can see this >> >> [2022-11-21 07:38:00.213620 +0000] I [MSGID: 133017] >> [shard.c:7275:shard_seek] 11-vmdata-shard: seek called on >> fbc063cb-874e-475d-b585-f89 >> f7518acdd. [Operation not supported] >> pending frames: >> frame : type(1) op(WRITE) >> frame : type(0) op(0) >> frame : type(0) op(0) >> frame : type(0) op(0) >> frame : type(0) op(0) >> frame : type(0) op(0) >> frame : type(0) op(0) >> frame : type(0) op(0) >> frame : type(0) op(0) >> frame : type(0) op(0) >> frame : type(0) op(0) >> frame : type(0) op(0) >> frame : type(0) op(0) >> frame : type(0) op(0) >> frame : type(0) op(0) >> frame : type(0) op(0) >> frame : type(0) op(0) >> ... >> frame : type(1) op(FSYNC) >> frame : type(1) op(FSYNC) >> frame : type(1) op(FSYNC) >> frame : type(1) op(FSYNC) >> frame : type(1) op(FSYNC) >> frame : type(1) op(FSYNC) >> frame : type(1) op(FSYNC) >> frame : type(1) op(FSYNC) >> frame : type(1) op(FSYNC) >> frame : type(1) op(FSYNC) >> frame : type(1) op(FSYNC) >> frame : type(1) op(FSYNC) >> frame : type(1) op(FSYNC) >> frame : type(1) op(FSYNC) >> frame : type(1) op(FSYNC) >> frame : type(1) op(FSYNC) >> frame : type(1) op(FSYNC) >> frame : type(1) op(FSYNC) >> frame : type(1) op(FSYNC) >> frame : type(1) op(FSYNC) >> patchset: git://git.gluster.org/glusterfs.git >> signal received: 11 >> time of crash: >> 2022-11-21 07:38:00 +0000 >> configuration details: >> argp 1 >> backtrace 1 >> dlfcn 1 >> libpthread 1 >> llistxattr 1 >> setfsid 1 >> epoll.h 1 >> xattr.h 1 >> st_atim.tv_nsec 1 >> package-string: glusterfs 10.3 >> /lib/x86_64-linux-gnu/libglusterfs.so.0(+0x28a54)[0x7f74f286ba54] >> /lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x700)[0x7f74f2873fc0] >> >> /lib/x86_64-linux-gnu/libc.so.6(+0x38d60)[0x7f74f262ed60] >> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x37a14)[0x7f74ecfcea14] >> >> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f74ecfb0414] >> >> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x16373)[0x7f74ecfad373] >> >> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x21d59)[0x7f74ecfb8d59] >> >> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x22815)[0x7f74ecfb9815] >> >> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x377d9)[0x7f74ecfce7d9] >> >> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f74ecfb0414] >> >> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x16373)[0x7f74ecfad373] >> >> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x170f9)[0x7f74ecfae0f9] >> >> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x313bb)[0x7f74ecfc83bb] >> >> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/protocol/client.so(+0x48e3a)[0x7f74ed06ce3a] >> >> /lib/x86_64-linux-gnu/libgfrpc.so.0(+0xfccb)[0x7f74f2816ccb] >> /lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x26)[0x7f74f2812646] >> >> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/rpc-transport/socket.so(+0x64c8)[0x7f74ee15f4c8] >> >> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/rpc-transport/socket.so(+0xd38c)[0x7f74ee16638c] >> >> /lib/x86_64-linux-gnu/libglusterfs.so.0(+0x7971d)[0x7f74f28bc71d] >> /lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7)[0x7f74f27d2ea7] >> /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f74f26f2aef] >> --------- >> The mount point wasn't accessible with the "Transport endpoint is not >> connected" message and it was shown like this. >> d????????? ? ? ? ? ? vmdata >> >> I had to stop all the VMs on that proxmox node, then stop the gluster >> daemon to ummount de directory, and after starting the daemon and >> re-mounting, all was working again. >> >> My gluster volume info returns this >> >> Volume Name: vmdata >> Type: Distributed-Disperse >> Volume ID: cace5aa4-b13a-4750-8736-aa179c2485e1 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 2 x (2 + 1) = 6 >> Transport-type: tcp >> Bricks: >> Brick1: g01:/data/brick1/brick >> Brick2: g02:/data/brick2/brick >> Brick3: g03:/data/brick1/brick >> Brick4: g01:/data/brick2/brick >> Brick5: g02:/data/brick1/brick >> Brick6: g03:/data/brick2/brick >> Options Reconfigured: >> nfs.disable: on >> transport.address-family: inet >> storage.fips-mode-rchecksum: on >> features.shard: enable >> features.shard-block-size: 256MB >> performance.read-ahead: off >> performance.quick-read: off >> performance.io-cache: off >> server.event-threads: 2 >> client.event-threads: 3 >> performance.client-io-threads: on >> performance.stat-prefetch: off >> dht.force-readdirp: off >> performance.force-readdirp: off >> network.remote-dio: on >> features.cache-invalidation: on >> performance.parallel-readdir: on >> performance.readdir-ahead: on >> >> Xavi, do you think the open-behind off setting can help somehow? I did >> try to understand what it does (with no luck), and if it could impact the >> performance of my VMs (I've the setup you know so well ;)) >> I would like to avoid more crashings like this, version 10.3 of gluster >> was working since two weeks ago, quite well until this morning. >> > > I don't think disabling open-behind will have any visible effect on > performance. Open-behind is only useful for small files when the workload > is mostly open + read + close, and quick-read is also enabled (which is not > your case). The only effect it will have is that the latency "saved" during > open is "paid" on the next operation sent to the file, so the total overall > latency should be the same. Additionally, VM workload doesn't open files > frequently, so it shouldn't matter much in any case. > > That said, I'm not sure if the problem is the same in your case. Based on > the stack of the crash, it seems an issue inside the disperse module. > > What OS are you using ? are you using official packages ? if so, which > ones ? > > Is it possible to provide a backtrace from gdb ? > > Regards, > > Xavi > > >> *Angel Docampo* >> >> <https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021> >> <[email protected]> <+34-93-1592929> >> >> >> El vie, 19 mar 2021 a las 2:10, David Cunningham (< >> [email protected]>) escribió: >> >>> Hi Xavi, >>> >>> Thank you for that information. We'll look at upgrading it. >>> >>> >>> On Fri, 12 Mar 2021 at 05:20, Xavi Hernandez <[email protected]> >>> wrote: >>> >>>> Hi David, >>>> >>>> with so little information it's hard to tell, but given that there are >>>> several OPEN and UNLINK operations, it could be related to an already fixed >>>> bug (in recent versions) in open-behind. >>>> >>>> You can try disabling open-behind with this command: >>>> >>>> # gluster volume set <volname> open-behind off >>>> >>>> But given the version you are using is very old and unmaintained, I >>>> would recommend you to upgrade to 8.x at least. >>>> >>>> Regards, >>>> >>>> Xavi >>>> >>>> >>>> On Wed, Mar 10, 2021 at 5:10 AM David Cunningham < >>>> [email protected]> wrote: >>>> >>>>> Hello, >>>>> >>>>> We have a GlusterFS 5.13 server which also mounts itself with the >>>>> native FUSE client. Recently the FUSE mount crashed and we found the >>>>> following in the syslog. There isn't anything logged in mnt-glusterfs.log >>>>> for that time. After killing all processes with a file handle open on the >>>>> filesystem we were able to unmount and then remount the filesystem >>>>> successfully. >>>>> >>>>> Would anyone have advice on how to debug this crash? Thank you in >>>>> advance! >>>>> >>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: pending frames: >>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(0) op(0) >>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(0) op(0) >>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1) op(UNLINK) >>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1) op(UNLINK) >>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1) op(OPEN) >>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: message repeated 3355 times: >>>>> [ frame : type(1) op(OPEN)] >>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1) op(OPEN) >>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: message repeated 6965 times: >>>>> [ frame : type(1) op(OPEN)] >>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1) op(OPEN) >>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: message repeated 4095 times: >>>>> [ frame : type(1) op(OPEN)] >>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(0) op(0) >>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: patchset: git:// >>>>> git.gluster.org/glusterfs.git >>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: signal received: 11 >>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: time of crash: >>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: 2021-03-09 03:12:31 >>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: configuration details: >>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: argp 1 >>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: backtrace 1 >>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: dlfcn 1 >>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: libpthread 1 >>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: llistxattr 1 >>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: setfsid 1 >>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: spinlock 1 >>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: epoll.h 1 >>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: xattr.h 1 >>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: st_atim.tv_nsec 1 >>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: package-string: glusterfs >>>>> 5.13 >>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: --------- >>>>> ... >>>>> Mar 9 05:13:50 voip1 systemd[1]: glusterfssharedstorage.service: Main >>>>> process exited, code=killed, status=11/SEGV >>>>> Mar 9 05:13:50 voip1 systemd[1]: glusterfssharedstorage.service: >>>>> Failed with result 'signal'. >>>>> ... >>>>> Mar 9 05:13:54 voip1 systemd[1]: glusterfssharedstorage.service: >>>>> Service hold-off time over, scheduling restart. >>>>> Mar 9 05:13:54 voip1 systemd[1]: glusterfssharedstorage.service: >>>>> Scheduled restart job, restart counter is at 2. >>>>> Mar 9 05:13:54 voip1 systemd[1]: Stopped Mount glusterfs sharedstorage. >>>>> Mar 9 05:13:54 voip1 systemd[1]: Starting Mount glusterfs >>>>> sharedstorage... >>>>> Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]: ERROR: Mount >>>>> point does not exist >>>>> Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]: Please specify a >>>>> mount point >>>>> Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]: Usage: >>>>> Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]: man 8 >>>>> /sbin/mount.glusterfs >>>>> >>>>> -- >>>>> David Cunningham, Voisonics Limited >>>>> http://voisonics.com/ >>>>> USA: +1 213 221 1092 >>>>> New Zealand: +64 (0)28 2558 3782 >>>>> ________ >>>>> >>>>> >>>>> >>>>> Community Meeting Calendar: >>>>> >>>>> Schedule - >>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>>>> Bridge: https://meet.google.com/cpu-eiue-hvk >>>>> Gluster-users mailing list >>>>> [email protected] >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>> >>>> >>> >>> -- >>> David Cunningham, Voisonics Limited >>> http://voisonics.com/ >>> USA: +1 213 221 1092 >>> New Zealand: +64 (0)28 2558 3782 >>> ________ >>> >>> >>> >>> Community Meeting Calendar: >>> >>> Schedule - >>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>> Bridge: https://meet.google.com/cpu-eiue-hvk >>> Gluster-users mailing list >>> [email protected] >>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>
________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-users
