Re: [ClusterLabs] Antw: [EXT] Re: Antw: Hanging OCFS2 Filesystem any one else?
On 7/9/21 3:56 PM, Ulrich Windl wrote: [...] h19 kernel: Out of memory: Killed process 6838 (corosync) total-vm:261212kB, anon-rss:31444kB, file-rss:7700kB, shmem-rss:121872kB I doubt that was the best possible choice ;-) The dead corosync caused the DC (h18) to fence h19 (which was successful), but the DC was fenced while it tried to recover resources, so the complete cluster rebooted. Hi Ulrich, Any clue, why DC(h18) get fenced, "suicide"? Does h18 become inquorate without h19 and the by default `no-quorum-policy=stop` kicks in? BTW, `no-quorum-policy=freeze` is the general suggestion for ocfs2 and gfs2. BR, Roger ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Antw: [EXT] Re: Antw: Hanging OCFS2 Filesystem any one else?
Hi Ulrich, Thank for your update. Based on some feedback from the upstream, there is a patch (ocfs2: initialize ip_next_orphan), which should fix this problem. I can comfirm the patch looks very similar with your problem. I will verify it next week, then let you know the result. Thanks Gang From: Users on behalf of Ulrich Windl Sent: Friday, July 9, 2021 15:56 To: users@clusterlabs.org Subject: [ClusterLabs] Antw: [EXT] Re: Antw: Hanging OCFS2 Filesystem any one else? Hi! An update on the issue: SUSE support found out that the reason for the hanging processes is a deadlock caused by a race condition (Kernel 5.3.18-24.64-default). Support is working on a fix. Today the cluster "fixed" the problem in an unusual way: h19 kernel: Out of memory: Killed process 6838 (corosync) total-vm:261212kB, anon-rss:31444kB, file-rss:7700kB, shmem-rss:121872kB I doubt that was the best possible choice ;-) The dead corosync caused the DC (h18) to fence h19 (which was successful), but the DC was fenced while it tried to recover resources, so the complete cluster rebooted. Regards, Ulrich ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Antw: [EXT] Re: Antw: Hanging OCFS2 Filesystem any one else?
Hi! An update on the issue: SUSE support found out that the reason for the hanging processes is a deadlock caused by a race condition (Kernel 5.3.18-24.64-default). Support is working on a fix. Today the cluster "fixed" the problem in an unusual way: h19 kernel: Out of memory: Killed process 6838 (corosync) total-vm:261212kB, anon-rss:31444kB, file-rss:7700kB, shmem-rss:121872kB I doubt that was the best possible choice ;-) The dead corosync caused the DC (h18) to fence h19 (which was successful), but the DC was fenced while it tried to recover resources, so the complete cluster rebooted. Regards, Ulrich ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Antw: [EXT] Re: Antw: Hanging OCFS2 Filesystem any one else?
Hi Ulrich, On 2021/6/15 17:01, Ulrich Windl wrote: Hi Guys! Just to keep you informed on the issue: I was informed that I'm not the only one seeing this problem, and there seems to be some "negative interference" between BtrFS reorganizing its extents periodically and OCFS2 making reflink snapshots (a local cron job here) in current SUSE SLES kernels. It seems that happens almost exactly at 0:00 o' clock. We encountered the same hang in local environment, the problem looks like caused by btrfs btrfs-balance job run, but I need to crash the kernel for the further analysis. Hi Ulrich, do you know how to reproduce this hang stably? e.g. run reflink snapshot script and trigger the btrfs-balance job Thanks Gang The only thing that BtrFS and OCFS2 have in common here is that BtrFS provides the mount point for OCFS2. Regards, Ulrich Ulrich Windl schrieb am 02.06.2021 um 11:00 in Nachricht <60B748A4.E0C : 161 : 60728>: Gang He schrieb am 02.06.2021 um 08:34 in Nachricht om> Hi Ulrich, The hang problem looks like a fix (90bd070aae6c4fb5d302f9c4b9c88be60c8197ec ocfs2: fix deadlock between setattr and dio_end_io_write), but it is not 100% sure. If possible, could you help to report a bug to SUSE, then we can work on that further. Hi! Actually a service request for the issue is open at SUSE. However I don't know which L3 engineer is working on it. I have some "funny" effects, like these: On one node "ls" hangs, but can be interrupted with ^C; on another node "ls" also hangs, but cannot be stopped with ^C or ^Z (Most processes cannot even be killed with "kill -9") "ls" on the directory also hangs, just as an "rm" for a non-existent file What I really wonder is what triggered the effect, and more importantly how to recover from it. Initially I had suspected a rather full (95%) flesystem, but that means there are still 24GB available. The other suspect was concurrent creation of reflink snapshots while the file being snapshot did change (e.g. allocate a hole in a sparse file) Regards, Ulrich Thanks Gang From: Users on behalf of Ulrich Windl Sent: Tuesday, June 1, 2021 15:14 To: users@clusterlabs.org Subject: [ClusterLabs] Antw: Hanging OCFS2 Filesystem any one else? Ulrich Windl schrieb am 31.05.2021 um 12:11 in Nachricht <60B4B65A.A8F : 161 : 60728>: Hi! We have an OCFS2 filesystem shared between three cluster nodes (SLES 15 SP2, Kernel 5.3.18‑24.64‑default). The filesystem is filled up to about 95%, and we have an odd effect: A stat() systemcall to some of the files hangs indefinitely (state "D"). ("ls ‑l" and "rm" also hang, but I suspect those are calling state() internally, too). My only suspect is that the effect might be related to the 95% being used. The other suspect is that concurrent reflink calls may trigger the effect. Did anyone else experience something similar? Hi! I have some details: It seems there is a reader/writer deadlock trying to allocate additional blocks for a file. The stacktrace looks like this: Jun 01 07:56:31 h16 kernel: rwsem_down_write_slowpath+0x251/0x620 Jun 01 07:56:31 h16 kernel: ? __ocfs2_change_file_space+0xb3/0x620 [ocfs2] Jun 01 07:56:31 h16 kernel: __ocfs2_change_file_space+0xb3/0x620 [ocfs2] Jun 01 07:56:31 h16 kernel: ocfs2_fallocate+0x82/0xa0 [ocfs2] Jun 01 07:56:31 h16 kernel: vfs_fallocate+0x13f/0x2a0 Jun 01 07:56:31 h16 kernel: ksys_fallocate+0x3c/0x70 Jun 01 07:56:31 h16 kernel: __x64_sys_fallocate+0x1a/0x20 Jun 01 07:56:31 h16 kernel: do_syscall_64+0x5b/0x1e0 That is the only writer (on that host), bit there are multiple readers like this: Jun 01 07:56:31 h16 kernel: rwsem_down_read_slowpath+0x172/0x300 Jun 01 07:56:31 h16 kernel: ? dput+0x2c/0x2f0 Jun 01 07:56:31 h16 kernel: ? lookup_slow+0x27/0x50 Jun 01 07:56:31 h16 kernel: lookup_slow+0x27/0x50 Jun 01 07:56:31 h16 kernel: walk_component+0x1c4/0x300 Jun 01 07:56:31 h16 kernel: ? path_init+0x192/0x320 Jun 01 07:56:31 h16 kernel: path_lookupat+0x6e/0x210 Jun 01 07:56:31 h16 kernel: ? __put_lkb+0x45/0xd0 [dlm] Jun 01 07:56:31 h16 kernel: filename_lookup+0xb6/0x190 Jun 01 07:56:31 h16 kernel: ? kmem_cache_alloc+0x3d/0x250 Jun 01 07:56:31 h16 kernel: ? getname_flags+0x66/0x1d0 Jun 01 07:56:31 h16 kernel: ? vfs_statx+0x73/0xe0 Jun 01 07:56:31 h16 kernel: vfs_statx+0x73/0xe0 Jun 01 07:56:31 h16 kernel: ? fsnotify_grab_connector+0x46/0x80 Jun 01 07:56:31 h16 kernel: __do_sys_newstat+0x39/0x70 Jun 01 07:56:31 h16 kernel: ? do_unlinkat+0x92/0x320 Jun 01 07:56:31 h16 kernel: do_syscall_64+0x5b/0x1e0 So that will match the hanging stat() quite nicely! However the PID displayed as holding the writer does not exist in the system (on that node). Regards, Ulrich Regards, Ulrich ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Antw: [EXT] Re: Antw: Hanging OCFS2 Filesystem any one else?
Thanks for the update. Could it be something local to your environment ? Have you checked mounting the OCFS2 on a vanilla system ? Best Regards,Strahil Nikolov On Tue, Jun 15, 2021 at 12:01, Ulrich Windl wrote: Hi Guys! Just to keep you informed on the issue: I was informed that I'm not the only one seeing this problem, and there seems to be some "negative interference" between BtrFS reorganizing its extents periodically and OCFS2 making reflink snapshots (a local cron job here) in current SUSE SLES kernels. It seems that happens almost exactly at 0:00 o' clock. The only thing that BtrFS and OCFS2 have in common here is that BtrFS provides the mount point for OCFS2. Regards, Ulrich >>> Ulrich Windl schrieb am 02.06.2021 um 11:00 in Nachricht <60B748A4.E0C : 161 : 60728>: Gang He schrieb am 02.06.2021 um 08:34 in Nachricht > om> > > > Hi Ulrich, > > > > The hang problem looks like a fix (90bd070aae6c4fb5d302f9c4b9c88be60c8197ec > > ocfs2: fix deadlock between setattr and dio_end_io_write), but it is not > 100% > > sure. > > If possible, could you help to report a bug to SUSE, then we can work on > > that further. > > Hi! > > Actually a service request for the issue is open at SUSE. However I don't > know which L3 engineer is working on it. > I have some "funny" effects, like these: > On one node "ls" hangs, but can be interrupted with ^C; on another node "ls" > also hangs, but cannot be stopped with ^C or ^Z > (Most processes cannot even be killed with "kill -9") > "ls" on the directory also hangs, just as an "rm" for a non-existent file > > What I really wonder is what triggered the effect, and more importantly how > to recover from it. > Initially I had suspected a rather full (95%) flesystem, but that means > there are still 24GB available. > The other suspect was concurrent creation of reflink snapshots while the > file being snapshot did change (e.g. allocate a hole in a sparse file) > > Regards, > Ulrich > > > > > Thanks > > Gang > > > > > > From: Users on behalf of Ulrich Windl > > > > Sent: Tuesday, June 1, 2021 15:14 > > To: users@clusterlabs.org > > Subject: [ClusterLabs] Antw: Hanging OCFS2 Filesystem any one else? > > > Ulrich Windl schrieb am 31.05.2021 um 12:11 in Nachricht <60B4B65A.A8F : 161 > > > : > > 60728>: > >> Hi! > >> > >> We have an OCFS2 filesystem shared between three cluster nodes (SLES 15 SP2, > >> Kernel 5.3.18‑24.64‑default). The filesystem is filled up to about 95%, and > >> we have an odd effect: > >> A stat() systemcall to some of the files hangs indefinitely (state "D"). > >> ("ls ‑l" and "rm" also hang, but I suspect those are calling state() > >> internally, too). > >> My only suspect is that the effect might be related to the 95% being used. > >> The other suspect is that concurrent reflink calls may trigger the effect. > >> > >> Did anyone else experience something similar? > > > > Hi! > > > > I have some details: > > It seems there is a reader/writer deadlock trying to allocate additional > > blocks for a file. > > The stacktrace looks like this: > > Jun 01 07:56:31 h16 kernel: rwsem_down_write_slowpath+0x251/0x620 > > Jun 01 07:56:31 h16 kernel: ? __ocfs2_change_file_space+0xb3/0x620 [ocfs2] > > Jun 01 07:56:31 h16 kernel: __ocfs2_change_file_space+0xb3/0x620 [ocfs2] > > Jun 01 07:56:31 h16 kernel: ocfs2_fallocate+0x82/0xa0 [ocfs2] > > Jun 01 07:56:31 h16 kernel: vfs_fallocate+0x13f/0x2a0 > > Jun 01 07:56:31 h16 kernel: ksys_fallocate+0x3c/0x70 > > Jun 01 07:56:31 h16 kernel: __x64_sys_fallocate+0x1a/0x20 > > Jun 01 07:56:31 h16 kernel: do_syscall_64+0x5b/0x1e0 > > > > That is the only writer (on that host), bit there are multiple readers like > > this: > > Jun 01 07:56:31 h16 kernel: rwsem_down_read_slowpath+0x172/0x300 > > Jun 01 07:56:31 h16 kernel: ? dput+0x2c/0x2f0 > > Jun 01 07:56:31 h16 kernel: ? lookup_slow+0x27/0x50 > > Jun 01 07:56:31 h16 kernel: lookup_slow+0x27/0x50 > > Jun 01 07:56:31 h16 kernel: walk_component+0x1c4/0x300 > > Jun 01 07:56:31 h16 kernel: ? path_init+0x192/0x320 > > Jun 01 07:56:31 h16 kernel: path_lookupat+0x6e/0x210 > > Jun 01 07:56:31 h16 kernel: ? __put_lkb+0x45/0xd0 [dlm] > > Jun 01 07:56:31 h16 kernel: filename_lookup+0xb6/0x190 > > Jun 01 07:56:31 h16 kernel: ? kmem_cache_alloc+0x3d/0x250 > > Jun 01 07:56:31 h16 kernel: ? getname_flags+0x66/0x1d0 > > Jun 01 07:56:31 h16 kernel: ? vfs_statx+0x73/0xe0 > > Jun 01 07:56:31 h16 kernel: vfs_statx+0x73/0xe0 > > Jun 01 07:56:31 h16 kernel: ? fsnotify_grab_connector+0x46/0x80 > > Jun 01 07:56:31 h16 kernel: __do_sys_newstat+0x39/0x70 > > Jun 01 07:56:31 h16 kernel: ? do_unlinkat+0x92/0x320 > > Jun 01 07:56:31 h16 kernel: do_syscall_64+0x5b/0x1e0 > > > > So that will match the hanging stat() quite nicely! > > > > However the PID displayed as holding the writer does not exist in the system > > > (on that node). > > > > Regards, > > Ulrich > > > > > >> > >> Regards,
[ClusterLabs] Antw: [EXT] Re: Antw: Hanging OCFS2 Filesystem any one else?
Hi Guys! Just to keep you informed on the issue: I was informed that I'm not the only one seeing this problem, and there seems to be some "negative interference" between BtrFS reorganizing its extents periodically and OCFS2 making reflink snapshots (a local cron job here) in current SUSE SLES kernels. It seems that happens almost exactly at 0:00 o' clock. The only thing that BtrFS and OCFS2 have in common here is that BtrFS provides the mount point for OCFS2. Regards, Ulrich >>> Ulrich Windl schrieb am 02.06.2021 um 11:00 in Nachricht <60B748A4.E0C : 161 : 60728>: Gang He schrieb am 02.06.2021 um 08:34 in Nachricht > om> > > > Hi Ulrich, > > > > The hang problem looks like a fix (90bd070aae6c4fb5d302f9c4b9c88be60c8197ec > > ocfs2: fix deadlock between setattr and dio_end_io_write), but it is not > 100% > > sure. > > If possible, could you help to report a bug to SUSE, then we can work on > > that further. > > Hi! > > Actually a service request for the issue is open at SUSE. However I don't > know which L3 engineer is working on it. > I have some "funny" effects, like these: > On one node "ls" hangs, but can be interrupted with ^C; on another node "ls" > also hangs, but cannot be stopped with ^C or ^Z > (Most processes cannot even be killed with "kill -9") > "ls" on the directory also hangs, just as an "rm" for a non-existent file > > What I really wonder is what triggered the effect, and more importantly how > to recover from it. > Initially I had suspected a rather full (95%) flesystem, but that means > there are still 24GB available. > The other suspect was concurrent creation of reflink snapshots while the > file being snapshot did change (e.g. allocate a hole in a sparse file) > > Regards, > Ulrich > > > > > Thanks > > Gang > > > > > > From: Users on behalf of Ulrich Windl > > > > Sent: Tuesday, June 1, 2021 15:14 > > To: users@clusterlabs.org > > Subject: [ClusterLabs] Antw: Hanging OCFS2 Filesystem any one else? > > > Ulrich Windl schrieb am 31.05.2021 um 12:11 in Nachricht <60B4B65A.A8F : 161 > > > : > > 60728>: > >> Hi! > >> > >> We have an OCFS2 filesystem shared between three cluster nodes (SLES 15 SP2, > >> Kernel 5.3.18‑24.64‑default). The filesystem is filled up to about 95%, and > >> we have an odd effect: > >> A stat() systemcall to some of the files hangs indefinitely (state "D"). > >> ("ls ‑l" and "rm" also hang, but I suspect those are calling state() > >> internally, too). > >> My only suspect is that the effect might be related to the 95% being used. > >> The other suspect is that concurrent reflink calls may trigger the effect. > >> > >> Did anyone else experience something similar? > > > > Hi! > > > > I have some details: > > It seems there is a reader/writer deadlock trying to allocate additional > > blocks for a file. > > The stacktrace looks like this: > > Jun 01 07:56:31 h16 kernel: rwsem_down_write_slowpath+0x251/0x620 > > Jun 01 07:56:31 h16 kernel: ? __ocfs2_change_file_space+0xb3/0x620 [ocfs2] > > Jun 01 07:56:31 h16 kernel: __ocfs2_change_file_space+0xb3/0x620 [ocfs2] > > Jun 01 07:56:31 h16 kernel: ocfs2_fallocate+0x82/0xa0 [ocfs2] > > Jun 01 07:56:31 h16 kernel: vfs_fallocate+0x13f/0x2a0 > > Jun 01 07:56:31 h16 kernel: ksys_fallocate+0x3c/0x70 > > Jun 01 07:56:31 h16 kernel: __x64_sys_fallocate+0x1a/0x20 > > Jun 01 07:56:31 h16 kernel: do_syscall_64+0x5b/0x1e0 > > > > That is the only writer (on that host), bit there are multiple readers like > > this: > > Jun 01 07:56:31 h16 kernel: rwsem_down_read_slowpath+0x172/0x300 > > Jun 01 07:56:31 h16 kernel: ? dput+0x2c/0x2f0 > > Jun 01 07:56:31 h16 kernel: ? lookup_slow+0x27/0x50 > > Jun 01 07:56:31 h16 kernel: lookup_slow+0x27/0x50 > > Jun 01 07:56:31 h16 kernel: walk_component+0x1c4/0x300 > > Jun 01 07:56:31 h16 kernel: ? path_init+0x192/0x320 > > Jun 01 07:56:31 h16 kernel: path_lookupat+0x6e/0x210 > > Jun 01 07:56:31 h16 kernel: ? __put_lkb+0x45/0xd0 [dlm] > > Jun 01 07:56:31 h16 kernel: filename_lookup+0xb6/0x190 > > Jun 01 07:56:31 h16 kernel: ? kmem_cache_alloc+0x3d/0x250 > > Jun 01 07:56:31 h16 kernel: ? getname_flags+0x66/0x1d0 > > Jun 01 07:56:31 h16 kernel: ? vfs_statx+0x73/0xe0 > > Jun 01 07:56:31 h16 kernel: vfs_statx+0x73/0xe0 > > Jun 01 07:56:31 h16 kernel: ? fsnotify_grab_connector+0x46/0x80 > > Jun 01 07:56:31 h16 kernel: __do_sys_newstat+0x39/0x70 > > Jun 01 07:56:31 h16 kernel: ? do_unlinkat+0x92/0x320 > > Jun 01 07:56:31 h16 kernel: do_syscall_64+0x5b/0x1e0 > > > > So that will match the hanging stat() quite nicely! > > > > However the PID displayed as holding the writer does not exist in the system > > > (on that node). > > > > Regards, > > Ulrich > > > > > >> > >> Regards, > >> Ulrich > >> > >> > >> > >> > > > > > > > > > > ___ > > Manage your subscription: > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > ClusterLabs home:
[ClusterLabs] Antw: [EXT] Re: Antw: Hanging OCFS2 Filesystem any one else?
>>> Gang He schrieb am 02.06.2021 um 08:34 in Nachricht > Hi Ulrich, > > The hang problem looks like a fix (90bd070aae6c4fb5d302f9c4b9c88be60c8197ec > ocfs2: fix deadlock between setattr and dio_end_io_write), but it is not 100% > sure. > If possible, could you help to report a bug to SUSE, then we can work on > that further. Hi! Actually a service request for the issue is open at SUSE. However I don't know which L3 engineer is working on it. I have some "funny" effects, like these: On one node "ls" hangs, but can be interrupted with ^C; on another node "ls" also hangs, but cannot be stopped with ^C or ^Z (Most processes cannot even be killed with "kill -9") "ls" on the directory also hangs, just as an "rm" for a non-existent file What I really wonder is what triggered the effect, and more importantly how to recover from it. Initially I had suspected a rather full (95%) flesystem, but that means there are still 24GB available. The other suspect was concurrent creation of reflink snapshots while the file being snapshot did change (e.g. allocate a hole in a sparse file) Regards, Ulrich > > Thanks > Gang > > > From: Users on behalf of Ulrich Windl > > Sent: Tuesday, June 1, 2021 15:14 > To: users@clusterlabs.org > Subject: [ClusterLabs] Antw: Hanging OCFS2 Filesystem any one else? > Ulrich Windl schrieb am 31.05.2021 um 12:11 in Nachricht <60B4B65A.A8F : 161 > : > 60728>: >> Hi! >> >> We have an OCFS2 filesystem shared between three cluster nodes (SLES 15 SP2, >> Kernel 5.3.18‑24.64‑default). The filesystem is filled up to about 95%, and >> we have an odd effect: >> A stat() systemcall to some of the files hangs indefinitely (state "D"). >> ("ls ‑l" and "rm" also hang, but I suspect those are calling state() >> internally, too). >> My only suspect is that the effect might be related to the 95% being used. >> The other suspect is that concurrent reflink calls may trigger the effect. >> >> Did anyone else experience something similar? > > Hi! > > I have some details: > It seems there is a reader/writer deadlock trying to allocate additional > blocks for a file. > The stacktrace looks like this: > Jun 01 07:56:31 h16 kernel: rwsem_down_write_slowpath+0x251/0x620 > Jun 01 07:56:31 h16 kernel: ? __ocfs2_change_file_space+0xb3/0x620 [ocfs2] > Jun 01 07:56:31 h16 kernel: __ocfs2_change_file_space+0xb3/0x620 [ocfs2] > Jun 01 07:56:31 h16 kernel: ocfs2_fallocate+0x82/0xa0 [ocfs2] > Jun 01 07:56:31 h16 kernel: vfs_fallocate+0x13f/0x2a0 > Jun 01 07:56:31 h16 kernel: ksys_fallocate+0x3c/0x70 > Jun 01 07:56:31 h16 kernel: __x64_sys_fallocate+0x1a/0x20 > Jun 01 07:56:31 h16 kernel: do_syscall_64+0x5b/0x1e0 > > That is the only writer (on that host), bit there are multiple readers like > this: > Jun 01 07:56:31 h16 kernel: rwsem_down_read_slowpath+0x172/0x300 > Jun 01 07:56:31 h16 kernel: ? dput+0x2c/0x2f0 > Jun 01 07:56:31 h16 kernel: ? lookup_slow+0x27/0x50 > Jun 01 07:56:31 h16 kernel: lookup_slow+0x27/0x50 > Jun 01 07:56:31 h16 kernel: walk_component+0x1c4/0x300 > Jun 01 07:56:31 h16 kernel: ? path_init+0x192/0x320 > Jun 01 07:56:31 h16 kernel: path_lookupat+0x6e/0x210 > Jun 01 07:56:31 h16 kernel: ? __put_lkb+0x45/0xd0 [dlm] > Jun 01 07:56:31 h16 kernel: filename_lookup+0xb6/0x190 > Jun 01 07:56:31 h16 kernel: ? kmem_cache_alloc+0x3d/0x250 > Jun 01 07:56:31 h16 kernel: ? getname_flags+0x66/0x1d0 > Jun 01 07:56:31 h16 kernel: ? vfs_statx+0x73/0xe0 > Jun 01 07:56:31 h16 kernel: vfs_statx+0x73/0xe0 > Jun 01 07:56:31 h16 kernel: ? fsnotify_grab_connector+0x46/0x80 > Jun 01 07:56:31 h16 kernel: __do_sys_newstat+0x39/0x70 > Jun 01 07:56:31 h16 kernel: ? do_unlinkat+0x92/0x320 > Jun 01 07:56:31 h16 kernel: do_syscall_64+0x5b/0x1e0 > > So that will match the hanging stat() quite nicely! > > However the PID displayed as holding the writer does not exist in the system > (on that node). > > Regards, > Ulrich > > >> >> Regards, >> Ulrich >> >> >> >> > > > > > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > > > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/