subject:"\[ClusterLabs\] Antw\: \[EXT\] Re\: Antw\: Hanging OCFS2 Filesystem any one else\?"

Re: [ClusterLabs] Antw: [EXT] Re: Antw: Hanging OCFS2 Filesystem any one else?

2021-07-12 Thread Roger Zhou




On 7/9/21 3:56 PM, Ulrich Windl wrote:

[...]


h19 kernel: Out of memory: Killed process 6838 (corosync) total-vm:261212kB, 
anon-rss:31444kB, file-rss:7700kB, shmem-rss:121872kB

I doubt that was the best possible choice ;-)

The dead corosync caused the DC (h18) to fence h19 (which was successful), but 
the DC was fenced while it tried to recover resources, so the complete cluster 
rebooted.



Hi Ulrich,

Any clue, why DC(h18) get fenced, "suicide"? Does h18 become inquorate without 
h19 and the by default `no-quorum-policy=stop` kicks in?

BTW, `no-quorum-policy=freeze` is the general suggestion for ocfs2 and gfs2.

BR,
Roger

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: [EXT] Re: Antw: Hanging OCFS2 Filesystem any one else?

2021-07-11 Thread Gang He

Hi Ulrich,

Thank for your update.
Based on some feedback from the upstream, there is a patch (ocfs2: initialize 
ip_next_orphan), which should fix this problem.
I can comfirm the patch looks very similar with your problem.
I will verify it next week, then let you know the result.

Thanks
Gang


From: Users  on behalf of Ulrich Windl 

Sent: Friday, July 9, 2021 15:56
To: users@clusterlabs.org
Subject: [ClusterLabs] Antw: [EXT] Re: Antw: Hanging OCFS2 Filesystem any one 
else?

Hi!

An update on the issue:
SUSE support found out that the reason for the hanging processes is a deadlock 
caused by a race condition (Kernel 5.3.18-24.64-default). Support is working on 
a fix.
Today the cluster "fixed" the problem in an unusual way:

h19 kernel: Out of memory: Killed process 6838 (corosync) total-vm:261212kB, 
anon-rss:31444kB, file-rss:7700kB, shmem-rss:121872kB

I doubt that was the best possible choice ;-)

The dead corosync caused the DC (h18) to fence h19 (which was successful), but 
the DC was fenced while it tried to recover resources, so the complete cluster 
rebooted.

Regards,
Ulrich




___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Antw: [EXT] Re: Antw: Hanging OCFS2 Filesystem any one else?

2021-07-09 Thread Ulrich Windl

Hi!

An update on the issue:
SUSE support found out that the reason for the hanging processes is a deadlock 
caused by a race condition (Kernel 5.3.18-24.64-default). Support is working on 
a fix.
Today the cluster "fixed" the problem in an unusual way:

h19 kernel: Out of memory: Killed process 6838 (corosync) total-vm:261212kB, 
anon-rss:31444kB, file-rss:7700kB, shmem-rss:121872kB

I doubt that was the best possible choice ;-)

The dead corosync caused the DC (h18) to fence h19 (which was successful), but 
the DC was fenced while it tried to recover resources, so the complete cluster 
rebooted.

Regards,
Ulrich




___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: [EXT] Re: Antw: Hanging OCFS2 Filesystem any one else?

2021-06-16 Thread Gang He


Hi Ulrich,

On 2021/6/15 17:01, Ulrich Windl wrote:

Hi Guys!

Just to keep you informed on the issue:
I was informed that I'm not the only one seeing this problem, and there seems
to be some "negative interference" between BtrFS reorganizing its extents
periodically and OCFS2 making reflink snapshots (a local cron job here) in
current SUSE SLES kernels. It seems that happens almost exactly at 0:00 o'
clock.
We encountered the same hang in local environment, the problem looks 
like caused by btrfs btrfs-balance job run, but I need to crash the 
kernel for the further analysis.
Hi Ulrich, do you know how to reproduce this hang stably? e.g. run 
reflink snapshot script and trigger the btrfs-balance job



Thanks
Gang



The only thing that BtrFS and OCFS2 have in common here is that BtrFS provides
the mount point for OCFS2.

Regards,
Ulrich


Ulrich Windl schrieb am 02.06.2021 um 11:00 in Nachricht <60B748A4.E0C :

161 :
60728>:

Gang He  schrieb am 02.06.2021 um 08:34 in Nachricht




om>


Hi Ulrich,

The hang problem looks like a fix

(90bd070aae6c4fb5d302f9c4b9c88be60c8197ec

ocfs2: fix deadlock between setattr and dio_end_io_write), but it is not

100%

sure.
If possible, could you help to report a bug to SUSE, then we can work on
that further.


Hi!

Actually a service request for the issue is open at SUSE. However I don't
know which L3 engineer is working on it.
I have some "funny" effects, like these:
On one node "ls" hangs, but can be interrupted with ^C; on another node "ls"



also hangs, but cannot be stopped with ^C or ^Z
(Most processes cannot even be killed with "kill -9")
"ls" on the directory also hangs, just as an "rm" for a non-existent file

What I really wonder is what triggered the effect, and more importantly  how



to recover from it.
Initially I had suspected a rather full (95%) flesystem, but that means
there are still 24GB available.
The other suspect was concurrent creation of reflink snapshots while the
file being snapshot did change (e.g. allocate a hole in a sparse file)

Regards,
Ulrich



Thanks
Gang


From: Users  on behalf of Ulrich Windl

Sent: Tuesday, June 1, 2021 15:14
To: users@clusterlabs.org
Subject: [ClusterLabs] Antw: Hanging OCFS2 Filesystem any one else?


Ulrich Windl schrieb am 31.05.2021 um 12:11 in Nachricht <60B4B65A.A8F

: 161



:
60728>:

Hi!

We have an OCFS2 filesystem shared between three cluster nodes (SLES 15

SP2,

Kernel 5.3.18‑24.64‑default). The filesystem is filled up to about 95%,

and

we have an odd effect:
A stat() systemcall to some of the files hangs indefinitely (state "D").
("ls ‑l" and "rm" also hang, but I suspect those are calling state()
internally, too).
My only suspect is that the effect might be related to the 95% being

used.

The other suspect is that concurrent reflink calls may trigger the

effect.


Did anyone else experience something similar?


Hi!

I have some details:
It seems there is a reader/writer deadlock trying to allocate additional
blocks for a file.
The stacktrace looks like this:
Jun 01 07:56:31 h16 kernel:  rwsem_down_write_slowpath+0x251/0x620
Jun 01 07:56:31 h16 kernel:  ? __ocfs2_change_file_space+0xb3/0x620

[ocfs2]

Jun 01 07:56:31 h16 kernel:  __ocfs2_change_file_space+0xb3/0x620 [ocfs2]
Jun 01 07:56:31 h16 kernel:  ocfs2_fallocate+0x82/0xa0 [ocfs2]
Jun 01 07:56:31 h16 kernel:  vfs_fallocate+0x13f/0x2a0
Jun 01 07:56:31 h16 kernel:  ksys_fallocate+0x3c/0x70
Jun 01 07:56:31 h16 kernel:  __x64_sys_fallocate+0x1a/0x20
Jun 01 07:56:31 h16 kernel:  do_syscall_64+0x5b/0x1e0

That is the only writer (on that host), bit there are multiple readers

like

this:
Jun 01 07:56:31 h16 kernel:  rwsem_down_read_slowpath+0x172/0x300
Jun 01 07:56:31 h16 kernel:  ? dput+0x2c/0x2f0
Jun 01 07:56:31 h16 kernel:  ? lookup_slow+0x27/0x50
Jun 01 07:56:31 h16 kernel:  lookup_slow+0x27/0x50
Jun 01 07:56:31 h16 kernel:  walk_component+0x1c4/0x300
Jun 01 07:56:31 h16 kernel:  ? path_init+0x192/0x320
Jun 01 07:56:31 h16 kernel:  path_lookupat+0x6e/0x210
Jun 01 07:56:31 h16 kernel:  ? __put_lkb+0x45/0xd0 [dlm]
Jun 01 07:56:31 h16 kernel:  filename_lookup+0xb6/0x190
Jun 01 07:56:31 h16 kernel:  ? kmem_cache_alloc+0x3d/0x250
Jun 01 07:56:31 h16 kernel:  ? getname_flags+0x66/0x1d0
Jun 01 07:56:31 h16 kernel:  ? vfs_statx+0x73/0xe0
Jun 01 07:56:31 h16 kernel:  vfs_statx+0x73/0xe0
Jun 01 07:56:31 h16 kernel:  ? fsnotify_grab_connector+0x46/0x80
Jun 01 07:56:31 h16 kernel:  __do_sys_newstat+0x39/0x70
Jun 01 07:56:31 h16 kernel:  ? do_unlinkat+0x92/0x320
Jun 01 07:56:31 h16 kernel:  do_syscall_64+0x5b/0x1e0

So that will match the hanging stat() quite nicely!

However the PID displayed as holding the writer does not exist in the

system



(on that node).

Regards,
Ulrich




Regards,
Ulrich









___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: [EXT] Re: Antw: Hanging OCFS2 Filesystem any one else?

2021-06-15 Thread Strahil Nikolov

Thanks for the update. Could it be something local to your environment ?
Have you checked mounting the OCFS2 on a vanilla system ?
Best Regards,Strahil Nikolov
 
 
  On Tue, Jun 15, 2021 at 12:01, Ulrich 
Windl wrote:   Hi Guys!

Just to keep you informed on the issue:
I was informed that I'm not the only one seeing this problem, and there seems
to be some "negative interference" between BtrFS reorganizing its extents
periodically and OCFS2 making reflink snapshots (a local cron job here) in
current SUSE SLES kernels. It seems that happens almost exactly at 0:00 o'
clock.

The only thing that BtrFS and OCFS2 have in common here is that BtrFS provides
the mount point for OCFS2.

Regards,
Ulrich

>>> Ulrich Windl schrieb am 02.06.2021 um 11:00 in Nachricht <60B748A4.E0C :
161 :
60728>:
 Gang He  schrieb am 02.06.2021 um 08:34 in Nachricht
>
 om>
> 
> > Hi Ulrich,
> > 
> > The hang problem looks like a fix
(90bd070aae6c4fb5d302f9c4b9c88be60c8197ec 
> > ocfs2: fix deadlock between setattr and dio_end_io_write), but it is not 
> 100% 
> > sure.
> > If possible, could you help to report a bug to SUSE, then we can work on 
> > that further.
> 
> Hi!
> 
> Actually a service request for the issue is open at SUSE. However I don't 
> know which L3 engineer is working on it.
> I have some "funny" effects, like these:
> On one node "ls" hangs, but can be interrupted with ^C; on another node "ls"

> also hangs, but cannot be stopped with ^C or ^Z
> (Most processes cannot even be killed with "kill -9")
> "ls" on the directory also hangs, just as an "rm" for a non-existent file
> 
> What I really wonder is what triggered the effect, and more importantly  how

> to recover from it.
> Initially I had suspected a rather full (95%) flesystem, but that means 
> there are still 24GB available.
> The other suspect was concurrent creation of reflink snapshots while the 
> file being snapshot did change (e.g. allocate a hole in a sparse file)
> 
> Regards,
> Ulrich
> 
> > 
> > Thanks
> > Gang
> > 
> > 
> > From: Users  on behalf of Ulrich Windl 
> > 
> > Sent: Tuesday, June 1, 2021 15:14
> > To: users@clusterlabs.org 
> > Subject: [ClusterLabs] Antw: Hanging OCFS2 Filesystem any one else?
> > 
>  Ulrich Windl schrieb am 31.05.2021 um 12:11 in Nachricht <60B4B65A.A8F
: 161 
> 
> > :
> > 60728>:
> >> Hi!
> >>
> >> We have an OCFS2 filesystem shared between three cluster nodes (SLES 15
SP2,
> >> Kernel 5.3.18‑24.64‑default). The filesystem is filled up to about 95%,
and
> >> we have an odd effect:
> >> A stat() systemcall to some of the files hangs indefinitely (state "D").
> >> ("ls ‑l" and "rm" also hang, but I suspect those are calling state()
> >> internally, too).
> >> My only suspect is that the effect might be related to the 95% being
used.
> >> The other suspect is that concurrent reflink calls may trigger the
effect.
> >>
> >> Did anyone else experience something similar?
> > 
> > Hi!
> > 
> > I have some details:
> > It seems there is a reader/writer deadlock trying to allocate additional 
> > blocks for a file.
> > The stacktrace looks like this:
> > Jun 01 07:56:31 h16 kernel:  rwsem_down_write_slowpath+0x251/0x620
> > Jun 01 07:56:31 h16 kernel:  ? __ocfs2_change_file_space+0xb3/0x620
[ocfs2]
> > Jun 01 07:56:31 h16 kernel:  __ocfs2_change_file_space+0xb3/0x620 [ocfs2]
> > Jun 01 07:56:31 h16 kernel:  ocfs2_fallocate+0x82/0xa0 [ocfs2]
> > Jun 01 07:56:31 h16 kernel:  vfs_fallocate+0x13f/0x2a0
> > Jun 01 07:56:31 h16 kernel:  ksys_fallocate+0x3c/0x70
> > Jun 01 07:56:31 h16 kernel:  __x64_sys_fallocate+0x1a/0x20
> > Jun 01 07:56:31 h16 kernel:  do_syscall_64+0x5b/0x1e0
> > 
> > That is the only writer (on that host), bit there are multiple readers
like 
> > this:
> > Jun 01 07:56:31 h16 kernel:  rwsem_down_read_slowpath+0x172/0x300
> > Jun 01 07:56:31 h16 kernel:  ? dput+0x2c/0x2f0
> > Jun 01 07:56:31 h16 kernel:  ? lookup_slow+0x27/0x50
> > Jun 01 07:56:31 h16 kernel:  lookup_slow+0x27/0x50
> > Jun 01 07:56:31 h16 kernel:  walk_component+0x1c4/0x300
> > Jun 01 07:56:31 h16 kernel:  ? path_init+0x192/0x320
> > Jun 01 07:56:31 h16 kernel:  path_lookupat+0x6e/0x210
> > Jun 01 07:56:31 h16 kernel:  ? __put_lkb+0x45/0xd0 [dlm]
> > Jun 01 07:56:31 h16 kernel:  filename_lookup+0xb6/0x190
> > Jun 01 07:56:31 h16 kernel:  ? kmem_cache_alloc+0x3d/0x250
> > Jun 01 07:56:31 h16 kernel:  ? getname_flags+0x66/0x1d0
> > Jun 01 07:56:31 h16 kernel:  ? vfs_statx+0x73/0xe0
> > Jun 01 07:56:31 h16 kernel:  vfs_statx+0x73/0xe0
> > Jun 01 07:56:31 h16 kernel:  ? fsnotify_grab_connector+0x46/0x80
> > Jun 01 07:56:31 h16 kernel:  __do_sys_newstat+0x39/0x70
> > Jun 01 07:56:31 h16 kernel:  ? do_unlinkat+0x92/0x320
> > Jun 01 07:56:31 h16 kernel:  do_syscall_64+0x5b/0x1e0
> > 
> > So that will match the hanging stat() quite nicely!
> > 
> > However the PID displayed as holding the writer does not exist in the
system 
> 
> > (on that node).
> > 
> > Regards,
> > Ulrich
> > 
> > 
> >>
> >> Regards,

[ClusterLabs] Antw: [EXT] Re: Antw: Hanging OCFS2 Filesystem any one else?

2021-06-15 Thread Ulrich Windl

Hi Guys!

Just to keep you informed on the issue:
I was informed that I'm not the only one seeing this problem, and there seems
to be some "negative interference" between BtrFS reorganizing its extents
periodically and OCFS2 making reflink snapshots (a local cron job here) in
current SUSE SLES kernels. It seems that happens almost exactly at 0:00 o'
clock.

The only thing that BtrFS and OCFS2 have in common here is that BtrFS provides
the mount point for OCFS2.

Regards,
Ulrich

>>> Ulrich Windl schrieb am 02.06.2021 um 11:00 in Nachricht <60B748A4.E0C :
161 :
60728>:
 Gang He  schrieb am 02.06.2021 um 08:34 in Nachricht
>
 om>
> 
> > Hi Ulrich,
> > 
> > The hang problem looks like a fix
(90bd070aae6c4fb5d302f9c4b9c88be60c8197ec 
> > ocfs2: fix deadlock between setattr and dio_end_io_write), but it is not 
> 100% 
> > sure.
> > If possible, could you help to report a bug to SUSE, then we can work on 
> > that further.
> 
> Hi!
> 
> Actually a service request for the issue is open at SUSE. However I don't 
> know which L3 engineer is working on it.
> I have some "funny" effects, like these:
> On one node "ls" hangs, but can be interrupted with ^C; on another node "ls"

> also hangs, but cannot be stopped with ^C or ^Z
> (Most processes cannot even be killed with "kill -9")
> "ls" on the directory also hangs, just as an "rm" for a non-existent file
> 
> What I really wonder is what triggered the effect, and more importantly  how

> to recover from it.
> Initially I had suspected a rather full (95%) flesystem, but that means 
> there are still 24GB available.
> The other suspect was concurrent creation of reflink snapshots while the 
> file being snapshot did change (e.g. allocate a hole in a sparse file)
> 
> Regards,
> Ulrich
> 
> > 
> > Thanks
> > Gang
> > 
> > 
> > From: Users  on behalf of Ulrich Windl 
> > 
> > Sent: Tuesday, June 1, 2021 15:14
> > To: users@clusterlabs.org 
> > Subject: [ClusterLabs] Antw: Hanging OCFS2 Filesystem any one else?
> > 
>  Ulrich Windl schrieb am 31.05.2021 um 12:11 in Nachricht <60B4B65A.A8F
: 161 
> 
> > :
> > 60728>:
> >> Hi!
> >>
> >> We have an OCFS2 filesystem shared between three cluster nodes (SLES 15
SP2,
> >> Kernel 5.3.18‑24.64‑default). The filesystem is filled up to about 95%,
and
> >> we have an odd effect:
> >> A stat() systemcall to some of the files hangs indefinitely (state "D").
> >> ("ls ‑l" and "rm" also hang, but I suspect those are calling state()
> >> internally, too).
> >> My only suspect is that the effect might be related to the 95% being
used.
> >> The other suspect is that concurrent reflink calls may trigger the
effect.
> >>
> >> Did anyone else experience something similar?
> > 
> > Hi!
> > 
> > I have some details:
> > It seems there is a reader/writer deadlock trying to allocate additional 
> > blocks for a file.
> > The stacktrace looks like this:
> > Jun 01 07:56:31 h16 kernel:  rwsem_down_write_slowpath+0x251/0x620
> > Jun 01 07:56:31 h16 kernel:  ? __ocfs2_change_file_space+0xb3/0x620
[ocfs2]
> > Jun 01 07:56:31 h16 kernel:  __ocfs2_change_file_space+0xb3/0x620 [ocfs2]
> > Jun 01 07:56:31 h16 kernel:  ocfs2_fallocate+0x82/0xa0 [ocfs2]
> > Jun 01 07:56:31 h16 kernel:  vfs_fallocate+0x13f/0x2a0
> > Jun 01 07:56:31 h16 kernel:  ksys_fallocate+0x3c/0x70
> > Jun 01 07:56:31 h16 kernel:  __x64_sys_fallocate+0x1a/0x20
> > Jun 01 07:56:31 h16 kernel:  do_syscall_64+0x5b/0x1e0
> > 
> > That is the only writer (on that host), bit there are multiple readers
like 
> > this:
> > Jun 01 07:56:31 h16 kernel:  rwsem_down_read_slowpath+0x172/0x300
> > Jun 01 07:56:31 h16 kernel:  ? dput+0x2c/0x2f0
> > Jun 01 07:56:31 h16 kernel:  ? lookup_slow+0x27/0x50
> > Jun 01 07:56:31 h16 kernel:  lookup_slow+0x27/0x50
> > Jun 01 07:56:31 h16 kernel:  walk_component+0x1c4/0x300
> > Jun 01 07:56:31 h16 kernel:  ? path_init+0x192/0x320
> > Jun 01 07:56:31 h16 kernel:  path_lookupat+0x6e/0x210
> > Jun 01 07:56:31 h16 kernel:  ? __put_lkb+0x45/0xd0 [dlm]
> > Jun 01 07:56:31 h16 kernel:  filename_lookup+0xb6/0x190
> > Jun 01 07:56:31 h16 kernel:  ? kmem_cache_alloc+0x3d/0x250
> > Jun 01 07:56:31 h16 kernel:  ? getname_flags+0x66/0x1d0
> > Jun 01 07:56:31 h16 kernel:  ? vfs_statx+0x73/0xe0
> > Jun 01 07:56:31 h16 kernel:  vfs_statx+0x73/0xe0
> > Jun 01 07:56:31 h16 kernel:  ? fsnotify_grab_connector+0x46/0x80
> > Jun 01 07:56:31 h16 kernel:  __do_sys_newstat+0x39/0x70
> > Jun 01 07:56:31 h16 kernel:  ? do_unlinkat+0x92/0x320
> > Jun 01 07:56:31 h16 kernel:  do_syscall_64+0x5b/0x1e0
> > 
> > So that will match the hanging stat() quite nicely!
> > 
> > However the PID displayed as holding the writer does not exist in the
system 
> 
> > (on that node).
> > 
> > Regards,
> > Ulrich
> > 
> > 
> >>
> >> Regards,
> >> Ulrich
> >>
> >>
> >>
> >>
> > 
> > 
> > 
> > 
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users 
> > 
> > ClusterLabs home:

[ClusterLabs] Antw: [EXT] Re: Antw: Hanging OCFS2 Filesystem any one else?

2021-06-02 Thread Ulrich Windl

>>> Gang He  schrieb am 02.06.2021 um 08:34 in Nachricht


> Hi Ulrich,
> 
> The hang problem looks like a fix (90bd070aae6c4fb5d302f9c4b9c88be60c8197ec

> ocfs2: fix deadlock between setattr and dio_end_io_write), but it is not
100% 
> sure.
> If possible, could you help to report a bug to SUSE, then we can work on 
> that further.

Hi!

Actually a service request for the issue is open at SUSE. However I don't know
which L3 engineer is working on it.
I have some "funny" effects, like these:
On one node "ls" hangs, but can be interrupted with ^C; on another node "ls"
also hangs, but cannot be stopped with ^C or ^Z
(Most processes cannot even be killed with "kill -9")
"ls" on the directory also hangs, just as an "rm" for a non-existent file

What I really wonder is what triggered the effect, and more importantly  how
to recover from it.
Initially I had suspected a rather full (95%) flesystem, but that means there
are still 24GB available.
The other suspect was concurrent creation of reflink snapshots while the file
being snapshot did change (e.g. allocate a hole in a sparse file)

Regards,
Ulrich

> 
> Thanks
> Gang
> 
> 
> From: Users  on behalf of Ulrich Windl 
> 
> Sent: Tuesday, June 1, 2021 15:14
> To: users@clusterlabs.org 
> Subject: [ClusterLabs] Antw: Hanging OCFS2 Filesystem any one else?
> 
 Ulrich Windl schrieb am 31.05.2021 um 12:11 in Nachricht <60B4B65A.A8F :
161 
> :
> 60728>:
>> Hi!
>>
>> We have an OCFS2 filesystem shared between three cluster nodes (SLES 15
SP2,
>> Kernel 5.3.18‑24.64‑default). The filesystem is filled up to about 95%,
and
>> we have an odd effect:
>> A stat() systemcall to some of the files hangs indefinitely (state "D").
>> ("ls ‑l" and "rm" also hang, but I suspect those are calling state()
>> internally, too).
>> My only suspect is that the effect might be related to the 95% being used.
>> The other suspect is that concurrent reflink calls may trigger the effect.
>>
>> Did anyone else experience something similar?
> 
> Hi!
> 
> I have some details:
> It seems there is a reader/writer deadlock trying to allocate additional 
> blocks for a file.
> The stacktrace looks like this:
> Jun 01 07:56:31 h16 kernel:  rwsem_down_write_slowpath+0x251/0x620
> Jun 01 07:56:31 h16 kernel:  ? __ocfs2_change_file_space+0xb3/0x620 [ocfs2]
> Jun 01 07:56:31 h16 kernel:  __ocfs2_change_file_space+0xb3/0x620 [ocfs2]
> Jun 01 07:56:31 h16 kernel:  ocfs2_fallocate+0x82/0xa0 [ocfs2]
> Jun 01 07:56:31 h16 kernel:  vfs_fallocate+0x13f/0x2a0
> Jun 01 07:56:31 h16 kernel:  ksys_fallocate+0x3c/0x70
> Jun 01 07:56:31 h16 kernel:  __x64_sys_fallocate+0x1a/0x20
> Jun 01 07:56:31 h16 kernel:  do_syscall_64+0x5b/0x1e0
> 
> That is the only writer (on that host), bit there are multiple readers like

> this:
> Jun 01 07:56:31 h16 kernel:  rwsem_down_read_slowpath+0x172/0x300
> Jun 01 07:56:31 h16 kernel:  ? dput+0x2c/0x2f0
> Jun 01 07:56:31 h16 kernel:  ? lookup_slow+0x27/0x50
> Jun 01 07:56:31 h16 kernel:  lookup_slow+0x27/0x50
> Jun 01 07:56:31 h16 kernel:  walk_component+0x1c4/0x300
> Jun 01 07:56:31 h16 kernel:  ? path_init+0x192/0x320
> Jun 01 07:56:31 h16 kernel:  path_lookupat+0x6e/0x210
> Jun 01 07:56:31 h16 kernel:  ? __put_lkb+0x45/0xd0 [dlm]
> Jun 01 07:56:31 h16 kernel:  filename_lookup+0xb6/0x190
> Jun 01 07:56:31 h16 kernel:  ? kmem_cache_alloc+0x3d/0x250
> Jun 01 07:56:31 h16 kernel:  ? getname_flags+0x66/0x1d0
> Jun 01 07:56:31 h16 kernel:  ? vfs_statx+0x73/0xe0
> Jun 01 07:56:31 h16 kernel:  vfs_statx+0x73/0xe0
> Jun 01 07:56:31 h16 kernel:  ? fsnotify_grab_connector+0x46/0x80
> Jun 01 07:56:31 h16 kernel:  __do_sys_newstat+0x39/0x70
> Jun 01 07:56:31 h16 kernel:  ? do_unlinkat+0x92/0x320
> Jun 01 07:56:31 h16 kernel:  do_syscall_64+0x5b/0x1e0
> 
> So that will match the hanging stat() quite nicely!
> 
> However the PID displayed as holding the writer does not exist in the system

> (on that node).
> 
> Regards,
> Ulrich
> 
> 
>>
>> Regards,
>> Ulrich
>>
>>
>>
>>
> 
> 
> 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 
> 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: [EXT] Re: Antw: Hanging OCFS2 Filesystem any one else?

Re: [ClusterLabs] Antw: [EXT] Re: Antw: Hanging OCFS2 Filesystem any one else?

[ClusterLabs] Antw: [EXT] Re: Antw: Hanging OCFS2 Filesystem any one else?

Re: [ClusterLabs] Antw: [EXT] Re: Antw: Hanging OCFS2 Filesystem any one else?

Re: [ClusterLabs] Antw: [EXT] Re: Antw: Hanging OCFS2 Filesystem any one else?

[ClusterLabs] Antw: [EXT] Re: Antw: Hanging OCFS2 Filesystem any one else?

[ClusterLabs] Antw: [EXT] Re: Antw: Hanging OCFS2 Filesystem any one else?

7 matches

Site Navigation

Mail list logo

Footer information