Re: [Cluster-devel] [PATCH RFC gfs2/for-next] fs: gfs2: release dlm gfs2 ls on withdraw

2021-08-23 Thread Alexander Aring
Hi,

On Mon, Aug 23, 2021 at 11:56 AM Alexander Aring  wrote:
>
> This patch will introduce a new gfs2 lock ops callback for release a
> possible lock infrastructure e.g. dlm. There is currently an issue with
> gfs2 and dlm by hitting ctrl-c during mount operation (sometimes it
> works, most times not). The issue is here that when the gfs2 filesystem
> is not withdrawn we don't release the dlm lockspace again and next mount
> dlm_new_lockspace() will return -EEXIST. This patch will ensure that we
> call dlm_release_lockspace() in the error path of gfs2_fill_super() even if
> the filesystem isn't withdrawn yet. Moreover it will do that in all
> cases even if the filesystem is not withdrawn yet.
>
> Signed-off-by: Alexander Aring 
> ---
> Hi,
>
> no idea if the gfs2_withdrawn(sdp) should be always evaluated to
> "false", but then there are cases when it returns "true" and the
> gfs2 dlm lockspace will not be released... next mount there will be
> a -EEXIST, as described in the commit message.
>
> If gfs2_withdrawn(sdp) should return "false" always maybe there is some
> missing wait and we should printout a warning if it's returning true...
> in this case and an error path we have a problem which can be observed
> at the next mount.
>
> - Alex
>
>  fs/gfs2/glock.h  |  1 +
>  fs/gfs2/lock_dlm.c   | 24 ++--
>  fs/gfs2/ops_fstype.c |  6 +-
>  3 files changed, 24 insertions(+), 7 deletions(-)
>
> diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h
> index 31a8f2f649b5..c8fd1352b0e1 100644
> --- a/fs/gfs2/glock.h
> +++ b/fs/gfs2/glock.h
> @@ -130,6 +130,7 @@ struct lm_lockops {
> void (*lm_recovery_result) (struct gfs2_sbd *sdp, unsigned int jid,
> unsigned int result);
> void (*lm_unmount) (struct gfs2_sbd *sdp);
> +   void (*lm_release) (struct gfs2_sbd *sdp);
> void (*lm_withdraw) (struct gfs2_sbd *sdp);
> void (*lm_put_lock) (struct gfs2_glock *gl);
> int (*lm_lock) (struct gfs2_glock *gl, unsigned int req_state,
> diff --git a/fs/gfs2/lock_dlm.c b/fs/gfs2/lock_dlm.c
> index 50578f881e6d..776667ca4da1 100644
> --- a/fs/gfs2/lock_dlm.c
> +++ b/fs/gfs2/lock_dlm.c
> @@ -1248,6 +1248,14 @@ static const struct dlm_lockspace_ops 
> gdlm_lockspace_ops = {
> .recover_done = gdlm_recover_done,
>  };
>
> +static void gdlm_ls_release(struct lm_lockstruct *ls)
> +{
> +   if (ls->ls_dlm) {
> +   dlm_release_lockspace(ls->ls_dlm, 2);
> +   ls->ls_dlm = NULL;
> +   }

I believe there must also be a:
free_recover_size(ls);

> +}
> +
>  static int gdlm_mount(struct gfs2_sbd *sdp, const char *table)
>  {
> struct lm_lockstruct *ls = >sd_lockstruct;
> @@ -1338,7 +1346,7 @@ static int gdlm_mount(struct gfs2_sbd *sdp, const char 
> *table)
> return 0;
>
>  fail_release:
> -   dlm_release_lockspace(ls->ls_dlm, 2);
> +   gdlm_ls_release(ls);

then don't change this.

>  fail_free:
> free_recover_size(ls);
>  fail:
> @@ -1374,14 +1382,17 @@ static void gdlm_unmount(struct gfs2_sbd *sdp)
>
> /* mounted_lock and control_lock will be purged in dlm recovery */
>  release:
> -   if (ls->ls_dlm) {
> -   dlm_release_lockspace(ls->ls_dlm, 2);
> -   ls->ls_dlm = NULL;
> -   }
> -
> +   gdlm_ls_release(ls);
> free_recover_size(ls);

remove that.

>  }
>
> +static void gdlm_release(struct gfs2_sbd *sdp)
> +{
> +   struct lm_lockstruct *ls = >sd_lockstruct;
> +
> +   gdlm_ls_release(ls);
> +}
> +
>  static const match_table_t dlm_tokens = {
> { Opt_jid, "jid=%d"},
> { Opt_id, "id=%d"},
> @@ -1396,6 +1407,7 @@ const struct lm_lockops gfs2_dlm_ops = {
> .lm_first_done = gdlm_first_done,
> .lm_recovery_result = gdlm_recovery_result,
> .lm_unmount = gdlm_unmount,
> +   .lm_release = gdlm_release,
> .lm_put_lock = gdlm_put_lock,
> .lm_lock = gdlm_lock,
> .lm_cancel = gdlm_cancel,
> diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
> index 7f8410d8fdc1..ef25ed884db2 100644
> --- a/fs/gfs2/ops_fstype.c
> +++ b/fs/gfs2/ops_fstype.c
> @@ -1075,8 +1075,12 @@ static int gfs2_lm_mount(struct gfs2_sbd *sdp, int 
> silent)
>  void gfs2_lm_unmount(struct gfs2_sbd *sdp)
>  {
> const struct lm_lockops *lm = sdp->sd_lockstruct.ls_ops;
> -   if (likely(!gfs2_withdrawn(sdp)) && lm->lm_unmount)
> +   if (likely(!gfs2_withdrawn(sdp)) && lm->lm_unmount) {
> lm->lm_unmount(sdp);
> +   } else {
> +   if (lm->lm_release)
> +   lm->lm_release(sdp);
> +   }

That means we also have a memory leak for some of the fields in
"free_recover_size(ls)".

If this approach is okay, I will send a v2.

- Alex



Re: [Cluster-devel] [PATCH v6 10/19] gfs2: Introduce flag for glock holder auto-demotion

2021-08-23 Thread Andreas Gruenbacher
On Mon, Aug 23, 2021 at 6:06 PM Matthew Wilcox  wrote:
> On Mon, Aug 23, 2021 at 05:18:12PM +0200, Andreas Gruenbacher wrote:
> > On Mon, Aug 23, 2021 at 10:14 AM Steven Whitehouse  
> > wrote:
> > > If the goal here is just to allow the glock to be held for a longer
> > > period of time, but with occasional interruptions to prevent
> > > starvation, then we have a potential model for this. There is
> > > cond_resched_lock() which does this for spin locks.
> >
> > This isn't an appropriate model for what I'm trying to achieve here.
> > In the cond_resched case, we know at the time of the cond_resched call
> > whether or not we want to schedule. If we do, we want to drop the spin
> > lock, schedule, and then re-acquire the spin lock. In the case we're
> > looking at here, we want to fault in user pages. There is no way of
> > knowing beforehand if the glock we're currently holding will have to
> > be dropped to achieve that. In fact, it will almost never have to be
> > dropped. But if it does, we need to drop it straight away to allow the
> > conflicting locking request to succeed.
>
> It occurs to me that this is similar to the wound/wait mutexes
> (include/linux/ww_mutex.h & Documentation/locking/ww-mutex-design.rst).
> You want to mark the glock as woundable before faulting, and then discover
> if it was wounded after faulting.  Maybe sharing this terminology will
> aid in understanding?

I've looked at the ww_mutex documentation. A "transaction" wounds
another "transaction" and that other transaction then "dies", or it
"heals" and restarts. In the glock case, a process sets and clears the
HIF_MAY_DEMOTE flag on one of its own glock holder contexts. After
clearing the flag, it either still holds the glock or it doesn't;
nothing needs to be done to "die" or to "heal". So I'm not sure we
want to conflate two concepts.

One of the earlier terms we've used was "stealing", with a
HIF_MAY_STEAL flag. That works, but it's slightly less obvious what
happens to a glock holder when the glock is stolen from it. (The
holder gets dequeued, __gfs2_glock_dq.) The glock code already uses
the terms promote/demote, acquire/release, enqueue/dequeue, and
_nq/_dq for various forms of acquiring and releasing a glock, so we're
not in a shortage or names right now apparently.

Thanks,
Andreas



Re: [Cluster-devel] [PATCH v6 10/19] gfs2: Introduce flag for glock holder auto-demotion

2021-08-23 Thread Bob Peterson

On 8/23/21 11:05 AM, Matthew Wilcox wrote:

On Mon, Aug 23, 2021 at 05:18:12PM +0200, Andreas Gruenbacher wrote:

On Mon, Aug 23, 2021 at 10:14 AM Steven Whitehouse  wrote:

If the goal here is just to allow the glock to be held for a longer
period of time, but with occasional interruptions to prevent
starvation, then we have a potential model for this. There is
cond_resched_lock() which does this for spin locks.


This isn't an appropriate model for what I'm trying to achieve here.
In the cond_resched case, we know at the time of the cond_resched call
whether or not we want to schedule. If we do, we want to drop the spin
lock, schedule, and then re-acquire the spin lock. In the case we're
looking at here, we want to fault in user pages. There is no way of
knowing beforehand if the glock we're currently holding will have to
be dropped to achieve that. In fact, it will almost never have to be
dropped. But if it does, we need to drop it straight away to allow the
conflicting locking request to succeed.


It occurs to me that this is similar to the wound/wait mutexes
(include/linux/ww_mutex.h & Documentation/locking/ww-mutex-design.rst).
You want to mark the glock as woundable before faulting, and then discover
if it was wounded after faulting.  Maybe sharing this terminology will
aid in understanding?


Hmm. Woundable. I like it.
Andreas and I argued about the terminology but we never found a 
middle-ground. Perhaps this is it. Thanks, Matthew.


Regards,

Bob Peterson



Re: [Cluster-devel] [PATCH v6 10/19] gfs2: Introduce flag for glock holder auto-demotion

2021-08-23 Thread Matthew Wilcox
On Mon, Aug 23, 2021 at 05:18:12PM +0200, Andreas Gruenbacher wrote:
> On Mon, Aug 23, 2021 at 10:14 AM Steven Whitehouse  
> wrote:
> > If the goal here is just to allow the glock to be held for a longer
> > period of time, but with occasional interruptions to prevent
> > starvation, then we have a potential model for this. There is
> > cond_resched_lock() which does this for spin locks.
> 
> This isn't an appropriate model for what I'm trying to achieve here.
> In the cond_resched case, we know at the time of the cond_resched call
> whether or not we want to schedule. If we do, we want to drop the spin
> lock, schedule, and then re-acquire the spin lock. In the case we're
> looking at here, we want to fault in user pages. There is no way of
> knowing beforehand if the glock we're currently holding will have to
> be dropped to achieve that. In fact, it will almost never have to be
> dropped. But if it does, we need to drop it straight away to allow the
> conflicting locking request to succeed.

It occurs to me that this is similar to the wound/wait mutexes
(include/linux/ww_mutex.h & Documentation/locking/ww-mutex-design.rst).
You want to mark the glock as woundable before faulting, and then discover
if it was wounded after faulting.  Maybe sharing this terminology will
aid in understanding?



[Cluster-devel] [PATCH RFC gfs2/for-next] fs: gfs2: release dlm gfs2 ls on withdraw

2021-08-23 Thread Alexander Aring
This patch will introduce a new gfs2 lock ops callback for release a
possible lock infrastructure e.g. dlm. There is currently an issue with
gfs2 and dlm by hitting ctrl-c during mount operation (sometimes it
works, most times not). The issue is here that when the gfs2 filesystem
is not withdrawn we don't release the dlm lockspace again and next mount
dlm_new_lockspace() will return -EEXIST. This patch will ensure that we
call dlm_release_lockspace() in the error path of gfs2_fill_super() even if
the filesystem isn't withdrawn yet. Moreover it will do that in all
cases even if the filesystem is not withdrawn yet.

Signed-off-by: Alexander Aring 
---
Hi,

no idea if the gfs2_withdrawn(sdp) should be always evaluated to
"false", but then there are cases when it returns "true" and the
gfs2 dlm lockspace will not be released... next mount there will be
a -EEXIST, as described in the commit message.

If gfs2_withdrawn(sdp) should return "false" always maybe there is some
missing wait and we should printout a warning if it's returning true...
in this case and an error path we have a problem which can be observed
at the next mount.

- Alex

 fs/gfs2/glock.h  |  1 +
 fs/gfs2/lock_dlm.c   | 24 ++--
 fs/gfs2/ops_fstype.c |  6 +-
 3 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h
index 31a8f2f649b5..c8fd1352b0e1 100644
--- a/fs/gfs2/glock.h
+++ b/fs/gfs2/glock.h
@@ -130,6 +130,7 @@ struct lm_lockops {
void (*lm_recovery_result) (struct gfs2_sbd *sdp, unsigned int jid,
unsigned int result);
void (*lm_unmount) (struct gfs2_sbd *sdp);
+   void (*lm_release) (struct gfs2_sbd *sdp);
void (*lm_withdraw) (struct gfs2_sbd *sdp);
void (*lm_put_lock) (struct gfs2_glock *gl);
int (*lm_lock) (struct gfs2_glock *gl, unsigned int req_state,
diff --git a/fs/gfs2/lock_dlm.c b/fs/gfs2/lock_dlm.c
index 50578f881e6d..776667ca4da1 100644
--- a/fs/gfs2/lock_dlm.c
+++ b/fs/gfs2/lock_dlm.c
@@ -1248,6 +1248,14 @@ static const struct dlm_lockspace_ops gdlm_lockspace_ops 
= {
.recover_done = gdlm_recover_done,
 };
 
+static void gdlm_ls_release(struct lm_lockstruct *ls)
+{
+   if (ls->ls_dlm) {
+   dlm_release_lockspace(ls->ls_dlm, 2);
+   ls->ls_dlm = NULL;
+   }
+}
+
 static int gdlm_mount(struct gfs2_sbd *sdp, const char *table)
 {
struct lm_lockstruct *ls = >sd_lockstruct;
@@ -1338,7 +1346,7 @@ static int gdlm_mount(struct gfs2_sbd *sdp, const char 
*table)
return 0;
 
 fail_release:
-   dlm_release_lockspace(ls->ls_dlm, 2);
+   gdlm_ls_release(ls);
 fail_free:
free_recover_size(ls);
 fail:
@@ -1374,14 +1382,17 @@ static void gdlm_unmount(struct gfs2_sbd *sdp)
 
/* mounted_lock and control_lock will be purged in dlm recovery */
 release:
-   if (ls->ls_dlm) {
-   dlm_release_lockspace(ls->ls_dlm, 2);
-   ls->ls_dlm = NULL;
-   }
-
+   gdlm_ls_release(ls);
free_recover_size(ls);
 }
 
+static void gdlm_release(struct gfs2_sbd *sdp)
+{
+   struct lm_lockstruct *ls = >sd_lockstruct;
+
+   gdlm_ls_release(ls);
+}
+
 static const match_table_t dlm_tokens = {
{ Opt_jid, "jid=%d"},
{ Opt_id, "id=%d"},
@@ -1396,6 +1407,7 @@ const struct lm_lockops gfs2_dlm_ops = {
.lm_first_done = gdlm_first_done,
.lm_recovery_result = gdlm_recovery_result,
.lm_unmount = gdlm_unmount,
+   .lm_release = gdlm_release,
.lm_put_lock = gdlm_put_lock,
.lm_lock = gdlm_lock,
.lm_cancel = gdlm_cancel,
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 7f8410d8fdc1..ef25ed884db2 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -1075,8 +1075,12 @@ static int gfs2_lm_mount(struct gfs2_sbd *sdp, int 
silent)
 void gfs2_lm_unmount(struct gfs2_sbd *sdp)
 {
const struct lm_lockops *lm = sdp->sd_lockstruct.ls_ops;
-   if (likely(!gfs2_withdrawn(sdp)) && lm->lm_unmount)
+   if (likely(!gfs2_withdrawn(sdp)) && lm->lm_unmount) {
lm->lm_unmount(sdp);
+   } else {
+   if (lm->lm_release)
+   lm->lm_release(sdp);
+   }
 }
 
 static int wait_on_journal(struct gfs2_sbd *sdp)
-- 
2.27.0



Re: [Cluster-devel] [PATCH v3 2/2] fs: remove mandatory file locking support

2021-08-23 Thread Guenter Roeck
On Fri, Aug 20, 2021 at 12:39:19PM -0400, Jeff Layton wrote:
> We added CONFIG_MANDATORY_FILE_LOCKING in 2015, and soon after turned it
> off in Fedora and RHEL8. Several other distros have followed suit.
> 
> I've heard of one problem in all that time: Someone migrated from an
> older distro that supported "-o mand" to one that didn't, and the host
> had a fstab entry with "mand" in it which broke on reboot. They didn't
> actually _use_ mandatory locking so they just removed the mount option
> and moved on.
> 
> This patch rips out mandatory locking support wholesale from the kernel,
> along with the Kconfig option and the Documentation file. It also
> changes the mount code to ignore the "mand" mount option instead of
> erroring out, and to throw a big, ugly warning.
> 
> Signed-off-by: Jeff Layton 

This patch leaves a number of unused variables and labels behind,
as can be seen in allmodcoinfig builds for various architectures.
Do you expect those to be fixed individually, or do you plan to
fix those up yourself ?

Guenter

> ---
>  .../filesystems/mandatory-locking.rst | 188 --
>  fs/9p/vfs_file.c  |  12 --
>  fs/Kconfig|  10 -
>  fs/afs/flock.c|   4 -
>  fs/ceph/locks.c   |   3 -
>  fs/gfs2/file.c|   3 -
>  fs/locks.c| 116 +--
>  fs/namei.c|   4 +-
>  fs/namespace.c|  29 +--
>  fs/nfs/file.c |   4 -
>  fs/nfsd/nfs4state.c   |  13 --
>  fs/nfsd/vfs.c |  15 --
>  fs/ocfs2/locks.c  |   4 -
>  fs/open.c |   8 +-
>  fs/read_write.c   |   7 -
>  fs/remap_range.c  |  10 -
>  include/linux/fs.h|  84 
>  mm/mmap.c |   6 -
>  mm/nommu.c|   3 -
>  19 files changed, 14 insertions(+), 509 deletions(-)
>  delete mode 100644 Documentation/filesystems/mandatory-locking.rst
> 
> diff --git a/Documentation/filesystems/mandatory-locking.rst 
> b/Documentation/filesystems/mandatory-locking.rst
> deleted file mode 100644
> index 9ce73544a8f0..
> --- a/Documentation/filesystems/mandatory-locking.rst
> +++ /dev/null
> @@ -1,188 +0,0 @@
> -.. SPDX-License-Identifier: GPL-2.0
> -
> -=
> -Mandatory File Locking For The Linux Operating System
> -=
> -
> - Andy Walker 
> -
> -15 April 1996
> -
> -  (Updated September 2007)
> -
> -0. Why you should avoid mandatory locking
> --
> -
> -The Linux implementation is prey to a number of difficult-to-fix race
> -conditions which in practice make it not dependable:
> -
> - - The write system call checks for a mandatory lock only once
> -   at its start.  It is therefore possible for a lock request to
> -   be granted after this check but before the data is modified.
> -   A process may then see file data change even while a mandatory
> -   lock was held.
> - - Similarly, an exclusive lock may be granted on a file after
> -   the kernel has decided to proceed with a read, but before the
> -   read has actually completed, and the reading process may see
> -   the file data in a state which should not have been visible
> -   to it.
> - - Similar races make the claimed mutual exclusion between lock
> -   and mmap similarly unreliable.
> -
> -1. What is  mandatory locking?
> ---
> -
> -Mandatory locking is kernel enforced file locking, as opposed to the more 
> usual
> -cooperative file locking used to guarantee sequential access to files among
> -processes. File locks are applied using the flock() and fcntl() system calls
> -(and the lockf() library routine which is a wrapper around fcntl().) It is
> -normally a process' responsibility to check for locks on a file it wishes to
> -update, before applying its own lock, updating the file and unlocking it 
> again.
> -The most commonly used example of this (and in the case of sendmail, the most
> -troublesome) is access to a user's mailbox. The mail user agent and the mail
> -transfer agent must guard against updating the mailbox at the same time, and
> -prevent reading the mailbox while it is being updated.
> -
> -In a perfect world all processes would use and honour a cooperative, or
> -"advisory" locking scheme. However, the world isn't perfect, and there's
> -a lot of poorly written code out there.
> -
> -In trying to address this problem, the designers of System V 

Re: [Cluster-devel] [PATCH v6 10/19] gfs2: Introduce flag for glock holder auto-demotion

2021-08-23 Thread Andreas Gruenbacher
On Mon, Aug 23, 2021 at 10:14 AM Steven Whitehouse  wrote:
> On Fri, 2021-08-20 at 17:22 +0200, Andreas Gruenbacher wrote:
> > On Fri, Aug 20, 2021 at 3:11 PM Bob Peterson 
> > wrote:
> > >
> [snip]
> > >
> > > You can almost think of this as a performance enhancement. This
> > > concept
> > > allows a process to hold a glock for much longer periods of time,
> > > at a
> > > lower priority, for example, when gfs2_file_read_iter needs to hold
> > > the
> > > glock for very long-running iterative reads.
> >
> > Consider a process that allocates a somewhat large buffer and reads
> > into it in chunks that are not page aligned. The buffer initially
> > won't be faulted in, so we fault in the first chunk and write into
> > it.
> > Then, when reading the second chunk, we find that the first page of
> > the second chunk is already present. We fill it, set the
> > HIF_MAY_DEMOTE flag, fault in more pages, and clear the
> > HIF_MAY_DEMOTE
> > flag. If we then still have the glock (which is very likely), we
> > resume the read. Otherwise, we return a short result.
> >
> > Thanks,
> > Andreas
> >
>
> If the goal here is just to allow the glock to be held for a longer
> period of time, but with occasional interruptions to prevent
> starvation, then we have a potential model for this. There is
> cond_resched_lock() which does this for spin locks.

This isn't an appropriate model for what I'm trying to achieve here.
In the cond_resched case, we know at the time of the cond_resched call
whether or not we want to schedule. If we do, we want to drop the spin
lock, schedule, and then re-acquire the spin lock. In the case we're
looking at here, we want to fault in user pages. There is no way of
knowing beforehand if the glock we're currently holding will have to
be dropped to achieve that. In fact, it will almost never have to be
dropped. But if it does, we need to drop it straight away to allow the
conflicting locking request to succeed.

Have a look at how the patch queue uses gfs2_holder_allow_demote() and
gfs2_holder_disallow_demote():

https://listman.redhat.com/archives/cluster-devel/2021-August/msg00128.html
https://listman.redhat.com/archives/cluster-devel/2021-August/msg00134.html

Thanks,
Andreas



Re: [Cluster-devel] FS/DLM module triggered kernel BUG

2021-08-23 Thread Alexander Aring
Hi Gang He,

On Mon, Aug 23, 2021 at 1:43 AM Gang He  wrote:
>
> Hello Guys,
>
> I used kernel 5.13.8, I sometimes encountered the dlm module triggered kernel 
> BUG.

What do you exactly do? I would like to test it on a recent upstream
version, or you can do it?

> Since the dlm kernel module is not the latest source code, I am not sure if 
> this problem is fixed, or not.
>

could be, see below.

> The backtrace is as below,
>
> [Fri Aug 20 16:24:14 2021] dlm: CEC5E19D749E473B99A0B792AD570441: remove 
> member 172204615
> [Fri Aug 20 16:24:14 2021] dlm: CEC5E19D749E473B99A0B792AD570441: 
> dlm_recover_members 2 nodes
> [Fri Aug 20 16:24:14 2021] dlm: connection 5ef82293 got EOF from 
> 172204615

here we disconnect from nodeid 172204615.

> [Fri Aug 20 16:24:14 2021] dlm: CEC5E19D749E473B99A0B792AD570441: generation 
> 4 slots 2 1:172204786 2:172204748
> [Fri Aug 20 16:24:14 2021] dlm: CEC5E19D749E473B99A0B792AD570441: 
> dlm_recover_directory
> [Fri Aug 20 16:24:14 2021] dlm: CEC5E19D749E473B99A0B792AD570441: 
> dlm_recover_directory 8 in 1 new
> [Fri Aug 20 16:24:14 2021] dlm: CEC5E19D749E473B99A0B792AD570441: 
> dlm_recover_directory 1 out 1 messages
> [Fri Aug 20 16:24:14 2021] dlm: CEC5E19D749E473B99A0B792AD570441: 
> dlm_recover_masters
> [Fri Aug 20 16:24:14 2021] dlm: CEC5E19D749E473B99A0B792AD570441: 
> dlm_recover_masters 33587 of 33599
> [Fri Aug 20 16:24:14 2021] dlm: CEC5E19D749E473B99A0B792AD570441: 
> dlm_recover_locks 0 out
> [Fri Aug 20 16:24:14 2021] BUG: unable to handle page fault for address: 
> dd99ffd16650
> [Fri Aug 20 16:24:14 2021] #PF: supervisor write access in kernel mode
> [Fri Aug 20 16:24:14 2021] #PF: error_code(0x0002) - not-present page
> [Fri Aug 20 16:24:14 2021] PGD 1040067 P4D 1040067 PUD 19c3067 PMD 19c4067 
> PTE 0
> [Fri Aug 20 16:24:14 2021] Oops: 0002 [#1] SMP PTI
> [Fri Aug 20 16:24:14 2021] CPU: 1 PID: 25221 Comm: kworker/u4:1 Tainted: G
> W 5.13.8-1-default #1 openSUSE Tumbleweed
> [Fri Aug 20 16:24:14 2021] Hardware name: QEMU Standard PC (i440FX + PIIX, 
> 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
> [Fri Aug 20 16:24:14 2021] Workqueue: dlm_recv process_recv_sockets [dlm]
> [Fri Aug 20 16:24:14 2021] RIP: 0010:__srcu_read_unlock+0x15/0x20
> [Fri Aug 20 16:24:14 2021] Code: 01 65 48 ff 04 c2 f0 83 44 24 fc 00 44 89 c0 
> c3 0f 1f 44 00 00 0f 1f 44 00 00 f0 83 44 24 fc 00 48 8b 87 e8 0c 00 00 48 63 
> f6 <65> 48 ff 44 f0 10 c3 0f 1f 40 00 0f 1f 44 00 00 41 54 49 89 fc 55
> [Fri Aug 20 16:24:14 2021] RSP: 0018:bd9a041ebd80 EFLAGS: 00010282
> [Fri Aug 20 16:24:14 2021] RAX: 3cc9c100ec00 RBX: 00dc RCX: 
> 0830
> [Fri Aug 20 16:24:14 2021] RDX:  RSI: 0f48 RDI: 
> c06b4420
> [Fri Aug 20 16:24:14 2021] RBP: a0d028423974 R08: 0001 R09: 
> 0004
> [Fri Aug 20 16:24:14 2021] R10:  R11:  R12: 
> a0d028425000
> [Fri Aug 20 16:24:14 2021] R13: 0a43a2f2 R14: a0d028425770 R15: 
> 0a43a2f2
> [Fri Aug 20 16:24:14 2021] FS:  () 
> GS:a0d03ed0() knlGS:
> [Fri Aug 20 16:24:14 2021] CS:  0010 DS:  ES:  CR0: 80050033
> [Fri Aug 20 16:24:14 2021] CR2: dd99ffd16650 CR3: 02696000 CR4: 
> 000406e0
> [Fri Aug 20 16:24:14 2021] Call Trace:
> [Fri Aug 20 16:24:14 2021]  dlm_receive_buffer+0x66/0x150 [dlm]

It would be interesting if we got here some message from nodeid
172204615 and I think this is what happens. There is maybe some use
after free going on and we should not receive anymore messages from
nodeid 172204615.
I recently added some dlm tracing infrastructure. It should be simple
to add a trace event here, print out the nodeid and compare
timestamps.

I recently fixed a synchronization issue which is not part of kernel
5.13.8 and has something to do with what you are seeing here.
There exists a workaround or a simple test if this really affects you,
simply create a dummy lockspace on all nodes so we actually never do
any disconnects and look if you are running again into this issue.

> [Fri Aug 20 16:24:14 2021]  dlm_process_incoming_buffer+0x38/0x90 [dlm]
> [Fri Aug 20 16:24:14 2021]  receive_from_sock+0xd4/0x1f0 [dlm]
> [Fri Aug 20 16:24:14 2021]  process_recv_sockets+0x1a/0x20 [dlm]
> [Fri Aug 20 16:24:14 2021]  process_one_work+0x1df/0x370
> [Fri Aug 20 16:24:14 2021]  worker_thread+0x50/0x400
> [Fri Aug 20 16:24:14 2021]  ? process_one_work+0x370/0x370
> [Fri Aug 20 16:24:14 2021]  kthread+0x127/0x150
> [Fri Aug 20 16:24:14 2021]  ? set_kthread_struct+0x40/0x40
> [Fri Aug 20 16:24:14 2021]  ret_from_fork+0x22/0x30

- Alex



[Cluster-devel] FS/DLM module triggered kernel BUG

2021-08-23 Thread Gang He
Hello Guys,

I used kernel 5.13.8, I sometimes encountered the dlm module triggered kernel 
BUG.
Since the dlm kernel module is not the latest source code, I am not sure if 
this problem is fixed, or not. 

The backtrace is as below,

[Fri Aug 20 16:24:14 2021] dlm: CEC5E19D749E473B99A0B792AD570441: remove member 
172204615
[Fri Aug 20 16:24:14 2021] dlm: CEC5E19D749E473B99A0B792AD570441: 
dlm_recover_members 2 nodes
[Fri Aug 20 16:24:14 2021] dlm: connection 5ef82293 got EOF from 
172204615
[Fri Aug 20 16:24:14 2021] dlm: CEC5E19D749E473B99A0B792AD570441: generation 4 
slots 2 1:172204786 2:172204748
[Fri Aug 20 16:24:14 2021] dlm: CEC5E19D749E473B99A0B792AD570441: 
dlm_recover_directory
[Fri Aug 20 16:24:14 2021] dlm: CEC5E19D749E473B99A0B792AD570441: 
dlm_recover_directory 8 in 1 new
[Fri Aug 20 16:24:14 2021] dlm: CEC5E19D749E473B99A0B792AD570441: 
dlm_recover_directory 1 out 1 messages
[Fri Aug 20 16:24:14 2021] dlm: CEC5E19D749E473B99A0B792AD570441: 
dlm_recover_masters
[Fri Aug 20 16:24:14 2021] dlm: CEC5E19D749E473B99A0B792AD570441: 
dlm_recover_masters 33587 of 33599
[Fri Aug 20 16:24:14 2021] dlm: CEC5E19D749E473B99A0B792AD570441: 
dlm_recover_locks 0 out
[Fri Aug 20 16:24:14 2021] BUG: unable to handle page fault for address: 
dd99ffd16650
[Fri Aug 20 16:24:14 2021] #PF: supervisor write access in kernel mode
[Fri Aug 20 16:24:14 2021] #PF: error_code(0x0002) - not-present page
[Fri Aug 20 16:24:14 2021] PGD 1040067 P4D 1040067 PUD 19c3067 PMD 19c4067 PTE 0
[Fri Aug 20 16:24:14 2021] Oops: 0002 [#1] SMP PTI
[Fri Aug 20 16:24:14 2021] CPU: 1 PID: 25221 Comm: kworker/u4:1 Tainted: G  
  W 5.13.8-1-default #1 openSUSE Tumbleweed
[Fri Aug 20 16:24:14 2021] Hardware name: QEMU Standard PC (i440FX + PIIX, 
1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
[Fri Aug 20 16:24:14 2021] Workqueue: dlm_recv process_recv_sockets [dlm]
[Fri Aug 20 16:24:14 2021] RIP: 0010:__srcu_read_unlock+0x15/0x20
[Fri Aug 20 16:24:14 2021] Code: 01 65 48 ff 04 c2 f0 83 44 24 fc 00 44 89 c0 
c3 0f 1f 44 00 00 0f 1f 44 00 00 f0 83 44 24 fc 00 48 8b 87 e8 0c 00 00 48 63 
f6 <65> 48 ff 44 f0 10 c3 0f 1f 40 00 0f 1f 44 00 00 41 54 49 89 fc 55
[Fri Aug 20 16:24:14 2021] RSP: 0018:bd9a041ebd80 EFLAGS: 00010282
[Fri Aug 20 16:24:14 2021] RAX: 3cc9c100ec00 RBX: 00dc RCX: 
0830
[Fri Aug 20 16:24:14 2021] RDX:  RSI: 0f48 RDI: 
c06b4420
[Fri Aug 20 16:24:14 2021] RBP: a0d028423974 R08: 0001 R09: 
0004
[Fri Aug 20 16:24:14 2021] R10:  R11:  R12: 
a0d028425000
[Fri Aug 20 16:24:14 2021] R13: 0a43a2f2 R14: a0d028425770 R15: 
0a43a2f2
[Fri Aug 20 16:24:14 2021] FS:  () 
GS:a0d03ed0() knlGS:
[Fri Aug 20 16:24:14 2021] CS:  0010 DS:  ES:  CR0: 80050033
[Fri Aug 20 16:24:14 2021] CR2: dd99ffd16650 CR3: 02696000 CR4: 
000406e0
[Fri Aug 20 16:24:14 2021] Call Trace:
[Fri Aug 20 16:24:14 2021]  dlm_receive_buffer+0x66/0x150 [dlm]
[Fri Aug 20 16:24:14 2021]  dlm_process_incoming_buffer+0x38/0x90 [dlm]
[Fri Aug 20 16:24:14 2021]  receive_from_sock+0xd4/0x1f0 [dlm]
[Fri Aug 20 16:24:14 2021]  process_recv_sockets+0x1a/0x20 [dlm]
[Fri Aug 20 16:24:14 2021]  process_one_work+0x1df/0x370
[Fri Aug 20 16:24:14 2021]  worker_thread+0x50/0x400
[Fri Aug 20 16:24:14 2021]  ? process_one_work+0x370/0x370
[Fri Aug 20 16:24:14 2021]  kthread+0x127/0x150
[Fri Aug 20 16:24:14 2021]  ? set_kthread_struct+0x40/0x40
[Fri Aug 20 16:24:14 2021]  ret_from_fork+0x22/0x30
[Fri Aug 20 16:24:14 2021] Modules linked in: rdma_ucm ib_uverbs rdma_cm iw_cm 
ib_cm ib_core ocfs2_stack_user ocfs2 ocfs2_nodemanager ocfs2_stackglue 
quota_tree dlm af_packet iscsi_ibft iscsi_boot_sysfs rfkill intel_rapl_msr 
hid_generic intel_rapl_common usbhid virtio_net pcspkr joydev net_failover 
virtio_balloon i2c_piix4 failover tiny_power_button button fuse configfs 
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ata_generic uhci_hcd ehci_pci 
ehci_hcd cirrus drm_kms_helper aesni_intel usbcore crypto_simd syscopyarea 
sysfillrect sysimgblt fb_sys_fops cec cryptd rc_core drm serio_raw i6300esb 
virtio_blk ata_piix floppy qemu_fw_cfg btrfs blake2b_generic libcrc32c 
crc32c_intel xor raid6_pq sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc 
scsi_dh_alua virtio_rng
[Fri Aug 20 16:24:14 2021] CR2: dd99ffd16650
[Fri Aug 20 16:24:14 2021] ---[ end trace 2ddfa38b9d824d93 ]---
[Fri Aug 20 16:24:14 2021] RIP: 0010:__srcu_read_unlock+0x15/0x20
[Fri Aug 20 16:24:14 2021] Code: 01 65 48 ff 04 c2 f0 83 44 24 fc 00 44 89 c0 
c3 0f 1f 44 00 00 0f 1f 44 00 00 f0 83 44 24 fc 00 48 8b 87 e8 0c 00 00 48 63 
f6 <65> 48 ff 44 f0 10 c3 0f 1f 40 00 0f 1f 44 00 00 41 54 49 89 fc 55
[Fri Aug 20 16:24:14 2021] RSP: 0018:bd9a041ebd80 EFLAGS: 00010282
[Fri Aug 20 16:24:14 2021] RAX: 3cc9c100ec00 RBX: 00dc RCX: 

Re: [Cluster-devel] [PATCH v6 10/19] gfs2: Introduce flag for glock holder auto-demotion

2021-08-23 Thread Steven Whitehouse
On Fri, 2021-08-20 at 17:22 +0200, Andreas Gruenbacher wrote:
> On Fri, Aug 20, 2021 at 3:11 PM Bob Peterson 
> wrote:
> > 
[snip]
> > 
> > You can almost think of this as a performance enhancement. This
> > concept
> > allows a process to hold a glock for much longer periods of time,
> > at a
> > lower priority, for example, when gfs2_file_read_iter needs to hold
> > the
> > glock for very long-running iterative reads.
> 
> Consider a process that allocates a somewhat large buffer and reads
> into it in chunks that are not page aligned. The buffer initially
> won't be faulted in, so we fault in the first chunk and write into
> it.
> Then, when reading the second chunk, we find that the first page of
> the second chunk is already present. We fill it, set the
> HIF_MAY_DEMOTE flag, fault in more pages, and clear the
> HIF_MAY_DEMOTE
> flag. If we then still have the glock (which is very likely), we
> resume the read. Otherwise, we return a short result.
> 
> Thanks,
> Andreas
> 

If the goal here is just to allow the glock to be held for a longer
period of time, but with occasional interruptions to prevent
starvation, then we have a potential model for this. There is
cond_resched_lock() which does this for spin locks. So perhaps we might
do something similar:

/**
 * gfs2_glock_cond_regain - Conditionally drop and regain glock
 * @gl: The glock
 * @gh: A granted holder for the glock
 *
 * If there is a pending demote request for this glock, drop and 
 * requeue a lock request for this glock. If there is no pending
 * demote request, this is a no-op. In either case the glock is
 * held on both entry and exit.
 *
 * Returns: 0 if no pending demote, 1 if lock dropped and regained
 */
int gfs2_glock_cond_regain(struct gfs2_glock *gl, struct gfs2_holder
*gh);

That seems more easily understood, and clearly documents places where
the lock may be dropped and regained. I think that the implementation
should be simpler and cleaner, compared with the current proposed
patch. There are only two bit flags related to pending demotes, for
example, so the check should be trivial.

It may need a few changes depending on the exact circumstances, but
hopefully that illustrates the concept,

Steve.