Re: [Cluster-devel] [PATCH 6/7] dlm: use FL_SLEEP to determine blocking vs non-blocking

2023-08-30 Thread Jeff Layton
On Wed, 2023-08-30 at 08:38 -0400, Alexander Aring wrote:
> Hi,
> 
> On Fri, Aug 25, 2023 at 2:18 PM Jeff Layton  wrote:
> > 
> > On Wed, 2023-08-23 at 17:33 -0400, Alexander Aring wrote:
> > > This patch uses the FL_SLEEP flag in struct file_lock to determine if
> > > the lock request is a blocking or non-blocking request. Before dlm was
> > > using IS_SETLKW() was being used which is not usable for lock requests
> > > coming from lockd when EXPORT_OP_SAFE_ASYNC_LOCK inside the export flags
> > > is set.
> > > 
> > > Signed-off-by: Alexander Aring 
> > > ---
> > >  fs/dlm/plock.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
> > > index 0094fa4004cc..0c6ed5eeb840 100644
> > > --- a/fs/dlm/plock.c
> > > +++ b/fs/dlm/plock.c
> > > @@ -140,7 +140,7 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 
> > > number, struct file *file,
> > >   op->info.optype = DLM_PLOCK_OP_LOCK;
> > >   op->info.pid= fl->fl_pid;
> > >   op->info.ex = (fl->fl_type == F_WRLCK);
> > > - op->info.wait   = IS_SETLKW(cmd);
> > > + op->info.wait   = !!(fl->fl_flags & FL_SLEEP);
> > >   op->info.fsid   = ls->ls_global_id;
> > >   op->info.number = number;
> > >   op->info.start  = fl->fl_start;
> > 
> > Not sure you really need the !!, but ok...
> > 
> 
> The wait is a byte value and FL_SLEEP doesn't fit into it, I already
> run into problems with it. I don't think somebody does a if (foo->wait
> == 1) but it should be set to 1 or 0.
> 

AIUI, any halfway decent compiler should take the result of the &, and
implicitly cast that properly to bool. Basically, any value other than 0
should be true.

If the compiler just blindly casts the lowest byte though, then you do
need the double-negative.

> An alternative would be: ((fl->fl_flags & FL_SLEEP) == FL_SLEEP). I am
> not sure what the coding style says here. I think it's more important
> what the C standard says about !!(condition), but there are other
> users of this in the Linux kernel. :-/

I don't care too much either way, but my understanding was that you
don't need to do the !! trick in most cases with modern compilers.
-- 
Jeff Layton 



Re: [Cluster-devel] [PATCH 1/7] lockd: introduce safe async lock op

2023-08-30 Thread Chuck Lever
On Wed, Aug 30, 2023 at 08:32:43AM -0400, Alexander Aring wrote:
> Hi,
> 
> On Fri, Aug 25, 2023 at 1:21 PM Chuck Lever  wrote:
> >
> > On Wed, Aug 23, 2023 at 05:33:46PM -0400, Alexander Aring wrote:
> > > This patch reverts mostly commit 40595cdc93ed ("nfs: block notification
> > > on fs with its own ->lock") and introduces an EXPORT_OP_SAFE_ASYNC_LOCK
> > > export flag to signal that the "own ->lock" implementation supports
> > > async lock requests. The only main user is DLM that is used by GFS2 and
> > > OCFS2 filesystem. Those implement their own lock() implementation and
> > > return FILE_LOCK_DEFERRED as return value. Since commit 40595cdc93ed
> > > ("nfs: block notification on fs with its own ->lock") the DLM
> > > implementation were never updated. This patch should prepare for DLM
> > > to set the EXPORT_OP_SAFE_ASYNC_LOCK export flag and update the DLM
> > > plock implementation regarding to it.
> > >
> > > Acked-by: Jeff Layton 
> > > Signed-off-by: Alexander Aring 
> > > ---
> > >  fs/lockd/svclock.c   |  5 ++---
> > >  fs/nfsd/nfs4state.c  | 13 ++---
> > >  include/linux/exportfs.h |  8 
> > >  3 files changed, 20 insertions(+), 6 deletions(-)
> >
> > I'm starting to look at these. Just so you know, it's too late for
> > inclusion in v6.6, but I think we can get these into shape for v6.7.
> >
> 
> ok. I base my work on [0], is this correct?

Correct.

Fyi, that is currently what is pending for v6.6. When the merge
window closes, it will jump to what will go into v6.7.


> > - The f_op->lock check is common to all the call sites, but it is
> >   not at all related to the export AFAICT. Can it be removed from
> >   this inline function?
> >
> 
> This flag implies it makes only sense if the filesystem has its own
> lock() implementation, if it doesn't have that I guess the core fs
> functions for local file locking are being used.
> I guess it can be removed, but it should not be used when there is no
> own ->lock() implementation, at least not now until somebody might
> update the fs core functionality for local file locking to handle
> blocking lock requests asynchronously.

Can that be handled with a remark in the documenting comment?


> [0] 
> https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/log/?h=nfsd-next

-- 
Chuck Lever



Re: [Cluster-devel] [PATCH 6/7] dlm: use FL_SLEEP to determine blocking vs non-blocking

2023-08-30 Thread Alexander Aring
Hi,

On Fri, Aug 25, 2023 at 2:18 PM Jeff Layton  wrote:
>
> On Wed, 2023-08-23 at 17:33 -0400, Alexander Aring wrote:
> > This patch uses the FL_SLEEP flag in struct file_lock to determine if
> > the lock request is a blocking or non-blocking request. Before dlm was
> > using IS_SETLKW() was being used which is not usable for lock requests
> > coming from lockd when EXPORT_OP_SAFE_ASYNC_LOCK inside the export flags
> > is set.
> >
> > Signed-off-by: Alexander Aring 
> > ---
> >  fs/dlm/plock.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
> > index 0094fa4004cc..0c6ed5eeb840 100644
> > --- a/fs/dlm/plock.c
> > +++ b/fs/dlm/plock.c
> > @@ -140,7 +140,7 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 
> > number, struct file *file,
> >   op->info.optype = DLM_PLOCK_OP_LOCK;
> >   op->info.pid= fl->fl_pid;
> >   op->info.ex = (fl->fl_type == F_WRLCK);
> > - op->info.wait   = IS_SETLKW(cmd);
> > + op->info.wait   = !!(fl->fl_flags & FL_SLEEP);
> >   op->info.fsid   = ls->ls_global_id;
> >   op->info.number = number;
> >   op->info.start  = fl->fl_start;
>
> Not sure you really need the !!, but ok...
>

The wait is a byte value and FL_SLEEP doesn't fit into it, I already
run into problems with it. I don't think somebody does a if (foo->wait
== 1) but it should be set to 1 or 0.

An alternative would be: ((fl->fl_flags & FL_SLEEP) == FL_SLEEP). I am
not sure what the coding style says here. I think it's more important
what the C standard says about !!(condition), but there are other
users of this in the Linux kernel. :-/

- Alex



Re: [Cluster-devel] [PATCH 1/7] lockd: introduce safe async lock op

2023-08-30 Thread Alexander Aring
Hi,

On Fri, Aug 25, 2023 at 1:21 PM Chuck Lever  wrote:
>
> On Wed, Aug 23, 2023 at 05:33:46PM -0400, Alexander Aring wrote:
> > This patch reverts mostly commit 40595cdc93ed ("nfs: block notification
> > on fs with its own ->lock") and introduces an EXPORT_OP_SAFE_ASYNC_LOCK
> > export flag to signal that the "own ->lock" implementation supports
> > async lock requests. The only main user is DLM that is used by GFS2 and
> > OCFS2 filesystem. Those implement their own lock() implementation and
> > return FILE_LOCK_DEFERRED as return value. Since commit 40595cdc93ed
> > ("nfs: block notification on fs with its own ->lock") the DLM
> > implementation were never updated. This patch should prepare for DLM
> > to set the EXPORT_OP_SAFE_ASYNC_LOCK export flag and update the DLM
> > plock implementation regarding to it.
> >
> > Acked-by: Jeff Layton 
> > Signed-off-by: Alexander Aring 
> > ---
> >  fs/lockd/svclock.c   |  5 ++---
> >  fs/nfsd/nfs4state.c  | 13 ++---
> >  include/linux/exportfs.h |  8 
> >  3 files changed, 20 insertions(+), 6 deletions(-)
>
> I'm starting to look at these. Just so you know, it's too late for
> inclusion in v6.6, but I think we can get these into shape for v6.7.
>

ok. I base my work on [0], is this correct?

> More below.
>
>
> > diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
> > index c43ccdf28ed9..6e3b230e8317 100644
> > --- a/fs/lockd/svclock.c
> > +++ b/fs/lockd/svclock.c
> > @@ -470,9 +470,7 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file 
> > *file,
> >   struct nlm_host *host, struct nlm_lock *lock, int wait,
> >   struct nlm_cookie *cookie, int reclaim)
> >  {
> > -#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
> >   struct inode*inode = nlmsvc_file_inode(file);
> > -#endif
> >   struct nlm_block*block = NULL;
> >   int error;
> >   int mode;
> > @@ -486,7 +484,8 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file 
> > *file,
> >   (long long)lock->fl.fl_end,
> >   wait);
> >
> > - if (nlmsvc_file_file(file)->f_op->lock) {
> > + if (!export_op_support_safe_async_lock(inode->i_sb->s_export_op,
> > +nlmsvc_file_file(file)->f_op)) 
> > {
> >   async_block = wait;
> >   wait = 0;
> >   }
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index 3aefbad4cc09..14ca06424ff1 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -7430,6 +7430,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct 
> > nfsd4_compound_state *cstate,
> >   struct nfsd4_blocked_lock *nbl = NULL;
> >   struct file_lock *file_lock = NULL;
> >   struct file_lock *conflock = NULL;
> > + struct super_block *sb;
> >   __be32 status = 0;
> >   int lkflg;
> >   int err;
> > @@ -7451,6 +7452,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct 
> > nfsd4_compound_state *cstate,
> >   dprintk("NFSD: nfsd4_lock: permission denied!\n");
> >   return status;
> >   }
> > + sb = cstate->current_fh.fh_dentry->d_sb;
> >
> >   if (lock->lk_is_new) {
> >   if (nfsd4_has_session(cstate))
> > @@ -7502,7 +7504,9 @@ nfsd4_lock(struct svc_rqst *rqstp, struct 
> > nfsd4_compound_state *cstate,
> >   fp = lock_stp->st_stid.sc_file;
> >   switch (lock->lk_type) {
> >   case NFS4_READW_LT:
> > - if (nfsd4_has_session(cstate))
> > + if (nfsd4_has_session(cstate) ||
> > + export_op_support_safe_async_lock(sb->s_export_op,
> > +   
> > nf->nf_file->f_op))
> >   fl_flags |= FL_SLEEP;
> >   fallthrough;
> >   case NFS4_READ_LT:
> > @@ -7514,7 +7518,9 @@ nfsd4_lock(struct svc_rqst *rqstp, struct 
> > nfsd4_compound_state *cstate,
> >   fl_type = F_RDLCK;
> >   break;
> >   case NFS4_WRITEW_LT:
> > - if (nfsd4_has_session(cstate))
> > + if (nfsd4_has_session(cstate) ||
> > + export_op_support_safe_async_lock(sb->s_export_op,
> > +   
> > nf->nf_file->f_op))
> >   fl_flags |= FL_SLEEP;
> >   fallthrough;
> >   case NFS4_WRITE_LT:
> > @@ -7542,7 +7548,8 @@ nfsd4_lock(struct svc_rqst *rqstp, struct 
> > nfsd4_compound_state *cstate,
> >* for file locks), so don't attempt blocking lock notifications
> >* on those filesystems:
> >*/
> > - if (nf->nf_file->f_op->lock)
> > + if (!export_op_support_safe_async_lock(sb->s_export_op,
> > +nf->nf_file->f_op))
> >   fl_flags &= ~FL_SLEEP;
> >

Re: [Cluster-devel] [PATCH 2/7] lockd: don't call vfs_lock_file() for pending requests

2023-08-30 Thread Alexander Aring
Hi,

On Fri, Aug 25, 2023 at 2:10 PM Jeff Layton  wrote:
>
> On Wed, 2023-08-23 at 17:33 -0400, Alexander Aring wrote:
> > This patch returns nlm_lck_blocked in nlmsvc_lock() when an asynchronous
> > lock request is pending. During testing I ran into the case with the
> > side-effects that lockd is waiting for only one lm_grant() callback
> > because it's already part of the nlm_blocked list. If another
> > asynchronous for the same nlm_block is triggered two lm_grant()
> > callbacks will occur but lockd was only waiting for one.
> >
> > To avoid any change of existing users this handling will only being made
> > when export_op_support_safe_async_lock() returns true.
> >
> > Signed-off-by: Alexander Aring 
> > ---
> >  fs/lockd/svclock.c | 24 +---
> >  1 file changed, 17 insertions(+), 7 deletions(-)
> >
> > diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
> > index 6e3b230e8317..aa4174fbaf5b 100644
> > --- a/fs/lockd/svclock.c
> > +++ b/fs/lockd/svclock.c
> > @@ -531,6 +531,23 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file 
> > *file,
> >   goto out;
> >   }
> >
> > + spin_lock(_blocked_lock);
> > + /*
> > +  * If this is a lock request for an already pending
> > +  * lock request we return nlm_lck_blocked without calling
> > +  * vfs_lock_file() again. Otherwise we have two pending
> > +  * requests on the underlaying ->lock() implementation but
> > +  * only one nlm_block to being granted by lm_grant().
> > +  */
> > + if (export_op_support_safe_async_lock(inode->i_sb->s_export_op,
> > +   nlmsvc_file_file(file)->f_op) &&
> > + !list_empty(>b_list)) {
> > + spin_unlock(_blocked_lock);
> > + ret = nlm_lck_blocked;
> > + goto out;
> > + }
>
> Looks reasonable. The block->b_list check is subtle, but the comment
> helps.

thanks. To be honest, I am "a little bit" worried (I am thinking of
this scenario) that we might have a problem here with multiple
identically lock requests being granted at the same time. In such
cases the most fields of struct file_lock are mostly the same and
nlm_compare_locks() checks exactly on those fields. I am concerned
this corner case could cause problems, but it is a very rare case and
it makes totally no sense that an application is doing such a request.

I am currently trying to get an xfstest for this upstream.

- Alex



Re: [Cluster-devel] [PATCH 07/11] vfs: add nowait parameter for file_accessed()

2023-08-30 Thread Hao Xu

On 8/29/23 19:53, Matthew Wilcox wrote:

On Tue, Aug 29, 2023 at 03:46:13PM +0800, Hao Xu wrote:

On 8/28/23 05:32, Matthew Wilcox wrote:

On Sun, Aug 27, 2023 at 09:28:31PM +0800, Hao Xu wrote:

From: Hao Xu 

Add a boolean parameter for file_accessed() to support nowait semantics.
Currently it is true only with io_uring as its initial caller.


So why do we need to do this as part of this series?  Apparently it
hasn't caused any problems for filemap_read().



We need this parameter to indicate if nowait semantics should be enforced in
touch_atime(), There are locks and maybe IOs in it.


That's not my point.  We currently call file_accessed() and
touch_atime() for nowait reads and nowait writes.  You haven't done
anything to fix those.

I suspect you can trim this patchset down significantly by avoiding
fixing the file_accessed() problem.  And then come back with a later
patchset that fixes it for all nowait i/o.  Or do a separate prep series


I'm ok to do that.


first that fixes it for the existing nowait users, and then a second
series to do all the directory stuff.

I'd do the first thing.  Just ignore the problem.  Directory atime
updates cause I/O so rarely that you can afford to ignore it.  Almost
everyone uses relatime or nodiratime.


Hi Matthew,
The previous discussion shows this does cause issues in real
producations: 
https://lore.kernel.org/io-uring/2785f009-2ebb-028d-8250-d5f3a3051...@gmail.com/#:~:text=fwiw%2C%20we%27ve%20just%20recently%20had%20similar%20problems%20with%20io_uring%20read/write