Re: [Cluster-devel] [PATCH 02/11] xfs: add NOWAIT semantics for readdir

2023-09-03 Thread Dave Chinner
On Sun, Aug 27, 2023 at 09:28:26PM +0800, Hao Xu wrote:
> From: Hao Xu 
> 
> Implement NOWAIT semantics for readdir. Return EAGAIN error to the
> caller if it would block, like failing to get locks, or going to
> do IO.
> 
> Co-developed-by: Dave Chinner 

Not really.

"Co-developed" implies equal development input between all the
parties, which is not the case here - this patch is based on
prototype I wrote, whilst you're doing the refining, testing and
correctness work.

In these cases with XFS code, we add a line in the commit message to
say:

"This is based on a patch originally written by Dave Chinner."


> Signed-off-by: Dave Chinner 
> Signed-off-by: Hao Xu 
> [fixes deadlock issue, tweak code style]

With a signoff chain like you already have.

In the end you'll also get a RVB from me, which seems rather wrong
to me if I've apparently been "co-developing" the code



> @@ -156,7 +157,9 @@ xfs_dir2_block_getdents(
>   if (xfs_dir2_dataptr_to_db(geo, ctx->pos) > geo->datablk)
>   return 0;
>  
> - error = xfs_dir3_block_read(args->trans, dp, );
> + if (ctx->flags & DIR_CONTEXT_F_NOWAIT)
> + flags |= XFS_DABUF_NOWAIT;
> + error = xfs_dir3_block_read(args->trans, dp, flags, );
>   if (error)
>   return error;
>  

Given we do this same check in both block and leaf formats to set
XFS_DABUF_NOWAIT, and we do the DIR_CONTEXT_F_NOWAIT check in
xfs_readdir() as well, we should probably do this check once at the
higher level and pass flags down from there with XFS_DABUF_NOWAIT
already set.

> @@ -240,6 +243,7 @@ xfs_dir2_block_getdents(
>  STATIC int
>  xfs_dir2_leaf_readbuf(
>   struct xfs_da_args  *args,
> + struct dir_context  *ctx,
>   size_t  bufsize,
>   xfs_dir2_off_t  *cur_off,
>   xfs_dablk_t *ra_blk,
> @@ -258,10 +262,15 @@ xfs_dir2_leaf_readbuf(
>   struct xfs_iext_cursor  icur;
>   int ra_want;
>   int error = 0;
> -
> - error = xfs_iread_extents(args->trans, dp, XFS_DATA_FORK);
> - if (error)
> - goto out;
> + unsigned intflags = 0;
> +
> + if (ctx->flags & DIR_CONTEXT_F_NOWAIT) {
> + flags |= XFS_DABUF_NOWAIT;
> + } else {
> + error = xfs_iread_extents(args->trans, dp, XFS_DATA_FORK);
> + if (error)
> + goto out;
> + }

Especially as, in hindsight, this doesn't make a whole lot of sense.
If XFS_DABUF_NOWAIT is set, we keep going until
xfs_ilock_data_map_shared_nowait() where we call
xfs_need_iread_extents() to see if we need to read the extents in
and abort at that point.

So, really, we shouldn't get this far with nowait semantics if
we haven't read the extents in yet - we're supposed to already have
the inode locked here and so we should have already checked this
condition before we bother locking the inode...

i.e. all we should be doing here is this:

if (!(flags & XFS_DABUF_NOWAIT)) {
error = xfs_iread_extents(args->trans, dp, XFS_DATA_FORK);
if (error)
goto out;
}

And then we don't need to pass the VFS dir_context down into low
level XFS functions unnecessarily.


>  
>   /*
>* Look for mapped directory blocks at or above the current offset.
> @@ -280,7 +289,7 @@ xfs_dir2_leaf_readbuf(
>   new_off = xfs_dir2_da_to_byte(geo, map.br_startoff);
>   if (new_off > *cur_off)
>   *cur_off = new_off;
> - error = xfs_dir3_data_read(args->trans, dp, map.br_startoff, 0, );
> + error = xfs_dir3_data_read(args->trans, dp, map.br_startoff, flags, 
> );
>   if (error)
>   goto out;
>  
> @@ -360,6 +369,7 @@ xfs_dir2_leaf_getdents(
>   int byteoff;/* offset in current block */
>   unsigned intoffset = 0;
>   int error = 0;  /* error return value */
> + int written = 0;
>  
>   /*
>* If the offset is at or past the largest allowed value,
> @@ -391,10 +401,17 @@ xfs_dir2_leaf_getdents(
>   bp = NULL;
>   }
>  
> - if (*lock_mode == 0)
> - *lock_mode = xfs_ilock_data_map_shared(dp);
> - error = xfs_dir2_leaf_readbuf(args, bufsize, ,
> - , );
> + if (*lock_mode == 0) {
> + *lock_mode =
> + xfs_ilock_data_map_shared_generic(dp,
> + ctx->flags & DIR_CONTEXT_F_NOWAIT);
> + if (!*lock_mode) {
> + error = -EAGAIN;
> + break;
> + }
> + }
> + error = 

Re: [Cluster-devel] [PATCH 07/11] vfs: add nowait parameter for file_accessed()

2023-09-03 Thread Dave Chinner
On Wed, Aug 30, 2023 at 02:11:31PM +0800, Hao Xu wrote:
> On 8/29/23 19:53, Matthew Wilcox wrote:
> > On Tue, Aug 29, 2023 at 03:46:13PM +0800, Hao Xu wrote:
> > > On 8/28/23 05:32, Matthew Wilcox wrote:
> > > > On Sun, Aug 27, 2023 at 09:28:31PM +0800, Hao Xu wrote:
> > > > > From: Hao Xu 
> > > > > 
> > > > > Add a boolean parameter for file_accessed() to support nowait 
> > > > > semantics.
> > > > > Currently it is true only with io_uring as its initial caller.
> > > > 
> > > > So why do we need to do this as part of this series?  Apparently it
> > > > hasn't caused any problems for filemap_read().
> > > > 
> > > 
> > > We need this parameter to indicate if nowait semantics should be enforced 
> > > in
> > > touch_atime(), There are locks and maybe IOs in it.
> > 
> > That's not my point.  We currently call file_accessed() and
> > touch_atime() for nowait reads and nowait writes.  You haven't done
> > anything to fix those.
> > 
> > I suspect you can trim this patchset down significantly by avoiding
> > fixing the file_accessed() problem.  And then come back with a later
> > patchset that fixes it for all nowait i/o.  Or do a separate prep series
> 
> I'm ok to do that.
> 
> > first that fixes it for the existing nowait users, and then a second
> > series to do all the directory stuff.
> > 
> > I'd do the first thing.  Just ignore the problem.  Directory atime
> > updates cause I/O so rarely that you can afford to ignore it.  Almost
> > everyone uses relatime or nodiratime.
> 
> Hi Matthew,
> The previous discussion shows this does cause issues in real
> producations: 
> https://lore.kernel.org/io-uring/2785f009-2ebb-028d-8250-d5f3a3051...@gmail.com/#:~:text=fwiw%2C%20we%27ve%20just%20recently%20had%20similar%20problems%20with%20io_uring%20read/write
> 

Then separate it out into it's own patch set so we can have a
discussion on the merits of requiring using noatime, relatime or
lazytime for really latency sensitive IO applications. Changing code
is not always the right solution...

-Dave.
-- 
Dave Chinner
da...@fromorbit.com