Jeff, does something like this look reasonable?

--b.

On Sat, Nov 14, 2020 at 12:57:24PM +0000, Daire Byrne wrote:
> ----- On 13 Nov, 2020, at 22:26, bfields [email protected] wrote:
> > On Fri, Nov 13, 2020 at 09:50:50AM -0500, bfields wrote:
> >> Ah-hah!  So, it's inode_query_iversion() that's modifying a nfs inode's
> >> i_version.  That's a special thing that only nfsd would do.
> >> 
> >> I think that's totally fixable, we'll just have to think a little about
> >> how....
> > 
> > I wonder if something like this helps?--b.
> > 
> > commit 0add88a9ccc5
> > Author: J. Bruce Fields <[email protected]>
> > Date:   Fri Nov 13 17:03:04 2020 -0500
> > 
> >    nfs: don't mangle i_version on NFS
> >    
> >    The i_version on NFS has pretty much opaque to the client, so we don't
> >    want to give the low bit any special interpretation.
> >    
> >    Define a new FS_PRIVATE_I_VERSION flag for filesystems that manage the
> >    i_version on their own.
> >    
> >    Signed-off-by: J. Bruce Fields <[email protected]>
> > 
> > diff --git a/fs/nfs/fs_context.c b/fs/nfs/fs_context.c
> > index 29ec8b09a52d..9b8dd5b713a7 100644
> > --- a/fs/nfs/fs_context.c
> > +++ b/fs/nfs/fs_context.c
> > @@ -1488,7 +1488,8 @@ struct file_system_type nfs_fs_type = {
> >     .init_fs_context        = nfs_init_fs_context,
> >     .parameters             = nfs_fs_parameters,
> >     .kill_sb                = nfs_kill_super,
> > -   .fs_flags               = FS_RENAME_DOES_D_MOVE|FS_BINARY_MOUNTDATA,
> > +   .fs_flags               = FS_RENAME_DOES_D_MOVE|FS_BINARY_MOUNTDATA|
> > +                             FS_PRIVATE_I_VERSION,
> > };
> > MODULE_ALIAS_FS("nfs");
> > EXPORT_SYMBOL_GPL(nfs_fs_type);
> > @@ -1500,7 +1501,8 @@ struct file_system_type nfs4_fs_type = {
> >     .init_fs_context        = nfs_init_fs_context,
> >     .parameters             = nfs_fs_parameters,
> >     .kill_sb                = nfs_kill_super,
> > -   .fs_flags               = FS_RENAME_DOES_D_MOVE|FS_BINARY_MOUNTDATA,
> > +   .fs_flags               = FS_RENAME_DOES_D_MOVE|FS_BINARY_MOUNTDATA|
> > +                             FS_PRIVATE_I_VERSION,
> > };
> > MODULE_ALIAS_FS("nfs4");
> > MODULE_ALIAS("nfs4");
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 21cc971fd960..c5bb4268228b 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -2217,6 +2217,7 @@ struct file_system_type {
> > #define FS_HAS_SUBTYPE              4
> > #define FS_USERNS_MOUNT             8       /* Can be mounted by userns 
> > root */
> > #define FS_DISALLOW_NOTIFY_PERM     16      /* Disable fanotify permission 
> > events */
> > +#define FS_PRIVATE_I_VERSION       32      /* i_version managed by 
> > filesystem */
> > #define FS_THP_SUPPORT              8192    /* Remove once all fs converted 
> > */
> > #define FS_RENAME_DOES_D_MOVE       32768   /* FS will handle d_move() 
> > during rename()
> > internally. */
> >     int (*init_fs_context)(struct fs_context *);
> > diff --git a/include/linux/iversion.h b/include/linux/iversion.h
> > index 2917ef990d43..52c790a847de 100644
> > --- a/include/linux/iversion.h
> > +++ b/include/linux/iversion.h
> > @@ -307,6 +307,8 @@ inode_query_iversion(struct inode *inode)
> >     u64 cur, old, new;
> > 
> >     cur = inode_peek_iversion_raw(inode);
> > +   if (inode->i_sb->s_type->fs_flags & FS_PRIVATE_I_VERSION)
> > +           return cur;
> >     for (;;) {
> >             /* If flag is already set, then no need to swap */
> >             if (cur & I_VERSION_QUERIED) {
> 
> Yes, I can confirm that this absolutely helps! I replaced our (brute force) 
> iversion patch with this (much nicer) patch and we got the same improvement; 
> nfsd and it's clients no longer cause the re-export server's client cache to 
> constantly be re-validated. The re-export server can now serve the same 
> results to many clients from cache. Thanks so much for spending the time to 
> track this down. If merged, future (crazy) NFS re-exporters will benefit from 
> the metadata performance improvement/acceleration!
> 
> Now if anyone has any ideas why all the read calls to the originating server 
> are limited to a maximum of 128k (with rsize=1M) when coming via the 
> re-export server's nfsd threads, I see that as the next biggest performance 
> issue. Reading directly on the re-export server with a userspace process 
> issues 1MB reads as expected. It doesn't happen for writes (wsize=1MB all the 
> way through) but I'm not sure if that has more to do with async and write 
> back caching helping to build up the size before commit?
> 
> I figure the other remaining items on my (wish) list are probably more in the 
> "won't fix" or "can't fix" category (except maybe the NFSv4.0 input/output 
> errors?).
> 
> Daire

--
Linux-cachefs mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cachefs

Reply via email to