A big global mutex in kernfs_iop_permission will significanly drag system performance when processes concurrently open files on kernfs in Big machines(with >= 16 cpu cores).
This patch replace the big mutex with a global rwsem lock. So that kernfs_iop_permission can perform concurrently. In a 96-core AMD EPYC ROME server, I can observe 50% boost on a open+read+close cycle when I call open+read+close one thread per core concurrently 1000 times after applying the patch. Signed-off-by: Fox Chen <foxhlc...@gmail.com> --- fs/kernfs/inode.c | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c index fc2469a20fed..ea65da176cfa 100644 --- a/fs/kernfs/inode.c +++ b/fs/kernfs/inode.c @@ -14,9 +14,12 @@ #include <linux/slab.h> #include <linux/xattr.h> #include <linux/security.h> +#include <linux/rwsem.h> #include "kernfs-internal.h" +static DECLARE_RWSEM(kernfs_iattr_rwsem); + static const struct address_space_operations kernfs_aops = { .readpage = simple_readpage, .write_begin = simple_write_begin, @@ -106,9 +109,9 @@ int kernfs_setattr(struct kernfs_node *kn, const struct iattr *iattr) { int ret; - mutex_lock(&kernfs_mutex); + down_write(&kernfs_iattr_rwsem); ret = __kernfs_setattr(kn, iattr); - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_iattr_rwsem); return ret; } @@ -121,7 +124,7 @@ int kernfs_iop_setattr(struct dentry *dentry, struct iattr *iattr) if (!kn) return -EINVAL; - mutex_lock(&kernfs_mutex); + down_write(&kernfs_iattr_rwsem); error = setattr_prepare(dentry, iattr); if (error) goto out; @@ -134,7 +137,7 @@ int kernfs_iop_setattr(struct dentry *dentry, struct iattr *iattr) setattr_copy(inode, iattr); out: - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_iattr_rwsem); return error; } @@ -189,9 +192,9 @@ int kernfs_iop_getattr(const struct path *path, struct kstat *stat, struct inode *inode = d_inode(path->dentry); struct kernfs_node *kn = inode->i_private; - mutex_lock(&kernfs_mutex); + down_read(&kernfs_iattr_rwsem); kernfs_refresh_inode(kn, inode); - mutex_unlock(&kernfs_mutex); + up_read(&kernfs_iattr_rwsem); generic_fillattr(inode, stat); return 0; @@ -281,9 +284,9 @@ int kernfs_iop_permission(struct inode *inode, int mask) kn = inode->i_private; - mutex_lock(&kernfs_mutex); + down_read(&kernfs_iattr_rwsem); kernfs_refresh_inode(kn, inode); - mutex_unlock(&kernfs_mutex); + up_read(&kernfs_iattr_rwsem); return generic_permission(inode, mask); } -- 2.29.2 Differences from V1: * Use rwsem instead of rwlock so we can sleep when kernfs_iattrs calls GFP_KERNEL type memory allocation. * Use a global lock instead of a per-node lock to reduce memory consumption. It's still slow, a open+read+close cycle spends ~260us compared to ~3us of single thread one. After applying this, the mutex in kernfs_dop_revalidate becomes the top time-consuming operation on concurrent open+read+close. However That's harder to solve than this one and it's near the merge window and holiday season, I don't want to add up work load to you guys during that time so I decided to turn in this separately. Hopefully, I can bring in kernfs_dop_revalidate patch after holiday. And hope this patch can help. thanks, fox