Re: [Cluster-devel] [PATCH v3 05/49] mm: shrinker: add infrastructure for dynamically allocating shrinker
On Thu, Jul 27, 2023 at 04:04:18PM +0800, Qi Zheng wrote: > Currently, the shrinker instances can be divided into the following three > types: > > a) global shrinker instance statically defined in the kernel, such as >workingset_shadow_shrinker. > > b) global shrinker instance statically defined in the kernel modules, such >as mmu_shrinker in x86. > > c) shrinker instance embedded in other structures. > > For case a, the memory of shrinker instance is never freed. For case b, > the memory of shrinker instance will be freed after synchronize_rcu() when > the module is unloaded. For case c, the memory of shrinker instance will > be freed along with the structure it is embedded in. > > In preparation for implementing lockless slab shrink, we need to > dynamically allocate those shrinker instances in case c, then the memory > can be dynamically freed alone by calling kfree_rcu(). > > So this commit adds the following new APIs for dynamically allocating > shrinker, and add a private_data field to struct shrinker to record and > get the original embedded structure. > > 1. shrinker_alloc() > > Used to allocate shrinker instance itself and related memory, it will > return a pointer to the shrinker instance on success and NULL on failure. > > 2. shrinker_register() > > Used to register the shrinker instance, which is same as the current > register_shrinker_prepared(). > > 3. shrinker_free() > > Used to unregister (if needed) and free the shrinker instance. > > In order to simplify shrinker-related APIs and make shrinker more > independent of other kernel mechanisms, subsequent submissions will use > the above API to convert all shrinkers (including case a and b) to > dynamically allocated, and then remove all existing APIs. > > This will also have another advantage mentioned by Dave Chinner: > > ``` > The other advantage of this is that it will break all the existing > out of tree code and third party modules using the old API and will > no longer work with a kernel using lockless slab shrinkers. They > need to break (both at the source and binary levels) to stop bad > things from happening due to using uncoverted shrinkers in the new nit: uncoverted -> unconverted > setup. > ``` > > Signed-off-by: Qi Zheng ... > diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c > index f1becfd45853..506257585408 100644 > --- a/mm/shrinker_debug.c > +++ b/mm/shrinker_debug.c > @@ -191,6 +191,20 @@ int shrinker_debugfs_add(struct shrinker *shrinker) > return 0; > } > > +int shrinker_debugfs_name_alloc(struct shrinker *shrinker, const char *fmt, > + va_list ap) > +{ > + shrinker->name = kvasprintf_const(GFP_KERNEL, fmt, ap); > + > + return shrinker->name ? 0 : -ENOMEM; > +} > + > +void shrinker_debugfs_name_free(struct shrinker *shrinker) > +{ > + kfree_const(shrinker->name); > + shrinker->name = NULL; > +} > + These functions have no prototype in this file, perhaps internal.h should be included? > int shrinker_debugfs_rename(struct shrinker *shrinker, const char *fmt, ...) > { > struct dentry *entry; ...
Re: [Cluster-devel] [syzbot] [gfs2?] kernel panic: hung_task: blocked tasks (2)
On 7/28/23 3:20 AM, David Howells wrote: syzbot wrote: Fixes: 9c8ad7a2ff0b ("uapi, x86: Fix the syscall numbering of the mount API syscalls [ver #2]") This would seem unlikely to be the culprit. It just changes the numbering on the fsconfig-related syscalls. Running the test program on v6.5-rc3, however, I end up with the test process stuck in the D state: INFO: task repro-17687f1aa:5551 blocked for more than 120 seconds. Not tainted 6.5.0-rc3-build3+ #1448 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:repro-17687f1aa state:D stack:0 pid:5551 ppid:5516 flags:0x4002 Call Trace: __schedule+0x4a7/0x4f1 schedule+0x66/0xa1 schedule_timeout+0x9d/0xd7 ? __next_timer_interrupt+0xf6/0xf6 gfs2_gl_hash_clear+0xa0/0xdc ? sugov_irq_work+0x15/0x15 gfs2_put_super+0x19f/0x1d3 generic_shutdown_super+0x78/0x187 kill_block_super+0x1c/0x32 deactivate_locked_super+0x2f/0x61 cleanup_mnt+0xab/0xcc task_work_run+0x6b/0x80 exit_to_user_mode_prepare+0x76/0xfd syscall_exit_to_user_mode+0x14/0x31 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f89aac31dab RSP: 002b:7fff43d9b878 EFLAGS: 0206 ORIG_RAX: 00a6 RAX: RBX: 7fff43d9cad8 RCX: 7f89aac31dab RDX: RSI: 000a RDI: 7fff43d9b920 RBP: 7fff43d9c960 R08: R09: 0073 R10: R11: 0206 R12: R13: 7fff43d9cae8 R14: 00417e18 R15: 7f89aad51000 David Hi David, This indicates gfs2 is having trouble resolving and freeing all its glocks, which usually means a reference counting problem or ail (active items list) problem during unmount. If gfs2_gl_hash_clear gets stuck for a long period of time it is supposed to dump the remaining list of glocks that still have not been resolved. I think it takes 10 minutes or so. Can you post the console messages that follow? That will help us figure out what's happening. Thanks. Regards, Bob Peterson
Re: [Cluster-devel] (subset) [PATCH v6 0/7] fs: implement multigrain timestamps
On Tue, 25 Jul 2023 10:58:13 -0400, Jeff Layton wrote: > The VFS always uses coarse-grained timestamps when updating the > ctime and mtime after a change. This has the benefit of allowing > filesystems to optimize away a lot metadata updates, down to around 1 > per jiffy, even when a file is under heavy writes. > > Unfortunately, this coarseness has always been an issue when we're > exporting via NFSv3, which relies on timestamps to validate caches. A > lot of changes can happen in a jiffy, so timestamps aren't sufficient to > help the client decide to invalidate the cache. > > [...] Survives xfstests (tmpfs, ext4, overlayfs) and the fs portions of LTP. Let's keep our eyes open for any potential issues. Past suspects has been IMA interacting with overlayfs. We'll see. Picked everything minus the tmpfs-writepage patch that was contentious. --- Applied to the vfs.ctime branch of the vfs/vfs.git tree. Patches in the vfs.ctime branch should appear in linux-next soon. Please report any outstanding bugs that were missed during review in a new review to the original patch series allowing us to drop it. It's encouraged to provide Acked-bys and Reviewed-bys even though the patch has now been applied. If possible patch trailers will be updated. Note that commit hashes shown below are subject to change due to rebase, trailer updates or similar. If in doubt, please check the listed branch. tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git branch: vfs.ctime [1/7] fs: pass the request_mask to generic_fillattr https://git.kernel.org/vfs/vfs/c/0a6ab6dc6958 [2/7] fs: add infrastructure for multigrain timestamps https://git.kernel.org/vfs/vfs/c/d242b98ac3e9 [4/7] tmpfs: add support for multigrain timestamps https://git.kernel.org/vfs/vfs/c/1f31c58cf032 [5/7] xfs: switch to multigrain timestamps https://git.kernel.org/vfs/vfs/c/859dd91017dd [6/7] ext4: switch to multigrain timestamps https://git.kernel.org/vfs/vfs/c/093af249eab4 [7/7] btrfs: convert to multigrain timestamps https://git.kernel.org/vfs/vfs/c/b90a04d1c30c
Re: [Cluster-devel] [PATCH v3 04/49] mm: shrinker: remove redundant shrinker_rwsem in debugfs operations
On Thu, Jul 27, 2023 at 04:04:17PM +0800, Qi Zheng wrote: > The debugfs_remove_recursive() will wait for debugfs_file_put() to return, > so the shrinker will not be freed when doing debugfs operations (such as > shrinker_debugfs_count_show() and shrinker_debugfs_scan_write()), so there > is no need to hold shrinker_rwsem during debugfs operations. > > Signed-off-by: Qi Zheng > Reviewed-by: Muchun Song > --- > mm/shrinker_debug.c | 14 -- > 1 file changed, 14 deletions(-) > > diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c > index 3ab53fad8876..f1becfd45853 100644 > --- a/mm/shrinker_debug.c > +++ b/mm/shrinker_debug.c > @@ -55,11 +55,6 @@ static int shrinker_debugfs_count_show(struct seq_file *m, > void *v) > if (!count_per_node) > return -ENOMEM; > > - ret = down_read_killable(&shrinker_rwsem); > - if (ret) { > - kfree(count_per_node); > - return ret; > - } > rcu_read_lock(); Hi Qi Zheng, As can be seen in the next hunk, this function returns 'ret'. However, with this change 'ret' is uninitialised unless signal_pending() returns non-zero in the while loop below. This is flagged in a clan-16 W=1 build. mm/shrinker_debug.c:87:11: warning: variable 'ret' is used uninitialized whenever 'do' loop exits because its condition is false [-Wsometimes-uninitialized] } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL); ^~~~ mm/shrinker_debug.c:92:9: note: uninitialized use occurs here return ret; ^~~ mm/shrinker_debug.c:87:11: note: remove the condition if it is always true } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL); ^~~~ 1 mm/shrinker_debug.c:77:7: warning: variable 'ret' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized] if (!memcg_aware) { ^~~~ mm/shrinker_debug.c:92:9: note: uninitialized use occurs here return ret; ^~~ mm/shrinker_debug.c:77:3: note: remove the 'if' if its condition is always false if (!memcg_aware) { ^~~ mm/shrinker_debug.c:52:9: note: initialize the variable 'ret' to silence this warning int ret, nid; ^ = 0 > > memcg_aware = shrinker->flags & SHRINKER_MEMCG_AWARE; > @@ -92,7 +87,6 @@ static int shrinker_debugfs_count_show(struct seq_file *m, > void *v) > } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL); > > rcu_read_unlock(); > - up_read(&shrinker_rwsem); > > kfree(count_per_node); > return ret; ...
Re: [Cluster-devel] [syzbot] [gfs2?] kernel panic: hung_task: blocked tasks (2)
syzbot has bisected this issue to: commit 9c8ad7a2ff0bfe58f019ec0abc1fb965114dde7d Author: David Howells Date: Thu May 16 11:52:27 2019 + uapi, x86: Fix the syscall numbering of the mount API syscalls [ver #2] bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=169b475ea8 start commit: fdf0eaf11452 Linux 6.5-rc2 git tree: upstream final oops: https://syzkaller.appspot.com/x/report.txt?x=159b475ea8 console output: https://syzkaller.appspot.com/x/log.txt?x=119b475ea8 kernel config: https://syzkaller.appspot.com/x/.config?x=27e33fd2346a54b dashboard link: https://syzkaller.appspot.com/bug?extid=607aa822c60b2e75b269 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11322fb6a8 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=17687f1aa8 Reported-by: syzbot+607aa822c60b2e75b...@syzkaller.appspotmail.com Fixes: 9c8ad7a2ff0b ("uapi, x86: Fix the syscall numbering of the mount API syscalls [ver #2]") For information about bisection process see: https://goo.gl/tpsmEJ#bisection
Re: [Cluster-devel] [PATCH v3 04/49] mm: shrinker: remove redundant shrinker_rwsem in debugfs operations
Hi Simon, On 2023/7/28 16:13, Simon Horman wrote: On Thu, Jul 27, 2023 at 04:04:17PM +0800, Qi Zheng wrote: The debugfs_remove_recursive() will wait for debugfs_file_put() to return, so the shrinker will not be freed when doing debugfs operations (such as shrinker_debugfs_count_show() and shrinker_debugfs_scan_write()), so there is no need to hold shrinker_rwsem during debugfs operations. Signed-off-by: Qi Zheng Reviewed-by: Muchun Song --- mm/shrinker_debug.c | 14 -- 1 file changed, 14 deletions(-) diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c index 3ab53fad8876..f1becfd45853 100644 --- a/mm/shrinker_debug.c +++ b/mm/shrinker_debug.c @@ -55,11 +55,6 @@ static int shrinker_debugfs_count_show(struct seq_file *m, void *v) if (!count_per_node) return -ENOMEM; - ret = down_read_killable(&shrinker_rwsem); - if (ret) { - kfree(count_per_node); - return ret; - } rcu_read_lock(); Hi Qi Zheng, As can be seen in the next hunk, this function returns 'ret'. However, with this change 'ret' is uninitialised unless signal_pending() returns non-zero in the while loop below. Thanks for your feedback, the 'ret' should be initialized to 0, will fix it. Thanks, Qi This is flagged in a clan-16 W=1 build. mm/shrinker_debug.c:87:11: warning: variable 'ret' is used uninitialized whenever 'do' loop exits because its condition is false [-Wsometimes-uninitialized] } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL); ^~~~ mm/shrinker_debug.c:92:9: note: uninitialized use occurs here return ret; ^~~ mm/shrinker_debug.c:87:11: note: remove the condition if it is always true } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL); ^~~~ 1 mm/shrinker_debug.c:77:7: warning: variable 'ret' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized] if (!memcg_aware) { ^~~~ mm/shrinker_debug.c:92:9: note: uninitialized use occurs here return ret; ^~~ mm/shrinker_debug.c:77:3: note: remove the 'if' if its condition is always false if (!memcg_aware) { ^~~ mm/shrinker_debug.c:52:9: note: initialize the variable 'ret' to silence this warning int ret, nid; ^ = 0 memcg_aware = shrinker->flags & SHRINKER_MEMCG_AWARE; @@ -92,7 +87,6 @@ static int shrinker_debugfs_count_show(struct seq_file *m, void *v) } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL); rcu_read_unlock(); - up_read(&shrinker_rwsem); kfree(count_per_node); return ret; ...