Re: [Cluster-devel] [syzbot] KASAN: use-after-free Read in gfs2_glock_shrink_scan

2021-05-18 Thread Hillf Danton
On Mon, 17 May 2021 23:13:16 -0700
>Hello,
>
>syzbot found the following issue on:
>
>HEAD commit:315d9931 Merge tag 'pm-5.13-rc2' of git://git.kernel.org/p..
>git tree:   upstream
>console output: https://syzkaller.appspot.com/x/log.txt?x=126d17b3d0
>kernel config:  https://syzkaller.appspot.com/x/.config?x=4e950b1ffed48778
>dashboard link: https://syzkaller.appspot.com/bug?extid=34ba7ddbf3021981a228
>userspace arch: i386
>
>Unfortunately, I don't have any reproducer for this issue yet.
>
>IMPORTANT: if you fix the issue, please add the following tag to the commit:
>Reported-by: syzbot+34ba7ddbf3021981a...@syzkaller.appspotmail.com
>
>==
>BUG: KASAN: use-after-free in __list_del_entry_valid+0xcc/0xf0 
>lib/list_debug.c:42
>Read of size 8 at addr 888074ee8f20 by task khugepaged/1669
>
>CPU: 0 PID: 1669 Comm: khugepaged Not tainted 5.13.0-rc1-syzkaller #0
>Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
>Call Trace:
> __dump_stack lib/dump_stack.c:79 [inline]
> dump_stack+0x141/0x1d7 lib/dump_stack.c:120
> print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:233
> __kasan_report mm/kasan/report.c:419 [inline]
> kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:436
> __list_del_entry_valid+0xcc/0xf0 lib/list_debug.c:42
> __list_del_entry include/linux/list.h:132 [inline]
> list_del_init include/linux/list.h:204 [inline]
> gfs2_dispose_glock_lru fs/gfs2/glock.c:1777 [inline]
> gfs2_scan_glock_lru fs/gfs2/glock.c:1832 [inline]
> gfs2_glock_shrink_scan fs/gfs2/glock.c:1843 [inline]
> gfs2_glock_shrink_scan+0x69f/0xa80 fs/gfs2/glock.c:1838
> do_shrink_slab+0x42d/0xbd0 mm/vmscan.c:709
> shrink_slab+0x17f/0x6f0 mm/vmscan.c:869
> shrink_node_memcgs mm/vmscan.c:2852 [inline]
> shrink_node+0x8d1/0x1de0 mm/vmscan.c:2967
> shrink_zones mm/vmscan.c:3170 [inline]
> do_try_to_free_pages+0x388/0x14b0 mm/vmscan.c:3225
> try_to_free_pages+0x29f/0x750 mm/vmscan.c:3464
> __perform_reclaim mm/page_alloc.c:4430 [inline]
> __alloc_pages_direct_reclaim mm/page_alloc.c:4451 [inline]
> __alloc_pages_slowpath.constprop.0+0x84e/0x2140 mm/page_alloc.c:4855
> __alloc_pages+0x422/0x500 mm/page_alloc.c:5213
> __alloc_pages_node include/linux/gfp.h:549 [inline]
> khugepaged_alloc_page+0xa0/0x170 mm/khugepaged.c:882
> collapse_huge_page mm/khugepaged.c:1085 [inline]
> khugepaged_scan_pmd mm/khugepaged.c:1368 [inline]
> khugepaged_scan_mm_slot mm/khugepaged.c:2137 [inline]
> khugepaged_do_scan mm/khugepaged.c:2218 [inline]
> khugepaged+0x312b/0x5530 mm/khugepaged.c:2263
> kthread+0x3b1/0x4a0 kernel/kthread.c:313
> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
>
>Allocated by task 10231:
> kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
> kasan_set_track mm/kasan/common.c:46 [inline]
> set_alloc_info mm/kasan/common.c:428 [inline]
> __kasan_slab_alloc+0x84/0xa0 mm/kasan/common.c:461
> kasan_slab_alloc include/linux/kasan.h:236 [inline]
> slab_post_alloc_hook mm/slab.h:524 [inline]
> slab_alloc_node mm/slub.c:2912 [inline]
> slab_alloc mm/slub.c:2920 [inline]
> kmem_cache_alloc+0x152/0x3a0 mm/slub.c:2925
> gfs2_glock_get+0x20e/0x1100 fs/gfs2/glock.c:1027
> gfs2_inode_lookup+0x2c9/0xb10 fs/gfs2/inode.c:149
> gfs2_dir_search+0x20f/0x2c0 fs/gfs2/dir.c:1665
> gfs2_lookupi+0x475/0x640 fs/gfs2/inode.c:332
> gfs2_lookup_simple+0x99/0xe0 fs/gfs2/inode.c:273
> init_inodes+0x1c79/0x2610 fs/gfs2/ops_fstype.c:880
> gfs2_fill_super+0x1b4a/0x2680 fs/gfs2/ops_fstype.c:1204
> get_tree_bdev+0x440/0x760 fs/super.c:1293
> gfs2_get_tree+0x4a/0x270 fs/gfs2/ops_fstype.c:1273
> vfs_get_tree+0x89/0x2f0 fs/super.c:1498
> do_new_mount fs/namespace.c:2905 [inline]
> path_mount+0x132a/0x1fa0 fs/namespace.c:3235
> do_mount fs/namespace.c:3248 [inline]
> __do_sys_mount fs/namespace.c:3456 [inline]
> __se_sys_mount fs/namespace.c:3433 [inline]
> __ia32_sys_mount+0x27e/0x300 fs/namespace.c:3433
> do_syscall_32_irqs_on arch/x86/entry/common.c:78 [inline]
> __do_fast_syscall_32+0x67/0xe0 arch/x86/entry/common.c:143
> do_fast_syscall_32+0x2f/0x70 arch/x86/entry/common.c:168
> entry_SYSENTER_compat_after_hwframe+0x4d/0x5c
>
>Freed by task 8886:
> kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
> kasan_set_track+0x1c/0x30 mm/kasan/common.c:46
> kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:357
> kasan_slab_free mm/kasan/common.c:360 [inline]
> kasan_slab_free mm/kasan/common.c:325 [inline]
> __kasan_slab_free+0xfb/0x130 mm/kasan/common.c:368
> kasan_slab_free include/linux/kasan.h:212 [inline]
> slab_free_hook mm/slub.c:1581 [inline]
> slab_free_freelist_hook+0xdf/0x240 mm/slub.c:1606
> slab_free mm/slub.c:3166 [inline]
> kmem_cache_free+0x8a/0x740 mm/slub.c:3182
> gfs2_glock_dealloc+0xcc/0x150 fs/gfs2/glock.c:130
> rcu_do_batch kernel/rcu/tree.c:2558 [inline]
> rcu_core+0x7ab/0x13b0 kernel/rcu/tree.c:2793
> __do_softirq+0x29b/0x9f6 kernel/softirq.c:559
>
>Last potentially related work creation:
> kasan_save_stack+0x1b/0x40 mm/kasan/common

[Cluster-devel] [syzbot] KASAN: use-after-free Read in gfs2_glock_shrink_scan

2021-05-18 Thread syzbot
Hello,

syzbot found the following issue on:

HEAD commit:315d9931 Merge tag 'pm-5.13-rc2' of git://git.kernel.org/p..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=126d17b3d0
kernel config:  https://syzkaller.appspot.com/x/.config?x=4e950b1ffed48778
dashboard link: https://syzkaller.appspot.com/bug?extid=34ba7ddbf3021981a228
userspace arch: i386

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+34ba7ddbf3021981a...@syzkaller.appspotmail.com

==
BUG: KASAN: use-after-free in __list_del_entry_valid+0xcc/0xf0 
lib/list_debug.c:42
Read of size 8 at addr 888074ee8f20 by task khugepaged/1669

CPU: 0 PID: 1669 Comm: khugepaged Not tainted 5.13.0-rc1-syzkaller #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x141/0x1d7 lib/dump_stack.c:120
 print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:233
 __kasan_report mm/kasan/report.c:419 [inline]
 kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:436
 __list_del_entry_valid+0xcc/0xf0 lib/list_debug.c:42
 __list_del_entry include/linux/list.h:132 [inline]
 list_del_init include/linux/list.h:204 [inline]
 gfs2_dispose_glock_lru fs/gfs2/glock.c:1777 [inline]
 gfs2_scan_glock_lru fs/gfs2/glock.c:1832 [inline]
 gfs2_glock_shrink_scan fs/gfs2/glock.c:1843 [inline]
 gfs2_glock_shrink_scan+0x69f/0xa80 fs/gfs2/glock.c:1838
 do_shrink_slab+0x42d/0xbd0 mm/vmscan.c:709
 shrink_slab+0x17f/0x6f0 mm/vmscan.c:869
 shrink_node_memcgs mm/vmscan.c:2852 [inline]
 shrink_node+0x8d1/0x1de0 mm/vmscan.c:2967
 shrink_zones mm/vmscan.c:3170 [inline]
 do_try_to_free_pages+0x388/0x14b0 mm/vmscan.c:3225
 try_to_free_pages+0x29f/0x750 mm/vmscan.c:3464
 __perform_reclaim mm/page_alloc.c:4430 [inline]
 __alloc_pages_direct_reclaim mm/page_alloc.c:4451 [inline]
 __alloc_pages_slowpath.constprop.0+0x84e/0x2140 mm/page_alloc.c:4855
 __alloc_pages+0x422/0x500 mm/page_alloc.c:5213
 __alloc_pages_node include/linux/gfp.h:549 [inline]
 khugepaged_alloc_page+0xa0/0x170 mm/khugepaged.c:882
 collapse_huge_page mm/khugepaged.c:1085 [inline]
 khugepaged_scan_pmd mm/khugepaged.c:1368 [inline]
 khugepaged_scan_mm_slot mm/khugepaged.c:2137 [inline]
 khugepaged_do_scan mm/khugepaged.c:2218 [inline]
 khugepaged+0x312b/0x5530 mm/khugepaged.c:2263
 kthread+0x3b1/0x4a0 kernel/kthread.c:313
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

Allocated by task 10231:
 kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
 kasan_set_track mm/kasan/common.c:46 [inline]
 set_alloc_info mm/kasan/common.c:428 [inline]
 __kasan_slab_alloc+0x84/0xa0 mm/kasan/common.c:461
 kasan_slab_alloc include/linux/kasan.h:236 [inline]
 slab_post_alloc_hook mm/slab.h:524 [inline]
 slab_alloc_node mm/slub.c:2912 [inline]
 slab_alloc mm/slub.c:2920 [inline]
 kmem_cache_alloc+0x152/0x3a0 mm/slub.c:2925
 gfs2_glock_get+0x20e/0x1100 fs/gfs2/glock.c:1027
 gfs2_inode_lookup+0x2c9/0xb10 fs/gfs2/inode.c:149
 gfs2_dir_search+0x20f/0x2c0 fs/gfs2/dir.c:1665
 gfs2_lookupi+0x475/0x640 fs/gfs2/inode.c:332
 gfs2_lookup_simple+0x99/0xe0 fs/gfs2/inode.c:273
 init_inodes+0x1c79/0x2610 fs/gfs2/ops_fstype.c:880
 gfs2_fill_super+0x1b4a/0x2680 fs/gfs2/ops_fstype.c:1204
 get_tree_bdev+0x440/0x760 fs/super.c:1293
 gfs2_get_tree+0x4a/0x270 fs/gfs2/ops_fstype.c:1273
 vfs_get_tree+0x89/0x2f0 fs/super.c:1498
 do_new_mount fs/namespace.c:2905 [inline]
 path_mount+0x132a/0x1fa0 fs/namespace.c:3235
 do_mount fs/namespace.c:3248 [inline]
 __do_sys_mount fs/namespace.c:3456 [inline]
 __se_sys_mount fs/namespace.c:3433 [inline]
 __ia32_sys_mount+0x27e/0x300 fs/namespace.c:3433
 do_syscall_32_irqs_on arch/x86/entry/common.c:78 [inline]
 __do_fast_syscall_32+0x67/0xe0 arch/x86/entry/common.c:143
 do_fast_syscall_32+0x2f/0x70 arch/x86/entry/common.c:168
 entry_SYSENTER_compat_after_hwframe+0x4d/0x5c

Freed by task 8886:
 kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
 kasan_set_track+0x1c/0x30 mm/kasan/common.c:46
 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:357
 kasan_slab_free mm/kasan/common.c:360 [inline]
 kasan_slab_free mm/kasan/common.c:325 [inline]
 __kasan_slab_free+0xfb/0x130 mm/kasan/common.c:368
 kasan_slab_free include/linux/kasan.h:212 [inline]
 slab_free_hook mm/slub.c:1581 [inline]
 slab_free_freelist_hook+0xdf/0x240 mm/slub.c:1606
 slab_free mm/slub.c:3166 [inline]
 kmem_cache_free+0x8a/0x740 mm/slub.c:3182
 gfs2_glock_dealloc+0xcc/0x150 fs/gfs2/glock.c:130
 rcu_do_batch kernel/rcu/tree.c:2558 [inline]
 rcu_core+0x7ab/0x13b0 kernel/rcu/tree.c:2793
 __do_softirq+0x29b/0x9f6 kernel/softirq.c:559

Last potentially related work creation:
 kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
 kasan_record_aux_stack+0xe5/0x110 mm/kasan/generic.c:345
 __call_rcu kernel/rcu/tree.c:3038 [inline]
 call_rcu+0xb1/0x750 kernel/rcu/tr

[Cluster-devel] [gfs2 patch] gfs2: fix scheduling while atomic bug in glocks

2021-05-18 Thread Bob Peterson
Before this patch, in the unlikely event that gfs2_glock_dq encountered a
withdraw, it would do a wait_on_bit to wait for its journal to be
recovered, but it never released the glock's spin_lock, which caused a
scheduling-while-atomic error.

This patch unlocks the lockref spin_lock before waiting for recovery.

Fixes: 601ef0d52e961 ("gfs2: Force withdraw to replay journals and wait
for it to finish"
Reported-by: Alexander Aring 
Signed-off-by: Bob Peterson 
---
 fs/gfs2/glock.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 79f47f227e81..d7bee2ab5d2b 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1471,9 +1471,11 @@ void gfs2_glock_dq(struct gfs2_holder *gh)
glock_blocked_by_withdraw(gl) &&
gh->gh_gl != sdp->sd_jinode_gl) {
sdp->sd_glock_dqs_held++;
+   spin_unlock(&gl->gl_lockref.lock);
might_sleep();
wait_on_bit(&sdp->sd_flags, SDF_WITHDRAW_RECOVERY,
TASK_UNINTERRUPTIBLE);
+   spin_lock(&gl->gl_lockref.lock);
}
if (gh->gh_flags & GL_NOCACHE)
handle_callback(gl, LM_ST_UNLOCKED, 0, false);



[Cluster-devel] [gfs2 PATCH] gfs2: fix a deadlock on withdraw-during mount

2021-05-18 Thread Bob Peterson
Before this patch, gfs2 would deadlock because of the following
sequence during mount:

mount
   gfs2_fill_super
  gfs2_make_fs_rw <--- Detects IO error with glock
 kthread_stop(sdp->sd_quotad_process);
<--- Blocked waiting for quotad to finish
logd
   Detects IO error and the need to withdraw
   calls gfs2_withdraw
  gfs2_make_fs_ro
 kthread_stop(sdp->sd_quotad_process);
<--- Blocked waiting for quotad to finish

gfs2_quotad
   gfs2_statfs_sync
  gfs2_glock_wait < Blocked waiting for statfs glock to be granted

glock_work_func
   do_xmote <---Detects IO error, can't release glock: blocked on withdraw
  glops->go_inval
  glock_blocked_by_withdraw
 requeue glock work & exit <--- work requeued, blocked by withdraw

This patch makes a special exception for the statfs system inode glock,
which allows the statfs glock UNLOCK to proceed normally. That allows the
quotad daemon to exit during the withdraw, which allows the logd daemon
to exit during the withdraw, which allows the mount to exit.

Signed-off-by: Bob Peterson 
---
 fs/gfs2/glock.c | 24 +---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index d7bee2ab5d2b..5af436c94d2a 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -582,6 +582,16 @@ static void finish_xmote(struct gfs2_glock *gl, unsigned 
int ret)
spin_unlock(&gl->gl_lockref.lock);
 }
 
+static bool is_system_glock(struct gfs2_glock *gl)
+{
+   struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
+   struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
+
+   if (gl == m_ip->i_gl)
+   return true;
+   return false;
+}
+
 /**
  * do_xmote - Calls the DLM to change the state of a lock
  * @gl: The lock state
@@ -671,17 +681,25 @@ __acquires(&gl->gl_lockref.lock)
 * to see sd_log_error and withdraw, and in the meantime, requeue the
 * work for later.
 *
+* We make a special exception for some system glocks, such as the
+* system statfs inode glock, which needs to be granted before the
+* gfs2_quotad daemon can exit, and that exit needs to finish before
+* we can unmount the withdrawn file system.
+*
 * However, if we're just unlocking the lock (say, for unmount, when
 * gfs2_gl_hash_clear calls clear_glock) and recovery is complete
 * then it's okay to tell dlm to unlock it.
 */
if (unlikely(sdp->sd_log_error && !gfs2_withdrawn(sdp)))
gfs2_withdraw_delayed(sdp);
-   if (glock_blocked_by_withdraw(gl)) {
-   if (target != LM_ST_UNLOCKED ||
-   test_bit(SDF_WITHDRAW_RECOVERY, &sdp->sd_flags)) {
+   if (glock_blocked_by_withdraw(gl) &&
+   (target != LM_ST_UNLOCKED ||
+test_bit(SDF_WITHDRAW_RECOVERY, &sdp->sd_flags))) {
+   if (!is_system_glock(gl)) {
gfs2_glock_queue_work(gl, GL_GLOCK_DFT_HOLD);
goto out;
+   } else {
+   clear_bit(GLF_INVALIDATE_IN_PROGRESS, &gl->gl_flags);
}
}
 



Re: [Cluster-devel] [syzbot] KASAN: use-after-free Read in gfs2_glock_shrink_scan

2021-05-18 Thread Andreas Gruenbacher
Hi,

On Tue, May 18, 2021 at 10:49 AM Hillf Danton  wrote:
> When put, glock is removed from lru by calling gfs2_glock_remove_from_lru()
> in __gfs2_glock_put(), and check GLF_LRU under lru_lock.
>
> On the shrink scan path, GLF_LRU is cleared under lru_lock but because of
> cond_resched_lock(&lru_lock) in gfs2_dispose_glock_lru() progress on the put
> side can be made without deleting glock from lru.
>
> Keep GLF_LRU across the race window opened by cond_resched_lock(&lru_lock) to
> ensure correct behavior on both sides - clrear GLF_LRU after list_del under
> lru_lock.

can you please resend with a Signed-off-by tag and a valid patch (---
line missing)?

Thanks,
Andreas

> +++ x/fs/gfs2/glock.c
> @@ -1772,6 +1772,7 @@ __acquires(&lru_lock)
> while(!list_empty(list)) {
> gl = list_first_entry(list, struct gfs2_glock, gl_lru);
> list_del_init(&gl->gl_lru);
> +   clear_bit(GLF_LRU, &gl->gl_flags);
> if (!spin_trylock(&gl->gl_lockref.lock)) {
>  add_back_to_lru:
> list_add(&gl->gl_lru, &lru_list);
> @@ -1817,7 +1818,6 @@ static long gfs2_scan_glock_lru(int nr)
> if (!test_bit(GLF_LOCK, &gl->gl_flags)) {
> list_move(&gl->gl_lru, &dispose);
> atomic_dec(&lru_count);
> -   clear_bit(GLF_LRU, &gl->gl_flags);
> freed++;
> continue;
> }
>