[Cluster-devel] [PATCH dlm/next] fs: dlm: fix race in nodeid2con
This patch fixes a race in nodeid2con in cases that we parallel running a lookup and both will create a connection structure for the same nodeid. It's a rare case to create a new connection structure to keep reader lockless we just do a lookup inside the protection area again and drop previous work if this race happens. Fixes: a47666eb763cc ("fs: dlm: make connection hash lockless") Signed-off-by: Alexander Aring --- fs/dlm/lowcomms.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c index b7b7360be609e..79f56f16bc2ce 100644 --- a/fs/dlm/lowcomms.c +++ b/fs/dlm/lowcomms.c @@ -175,7 +175,7 @@ static struct connection *__find_con(int nodeid) */ static struct connection *nodeid2con(int nodeid, gfp_t alloc) { - struct connection *con = NULL; + struct connection *con, *tmp; int r; con = __find_con(nodeid); @@ -213,6 +213,20 @@ static struct connection *nodeid2con(int nodeid, gfp_t alloc) r = nodeid_hash(nodeid); spin_lock(_lock); + /* Because multiple workqueues/threads calls this function it can +* race on multiple cpu's. Instead of locking hot path __find_con() +* we just check in rare cases of recently added nodes again +* under protection of connections_lock. If this is the case we +* abort our connection creation and return the existing connection. +*/ + tmp = __find_con(nodeid); + if (tmp) { + spin_unlock(_lock); + kfree(con->rx_buf); + kfree(con); + return tmp; + } + hlist_add_head_rcu(>list, _hash[r]); spin_unlock(_lock); -- 2.26.2
Re: [Cluster-devel] KASAN: slab-out-of-bounds Write in gfs2_fill_super
On 30/09/2020 13:39, syzbot wrote: Hello, syzbot found the following issue on: HEAD commit:fb0155a0 Merge tag 'nfs-for-5.9-3' of git://git.linux-nfs... git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=13458c0f90 kernel config: https://syzkaller.appspot.com/x/.config?x=adebb40048274f92 dashboard link: https://syzkaller.appspot.com/bug?extid=af90d47a37376844e731 compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81) syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15c307d390 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1353d58d90 Bisection is inconclusive: the issue happens on the oldest tested release. bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=106acbbb90 final oops: https://syzkaller.appspot.com/x/report.txt?x=126acbbb90 console output: https://syzkaller.appspot.com/x/log.txt?x=146acbbb90 IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+af90d47a37376844e...@syzkaller.appspotmail.com gfs2: fsid=loop0: Trying to join cluster "lock_nolock", "loop0" gfs2: fsid=loop0: Now mounting FS... == BUG: KASAN: slab-out-of-bounds in gfs2_read_sb fs/gfs2/ops_fstype.c:342 [inline] BUG: KASAN: slab-out-of-bounds in init_sb fs/gfs2/ops_fstype.c:479 [inline] BUG: KASAN: slab-out-of-bounds in gfs2_fill_super+0x1db5/0x3fe0 fs/gfs2/ops_fstype.c:1096 Write of size 8 at addr 88809073d548 by task syz-executor940/6853 Bug filed for this: https://bugzilla.redhat.com/show_bug.cgi?id=1883929 Andy CPU: 1 PID: 6853 Comm: syz-executor940 Not tainted 5.9.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1d6/0x29e lib/dump_stack.c:118 print_address_description+0x66/0x620 mm/kasan/report.c:383 __kasan_report mm/kasan/report.c:513 [inline] kasan_report+0x132/0x1d0 mm/kasan/report.c:530 gfs2_read_sb fs/gfs2/ops_fstype.c:342 [inline] init_sb fs/gfs2/ops_fstype.c:479 [inline] gfs2_fill_super+0x1db5/0x3fe0 fs/gfs2/ops_fstype.c:1096 get_tree_bdev+0x3e9/0x5f0 fs/super.c:1342 gfs2_get_tree+0x4c/0x1f0 fs/gfs2/ops_fstype.c:1201 vfs_get_tree+0x88/0x270 fs/super.c:1547 do_new_mount fs/namespace.c:2875 [inline] path_mount+0x179d/0x29e0 fs/namespace.c:3192 do_mount fs/namespace.c:3205 [inline] __do_sys_mount fs/namespace.c:3413 [inline] __se_sys_mount+0x126/0x180 fs/namespace.c:3390 do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x446dba Code: b8 08 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 fd ad fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 da ad fb ff c3 66 0f 1f 84 00 00 00 00 00 RSP: 002b:7fff4c56e748 EFLAGS: 0293 ORIG_RAX: 00a5 RAX: ffda RBX: 7fff4c56e7a0 RCX: 00446dba RDX: 2000 RSI: 2100 RDI: 7fff4c56e760 RBP: 7fff4c56e760 R08: 7fff4c56e7a0 R09: 7fff0015 R10: 0220 R11: 0293 R12: 0001 R13: 0004 R14: 0003 R15: 0003 Allocated by task 6853: kasan_save_stack mm/kasan/common.c:48 [inline] kasan_set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc+0x100/0x130 mm/kasan/common.c:461 kmem_cache_alloc_trace+0x1e4/0x2e0 mm/slab.c:3554 kmalloc include/linux/slab.h:554 [inline] kzalloc include/linux/slab.h:666 [inline] init_sbd fs/gfs2/ops_fstype.c:77 [inline] gfs2_fill_super+0xb6/0x3fe0 fs/gfs2/ops_fstype.c:1018 get_tree_bdev+0x3e9/0x5f0 fs/super.c:1342 gfs2_get_tree+0x4c/0x1f0 fs/gfs2/ops_fstype.c:1201 vfs_get_tree+0x88/0x270 fs/super.c:1547 do_new_mount fs/namespace.c:2875 [inline] path_mount+0x179d/0x29e0 fs/namespace.c:3192 do_mount fs/namespace.c:3205 [inline] __do_sys_mount fs/namespace.c:3413 [inline] __se_sys_mount+0x126/0x180 fs/namespace.c:3390 do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The buggy address belongs to the object at 88809073c000 which belongs to the cache kmalloc-8k of size 8192 The buggy address is located 5448 bytes inside of 8192-byte region [88809073c000, 88809073e000) The buggy address belongs to the page: page:bd4b0b2d refcount:1 mapcount:0 mapping: index:0x0 pfn:0x9073c head:bd4b0b2d order:2 compound_mapcount:0 compound_pincount:0 flags: 0xfffe010200(slab|head) raw: 00fffe010200 ea00028e5608 8880aa441b50 8880aa440a00 raw: 88809073c000 00010001 page dumped because: kasan: bad access detected Memory state around the buggy address: 88809073d400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Re: [Cluster-devel] general protection fault in gfs2_withdraw
On 29/09/2020 06:34, syzbot wrote: syzbot has bisected this issue to: commit 601ef0d52e9617588fcff3df26953592f2eb44ac Author: Bob Peterson Date: Tue Jan 28 19:23:45 2020 + gfs2: Force withdraw to replay journals and wait for it to finish bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=151d25e390 start commit: 7c7ec322 Merge tag 'for-linus' of git://git.kernel.org/pub.. git tree: upstream final oops: https://syzkaller.appspot.com/x/report.txt?x=171d25e390 console output: https://syzkaller.appspot.com/x/log.txt?x=131d25e390 kernel config: https://syzkaller.appspot.com/x/.config?x=6184b75aa6d48d66 dashboard link: https://syzkaller.appspot.com/bug?extid=50a8a9cf8127f2c6f5df syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13c6a10990 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15d45ed390 Reported-by: syzbot+50a8a9cf8127f2c6f...@syzkaller.appspotmail.com Fixes: 601ef0d52e96 ("gfs2: Force withdraw to replay journals and wait for it to finish") For information about bisection process see: https://goo.gl/tpsmEJ#bisection Bug filed for this one: https://bugzilla.redhat.com/show_bug.cgi?id=1883932 Andy
[Cluster-devel] KASAN: slab-out-of-bounds Write in gfs2_fill_super
Hello, syzbot found the following issue on: HEAD commit:fb0155a0 Merge tag 'nfs-for-5.9-3' of git://git.linux-nfs... git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=13458c0f90 kernel config: https://syzkaller.appspot.com/x/.config?x=adebb40048274f92 dashboard link: https://syzkaller.appspot.com/bug?extid=af90d47a37376844e731 compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81) syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15c307d390 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1353d58d90 Bisection is inconclusive: the issue happens on the oldest tested release. bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=106acbbb90 final oops: https://syzkaller.appspot.com/x/report.txt?x=126acbbb90 console output: https://syzkaller.appspot.com/x/log.txt?x=146acbbb90 IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+af90d47a37376844e...@syzkaller.appspotmail.com gfs2: fsid=loop0: Trying to join cluster "lock_nolock", "loop0" gfs2: fsid=loop0: Now mounting FS... == BUG: KASAN: slab-out-of-bounds in gfs2_read_sb fs/gfs2/ops_fstype.c:342 [inline] BUG: KASAN: slab-out-of-bounds in init_sb fs/gfs2/ops_fstype.c:479 [inline] BUG: KASAN: slab-out-of-bounds in gfs2_fill_super+0x1db5/0x3fe0 fs/gfs2/ops_fstype.c:1096 Write of size 8 at addr 88809073d548 by task syz-executor940/6853 CPU: 1 PID: 6853 Comm: syz-executor940 Not tainted 5.9.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1d6/0x29e lib/dump_stack.c:118 print_address_description+0x66/0x620 mm/kasan/report.c:383 __kasan_report mm/kasan/report.c:513 [inline] kasan_report+0x132/0x1d0 mm/kasan/report.c:530 gfs2_read_sb fs/gfs2/ops_fstype.c:342 [inline] init_sb fs/gfs2/ops_fstype.c:479 [inline] gfs2_fill_super+0x1db5/0x3fe0 fs/gfs2/ops_fstype.c:1096 get_tree_bdev+0x3e9/0x5f0 fs/super.c:1342 gfs2_get_tree+0x4c/0x1f0 fs/gfs2/ops_fstype.c:1201 vfs_get_tree+0x88/0x270 fs/super.c:1547 do_new_mount fs/namespace.c:2875 [inline] path_mount+0x179d/0x29e0 fs/namespace.c:3192 do_mount fs/namespace.c:3205 [inline] __do_sys_mount fs/namespace.c:3413 [inline] __se_sys_mount+0x126/0x180 fs/namespace.c:3390 do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x446dba Code: b8 08 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 fd ad fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 da ad fb ff c3 66 0f 1f 84 00 00 00 00 00 RSP: 002b:7fff4c56e748 EFLAGS: 0293 ORIG_RAX: 00a5 RAX: ffda RBX: 7fff4c56e7a0 RCX: 00446dba RDX: 2000 RSI: 2100 RDI: 7fff4c56e760 RBP: 7fff4c56e760 R08: 7fff4c56e7a0 R09: 7fff0015 R10: 0220 R11: 0293 R12: 0001 R13: 0004 R14: 0003 R15: 0003 Allocated by task 6853: kasan_save_stack mm/kasan/common.c:48 [inline] kasan_set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc+0x100/0x130 mm/kasan/common.c:461 kmem_cache_alloc_trace+0x1e4/0x2e0 mm/slab.c:3554 kmalloc include/linux/slab.h:554 [inline] kzalloc include/linux/slab.h:666 [inline] init_sbd fs/gfs2/ops_fstype.c:77 [inline] gfs2_fill_super+0xb6/0x3fe0 fs/gfs2/ops_fstype.c:1018 get_tree_bdev+0x3e9/0x5f0 fs/super.c:1342 gfs2_get_tree+0x4c/0x1f0 fs/gfs2/ops_fstype.c:1201 vfs_get_tree+0x88/0x270 fs/super.c:1547 do_new_mount fs/namespace.c:2875 [inline] path_mount+0x179d/0x29e0 fs/namespace.c:3192 do_mount fs/namespace.c:3205 [inline] __do_sys_mount fs/namespace.c:3413 [inline] __se_sys_mount+0x126/0x180 fs/namespace.c:3390 do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The buggy address belongs to the object at 88809073c000 which belongs to the cache kmalloc-8k of size 8192 The buggy address is located 5448 bytes inside of 8192-byte region [88809073c000, 88809073e000) The buggy address belongs to the page: page:bd4b0b2d refcount:1 mapcount:0 mapping: index:0x0 pfn:0x9073c head:bd4b0b2d order:2 compound_mapcount:0 compound_pincount:0 flags: 0xfffe010200(slab|head) raw: 00fffe010200 ea00028e5608 8880aa441b50 8880aa440a00 raw: 88809073c000 00010001 page dumped because: kasan: bad access detected Memory state around the buggy address: 88809073d400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 88809073d480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >88809073d500: 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc