Re: [Cluster-devel] general protection fault in gfs2_withdraw

2020-09-28 Thread syzbot
syzbot has found a reproducer for the following issue on:

HEAD commit:7c7ec322 Merge tag 'for-linus' of git://git.kernel.org/pub..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=11f2ff2790
kernel config:  https://syzkaller.appspot.com/x/.config?x=6184b75aa6d48d66
dashboard link: https://syzkaller.appspot.com/bug?extid=50a8a9cf8127f2c6f5df
compiler:   clang version 10.0.0 (https://github.com/llvm/llvm-project/ 
c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=160fb77390
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1104f10990

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+50a8a9cf8127f2c6f...@syzkaller.appspotmail.com

gfs2: fsid=syz:syz.0: fatal: invalid metadata block
  bh = 2072 (magic number)
  function = gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 417
gfs2: fsid=syz:syz.0: about to withdraw this file system
general protection fault, probably for non-canonical address 
0xdc0e:  [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0070-0x0077]
CPU: 0 PID: 6842 Comm: syz-executor264 Not tainted 5.9.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
RIP: 0010:signal_our_withdraw fs/gfs2/util.c:97 [inline]
RIP: 0010:gfs2_withdraw+0x2b0/0xe20 fs/gfs2/util.c:294
Code: e8 03 48 89 44 24 38 42 80 3c 38 00 74 08 48 89 ef e8 34 f7 69 fe 48 89 
6c 24 20 48 8b 6d 00 48 83 c5 70 48 89 e8 48 c1 e8 03 <42> 80 3c 38 00 74 08 48 
89 ef e8 11 f7 69 fe 48 8b 45 00 48 89 44
RSP: 0018:c900057474f0 EFLAGS: 00010202
RAX: 000e RBX: 8880a71e RCX: 98268db4dfe86a00
RDX: 888092bb6100 RSI:  RDI: 8880a71e0430
RBP: 0070 R08: 834ad50c R09: ed1015d041c3
R10: ed1015d041c3 R11:  R12: 111014e3c04d
R13: 8880a71e0050 R14: 8880a71e026c R15: dc00
FS:  0233b880() GS:8880ae80() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f74f826d6c0 CR3: a04cc000 CR4: 001506f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 gfs2_meta_check_ii+0x70/0x80 fs/gfs2/util.c:450
 gfs2_metatype_check_i fs/gfs2/util.h:126 [inline]
 gfs2_meta_indirect_buffer+0x29f/0x380 fs/gfs2/meta_io.c:417
 gfs2_meta_inode_buffer fs/gfs2/meta_io.h:70 [inline]
 gfs2_inode_refresh+0x65/0xc00 fs/gfs2/glops.c:438
 inode_go_lock+0x12c/0x480 fs/gfs2/glops.c:468
 do_promote+0x4db/0xcd0 fs/gfs2/glock.c:390
 finish_xmote+0x907/0x1350 fs/gfs2/glock.c:560
 do_xmote+0xadb/0x14c0 fs/gfs2/glock.c:686
 gfs2_glock_nq+0xac3/0x14d0 fs/gfs2/glock.c:1410
 gfs2_glock_nq_init fs/gfs2/glock.h:238 [inline]
 gfs2_lookupi+0x36f/0x4f0 fs/gfs2/inode.c:317
 gfs2_lookup_simple+0xa4/0x100 fs/gfs2/inode.c:268
 init_journal+0x132/0x1970 fs/gfs2/ops_fstype.c:620
 init_inodes fs/gfs2/ops_fstype.c:756 [inline]
 gfs2_fill_super+0x2717/0x3fe0 fs/gfs2/ops_fstype.c:1125
 get_tree_bdev+0x3e9/0x5f0 fs/super.c:1342
 gfs2_get_tree+0x4c/0x1f0 fs/gfs2/ops_fstype.c:1201
 vfs_get_tree+0x88/0x270 fs/super.c:1547
 do_new_mount fs/namespace.c:2875 [inline]
 path_mount+0x179d/0x29e0 fs/namespace.c:3192
 do_mount fs/namespace.c:3205 [inline]
 __do_sys_mount fs/namespace.c:3413 [inline]
 __se_sys_mount+0x126/0x180 fs/namespace.c:3390
 do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x458e1a
Code: b8 08 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 fd ad fb ff c3 66 2e 0f 1f 
84 00 00 00 00 00 66 90 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 
da ad fb ff c3 66 0f 1f 84 00 00 00 00 00
RSP: 002b:7ffc76f65c88 EFLAGS: 0293 ORIG_RAX: 00a5
RAX: ffda RBX: 7ffc76f65ce0 RCX: 00458e1a
RDX: 2000 RSI: 2100 RDI: 7ffc76f65ca0
RBP: 7ffc76f65ca0 R08: 7ffc76f65ce0 R09: 7ffc0015
R10:  R11: 0293 R12: 0809
R13: 0004 R14: 0003 R15: 0003
Modules linked in:
---[ end trace 1e62174917573e95 ]---
RIP: 0010:signal_our_withdraw fs/gfs2/util.c:97 [inline]
RIP: 0010:gfs2_withdraw+0x2b0/0xe20 fs/gfs2/util.c:294
Code: e8 03 48 89 44 24 38 42 80 3c 38 00 74 08 48 89 ef e8 34 f7 69 fe 48 89 
6c 24 20 48 8b 6d 00 48 83 c5 70 48 89 e8 48 c1 e8 03 <42> 80 3c 38 00 74 08 48 
89 ef e8 11 f7 69 fe 48 8b 45 00 48 89 44
RSP: 0018:c900057474f0 EFLAGS: 00010202
RAX: 000e RBX: 8880a71e RCX: 98268db4dfe86a00
RDX: 888092bb6100 RSI:  RDI: 8880a71e0430
RBP: 0070 R08: 834ad50c R09: ed1015d041c3
R10: ed1015d041c3 R11:  R12: 111014e3c04d
R13: 8880a71e0050 R14: 8880a71e026c R15: dc00
F

Re: [Cluster-devel] general protection fault in gfs2_withdraw

2020-09-28 Thread Andrew Price

On 26/09/2020 18:21, syzbot wrote:

syzbot has found a reproducer for the following issue on:

HEAD commit:7c7ec322 Merge tag 'for-linus' of git://git.kernel.org/pub..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=11f2ff2790
kernel config:  https://syzkaller.appspot.com/x/.config?x=6184b75aa6d48d66
dashboard link: https://syzkaller.appspot.com/bug?extid=50a8a9cf8127f2c6f5df
compiler:   clang version 10.0.0 (https://github.com/llvm/llvm-project/ 
c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=160fb77390
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1104f10990

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+50a8a9cf8127f2c6f...@syzkaller.appspotmail.com

gfs2: fsid=syz:syz.0: fatal: invalid metadata block
   bh = 2072 (magic number)
   function = gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 417
gfs2: fsid=syz:syz.0: about to withdraw this file system
general protection fault, probably for non-canonical address 
0xdc0e:  [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0070-0x0077]
CPU: 0 PID: 6842 Comm: syz-executor264 Not tainted 5.9.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
RIP: 0010:signal_our_withdraw fs/gfs2/util.c:97 [inline]


Seems that it's withdrawing in the init_inodes() path early enough 
(while looking up the jindex) that sdp->sd_jdesc is still NULL here:


  static void signal_our_withdraw(struct gfs2_sbd *sdp)
  {
  struct gfs2_glock *gl = sdp->sd_live_gh.gh_gl;
  struct inode *inode = sdp->sd_jdesc->jd_inode;

I'm undecided as to whether the bug is that we're withdrawing that early 
at all, or that we're not checking for NULL there?


Probably introduced by:

601ef0d52e96 gfs2: Force withdraw to replay journals and wait for it to 
finish


Andy


RIP: 0010:gfs2_withdraw+0x2b0/0xe20 fs/gfs2/util.c:294
Code: e8 03 48 89 44 24 38 42 80 3c 38 00 74 08 48 89 ef e8 34 f7 69 fe 48 89 6c 24 
20 48 8b 6d 00 48 83 c5 70 48 89 e8 48 c1 e8 03 <42> 80 3c 38 00 74 08 48 89 ef 
e8 11 f7 69 fe 48 8b 45 00 48 89 44
RSP: 0018:c900057474f0 EFLAGS: 00010202
RAX: 000e RBX: 8880a71e RCX: 98268db4dfe86a00
RDX: 888092bb6100 RSI:  RDI: 8880a71e0430
RBP: 0070 R08: 834ad50c R09: ed1015d041c3
R10: ed1015d041c3 R11:  R12: 111014e3c04d
R13: 8880a71e0050 R14: 8880a71e026c R15: dc00
FS:  0233b880() GS:8880ae80() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f74f826d6c0 CR3: a04cc000 CR4: 001506f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
  gfs2_meta_check_ii+0x70/0x80 fs/gfs2/util.c:450
  gfs2_metatype_check_i fs/gfs2/util.h:126 [inline]
  gfs2_meta_indirect_buffer+0x29f/0x380 fs/gfs2/meta_io.c:417
  gfs2_meta_inode_buffer fs/gfs2/meta_io.h:70 [inline]
  gfs2_inode_refresh+0x65/0xc00 fs/gfs2/glops.c:438
  inode_go_lock+0x12c/0x480 fs/gfs2/glops.c:468
  do_promote+0x4db/0xcd0 fs/gfs2/glock.c:390
  finish_xmote+0x907/0x1350 fs/gfs2/glock.c:560
  do_xmote+0xadb/0x14c0 fs/gfs2/glock.c:686
  gfs2_glock_nq+0xac3/0x14d0 fs/gfs2/glock.c:1410
  gfs2_glock_nq_init fs/gfs2/glock.h:238 [inline]
  gfs2_lookupi+0x36f/0x4f0 fs/gfs2/inode.c:317
  gfs2_lookup_simple+0xa4/0x100 fs/gfs2/inode.c:268
  init_journal+0x132/0x1970 fs/gfs2/ops_fstype.c:620
  init_inodes fs/gfs2/ops_fstype.c:756 [inline]
  gfs2_fill_super+0x2717/0x3fe0 fs/gfs2/ops_fstype.c:1125
  get_tree_bdev+0x3e9/0x5f0 fs/super.c:1342
  gfs2_get_tree+0x4c/0x1f0 fs/gfs2/ops_fstype.c:1201
  vfs_get_tree+0x88/0x270 fs/super.c:1547
  do_new_mount fs/namespace.c:2875 [inline]
  path_mount+0x179d/0x29e0 fs/namespace.c:3192
  do_mount fs/namespace.c:3205 [inline]
  __do_sys_mount fs/namespace.c:3413 [inline]
  __se_sys_mount+0x126/0x180 fs/namespace.c:3390
  do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x458e1a
Code: b8 08 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 fd ad fb ff c3 66 2e 0f 1f 84 00 
00 00 00 00 66 90 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 da ad 
fb ff c3 66 0f 1f 84 00 00 00 00 00
RSP: 002b:7ffc76f65c88 EFLAGS: 0293 ORIG_RAX: 00a5
RAX: ffda RBX: 7ffc76f65ce0 RCX: 00458e1a
RDX: 2000 RSI: 2100 RDI: 7ffc76f65ca0
RBP: 7ffc76f65ca0 R08: 7ffc76f65ce0 R09: 7ffc0015
R10:  R11: 0293 R12: 0809
R13: 0004 R14: 0003 R15: 0003
Modules linked in:
---[ end trace 1e62174917573e95 ]---
RIP: 0010:signal_our_withdraw fs/gfs2/util.c:97 [inline]
RIP: 0010:

Re: [Cluster-devel] general protection fault in gfs2_withdraw

2020-09-28 Thread Bob Peterson
- Original Message -
> On 26/09/2020 18:21, syzbot wrote:
> > syzbot has found a reproducer for the following issue on:
> > 
> > HEAD commit:7c7ec322 Merge tag 'for-linus' of
> > git://git.kernel.org/pub..
> > git tree:   upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=11f2ff2790
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=6184b75aa6d48d66
> > dashboard link:
> > https://syzkaller.appspot.com/bug?extid=50a8a9cf8127f2c6f5df
> > compiler:   clang version 10.0.0 (https://github.com/llvm/llvm-project/
> > c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
> > syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=160fb77390
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1104f10990
> > 
> > IMPORTANT: if you fix the issue, please add the following tag to the
> > commit:
> > Reported-by: syzbot+50a8a9cf8127f2c6f...@syzkaller.appspotmail.com
> > 
> > gfs2: fsid=syz:syz.0: fatal: invalid metadata block
> >bh = 2072 (magic number)
> >function = gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line =
> >417
> > gfs2: fsid=syz:syz.0: about to withdraw this file system
> > general protection fault, probably for non-canonical address
> > 0xdc0e:  [#1] PREEMPT SMP KASAN
> > KASAN: null-ptr-deref in range [0x0070-0x0077]
> > CPU: 0 PID: 6842 Comm: syz-executor264 Not tainted 5.9.0-rc6-syzkaller #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > Google 01/01/2011
> > RIP: 0010:signal_our_withdraw fs/gfs2/util.c:97 [inline]
> 
> Seems that it's withdrawing in the init_inodes() path early enough
> (while looking up the jindex) that sdp->sd_jdesc is still NULL here:
> 
>static void signal_our_withdraw(struct gfs2_sbd *sdp)
>{
>struct gfs2_glock *gl = sdp->sd_live_gh.gh_gl;
>struct inode *inode = sdp->sd_jdesc->jd_inode;
> 
> I'm undecided as to whether the bug is that we're withdrawing that early
> at all, or that we're not checking for NULL there?
> 
> Probably introduced by:
> 
> 601ef0d52e96 gfs2: Force withdraw to replay journals and wait for it to
> finish
> 
> Andy

Hi Andy. Thanks for your analysis.

I suspect you're right.
It's probably another exception to the rule. We knew there would be a few of
those with 601ef0d52e96, such as the one we made for "withdrawing during 
withdraw".
We should probably just add a check for NULL and make it do the right thing.

Regards,

Bob Peterson