Re: INFO: task hung in do_read_cache_page (3)

2020-05-12 Thread Dmitry Vyukov
On Mon, May 11, 2020 at 7:31 PM Mike Christie  wrote:
>
> On 5/11/20 8:19 AM, syzbot wrote:
> > syzbot has bisected this bug to:
> >
> > commit 2da22da573481cc4837e246d0eee4d518b3f715e
> > Author: Mike Christie 
> > Date:   Tue Aug 13 16:39:52 2019 +
> >
> > nbd: fix zero cmd timeout handling v2
> >
> > bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=11d6ab1410
> > start commit:   e99332e7 gcc-10: mark more functions __init to avoid secti..
> > git tree:   upstream
> > final crash:https://syzkaller.appspot.com/x/report.txt?x=13d6ab1410
> > console output: https://syzkaller.appspot.com/x/log.txt?x=15d6ab1410
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=8a96cf498e199d8b
> > dashboard link: https://syzkaller.appspot.com/bug?extid=518c54e255b5031adde4
> > syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=146e45ec10
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=16a410
> >
> > Reported-by: syzbot+518c54e255b5031ad...@syzkaller.appspotmail.com
> > Fixes: 2da22da57348 ("nbd: fix zero cmd timeout handling v2")
> >
> > For information about bisection process see: https://goo.gl/tpsmEJ#bisection
> >
>
> How do you adjust/modify what is expected from the test or what is
> reported as an error?
>
> The patch added back behavior that got removed. With the patch we expect
> the hung task warnings, because we specifically want to hold onto
> running commands for longer than 120/hung_task_timeout seconds

Hi Mike,

All task hung messages are considered a kernel bug. So far this was
always true as far as I can tell. Tasks that are blocked for any
prolonged period should be killable.


Re: INFO: task hung in do_read_cache_page (3)

2020-05-11 Thread Mike Christie
On 5/11/20 8:19 AM, syzbot wrote:
> syzbot has bisected this bug to:
> 
> commit 2da22da573481cc4837e246d0eee4d518b3f715e
> Author: Mike Christie 
> Date:   Tue Aug 13 16:39:52 2019 +
> 
> nbd: fix zero cmd timeout handling v2
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=11d6ab1410
> start commit:   e99332e7 gcc-10: mark more functions __init to avoid secti..
> git tree:   upstream
> final crash:https://syzkaller.appspot.com/x/report.txt?x=13d6ab1410
> console output: https://syzkaller.appspot.com/x/log.txt?x=15d6ab1410
> kernel config:  https://syzkaller.appspot.com/x/.config?x=8a96cf498e199d8b
> dashboard link: https://syzkaller.appspot.com/bug?extid=518c54e255b5031adde4
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=146e45ec10
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=16a410
> 
> Reported-by: syzbot+518c54e255b5031ad...@syzkaller.appspotmail.com
> Fixes: 2da22da57348 ("nbd: fix zero cmd timeout handling v2")
> 
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection
> 

How do you adjust/modify what is expected from the test or what is
reported as an error?

The patch added back behavior that got removed. With the patch we expect
the hung task warnings, because we specifically want to hold onto
running commands for longer than 120/hung_task_timeout seconds



Re: INFO: task hung in do_read_cache_page (3)

2020-05-11 Thread syzbot
syzbot has bisected this bug to:

commit 2da22da573481cc4837e246d0eee4d518b3f715e
Author: Mike Christie 
Date:   Tue Aug 13 16:39:52 2019 +

nbd: fix zero cmd timeout handling v2

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=11d6ab1410
start commit:   e99332e7 gcc-10: mark more functions __init to avoid secti..
git tree:   upstream
final crash:https://syzkaller.appspot.com/x/report.txt?x=13d6ab1410
console output: https://syzkaller.appspot.com/x/log.txt?x=15d6ab1410
kernel config:  https://syzkaller.appspot.com/x/.config?x=8a96cf498e199d8b
dashboard link: https://syzkaller.appspot.com/bug?extid=518c54e255b5031adde4
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=146e45ec10
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=16a410

Reported-by: syzbot+518c54e255b5031ad...@syzkaller.appspotmail.com
Fixes: 2da22da57348 ("nbd: fix zero cmd timeout handling v2")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection


Re: INFO: task hung in do_read_cache_page (3)

2020-05-10 Thread syzbot
syzbot has found a reproducer for the following crash on:

HEAD commit:e99332e7 gcc-10: mark more functions __init to avoid secti..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1344fb1410
kernel config:  https://syzkaller.appspot.com/x/.config?x=8a96cf498e199d8b
dashboard link: https://syzkaller.appspot.com/bug?extid=518c54e255b5031adde4
compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=146e45ec10
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=16a410

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+518c54e255b5031ad...@syzkaller.appspotmail.com

INFO: task syz-executor928:7064 blocked for more than 143 seconds.
  Not tainted 5.7.0-rc4-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor928 D26720  7064   7053 0x4004
Call Trace:
 schedule+0xd0/0x2a0 kernel/sched/core.c:4158
 io_schedule+0x17/0x60 kernel/sched/core.c:5801
 wait_on_page_bit_common mm/filemap.c:1153 [inline]
 wait_on_page_bit mm/filemap.c:1202 [inline]
 wait_on_page_locked include/linux/pagemap.h:528 [inline]
 do_read_cache_page+0x648/0x1810 mm/filemap.c:2814
 read_mapping_page include/linux/pagemap.h:397 [inline]
 read_part_sector+0xf6/0x600 block/partitions/core.c:643
 adfspart_check_ICS+0x9d/0xc80 block/partitions/acorn.c:360
 check_partition block/partitions/core.c:140 [inline]
 blk_add_partitions+0x474/0xe50 block/partitions/core.c:571
 bdev_disk_changed+0x1fb/0x380 fs/block_dev.c:1543
 __blkdev_get+0x130c/0x1530 fs/block_dev.c:1681
 blkdev_get+0x41/0x2b0 fs/block_dev.c:1748
 blkdev_open+0x21d/0x2b0 fs/block_dev.c:1887
 do_dentry_open+0x4ba/0x1290 fs/open.c:797
 do_open fs/namei.c:3229 [inline]
 path_openat+0x1e59/0x27d0 fs/namei.c:3346
 do_filp_open+0x192/0x260 fs/namei.c:3373
 do_sys_openat2+0x585/0x7d0 fs/open.c:1148
 do_sys_open+0xc3/0x140 fs/open.c:1164
 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
 entry_SYSCALL_64_after_hwframe+0x49/0xb3
RIP: 0033:0x405a71
Code: Bad RIP value.
RSP: 002b:7f26eda2b830 EFLAGS: 0293 ORIG_RAX: 0002
RAX: ffda RBX: 6667 RCX: 00405a71
RDX:  RSI:  RDI: 7f26eda2b8d0
RBP: 006dbc40 R08: 000f R09: 7f26eda2c700
R10: 7f26eda2c9d0 R11: 0293 R12: 006dbc4c
R13: 7ffcaf21940f R14: 7f26eda2c9c0 R15: 20c49ba5e353f7cf

Showing all locks held in the system:
1 lock held by khungtaskd/1139:
 #0: 899bebc0 (rcu_read_lock){}-{1:2}, at: 
debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:5754
2 locks held by in:imklog/6726:
 #0: 88809eddc3f0 (>f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 
fs/file.c:826
 #1: 8880aa001818 (>list_lock){-.-.}-{2:2}, at: syslog_print 
kernel/printk/printk.c:1392 [inline]
 #1: 8880aa001818 (>list_lock){-.-.}-{2:2}, at: do_syslog 
kernel/printk/printk.c:1557 [inline]
 #1: 8880aa001818 (>list_lock){-.-.}-{2:2}, at: 
do_syslog+0x632/0x16d0 kernel/printk/printk.c:1531
1 lock held by syz-executor928/7064:
 #0: 888088cbb380 (>bd_mutex){+.+.}-{3:3}, at: 
__blkdev_get+0x179/0x1530 fs/block_dev.c:1600

=

NMI backtrace for cpu 1
CPU: 1 PID: 1139 Comm: khungtaskd Not tainted 5.7.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x188/0x20d lib/dump_stack.c:118
 nmi_cpu_backtrace.cold+0x70/0xb1 lib/nmi_backtrace.c:101
 nmi_trigger_cpumask_backtrace+0x231/0x27e lib/nmi_backtrace.c:62
 trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
 check_hung_uninterruptible_tasks kernel/hung_task.c:205 [inline]
 watchdog+0xa8c/0x1010 kernel/hung_task.c:289
 kthread+0x388/0x470 kernel/kthread.c:268
 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xe/0x10 
arch/x86/include/asm/irqflags.h:60