Re: [LKP] [tty] 0b4f83d510: INFO:task_blocked_for_more_than#seconds
Hi Sergey, Jiri, On Mon, 2018-09-10 at 14:14 +0900, Sergey Senozhatsky wrote: > On (09/07/18 08:39), Jiri Slaby wrote: > > > [ 244.944070] > > > [ 244.944070] Showing all locks held in the system: > > > [ 244.945558] 1 lock held by khungtaskd/18: > > > [ 244.946495] #0: (ptrval) (rcu_read_lock){}, at: > > > debug_show_all_locks+0x16/0x190 > > > [ 244.948503] 2 locks held by askfirst/235: > > > [ 244.949439] #0: (ptrval) (&tty->ldisc_sem){}, at: > > > tty_ldisc_ref_wait+0x25/0x50 > > > [ 244.951762] #1: (ptrval) (&ldata- > > > >atomic_read_lock){+.+.}, at: n_tty_read+0x13d/0xa00 > > > > Here, it just seems to wait for input from the user. > > > > > [ 244.953799] 1 lock held by validate_data/655: > > > [ 244.954814] #0: (ptrval) (&tty->ldisc_sem){}, at: > > > tty_ldisc_ref_wait+0x25/0x50 > > > [ 244.956764] 1 lock held by dnsmasq/668: > > > [ 244.957649] #0: (ptrval) (&tty->ldisc_sem){}, at: > > > tty_ldisc_ref_wait+0x25/0x50 > > > [ 244.959598] 1 lock held by dropbear/734: > > > [ 244.967564] #0: (ptrval) (&tty->ldisc_sem){}, at: > > > tty_ldisc_ref_wait+0x25/0x50 > > > > Hmm, I assume there is another task waiting for write_ldsem and > > that one > > prevents these readers to proceed. Unfortunately, due to the > > defunct > > __ptrval__ pointer hashing crap, we do not see who is waiting for > > what. > > But I am guessing all are the same locks. > > Hmm, interesting. Am I getting it right that the test did pass > before. > And now we see that sort of/smells like live-lock right after the > introduction of tty_ldisc_lock() to tty_reopen(). > > > So I think, we are forced to limit the waiting to 5 seconds in > > reopen in > > the end too (the same as we do for new open in tty_init_dev). > > If I got it right, LKP did test the 5*HZ patch > > retval = tty_ldisc_lock(tty, 5 * HZ); > > At least it's > In-Reply-To: <20180829022353.23568-3-d...@arista.com> > > and > Message-Id: <20180829022353.23568-3-d...@arista.com> > > is the patch which does the 5*HZ lock timeout thing. Yeah, I also noticed on the weekend that the commit in the mentioned branch is from v1.. Currently, I tried to reproduce it like ~15-20 times, but unlucky :( It looks like, the lockup wasn't introduced by this commit, but unfortunately the commit made it more likely. At least, that's what I suppose after I've found this report: https://lkml.org/lkml/2017/11/21/216 It seems to me that the lockup is triggered by: [ 244.948503] 2 locks held by askfirst/235: [ 244.949439] #0: (ptrval) (&tty->ldisc_sem){}, at: tty_ldisc_ref_wait+0x25/0x50 [ 244.951762] #1: (ptrval) (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x13d/0xa00 Looking into this.. -- Thanks, Dmitry
Re: [LKP] [tty] 0b4f83d510: INFO:task_blocked_for_more_than#seconds
On (09/07/18 08:39), Jiri Slaby wrote: > > [ 244.944070] > > [ 244.944070] Showing all locks held in the system: > > [ 244.945558] 1 lock held by khungtaskd/18: > > [ 244.946495] #0: (ptrval) (rcu_read_lock){}, at: > > debug_show_all_locks+0x16/0x190 > > [ 244.948503] 2 locks held by askfirst/235: > > [ 244.949439] #0: (ptrval) (&tty->ldisc_sem){}, at: > > tty_ldisc_ref_wait+0x25/0x50 > > [ 244.951762] #1: (ptrval) (&ldata->atomic_read_lock){+.+.}, at: > > n_tty_read+0x13d/0xa00 > > Here, it just seems to wait for input from the user. > > > [ 244.953799] 1 lock held by validate_data/655: > > [ 244.954814] #0: (ptrval) (&tty->ldisc_sem){}, at: > > tty_ldisc_ref_wait+0x25/0x50 > > [ 244.956764] 1 lock held by dnsmasq/668: > > [ 244.957649] #0: (ptrval) (&tty->ldisc_sem){}, at: > > tty_ldisc_ref_wait+0x25/0x50 > > [ 244.959598] 1 lock held by dropbear/734: > > [ 244.967564] #0: (ptrval) (&tty->ldisc_sem){}, at: > > tty_ldisc_ref_wait+0x25/0x50 > > Hmm, I assume there is another task waiting for write_ldsem and that one > prevents these readers to proceed. Unfortunately, due to the defunct > __ptrval__ pointer hashing crap, we do not see who is waiting for what. > But I am guessing all are the same locks. Hmm, interesting. Am I getting it right that the test did pass before. And now we see that sort of/smells like live-lock right after the introduction of tty_ldisc_lock() to tty_reopen(). > So I think, we are forced to limit the waiting to 5 seconds in reopen in > the end too (the same as we do for new open in tty_init_dev). If I got it right, LKP did test the 5*HZ patch retval = tty_ldisc_lock(tty, 5 * HZ); At least it's In-Reply-To: <20180829022353.23568-3-d...@arista.com> and Message-Id: <20180829022353.23568-3-d...@arista.com> is the patch which does the 5*HZ lock timeout thing. -ss
Re: [LKP] [tty] 0b4f83d510: INFO:task_blocked_for_more_than#seconds
On Fri, 2018-09-07 at 08:39 +0200, Jiri Slaby wrote: > On 09/07/2018, 06:50 AM, kernel test robot wrote: > > FYI, we noticed the following commit (built with gcc-7): > > > > commit: 0b4f83d510f6fef6bb9da25f122c8d733d50516f ("[PATCH 2/4] tty: > > Hold tty_ldisc_lock() during tty_reopen()") > > url: https://github.com/0day-ci/linux/commits/Dmitry-Safonov/tty-Ho > > ld-write-ldisc-sem-in-tty_reopen/20180829-165618 > > base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/tty.git > > tty-testing > > > > in testcase: trinity > > with following parameters: > > > > runtime: 300s > > > > test-description: Trinity is a linux system call fuzz tester. > > test-url: http://codemonkey.org.uk/projects/trinity/ > > > > > > on test machine: qemu-system-x86_64 -enable-kvm -m 256M > > > > caused below changes (please refer to attached dmesg/kmsg for > > entire log/backtrace): > > > > > > +--++ > > + > > > | 58dd163974 | > > > 0b4f83d510 | > > > > +--++ > > + > > > boot_successes | 14 | > > > 4 | > > > boot_failures| 0 | > > > 6 | > > > INFO:task_blocked_for_more_than#seconds | 0 | > > > 6 | > > > Kernel_panic-not_syncing:hung_task:blocked_tasks | 0 | > > > 6 | > > > > +--++ > > + > > > > > > > > [ 244.816801] INFO: task validate_data:655 blocked for more than > > 120 seconds. > > [ 244.818833] Not tainted 4.18.0-11684-g0b4f83d #1 > > [ 244.820028] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > disables this message. > > [ 244.826965] validate_data D0 655623 0x2002 > > [ 244.828279] Call Trace: > > [ 244.828958] ? __schedule+0x843/0x950 > > [ 244.830173] ? __ldsem_down_read_nested+0x1c4/0x3b0 > > [ 244.834903] schedule+0x31/0x70 > > [ 244.835665] schedule_timeout+0x34/0x760 > > [ 244.836613] ? ftrace_likely_update+0x35/0x60 > > [ 244.837683] ? __ldsem_down_read_nested+0x1c4/0x3b0 > > [ 244.838818] ? ftrace_likely_update+0x35/0x60 > > [ 244.840127] ? ftrace_likely_update+0x35/0x60 > > [ 244.845947] ? __ldsem_down_read_nested+0x1c4/0x3b0 > > [ 244.847882] __ldsem_down_read_nested+0x23a/0x3b0 > > [ 244.849886] ? tty_ldisc_ref_wait+0x25/0x50 > > [ 244.853807] tty_ldisc_ref_wait+0x25/0x50 > > [ 244.854946] tty_compat_ioctl+0x8a/0x120 > > [ 244.855928] ? this_tty+0x80/0x80 > > [ 244.856742] __ia32_compat_sys_ioctl+0xc28/0x1ce0 > > [ 244.857981] do_int80_syscall_32+0x1d2/0x5f0 > > [ 244.859003] entry_INT80_compat+0x88/0xa0 > > [ 244.859972] INFO: task dnsmasq:668 blocked for more than 120 > > seconds. > > [ 244.868315] Not tainted 4.18.0-11684-g0b4f83d #1 > > [ 244.869583] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > disables this message. > > [ 244.871744] dnsmasq D0 668 1 0x2002 > > [ 244.873063] Call Trace: > > [ 244.873697] ? __schedule+0x843/0x950 > > [ 244.874572] ? __ldsem_down_read_nested+0x1c4/0x3b0 > > [ 244.875725] schedule+0x31/0x70 > > [ 244.876576] schedule_timeout+0x34/0x760 > > [ 244.877573] ? ftrace_likely_update+0x35/0x60 > > [ 244.878660] ? __ldsem_down_read_nested+0x1c4/0x3b0 > > [ 244.879872] ? ftrace_likely_update+0x35/0x60 > > [ 244.890522] ? ftrace_likely_update+0x35/0x60 > > [ 244.891572] ? __ldsem_down_read_nested+0x1c4/0x3b0 > > [ 244.892746] __ldsem_down_read_nested+0x23a/0x3b0 > > [ 244.893861] ? tty_ldisc_ref_wait+0x25/0x50 > > [ 244.894841] tty_ldisc_ref_wait+0x25/0x50 > > [ 244.895911] tty_compat_ioctl+0x8a/0x120 > > [ 244.896916] ? this_tty+0x80/0x80 > > [ 244.897717] __ia32_compat_sys_ioctl+0xc28/0x1ce0 > > [ 244.898821] do_int80_syscall_32+0x1d2/0x5f0 > > [ 244.899830] entry_INT80_compat+0x88/0xa0 > > [ 244.909466] INFO: task dropbear:734 blocked for more than 120 > > seconds. > > [ 244.911173] Not tainted 4.18.0-11684-g0b4f83d #1 > > [ 244.912394] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > disables this message. > > [ 244.914176] dropbearD0 734 1 0x2002 > > [ 244.915446] Call Trace: > > [ 244.916068] ? __schedule+0x843/0x950 > > [ 244.916945] ? __ldsem_down_read_nested+0x1c4/0x3b0 > > [ 244.918076] schedule+0x31/0x70 > > [ 244.918832] schedule_timeout+0x34/0x760 > > [ 244.919781] ? ftrace_likely_update+0x35/0x60 > > [ 244.921104] ? __ldsem_down_read_nested+0x1c4/0x3b0 > > [ 244.922304] ? ftrace_likely_update+0x35/0x60 > > [ 244.923347] ? ftrace_likely_update+0x35/0x60 > > [ 244.924369] ? __ldsem_down_read_nested+0x1c4/0x3b0 > > [ 244.925496] __ldsem_down_read_nested+0x23a/0x3b0 > > [ 244.926598] ? tty_ldisc_ref_wait+0x25/0x50 > > [ 244.927578] tty_ldisc_r
Re: [LKP] [tty] 0b4f83d510: INFO:task_blocked_for_more_than#seconds
On 09/07/2018, 06:50 AM, kernel test robot wrote: > FYI, we noticed the following commit (built with gcc-7): > > commit: 0b4f83d510f6fef6bb9da25f122c8d733d50516f ("[PATCH 2/4] tty: Hold > tty_ldisc_lock() during tty_reopen()") > url: > https://github.com/0day-ci/linux/commits/Dmitry-Safonov/tty-Hold-write-ldisc-sem-in-tty_reopen/20180829-165618 > base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/tty.git tty-testing > > in testcase: trinity > with following parameters: > > runtime: 300s > > test-description: Trinity is a linux system call fuzz tester. > test-url: http://codemonkey.org.uk/projects/trinity/ > > > on test machine: qemu-system-x86_64 -enable-kvm -m 256M > > caused below changes (please refer to attached dmesg/kmsg for entire > log/backtrace): > > > +--+++ > | | 58dd163974 | 0b4f83d510 | > +--+++ > | boot_successes | 14 | 4 | > | boot_failures| 0 | 6 | > | INFO:task_blocked_for_more_than#seconds | 0 | 6 | > | Kernel_panic-not_syncing:hung_task:blocked_tasks | 0 | 6 | > +--+++ > > > > [ 244.816801] INFO: task validate_data:655 blocked for more than 120 seconds. > [ 244.818833] Not tainted 4.18.0-11684-g0b4f83d #1 > [ 244.820028] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 244.826965] validate_data D0 655623 0x2002 > [ 244.828279] Call Trace: > [ 244.828958] ? __schedule+0x843/0x950 > [ 244.830173] ? __ldsem_down_read_nested+0x1c4/0x3b0 > [ 244.834903] schedule+0x31/0x70 > [ 244.835665] schedule_timeout+0x34/0x760 > [ 244.836613] ? ftrace_likely_update+0x35/0x60 > [ 244.837683] ? __ldsem_down_read_nested+0x1c4/0x3b0 > [ 244.838818] ? ftrace_likely_update+0x35/0x60 > [ 244.840127] ? ftrace_likely_update+0x35/0x60 > [ 244.845947] ? __ldsem_down_read_nested+0x1c4/0x3b0 > [ 244.847882] __ldsem_down_read_nested+0x23a/0x3b0 > [ 244.849886] ? tty_ldisc_ref_wait+0x25/0x50 > [ 244.853807] tty_ldisc_ref_wait+0x25/0x50 > [ 244.854946] tty_compat_ioctl+0x8a/0x120 > [ 244.855928] ? this_tty+0x80/0x80 > [ 244.856742] __ia32_compat_sys_ioctl+0xc28/0x1ce0 > [ 244.857981] do_int80_syscall_32+0x1d2/0x5f0 > [ 244.859003] entry_INT80_compat+0x88/0xa0 > [ 244.859972] INFO: task dnsmasq:668 blocked for more than 120 seconds. > [ 244.868315] Not tainted 4.18.0-11684-g0b4f83d #1 > [ 244.869583] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 244.871744] dnsmasq D0 668 1 0x2002 > [ 244.873063] Call Trace: > [ 244.873697] ? __schedule+0x843/0x950 > [ 244.874572] ? __ldsem_down_read_nested+0x1c4/0x3b0 > [ 244.875725] schedule+0x31/0x70 > [ 244.876576] schedule_timeout+0x34/0x760 > [ 244.877573] ? ftrace_likely_update+0x35/0x60 > [ 244.878660] ? __ldsem_down_read_nested+0x1c4/0x3b0 > [ 244.879872] ? ftrace_likely_update+0x35/0x60 > [ 244.890522] ? ftrace_likely_update+0x35/0x60 > [ 244.891572] ? __ldsem_down_read_nested+0x1c4/0x3b0 > [ 244.892746] __ldsem_down_read_nested+0x23a/0x3b0 > [ 244.893861] ? tty_ldisc_ref_wait+0x25/0x50 > [ 244.894841] tty_ldisc_ref_wait+0x25/0x50 > [ 244.895911] tty_compat_ioctl+0x8a/0x120 > [ 244.896916] ? this_tty+0x80/0x80 > [ 244.897717] __ia32_compat_sys_ioctl+0xc28/0x1ce0 > [ 244.898821] do_int80_syscall_32+0x1d2/0x5f0 > [ 244.899830] entry_INT80_compat+0x88/0xa0 > [ 244.909466] INFO: task dropbear:734 blocked for more than 120 seconds. > [ 244.911173] Not tainted 4.18.0-11684-g0b4f83d #1 > [ 244.912394] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 244.914176] dropbearD0 734 1 0x2002 > [ 244.915446] Call Trace: > [ 244.916068] ? __schedule+0x843/0x950 > [ 244.916945] ? __ldsem_down_read_nested+0x1c4/0x3b0 > [ 244.918076] schedule+0x31/0x70 > [ 244.918832] schedule_timeout+0x34/0x760 > [ 244.919781] ? ftrace_likely_update+0x35/0x60 > [ 244.921104] ? __ldsem_down_read_nested+0x1c4/0x3b0 > [ 244.922304] ? ftrace_likely_update+0x35/0x60 > [ 244.923347] ? ftrace_likely_update+0x35/0x60 > [ 244.924369] ? __ldsem_down_read_nested+0x1c4/0x3b0 > [ 244.925496] __ldsem_down_read_nested+0x23a/0x3b0 > [ 244.926598] ? tty_ldisc_ref_wait+0x25/0x50 > [ 244.927578] tty_ldisc_ref_wait+0x25/0x50 > [ 244.928526] tty_compat_ioctl+0x8a/0x120 > [ 244.929449] ? this_tty+0x80/0x80 > [ 244.930240] __ia32_compat_sys_ioctl+0xc28/0x1ce0 > [ 244.940083] do_int80_syscall_32+0x1d2/0x5f0 > [ 244.941310] entry_INT80_compat+0x88/0xa0 > [ 244.944070] > [ 244.944070] Showing all locks held in
[LKP] [tty] 0b4f83d510: INFO:task_blocked_for_more_than#seconds
FYI, we noticed the following commit (built with gcc-7): commit: 0b4f83d510f6fef6bb9da25f122c8d733d50516f ("[PATCH 2/4] tty: Hold tty_ldisc_lock() during tty_reopen()") url: https://github.com/0day-ci/linux/commits/Dmitry-Safonov/tty-Hold-write-ldisc-sem-in-tty_reopen/20180829-165618 base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/tty.git tty-testing in testcase: trinity with following parameters: runtime: 300s test-description: Trinity is a linux system call fuzz tester. test-url: http://codemonkey.org.uk/projects/trinity/ on test machine: qemu-system-x86_64 -enable-kvm -m 256M caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): +--+++ | | 58dd163974 | 0b4f83d510 | +--+++ | boot_successes | 14 | 4 | | boot_failures| 0 | 6 | | INFO:task_blocked_for_more_than#seconds | 0 | 6 | | Kernel_panic-not_syncing:hung_task:blocked_tasks | 0 | 6 | +--+++ [ 244.816801] INFO: task validate_data:655 blocked for more than 120 seconds. [ 244.818833] Not tainted 4.18.0-11684-g0b4f83d #1 [ 244.820028] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 244.826965] validate_data D0 655623 0x2002 [ 244.828279] Call Trace: [ 244.828958] ? __schedule+0x843/0x950 [ 244.830173] ? __ldsem_down_read_nested+0x1c4/0x3b0 [ 244.834903] schedule+0x31/0x70 [ 244.835665] schedule_timeout+0x34/0x760 [ 244.836613] ? ftrace_likely_update+0x35/0x60 [ 244.837683] ? __ldsem_down_read_nested+0x1c4/0x3b0 [ 244.838818] ? ftrace_likely_update+0x35/0x60 [ 244.840127] ? ftrace_likely_update+0x35/0x60 [ 244.845947] ? __ldsem_down_read_nested+0x1c4/0x3b0 [ 244.847882] __ldsem_down_read_nested+0x23a/0x3b0 [ 244.849886] ? tty_ldisc_ref_wait+0x25/0x50 [ 244.853807] tty_ldisc_ref_wait+0x25/0x50 [ 244.854946] tty_compat_ioctl+0x8a/0x120 [ 244.855928] ? this_tty+0x80/0x80 [ 244.856742] __ia32_compat_sys_ioctl+0xc28/0x1ce0 [ 244.857981] do_int80_syscall_32+0x1d2/0x5f0 [ 244.859003] entry_INT80_compat+0x88/0xa0 [ 244.859972] INFO: task dnsmasq:668 blocked for more than 120 seconds. [ 244.868315] Not tainted 4.18.0-11684-g0b4f83d #1 [ 244.869583] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 244.871744] dnsmasq D0 668 1 0x2002 [ 244.873063] Call Trace: [ 244.873697] ? __schedule+0x843/0x950 [ 244.874572] ? __ldsem_down_read_nested+0x1c4/0x3b0 [ 244.875725] schedule+0x31/0x70 [ 244.876576] schedule_timeout+0x34/0x760 [ 244.877573] ? ftrace_likely_update+0x35/0x60 [ 244.878660] ? __ldsem_down_read_nested+0x1c4/0x3b0 [ 244.879872] ? ftrace_likely_update+0x35/0x60 [ 244.890522] ? ftrace_likely_update+0x35/0x60 [ 244.891572] ? __ldsem_down_read_nested+0x1c4/0x3b0 [ 244.892746] __ldsem_down_read_nested+0x23a/0x3b0 [ 244.893861] ? tty_ldisc_ref_wait+0x25/0x50 [ 244.894841] tty_ldisc_ref_wait+0x25/0x50 [ 244.895911] tty_compat_ioctl+0x8a/0x120 [ 244.896916] ? this_tty+0x80/0x80 [ 244.897717] __ia32_compat_sys_ioctl+0xc28/0x1ce0 [ 244.898821] do_int80_syscall_32+0x1d2/0x5f0 [ 244.899830] entry_INT80_compat+0x88/0xa0 [ 244.909466] INFO: task dropbear:734 blocked for more than 120 seconds. [ 244.911173] Not tainted 4.18.0-11684-g0b4f83d #1 [ 244.912394] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 244.914176] dropbearD0 734 1 0x2002 [ 244.915446] Call Trace: [ 244.916068] ? __schedule+0x843/0x950 [ 244.916945] ? __ldsem_down_read_nested+0x1c4/0x3b0 [ 244.918076] schedule+0x31/0x70 [ 244.918832] schedule_timeout+0x34/0x760 [ 244.919781] ? ftrace_likely_update+0x35/0x60 [ 244.921104] ? __ldsem_down_read_nested+0x1c4/0x3b0 [ 244.922304] ? ftrace_likely_update+0x35/0x60 [ 244.923347] ? ftrace_likely_update+0x35/0x60 [ 244.924369] ? __ldsem_down_read_nested+0x1c4/0x3b0 [ 244.925496] __ldsem_down_read_nested+0x23a/0x3b0 [ 244.926598] ? tty_ldisc_ref_wait+0x25/0x50 [ 244.927578] tty_ldisc_ref_wait+0x25/0x50 [ 244.928526] tty_compat_ioctl+0x8a/0x120 [ 244.929449] ? this_tty+0x80/0x80 [ 244.930240] __ia32_compat_sys_ioctl+0xc28/0x1ce0 [ 244.940083] do_int80_syscall_32+0x1d2/0x5f0 [ 244.941310] entry_INT80_compat+0x88/0xa0 [ 244.944070] [ 244.944070] Showing all locks held in the system: [ 244.945558] 1 lock held by khungtaskd/18: [ 244.946495] #0: (ptrval) (rcu_read_lock){}, at: debug_show_all_locks+0x16/0x190 [ 244.948503] 2 locks held by askfirst/235: [ 244.949439] #0: (ptrval) (&tty->ldisc_sem){+++