Hey Paul, Lai, Josh, and the RCU list and Jan/FS list -
Reaching out about a tricky hung task issue that I'm running into. I've
got a virtualized Linux guest on top of a KVM based platform, running
a 5.10.y based kernel. The issue we're running into is a hung task that
*only* happens on shutdown/reboot of this particular VM once every 
20-50 times.

The signature of the hung task is always similar to the output below,
where we appear to hang on the call to 
    synchronize_srcu(&fsnotify_mark_srcu)
in fsnotify_connector_destroy_workfn / fsnotify_mark_destroy_workfn,
where two kernel threads are both calling synchronize_srcu, then
scheduling out in wait_for_completion, and completely going out to
lunch for over 4 minutes. This then triggers the hung task timeout and
things blow up.

We are running audit=1 for this system and are using an el8 based
userspace.

I've flipped through the fs/notify code base for both 5.10 as well as
upstream mainline to see if something jumped off the page, and I
haven't yet spotted any particular suspect code from the caller side.

This hang appears to come up at the very end of the shutdown/reboot
process, seemingly after the system starts to unwind through initrd.

What I'm working on now is adding some instrumentation to the dracut
shutdown initrd scripts to see if I can how far we get down that path
before the system fails to make forward progress, which may give some
hints. TBD on that. I've also enabled lockdep with CONFIG_PROVE_RCU and
a plethora of DEBUG options [2], and didn't get anything interesting.
To be clear, we haven't seen lockdep spit out any complaints as of yet.

Reaching out to see if this sounds familar to anyone on the list, or if
there are any particular areas of the RCU code base that might be
suspect for this kind of issue. I'm happy to provide more information,
as frankly, I'm quite stumped at the moment.

Thanks all,
Jon

[1] panic trace
    Normal shutdown process, then hangs on the following:
    ...
    dracut Warning: Killing all remaining processes
    ...
    INFO: task kworker/u20:7:1200701 blocked for more than 241 seconds.
          Tainted: G           O      5.10.205-2.el8.x86_64 #1
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    task:kworker/u20:7   state:D stack:    0 pid:1200701 ppid:     2 
flags:0x00004080
    Workqueue: events_unbound fsnotify_connector_destroy_workfn
    Call Trace:
     __schedule+0x267/0x790
     schedule+0x3c/0xb0
     schedule_timeout+0x219/0x2b0
     wait_for_completion+0x9e/0x100
     __synchronize_srcu.part.24+0x83/0xb0
     ? __bpf_trace_rcu_utilization+0x10/0x10
     ? synchronize_srcu+0x5d/0xf0
     fsnotify_connector_destroy_workfn+0x46/0x80
     process_one_work+0x1fc/0x390
     worker_thread+0x2d/0x3e0
     ? process_one_work+0x390/0x390
     kthread+0x114/0x130
     ? kthread_park+0x80/0x80
     ret_from_fork+0x1f/0x30
    INFO: task kworker/u20:8:1287360 blocked for more than 241 seconds.
          Tainted: G           O      5.10.205-2.el8.x86_64 #1
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    task:kworker/u20:8   state:D stack:    0 pid:1287360 ppid:     2 
flags:0x00004080
    Workqueue: events_unbound fsnotify_mark_destroy_workfn
    Call Trace:
     __schedule+0x267/0x790
     schedule+0x3c/0xb0
     schedule_timeout+0x219/0x2b0
     ? add_timer+0x14a/0x200
     wait_for_completion+0x9e/0x100
     __synchronize_srcu.part.24+0x83/0xb0
     ? __bpf_trace_rcu_utilization+0x10/0x10
     fsnotify_mark_destroy_workfn+0x77/0xe0
     process_one_work+0x1fc/0x390
     ? process_one_work+0x390/0x390
     worker_thread+0x2d/0x3e0
     ? process_one_work+0x390/0x390
     kthread+0x114/0x130
     ? kthread_park+0x80/0x80
     ret_from_fork+0x1f/0x30
    Kernel panic - not syncing: hung_task: blocked tasks
    CPU: 1 PID: 64 Comm: khungtaskd Kdump: loaded Tainted: G           O      
5.10.205-2.el8.x86_64 #1
    Hardware name: Red Hat KVM, BIOS 20230302.1.2662.el8 04/01/2014
    Call Trace:
     dump_stack+0x6d/0x8c
     panic+0x114/0x2ea
     watchdog.cold.8+0xb5/0xb5
     ? hungtask_pm_notify+0x50/0x50
     kthread+0x114/0x130
     ? kthread_park+0x80/0x80
     ret_from_fork+0x1f/0x30

[2] additional debugging config knobs turned up.
    CONFIG_PROVE_LOCKING=y
    CONFIG_LOCK_STAT=y
    CONFIG_DEBUG_RT_MUTEXES=y
    CONFIG_DEBUG_SPINLOCK=y
    CONFIG_DEBUG_MUTEXES=y
    CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
    CONFIG_DEBUG_RWSEMS=y
    CONFIG_DEBUG_LOCK_ALLOC=y
    CONFIG_LOCKDEP=y
    CONFIG_LOCKDEP_BITS=15
    CONFIG_LOCKDEP_CHAINS_BITS=16
    CONFIG_LOCKDEP_STACK_TRACE_BITS=19
    CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS=14
    CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS=12
    CONFIG_DEBUG_SHIRQ=y
    CONFIG_WQ_WATCHDOG=y
    CONFIG_DEBUG_ATOMIC_SLEEP=y
    CONFIG_DEBUG_LIST=y
    CONFIG_DEBUG_PLIST=y
    CONFIG_DEBUG_SG=y
    CONFIG_DEBUG_NOTIFIERS=y
    CONFIG_BUG_ON_DATA_CORRUPTION=y

Reply via email to