> Could you provide a time-aligned dump that includes:
> - pwq state (active/pending/in-flight)
> - pending and in-flight work items with their queue/start times
> - worker task states
Below are time-aligned extracts from both instances. Full logs are
included further down in this email.
=== Instance 1: kernel 6.18.8, pool 14 (cpus=3) ===
--- t=62s: First workqueue lockup dump (pool stuck 49s, since ~t=13s) ---
kvm-irqfd-cleanup: pwq 14: active=4 refcnt=5
in-flight: 157:irqfd_shutdown ,4044:irqfd_shutdown ,
102:irqfd_shutdown ,39:irqfd_shutdown
rcu_gp: pwq 14: active=2 refcnt=3
pending: 2*process_srcu
events: pwq 14: active=43 refcnt=44
pending: binder_deferred_func, kernfs_notify_workfn,
delayed_vfree_work, 5*destroy_super_work,
3*bpf_prog_free_deferred, 10*destroy_super_work, ...
mm_percpu_wq: pwq 14: active=2 refcnt=4
pending: vmstat_update, lru_add_drain_per_cpu
pm: pwq 14: active=1 refcnt=2
pending: pm_runtime_work
pool 14: cpus=3 flags=0x0 hung=49s workers=11
idle: 4046 4038 4045 4039 4043 156 77 (7 idle)
Active busy worker backtrace (pid 102):
__schedule â schedule â schedule_preempt_disabled â
__mutex_lock â irqfd_resampler_shutdown+0x23 â
irqfd_shutdown â process_scheduled_works â worker_thread
--- t=312s: Last workqueue lockup dump (pool stuck 298s) ---
kvm-irqfd-cleanup: pwq 14: active=4 (same 4 in-flight)
rcu_gp: pwq 14: pending: 2*process_srcu (still pending, 250s later)
events: pwq 14: active=43 (same, no progress)
pool 14: hung=298s workers=11 idle: 4046 4038 4045 4039 4043 156 77
--- t=314s: Hung task dump ---
Worker 4044 (MUTEX HOLDER):
task:kworker/3:8 state:D pid:4044
Workqueue: kvm-irqfd-cleanup irqfd_shutdown
__synchronize_srcu+0x100/0x130
irqfd_resampler_shutdown+0xf0/0x150 â synchronize_srcu call
Worker 157 (MUTEX WAITER):
task:kworker/3:4 state:D pid:157
__mutex_lock+0x409/0xd90
irqfd_resampler_shutdown+0x23/0x150 â mutex_lock call
(Workers 39 and 102 show identical mutex_lock stacks)
=== Instance 2: kernel 6.18.2, pool 22 (cpus=5) ===
--- t=93s: First workqueue lockup dump (pool stuck 79s, since ~t=14s) ---
kvm-irqfd-cleanup: pwq 22: active=4 refcnt=5
in-flight: 151:irqfd_shutdown ,4246:irqfd_shutdown ,
4241:irqfd_shutdown ,4243:irqfd_shutdown
rcu_gp: pwq 22: active=1 refcnt=2
pending: process_srcu
events: pwq 22: active=56 refcnt=57
pending: kernfs_notify_workfn, delayed_vfree_work,
binder_deferred_func, 47*destroy_super_work, ...
pool 22: cpus=5 flags=0x0 hung=79s workers=12
idle: 4242 51 4248 4247 4245 435 4244 4239 (8 idle)
--- t=341s: Last workqueue lockup dump (pool stuck 327s) ---
kvm-irqfd-cleanup: pwq 22: active=4 (same)
rcu_gp: pwq 22: pending: process_srcu (still pending, 248s later)
events: pwq 22: active=56 (56 pending items, zero progress)
pool 22: hung=327s workers=12 idle: same 8 workers
--- t=343s: Hung task dump ---
Worker 4241 (MUTEX HOLDER):
task:kworker/5:4 state:D pid:4241
Workqueue: kvm-irqfd-cleanup irqfd_shutdown
__synchronize_srcu+0x100/0x130
irqfd_resampler_shutdown+0xf0/0x150
Worker 4243 (MUTEX WAITER):
task:kworker/5:6 state:D pid:4243
__mutex_lock+0x37d/0xbb0
irqfd_resampler_shutdown+0x23/0x150
(Workers 151 and 4246 show identical mutex_lock stacks)
> Please post sanitized ramoops/dmesg logs on-list so others can
> validate.
Full logs: https://gist.github.com/sonam-sanju/773855aa2cbe156ca19f3a87bbebc15e
Thanks,
Sonam