Hello Tejun,

Thank you for the detailed analysis.

On Wed, Apr 23, 2026, Tejun Heo wrote:
> The problem with this theory is that this kworker, while preempted, is still
> runnable and should be dispatched to its CPU once it becomes available
> again. Workqueue doesn't care whether the task gets preempted or when it
> gets the CPU back. It only cares about whether the task enters blocking
> state (!runnable). A task which is preempted, even on the way to blocking,
> still is runnable and should get put back on the CPU by the scheduler.
>
> If you can take a crashdump of the deadlocked state, can you see whether the
> task is still on the scheduler's runqueue?

I instrumented show_one_worker_pool() to dump scheduler state for each busy 
worker 
when the pool has been hung for >30 seconds.

All workers show on_rq=0.

== Pool state ==

  pool 2: cpus=0 node=0 flags=0x0 nice=0 hung=47s
  workers=13 nr_running=1 nr_idle=7

== Per-worker scheduler state (first dump at t=62.5s) ==

  PID  | state | on_rq | se.on_rq | sched_delayed | sleeping | blocked_on
  
-----|-------|-------|----------|---------------|----------|-------------------
  4819 | 0x2   | 0     | 0        | 0             | 1        | ffff953608205210 
type=1
  4823 | 0x2   | 0     | 0        | 0             | 1        | ffff953608205210 
type=1
  4818 | 0x2   | 0     | 0        | 0             | 0        | ffff953608205210 
type=1
  11   | 0x2   | 0     | 0        | 0             | 1        | ffff953608205210 
type=1
  9    | 0x2   | 0     | 0        | 0             | 1        | ffff953608205210 
type=1
  4814 | 0x2   | 0     | 0        | 0             | 1        | (mutex holder)


All 6 workers are in kvm-irqfd-cleanup, calling irqfd_shutdown →
irqfd_resampler_shutdown. They contend on the same resampler->lock
mutex (ffff953608205210).

Full logs: https://gist.github.com/sonam-sanju/08042878542b7a58d2818e6076554211

Thanks,
Sonam

Reply via email to