Hello Tejun, Thank you for the detailed analysis.
On Wed, Apr 23, 2026, Tejun Heo wrote: > The problem with this theory is that this kworker, while preempted, is still > runnable and should be dispatched to its CPU once it becomes available > again. Workqueue doesn't care whether the task gets preempted or when it > gets the CPU back. It only cares about whether the task enters blocking > state (!runnable). A task which is preempted, even on the way to blocking, > still is runnable and should get put back on the CPU by the scheduler. > > If you can take a crashdump of the deadlocked state, can you see whether the > task is still on the scheduler's runqueue? I instrumented show_one_worker_pool() to dump scheduler state for each busy worker when the pool has been hung for >30 seconds. All workers show on_rq=0. == Pool state == pool 2: cpus=0 node=0 flags=0x0 nice=0 hung=47s workers=13 nr_running=1 nr_idle=7 == Per-worker scheduler state (first dump at t=62.5s) == PID | state | on_rq | se.on_rq | sched_delayed | sleeping | blocked_on -----|-------|-------|----------|---------------|----------|------------------- 4819 | 0x2 | 0 | 0 | 0 | 1 | ffff953608205210 type=1 4823 | 0x2 | 0 | 0 | 0 | 1 | ffff953608205210 type=1 4818 | 0x2 | 0 | 0 | 0 | 0 | ffff953608205210 type=1 11 | 0x2 | 0 | 0 | 0 | 1 | ffff953608205210 type=1 9 | 0x2 | 0 | 0 | 0 | 1 | ffff953608205210 type=1 4814 | 0x2 | 0 | 0 | 0 | 1 | (mutex holder) All 6 workers are in kvm-irqfd-cleanup, calling irqfd_shutdown â irqfd_resampler_shutdown. They contend on the same resampler->lock mutex (ffff953608205210). Full logs: https://gist.github.com/sonam-sanju/08042878542b7a58d2818e6076554211 Thanks, Sonam

