On Thu, Apr 23, 2026 at 5:05 AM Sonam Sanju <[email protected]> wrote: > > Hello Tejun, > > Thank you for the detailed analysis. > > On Wed, Apr 23, 2026, Tejun Heo wrote: > > The problem with this theory is that this kworker, while preempted, is still > > runnable and should be dispatched to its CPU once it becomes available > > again. Workqueue doesn't care whether the task gets preempted or when it > > gets the CPU back. It only cares about whether the task enters blocking > > state (!runnable). A task which is preempted, even on the way to blocking, > > still is runnable and should get put back on the CPU by the scheduler. > > > > If you can take a crashdump of the deadlocked state, can you see whether the > > task is still on the scheduler's runqueue? > > I instrumented show_one_worker_pool() to dump scheduler state for each busy > worker > when the pool has been hung for >30 seconds. > > All workers show on_rq=0. > > == Pool state == > > pool 2: cpus=0 node=0 flags=0x0 nice=0 hung=47s > workers=13 nr_running=1 nr_idle=7 > > == Per-worker scheduler state (first dump at t=62.5s) == > > PID | state | on_rq | se.on_rq | sched_delayed | sleeping | blocked_on > > -----|-------|-------|----------|---------------|----------|------------------- > 4819 | 0x2 | 0 | 0 | 0 | 1 | > ffff953608205210 type=1 > 4823 | 0x2 | 0 | 0 | 0 | 1 | > ffff953608205210 type=1 > 4818 | 0x2 | 0 | 0 | 0 | 0 | > ffff953608205210 type=1 > 11 | 0x2 | 0 | 0 | 0 | 1 | > ffff953608205210 type=1 > 9 | 0x2 | 0 | 0 | 0 | 1 | > ffff953608205210 type=1 > 4814 | 0x2 | 0 | 0 | 0 | 1 | (mutex holder) > > > All 6 workers are in kvm-irqfd-cleanup, calling irqfd_shutdown → > irqfd_resampler_shutdown. They contend on the same resampler->lock > mutex (ffff953608205210). >
Sorry for the late disclosure; I was running the 6.18 Android kernel and missed this relevant detail because the bug discussion initially started with KVM and I had verified the irqfd related code was the same as the vanilla kernel. Now, after going through Tejun's response and reviewing the __schedule() code regarding SM_PREEMPT, I realized the Android kernel has extra logic related to proxy execution that might be triggering this issue. I tested on vanilla 6.18.23 kernel and was not able to reproduce this. Sonam, just checking if you are able to reproduce this issue with the vanilla 6.18 kernel? Thanks, Vineeth

