On Thu, Apr 23, 2026 at 5:05 AM Sonam Sanju <[email protected]> wrote:
>
> Hello Tejun,
>
> Thank you for the detailed analysis.
>
> On Wed, Apr 23, 2026, Tejun Heo wrote:
> > The problem with this theory is that this kworker, while preempted, is still
> > runnable and should be dispatched to its CPU once it becomes available
> > again. Workqueue doesn't care whether the task gets preempted or when it
> > gets the CPU back. It only cares about whether the task enters blocking
> > state (!runnable). A task which is preempted, even on the way to blocking,
> > still is runnable and should get put back on the CPU by the scheduler.
> >
> > If you can take a crashdump of the deadlocked state, can you see whether the
> > task is still on the scheduler's runqueue?
>
> I instrumented show_one_worker_pool() to dump scheduler state for each busy 
> worker
> when the pool has been hung for >30 seconds.
>
> All workers show on_rq=0.
>
> == Pool state ==
>
>   pool 2: cpus=0 node=0 flags=0x0 nice=0 hung=47s
>   workers=13 nr_running=1 nr_idle=7
>
> == Per-worker scheduler state (first dump at t=62.5s) ==
>
>   PID  | state | on_rq | se.on_rq | sched_delayed | sleeping | blocked_on
>   
> -----|-------|-------|----------|---------------|----------|-------------------
>   4819 | 0x2   | 0     | 0        | 0             | 1        | 
> ffff953608205210 type=1
>   4823 | 0x2   | 0     | 0        | 0             | 1        | 
> ffff953608205210 type=1
>   4818 | 0x2   | 0     | 0        | 0             | 0        | 
> ffff953608205210 type=1
>   11   | 0x2   | 0     | 0        | 0             | 1        | 
> ffff953608205210 type=1
>   9    | 0x2   | 0     | 0        | 0             | 1        | 
> ffff953608205210 type=1
>   4814 | 0x2   | 0     | 0        | 0             | 1        | (mutex holder)
>
>
> All 6 workers are in kvm-irqfd-cleanup, calling irqfd_shutdown →
> irqfd_resampler_shutdown. They contend on the same resampler->lock
> mutex (ffff953608205210).
>

Sorry for the late disclosure; I was running the 6.18 Android kernel
and missed this relevant detail because the bug discussion initially
started with KVM and I had verified the irqfd related code was the
same as the vanilla kernel. Now, after going through Tejun's response
and reviewing the __schedule() code regarding SM_PREEMPT, I realized
the Android kernel has extra logic related to proxy execution that
might be triggering this issue. I tested on vanilla 6.18.23 kernel and
was not able to reproduce this.

Sonam, just checking if you are able to reproduce this issue with the
vanilla 6.18 kernel?

Thanks,
Vineeth

Reply via email to