> Could you provide a time-aligned dump that includes:
>   - pwq state (active/pending/in-flight)
>   - pending and in-flight work items with their queue/start times
>   - worker task states

Below are time-aligned extracts from both instances.  Full logs are
included further down in this email.

=== Instance 1: kernel 6.18.8, pool 14 (cpus=3) ===

--- t=62s: First workqueue lockup dump (pool stuck 49s, since ~t=13s) ---

  kvm-irqfd-cleanup: pwq 14: active=4 refcnt=5
    in-flight: 157:irqfd_shutdown ,4044:irqfd_shutdown ,
               102:irqfd_shutdown ,39:irqfd_shutdown

  rcu_gp: pwq 14: active=2 refcnt=3
    pending: 2*process_srcu

  events: pwq 14: active=43 refcnt=44
    pending: binder_deferred_func, kernfs_notify_workfn,
             delayed_vfree_work, 5*destroy_super_work,
             3*bpf_prog_free_deferred, 10*destroy_super_work, ...

  mm_percpu_wq: pwq 14: active=2 refcnt=4
    pending: vmstat_update, lru_add_drain_per_cpu

  pm: pwq 14: active=1 refcnt=2
    pending: pm_runtime_work

  pool 14: cpus=3 flags=0x0 hung=49s workers=11
    idle: 4046 4038 4045 4039 4043 156 77  (7 idle)

  Active busy worker backtrace (pid 102):
    __schedule → schedule → schedule_preempt_disabled →
    __mutex_lock → irqfd_resampler_shutdown+0x23 →
    irqfd_shutdown → process_scheduled_works → worker_thread

--- t=312s: Last workqueue lockup dump (pool stuck 298s) ---

  kvm-irqfd-cleanup: pwq 14: active=4 (same 4 in-flight)
  rcu_gp: pwq 14: pending: 2*process_srcu  (still pending, 250s later)
  events: pwq 14: active=43  (same, no progress)
  pool 14: hung=298s workers=11 idle: 4046 4038 4045 4039 4043 156 77

--- t=314s: Hung task dump ---

  Worker 4044 (MUTEX HOLDER):
    task:kworker/3:8   state:D  pid:4044
    Workqueue: kvm-irqfd-cleanup irqfd_shutdown
      __synchronize_srcu+0x100/0x130
      irqfd_resampler_shutdown+0xf0/0x150  ← synchronize_srcu call

  Worker 157 (MUTEX WAITER):
    task:kworker/3:4   state:D  pid:157
      __mutex_lock+0x409/0xd90
      irqfd_resampler_shutdown+0x23/0x150  ← mutex_lock call

  (Workers 39 and 102 show identical mutex_lock stacks)

=== Instance 2: kernel 6.18.2, pool 22 (cpus=5) ===

--- t=93s: First workqueue lockup dump (pool stuck 79s, since ~t=14s) ---

  kvm-irqfd-cleanup: pwq 22: active=4 refcnt=5
    in-flight: 151:irqfd_shutdown ,4246:irqfd_shutdown ,
               4241:irqfd_shutdown ,4243:irqfd_shutdown

  rcu_gp: pwq 22: active=1 refcnt=2
    pending: process_srcu

  events: pwq 22: active=56 refcnt=57
    pending: kernfs_notify_workfn, delayed_vfree_work,
             binder_deferred_func, 47*destroy_super_work, ...

  pool 22: cpus=5 flags=0x0 hung=79s workers=12
    idle: 4242 51 4248 4247 4245 435 4244 4239  (8 idle)

--- t=341s: Last workqueue lockup dump (pool stuck 327s) ---

  kvm-irqfd-cleanup: pwq 22: active=4 (same)
  rcu_gp: pwq 22: pending: process_srcu  (still pending, 248s later)
  events: pwq 22: active=56  (56 pending items, zero progress)
  pool 22: hung=327s workers=12 idle: same 8 workers

--- t=343s: Hung task dump ---

  Worker 4241 (MUTEX HOLDER):
    task:kworker/5:4   state:D  pid:4241
    Workqueue: kvm-irqfd-cleanup irqfd_shutdown
      __synchronize_srcu+0x100/0x130
      irqfd_resampler_shutdown+0xf0/0x150

  Worker 4243 (MUTEX WAITER):
    task:kworker/5:6   state:D  pid:4243
      __mutex_lock+0x37d/0xbb0
      irqfd_resampler_shutdown+0x23/0x150

  (Workers 151 and 4246 show identical mutex_lock stacks)

> Please post sanitized ramoops/dmesg logs on-list so others can
> validate.

Full logs: https://gist.github.com/sonam-sanju/773855aa2cbe156ca19f3a87bbebc15e

Thanks,
Sonam

Reply via email to