On Wed 2021-03-24 19:34:02, Wang Qing wrote: > There are two workqueue-specific watchdog timestamps: > > + @wq_watchdog_touched_cpu (per-CPU) updated by > touch_softlockup_watchdog() > > + @wq_watchdog_touched (global) updated by > touch_all_softlockup_watchdogs() > > watchdog_timer_fn() checks only the global @wq_watchdog_touched for > unbound workqueues. As a result, unbound workqueues are not aware > of touch_softlockup_watchdog(). The watchdog might report a stall > even when the unbound workqueues are blocked by a known slow code. > > Solution: > touch_softlockup_watchdog() must touch also the global @wq_watchdog_touched > timestamp. > > The global timestamp can not longer be used for bound workqueues > because it is updated on all CPUs. Instead, bound workqueues > have to check only @wq_watchdog_touched_cpu and these timestamp > has to be updated for all CPUs in touch_all_softlockup_watchdogs(). > > Beware: > The change might cause the opposite problem. An unbound workqueue > might get blocked on CPU A because of a real softlockup. The workqueue > watchdog would miss it when the timestamp got touched on CPU B. > > It is acceptable because softlockups are detected by softlockup > watchdog. The workqueue watchdog is there to detect stalls where > a work never finishes, for example, because of dependencies of works > queued into the same workqueue. > > V3: > - Modify the commit message clearly according to Petr's suggestion. > > Signed-off-by: Wang Qing <wangq...@vivo.com>
The patch fixes a real problem: Reviewed-by: Petr Mladek <pmla...@suse.com> Best Regards, Petr