On Wed 2021-03-24 10:16:46, 王擎 wrote: > > >On Tue 2021-03-23 20:37:35, 王擎 wrote: > >> > >> >On Fri 2021-03-19 16:00:36, Wang Qing wrote: > >> >> When touch_softlockup_watchdog() is called, only > >> >> wq_watchdog_touched_cpu > >> >> updated, while the unbound worker_pool running on its core uses > >> >> wq_watchdog_touched to determine whether locked up. This may be > >> >> mischecked. > >> > > >> >By other words, unbound workqueues are not aware of the more common > >> >touch_softlockup_watchdog() because it updates only > >> >wq_watchdog_touched_cpu for the affected CPU. As a result, > >> >the workqueue watchdog might report lockup in unbound workqueue > >> >even though it is blocked by a known slow code. > >> > >> Yes, this is the problem I'm talking about. > > > >I thought more about it. This patch prevents a false positive. > >Could it bring an opposite problem and hide real problems? > > > >I mean that an unbound workqueue might get blocked on CPU A > >because of a real softlockup. But we might not notice it because > >CPU B is touched. Well, there are other ways how to detect > >this situation, e.g. the softlockup watchdog. > > > > > >> >> My suggestion is to update both when touch_softlockup_watchdog() is > >> >> called, > >> >> use wq_watchdog_touched_cpu to check bound, and use wq_watchdog_touched > >> >> to check unbound worker_pool. > >> >> > >> >> Signed-off-by: Wang Qing <wangq...@vivo.com> > >> >> --- > >> >> kernel/watchdog.c | 5 +++-- > >> >> kernel/workqueue.c | 17 ++++++----------- > >> >> 2 files changed, 9 insertions(+), 13 deletions(-) > >> >> > >> >> diff --git a/kernel/watchdog.c b/kernel/watchdog.c > >> >> index 7110906..107bc38 > >> >> --- a/kernel/watchdog.c > >> >> +++ b/kernel/watchdog.c > >> >> @@ -278,9 +278,10 @@ void touch_all_softlockup_watchdogs(void) > >> >> * update as well, the only side effect might be a cycle delay > >> >> for > >> >> * the softlockup check. > >> >> */ > >> >> - for_each_cpu(cpu, &watchdog_allowed_mask) > >> >> + for_each_cpu(cpu, &watchdog_allowed_mask) { > >> >> per_cpu(watchdog_touch_ts, cpu) = SOFTLOCKUP_RESET; > >> >> - wq_watchdog_touch(-1); > >> >> + wq_watchdog_touch(cpu); > >> > > >> >Note that wq_watchdog_touch(cpu) newly always updates > >> >wq_watchdog_touched. This cycle will set the same jiffies > >> >value cpu-times to the same variable. > >> > > >> Although there is a bit of redundancy here, but the most concise way of > >> implementation, and it is certain that it will not affect performance. > >> > Another way to implement is wq_watchdog_touch() remain unchanged, but need > to modify touch_softlockup_watchdog() and touch_all_softlockup_watchdogs(): > notrace void touch_softlockup_watchdog(void) > { > touch_softlockup_watchdog_sched(); > wq_watchdog_touch(raw_smp_processor_id()); > + wq_watchdog_touch(-1); > } > void touch_all_softlockup_watchdogs(void) > * update as well, the only side effect might be a cycle delay for > * the softlockup check. > */ > - for_each_cpu(cpu, &watchdog_allowed_mask) > + for_each_cpu(cpu, &watchdog_allowed_mask) { > per_cpu(watchdog_touch_ts, cpu) = SOFTLOCKUP_RESET; > + wq_watchdog_touch(cpu); > + } > wq_watchdog_touch(-1); > } > So wq_watchdog_touched will not get updated many times, > which do you think is better, Petr?
I actually prefer the original patch. It makes wq_watchdog_touch() easy to use. The complexity is hidden in wq-specific code. The alternative way updates each timestamp only once but the use is more complicated. IMHO, it is more error prone. Best Regards, Petr