Re: Re: Re: [PATCH V2] workqueue: watchdog: update wq_watchdog_touched for unbound lockup checking

Petr Mladek Wed, 24 Mar 2021 01:51:57 -0700

On Wed 2021-03-24 10:16:46, 王擎 wrote:
> 
> >On Tue 2021-03-23 20:37:35, 王擎 wrote:
> >> 
> >> >On Fri 2021-03-19 16:00:36, Wang Qing wrote:
> >> >> When touch_softlockup_watchdog() is called, only 
> >> >> wq_watchdog_touched_cpu 
> >> >> updated, while the unbound worker_pool running on its core uses 
> >> >> wq_watchdog_touched to determine whether locked up. This may be 
> >> >> mischecked.
> >> >
> >> >By other words, unbound workqueues are not aware of the more common
> >> >touch_softlockup_watchdog() because it updates only
> >> >wq_watchdog_touched_cpu for the affected CPU. As a result,
> >> >the workqueue watchdog might report lockup in unbound workqueue
> >> >even though it is blocked by a known slow code.
> >> 
> >> Yes, this is the problem I'm talking about.
> >
> >I thought more about it. This patch prevents a false positive.
> >Could it bring an opposite problem and hide real problems?
> >
> >I mean that an unbound workqueue might get blocked on CPU A
> >because of a real softlockup. But we might not notice it because
> >CPU B is touched. Well, there are other ways how to detect
> >this situation, e.g. the softlockup watchdog.
> >
> >
> >> >> My suggestion is to update both when touch_softlockup_watchdog() is 
> >> >> called, 
> >> >> use wq_watchdog_touched_cpu to check bound, and use wq_watchdog_touched 
> >> >> to check unbound worker_pool.
> >> >> 
> >> >> Signed-off-by: Wang Qing <wangq...@vivo.com>
> >> >> ---
> >> >>  kernel/watchdog.c  |  5 +++--
> >> >>  kernel/workqueue.c | 17 ++++++-----------
> >> >>  2 files changed, 9 insertions(+), 13 deletions(-)
> >> >> 
> >> >> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> >> >> index 7110906..107bc38
> >> >> --- a/kernel/watchdog.c
> >> >> +++ b/kernel/watchdog.c
> >> >> @@ -278,9 +278,10 @@ void touch_all_softlockup_watchdogs(void)
> >> >>          * update as well, the only side effect might be a cycle delay 
> >> >> for
> >> >>          * the softlockup check.
> >> >>          */
> >> >> -       for_each_cpu(cpu, &watchdog_allowed_mask)
> >> >> +       for_each_cpu(cpu, &watchdog_allowed_mask) {
> >> >>                 per_cpu(watchdog_touch_ts, cpu) = SOFTLOCKUP_RESET;
> >> >> -       wq_watchdog_touch(-1);
> >> >> +               wq_watchdog_touch(cpu);
> >> >
> >> >Note that wq_watchdog_touch(cpu) newly always updates
> >> >wq_watchdog_touched. This cycle will set the same jiffies
> >> >value cpu-times to the same variable.
> >> >
> >> Although there is a bit of redundancy here, but the most concise way of 
> >> implementation, and it is certain that it will not affect performance.
> >> 
> Another way to implement is wq_watchdog_touch() remain unchanged, but need 
> to modify touch_softlockup_watchdog() and touch_all_softlockup_watchdogs():
> notrace void touch_softlockup_watchdog(void)
> {
>       touch_softlockup_watchdog_sched();
>       wq_watchdog_touch(raw_smp_processor_id());
> +     wq_watchdog_touch(-1);
> }
> void touch_all_softlockup_watchdogs(void)
>  * update as well, the only side effect might be a cycle delay for
>  * the softlockup check.
> */
>  -    for_each_cpu(cpu, &watchdog_allowed_mask)
> +     for_each_cpu(cpu, &watchdog_allowed_mask) {
>               per_cpu(watchdog_touch_ts, cpu) = SOFTLOCKUP_RESET;
>  +            wq_watchdog_touch(cpu);
>  +    }
>       wq_watchdog_touch(-1);
>   }
> So wq_watchdog_touched will not get updated many times,  
> which do you think is better, Petr?


I actually prefer the original patch. It makes wq_watchdog_touch()
easy to use. The complexity is hidden in wq-specific code.

The alternative way updates each timestamp only once but the use
is more complicated. IMHO, it is more error prone.

Best Regards,
Petr

Re: Re: Re: [PATCH V2] workqueue: watchdog: update wq_watchdog_touched for unbound lockup checking

Reply via email to