>On Wed 2021-03-24 10:16:46, 王擎 wrote:
>> 
>> >On Tue 2021-03-23 20:37:35, 王擎 wrote:
>> >> 
>> >> >On Fri 2021-03-19 16:00:36, Wang Qing wrote:
>> >> >> When touch_softlockup_watchdog() is called, only 
>> >> >> wq_watchdog_touched_cpu 
>> >> >> updated, while the unbound worker_pool running on its core uses 
>> >> >> wq_watchdog_touched to determine whether locked up. This may be 
>> >> >> mischecked.
>> >> >
>> >> >By other words, unbound workqueues are not aware of the more common
>> >> >touch_softlockup_watchdog() because it updates only
>> >> >wq_watchdog_touched_cpu for the affected CPU. As a result,
>> >> >the workqueue watchdog might report lockup in unbound workqueue
>> >> >even though it is blocked by a known slow code.
>> >> 
>> >> Yes, this is the problem I'm talking about.
>> >
>> >I thought more about it. This patch prevents a false positive.
>> >Could it bring an opposite problem and hide real problems?
>> >
>> >I mean that an unbound workqueue might get blocked on CPU A
>> >because of a real softlockup. But we might not notice it because
>> >CPU B is touched. Well, there are other ways how to detect
>> >this situation, e.g. the softlockup watchdog.
>> >
>> >
>> >> >> My suggestion is to update both when touch_softlockup_watchdog() is 
>> >> >> called, 
>> >> >> use wq_watchdog_touched_cpu to check bound, and use 
>> >> >> wq_watchdog_touched 
>> >> >> to check unbound worker_pool.
>> >> >> 
>> >> >> Signed-off-by: Wang Qing <wangq...@vivo.com>
>> >> >> ---
>> >> >>  kernel/watchdog.c  |  5 +++--
>> >> >>  kernel/workqueue.c | 17 ++++++-----------
>> >> >>  2 files changed, 9 insertions(+), 13 deletions(-)
>> >> >> 
>> >> >> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
>> >> >> index 7110906..107bc38
>> >> >> --- a/kernel/watchdog.c
>> >> >> +++ b/kernel/watchdog.c
>> >> >> @@ -278,9 +278,10 @@ void touch_all_softlockup_watchdogs(void)
>> >> >>         * update as well, the only side effect might be a cycle delay 
>> >> >> for
>> >> >>         * the softlockup check.
>> >> >>         */
>> >> >> -      for_each_cpu(cpu, &watchdog_allowed_mask)
>> >> >> +      for_each_cpu(cpu, &watchdog_allowed_mask) {
>> >> >>                per_cpu(watchdog_touch_ts, cpu) = SOFTLOCKUP_RESET;
>> >> >> -      wq_watchdog_touch(-1);
>> >> >> +              wq_watchdog_touch(cpu);
>> >> >
>> >> >Note that wq_watchdog_touch(cpu) newly always updates
>> >> >wq_watchdog_touched. This cycle will set the same jiffies
>> >> >value cpu-times to the same variable.
>> >> >
>> >> Although there is a bit of redundancy here, but the most concise way of 
>> >> implementation, and it is certain that it will not affect performance.
>> >> 
>> Another way to implement is wq_watchdog_touch() remain unchanged, but need 
>> to modify touch_softlockup_watchdog() and touch_all_softlockup_watchdogs():
>> notrace void touch_softlockup_watchdog(void)
>> {
>>      touch_softlockup_watchdog_sched();
>>      wq_watchdog_touch(raw_smp_processor_id());
>> +     wq_watchdog_touch(-1);
>> }
>> void touch_all_softlockup_watchdogs(void)
>>  * update as well, the only side effect might be a cycle delay for
>>  * the softlockup check.
>> */
>>  -   for_each_cpu(cpu, &watchdog_allowed_mask)
>> +    for_each_cpu(cpu, &watchdog_allowed_mask) {
>>              per_cpu(watchdog_touch_ts, cpu) = SOFTLOCKUP_RESET;
>>  +           wq_watchdog_touch(cpu);
>>  +   }
>>      wq_watchdog_touch(-1);
>>   }
>> So wq_watchdog_touched will not get updated many times,  
>> which do you think is better, Petr?
>
>I actually prefer the original patch. It makes wq_watchdog_touch()
>easy to use. The complexity is hidden in wq-specific code.
>
>The alternative way updates each timestamp only once but the use
>is more complicated. IMHO, it is more error prone.

I agree, so I will just modify the commit log based on V2 and resend.

Thanks,
Qing
>
>Best Regards,
>Petr


Reply via email to