On (03/20/18 09:34), [email protected] wrote:
[..]
> Thanks very much.
> commit e480af09c49736848f749a43dff2c902104f6691 avoided the NMI watchdog
> trigger.

Hm, okay... But "touch_nmi_watchdog() everywhere printk/console-related"
is not exactly where I wanted us to be.

By the way e480af09c49736848f749a43dff2c902104f6691 is from 2006.
Are you sure you meant exactly that commit? What kernel do you use?


Are you saying that none of Steven's patches helped on your setups?


> And this patch may  avdoid long time blocking:
> https://lkml.org/lkml/2018/3/8/584
> 
> We've test it several days.

Hm, printk_deferred is a bit dangerous; it moves console_unlock() to
IRQ. So you still can have the problem of stuck CPUs, it's just now
you shut up the watchdog. Did you test Steven's patches?


A tricky part about printk_deferred() is that it does not use hand off
mechanism. And even more...  What we have with "printk vs printk"
sceanrio

        CPU0                    CPU1            ...             CPUN

        printk                  printk
         console_unlock          hand off                       printk
                                  console_unlock                 hand off
                                                                  console_unlock

turns into a good old "one CPU prints it all" when we have "printk vs
printk_deferred" case. Because printk_deferred just log_store messages
and then _may be_ it grabs the console_sem from IRQ and invokes
console_unlock().

So it's something like this

        CPU0                    CPU1            ...             CPUN

        printk                  printk_deffered
         console_unlock                                         printk_deferred
         console_unlock
         console_unlock
        ...                     ...                             ...
                                printk_deffered                 printk_deferred
         console_unlock
         console_unlock


// offtopic  "I can has printk_kthread?"



You now touch_nmi_watchdog() from the console driver [well... at least this
is what e480af09c4973 is doing, but I'm not sure I see how come you didn't
have it applied], so that's why you don't see hard lockups on that CPU0. But
your printing CPU still can stuck, which will defer RCUs on that CPU, etc.
etc. etc. So I'd say that those two approaches

                printk_deferred + touch_nmi_watchdog

combined can do quite some harm. One thing for sure - they don't really fix
any problems.

        -ss

Reply via email to