On Wed, 17 Jan 2018 07:15:09 -0800
Tejun Heo <t...@kernel.org> wrote:

> It's great that Steven's patches solve a good number of problems.  It
> is also true that there's a class of problems that it doesn't solve,
> which other approaches do.  The productive thing to do here is trying
> to solve the unsolved one too, especially given that it doesn't seem
> too difficuilt to do so on top of what's proposed.

OK, let's talk about the other problems, as this is no longer related
to my patch.

>From your previous email:

> 1. Console is IPMI emulated serial console.  Super slow.  Also
>    netconsole is in use.
> 2. System runs out of memory, OOM triggers.
> 3. OOM handler is printing out OOM debug info.
> 4. While trying to emit the messages for netconsole, the network stack
>    / driver tries to allocate memory and then fail, which in turn
>    triggers allocation failure or other warning messages.  printk was
>    already flushing, so the messages are queued on the ring.
> 5. OOM handler keeps flushing but 4 repeats and the queue is never
>    shrinking.  Because OOM handler is trapped in printk flushing, it
>    never manages to free memory and no one else can enter OOM path
>    either, so the system is trapped in this state.

>From what I gathered, you said an OOM would trigger, and then the
network console would not be able to allocate memory and it would
trigger a printk too, and cause an infinite amount of printks.

This could very well be a great place to force offloading. If a printk
is called from within a printk, at the same context (normal, softirq,
irq or NMI), then we should trigger the offloading.

My ftrace ring buffer has a context level recursion check, we could use
that, and even tie it into my previous patch:

With something like this (not compiled tested or anything, and
kick_offload_thread() would need to be implemented).

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9cb943c90d98..b80b23a0ca13 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2261,6 +2261,63 @@ static int have_callable_console(void)
 
        return 0;
 }
+/*
+ * Used for which context the printk is in.
+ *  NMI     = 0
+ *  IRQ     = 1
+ *  SOFTIRQ = 2
+ *  NORMAL  = 3
+ *
+ * Stack ordered, where the lower number can preempt
+ * the higher number: mask &= mask - 1, will only clear
+ * the lowerest set bit.
+ */
+enum {
+       CTX_NMI,
+       CTX_IRQ,
+       CTX_SOFTIRQ,
+       CTX_NORMAL,
+};
+
+static DEFINE_PER_CPU(int, recursion_bits);
+
+static bool recursion_check_start(void)
+{
+       unsigned long pc = preempt_count();
+       int val = this_cpu_read(recursion_bits);
+
+       if (!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)))
+               bit = CTX_NORMAL;
+       else
+               bit = pc & NMI_MASK ? CTX_NMI :
+                       pc & HARDIRQ_MASK ? CTX_IRQ : CTX_SOFTIRQ;
+
+       if (unlikely(val & (1 << bit)))
+               return true;
+
+       val |= (1 << bit);
+       this_cpu_write(recursion_bits, val);
+       return false;
+}
+
+static void recursion_check_finish(bool offload)
+{
+       int val = this_cpu_read(recursion_bits);
+
+       if (offload)
+               return;
+
+       val &= val - 1;
+       this_cpu_write(recursion_bits, val);
+}
+
+static void kick_offload_thread(void)
+{
+       /*
+        * Consoles are triggering printks, offload the printks
+        * to another CPU to hopefully avoid a lockup.
+        */
+}
 
 /*
  * Can we actually use the console at this time on this cpu?
@@ -2333,6 +2390,7 @@ void console_unlock(void)
 
        for (;;) {
                struct printk_log *msg;
+               bool offload;
                size_t ext_len = 0;
                size_t len;
 
@@ -2393,15 +2451,20 @@ void console_unlock(void)
                 * waiter waiting to take over.
                 */
                console_lock_spinning_enable();
+               offload = recursion_check_start();
 
                stop_critical_timings();        /* don't trace print latency */
                call_console_drivers(ext_text, ext_len, text, len);
                start_critical_timings();
 
+               recursion_check_finish(offload);
+
                if (console_lock_spinning_disable_and_check()) {
                        printk_safe_exit_irqrestore(flags);
                        return;
                }
+               if (offload)
+                       kick_offload_thread();
 
                printk_safe_exit_irqrestore(flags);
 

-- Steve

Reply via email to