The Intel CMCI interrupt handler calls mce_timer_kick() to force more
frequent polling for MCE events when a CMCI storm occurs and CMCI
interrupts are subsequently disabled.

If a CMCI interrupt storm happens to be detected while the timer
interrupt is executing timer functions, mce_timer_kick() can race with
mce_timer_fn(), which results in a double-add and the following BUG:

        #0 [ffff88047fda3ad0] machine_kexec at ffffffff8102bdf5
        #1 [ffff88047fda3b20] crash_kexec at ffffffff8109e788
        #2 [ffff88047fda3bf0] oops_end at ffffffff815f20e8
        #3 [ffff88047fda3c20] die at ffffffff81005c08
        #4 [ffff88047fda3c50] do_trap at ffffffff815f192b
        #5 [ffff88047fda3cb0] do_invalid_op at ffffffff81002f42
        #6 [ffff88047fda3d60] invalid_op at ffffffff815fa668
        [exception RIP: add_timer_on+234]
        RIP: ffffffff8104d05a RSP: ffff88047fda3e18 RFLAGS: 00010286
        RAX: 0000000000000000 RBX: ffff88047fdacbc0 RCX: 000000001fbee3ff
        RDX: ffff88047fda0000 RSI: 000000000000001d RDI: ffff88047fdacbc0
        RBP: ffff88047fda3e58 R8: 0000000000000000 R9: ffffffff81aa0940
        R10: 0720072007200720 R11: 0720072007200765 R12: ffff880474a6c000
        R13: 0000000000000101 R14: 000000000000001d R15: ffff88047fdacbc0
        ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
        #7 [ffff88047fda3e60] mce_timer_fn at ffffffff8101f524
        #8 [ffff88047fda3e80] call_timer_fn at ffffffff8104b4fa
        #9 [ffff88047fda3ec0] run_timer_softirq at ffffffff8104ce70

The timer_add() in mce_timer_kick() is actually unnecessary: since the
timer is re-added by its handler function, the only case in which the
timer doesn't exist is when the CMCI interrupt calls mce_timer_kick() in
the interval between the timer firing and mce_timer_fn() actually being
executed. Thus, the timer work will be performed by mce_timer_fn() just
after the interrupt exits.

This patch removes the add_timer() from mce_timer_kick(), and disables
local interrupts during mce_timer_fn() so that mce_timer_fn() will
always pick up the timer interval value that mce_timer_kick() drops
in the PERCPU variable.

This means that the CMCI interrupt that hits the storm threshold will
call mce_timer_kick() either:

        1) In the interval between the mce_timer firing and mce_timer_fn()
           disabling local IRQs. In this case, mce_timer_fn() will
           immediately execute after the CMCI handler exits, and will
           use the interval loaded in the PERCPU variable from
           mce_timer_kick() to calculate its next timer interval.

        2) Happen after mce_timer_fn() has done its work, in which case
           the existing timer will be updated with the new interval if
           it is before the existing one.

Signed-off-by: Calvin Owens <calvinow...@fb.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 61a9668ce..7074a90 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1286,7 +1286,7 @@ static int cmc_error_seen(void)
 static void mce_timer_fn(unsigned long data)
 {
        struct timer_list *t = this_cpu_ptr(&mce_timer);
-       unsigned long iv;
+       unsigned long iv, flags;
        int notify;
 
        WARN_ON(smp_processor_id() != data);
@@ -1301,6 +1301,9 @@ static void mce_timer_fn(unsigned long data)
         * Alert userspace if needed.  If we logged an MCE, reduce the
         * polling interval, otherwise increase the polling interval.
         */
+
+       local_irq_save(flags);
+
        iv = __this_cpu_read(mce_next_interval);
        notify = mce_notify_irq();
        notify |= cmc_error_seen();
@@ -1316,6 +1319,8 @@ static void mce_timer_fn(unsigned long data)
                t->expires = jiffies + iv;
                add_timer_on(t, smp_processor_id());
        }
+
+       local_irq_restore(flags);
 }
 
 /*
@@ -1330,9 +1335,6 @@ void mce_timer_kick(unsigned long interval)
        if (timer_pending(t)) {
                if (time_before(when, t->expires))
                        mod_timer_pinned(t, when);
-       } else {
-               t->expires = round_jiffies(when);
-               add_timer_on(t, smp_processor_id());
        }
        if (interval < iv)
                __this_cpu_write(mce_next_interval, interval);
-- 
2.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to