On Fri, May 23, 2014 at 09:32:19AM +0800, Chen Yucong wrote: > ...if we reach a timeout, there is very little > chance for recovering. Thought. the probability for this situation to > happen is very slight, it's not impossible. Indeed, it's hard to know > the precise causes for timeout.
Ok, enough talking, let's close that hole and get on with our lives: --- From: Borislav Petkov <b...@suse.de> Date: Fri, 23 May 2014 11:06:35 +0200 Subject: [PATCH] mce: Panic when a core has reached a timeout There is very little and maybe practically nothing we can do to recover from a system where at least one core has reached a timeout during the whole monarch cores gathering. So panic when that happens. Signed-off-by: Borislav Petkov <b...@suse.de> --- arch/x86/kernel/cpu/mcheck/mce.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index bfde4871848f..529ccc488f5a 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -704,8 +704,7 @@ static int mce_timed_out(u64 *t) if (!mca_cfg.monarch_timeout) goto out; if ((s64)*t < SPINUNIT) { - /* CHECKME: Make panic default for 1 too? */ - if (mca_cfg.tolerant < 1) + if (mca_cfg.tolerant <= 1) mce_panic("Timeout synchronizing machine check over CPUs", NULL, NULL); cpu_missing = 1; -- 1.9.0 -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/