RE: [PATCH] x86, MCE: Flesh out when to panic comment
> And this tolerant check looks fishy to me: > >if (s->sev >= MCE_UC_SEVERITY && ctx == IN_KERNEL) { >if (panic_on_oops || tolerant < 1) >return MCE_PANIC_SEVERITY; >} > > since we set it to 1 by default. But I'll look again on a clear head > tomorrow - it is too late here. tolerant level 0 exists - but is somewhat crazy in the opposite direction from the large values. Look at the comment in mce.c ... level 0 means always panic if you see a UC error -Tony
Re: [PATCH] x86, MCE: Flesh out when to panic comment
On Tue, May 27, 2014 at 09:53:56PM +, Luck, Tony wrote: > It is far from obvious that mce_severity() will always say that an > error detected inside the kernel will be fatal. Oh yeah, it needs a good cleansing rewrite, that's for sure. And this tolerant check looks fishy to me: if (s->sev >= MCE_UC_SEVERITY && ctx == IN_KERNEL) { if (panic_on_oops || tolerant < 1) return MCE_PANIC_SEVERITY; } since we set it to 1 by default. But I'll look again on a clear head tomorrow - it is too late here. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] x86, MCE: Flesh out when to panic comment
>> I think the comment is still not explaining the big part of what the >> discussion was about -- i.e. if it was in kernel context, we always >> panic. > > I thought the pointer to mce_severity was enough? People should open an > editor and look at the function and at its gory insanity. :-P It is far from obvious that mce_severity() will always say that an error detected inside the kernel will be fatal. -Tony
RE: [PATCH] x86, MCE: Flesh out when to panic comment
I think the comment is still not explaining the big part of what the discussion was about -- i.e. if it was in kernel context, we always panic. I thought the pointer to mce_severity was enough? People should open an editor and look at the function and at its gory insanity. :-P It is far from obvious that mce_severity() will always say that an error detected inside the kernel will be fatal. -Tony
Re: [PATCH] x86, MCE: Flesh out when to panic comment
On Tue, May 27, 2014 at 09:53:56PM +, Luck, Tony wrote: It is far from obvious that mce_severity() will always say that an error detected inside the kernel will be fatal. Oh yeah, it needs a good cleansing rewrite, that's for sure. And this tolerant check looks fishy to me: if (s-sev = MCE_UC_SEVERITY ctx == IN_KERNEL) { if (panic_on_oops || tolerant 1) return MCE_PANIC_SEVERITY; } since we set it to 1 by default. But I'll look again on a clear head tomorrow - it is too late here. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] x86, MCE: Flesh out when to panic comment
And this tolerant check looks fishy to me: if (s-sev = MCE_UC_SEVERITY ctx == IN_KERNEL) { if (panic_on_oops || tolerant 1) return MCE_PANIC_SEVERITY; } since we set it to 1 by default. But I'll look again on a clear head tomorrow - it is too late here. tolerant level 0 exists - but is somewhat crazy in the opposite direction from the large values. Look at the comment in mce.c ... level 0 means always panic if you see a UC error -Tony
Re: [PATCH] x86, MCE: Flesh out when to panic comment
On Mon, May 26, 2014 at 10:51 AM, Borislav Petkov wrote: > On Mon, May 26, 2014 at 09:47:38AM -0700, Andy Lutomirski wrote: >> It may be worth at least pointing out that mce_severity looks at >> whether we faulted from kernel context. I missed that the first time >> around because mce_severity doesn't take a pt_regs pointer. > > Right, but next time we talk about a different aspect which isn't > commented on in the handler, we'd have to add to it again, until we've > rewritten the whole function in pseudo code. > > I think simply pointing to the function which decides the fate of the > machine based on the MCE severity is enough - people can then go and > stare at it, albeit with some struggle. Fair enough. > > -- > Regards/Gruss, > Boris. > > Sent from a fat crate under my desk. Formatting is fine. > -- -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, MCE: Flesh out when to panic comment
On Mon, May 26, 2014 at 09:47:38AM -0700, Andy Lutomirski wrote: > It may be worth at least pointing out that mce_severity looks at > whether we faulted from kernel context. I missed that the first time > around because mce_severity doesn't take a pt_regs pointer. Right, but next time we talk about a different aspect which isn't commented on in the handler, we'd have to add to it again, until we've rewritten the whole function in pseudo code. I think simply pointing to the function which decides the fate of the machine based on the MCE severity is enough - people can then go and stare at it, albeit with some struggle. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, MCE: Flesh out when to panic comment
On Mon, May 26, 2014 at 4:06 AM, Borislav Petkov wrote: > On Mon, May 26, 2014 at 12:51:10PM +0200, Jiri Kosina wrote: >> I think the comment is still not explaining the big part of what the >> discussion was about -- i.e. if it was in kernel context, we always >> panic. > > I thought the pointer to mce_severity was enough? People should open an > editor and look at the function and at its gory insanity. :-P It may be worth at least pointing out that mce_severity looks at whether we faulted from kernel context. I missed that the first time around because mce_severity doesn't take a pt_regs pointer. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, MCE: Flesh out when to panic comment
On Mon, May 26, 2014 at 12:51:10PM +0200, Jiri Kosina wrote: > I think the comment is still not explaining the big part of what the > discussion was about -- i.e. if it was in kernel context, we always > panic. I thought the pointer to mce_severity was enough? People should open an editor and look at the function and at its gory insanity. :-P -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, MCE: Flesh out when to panic comment
On Mon, 26 May 2014, Borislav Petkov wrote: > On Wed, May 21, 2014 at 03:13:54PM -0700, H. Peter Anvin wrote: > > Seems like a comment would be in order, though. > > --- > From: Borislav Petkov > Subject: [PATCH] x86, MCE: Flesh out when to panic comment > > Recent discussion (link below) showed that it is not really clear what > appropriate recovery actions we're taking when in a machine check > exception. Flesh out the comment which was explaining that with more > detail. > > Suggested-by: "H. Peter Anvin" > Cc: Tony Luck > Link: > http://lkml.kernel.org/r/calcetrxudj8bknf_m-r4o40xln%2bpnz5tozw0p7n4kqo3qng...@mail.gmail.com > Signed-off-by: Borislav Petkov > --- > arch/x86/kernel/cpu/mcheck/mce.c | 12 > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c > b/arch/x86/kernel/cpu/mcheck/mce.c > index 68317c80de7f..9f070339b09f 100644 > --- a/arch/x86/kernel/cpu/mcheck/mce.c > +++ b/arch/x86/kernel/cpu/mcheck/mce.c > @@ -1151,10 +1151,14 @@ void do_machine_check(struct pt_regs *regs, long > error_code) > no_way_out = worst >= MCE_PANIC_SEVERITY; > > /* > - * At insane "tolerant" levels we take no action. Otherwise > - * we only die if we have no other choice. For less serious > - * issues we try to recover, or limit damage to the current > - * process. > + * At insane "tolerant" levels we take no action. Otherwise we only die > + * if we have no other choice. Which means, we're definitely going to > + * panic on unrecoverable, uncontainable errors which would otherwise > + * influence machine state and/or cause any type of corruption. The > + * decision what do to is done by mce_severity(). > + * > + * For less serious issues we try to recover, or limit damage to the > + * current process. >*/ I think the comment is still not explaining the big part of what the discussion was about -- i.e. if it was in kernel context, we always panic. -- Jiri Kosina SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, MCE: Flesh out when to panic comment
On Mon, 26 May 2014, Borislav Petkov wrote: On Wed, May 21, 2014 at 03:13:54PM -0700, H. Peter Anvin wrote: Seems like a comment would be in order, though. --- From: Borislav Petkov b...@suse.de Subject: [PATCH] x86, MCE: Flesh out when to panic comment Recent discussion (link below) showed that it is not really clear what appropriate recovery actions we're taking when in a machine check exception. Flesh out the comment which was explaining that with more detail. Suggested-by: H. Peter Anvin h...@zytor.com Cc: Tony Luck tony.l...@intel.com Link: http://lkml.kernel.org/r/calcetrxudj8bknf_m-r4o40xln%2bpnz5tozw0p7n4kqo3qng...@mail.gmail.com Signed-off-by: Borislav Petkov b...@suse.de --- arch/x86/kernel/cpu/mcheck/mce.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index 68317c80de7f..9f070339b09f 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -1151,10 +1151,14 @@ void do_machine_check(struct pt_regs *regs, long error_code) no_way_out = worst = MCE_PANIC_SEVERITY; /* - * At insane tolerant levels we take no action. Otherwise - * we only die if we have no other choice. For less serious - * issues we try to recover, or limit damage to the current - * process. + * At insane tolerant levels we take no action. Otherwise we only die + * if we have no other choice. Which means, we're definitely going to + * panic on unrecoverable, uncontainable errors which would otherwise + * influence machine state and/or cause any type of corruption. The + * decision what do to is done by mce_severity(). + * + * For less serious issues we try to recover, or limit damage to the + * current process. */ I think the comment is still not explaining the big part of what the discussion was about -- i.e. if it was in kernel context, we always panic. -- Jiri Kosina SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, MCE: Flesh out when to panic comment
On Mon, May 26, 2014 at 12:51:10PM +0200, Jiri Kosina wrote: I think the comment is still not explaining the big part of what the discussion was about -- i.e. if it was in kernel context, we always panic. I thought the pointer to mce_severity was enough? People should open an editor and look at the function and at its gory insanity. :-P -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, MCE: Flesh out when to panic comment
On Mon, May 26, 2014 at 4:06 AM, Borislav Petkov b...@alien8.de wrote: On Mon, May 26, 2014 at 12:51:10PM +0200, Jiri Kosina wrote: I think the comment is still not explaining the big part of what the discussion was about -- i.e. if it was in kernel context, we always panic. I thought the pointer to mce_severity was enough? People should open an editor and look at the function and at its gory insanity. :-P It may be worth at least pointing out that mce_severity looks at whether we faulted from kernel context. I missed that the first time around because mce_severity doesn't take a pt_regs pointer. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, MCE: Flesh out when to panic comment
On Mon, May 26, 2014 at 09:47:38AM -0700, Andy Lutomirski wrote: It may be worth at least pointing out that mce_severity looks at whether we faulted from kernel context. I missed that the first time around because mce_severity doesn't take a pt_regs pointer. Right, but next time we talk about a different aspect which isn't commented on in the handler, we'd have to add to it again, until we've rewritten the whole function in pseudo code. I think simply pointing to the function which decides the fate of the machine based on the MCE severity is enough - people can then go and stare at it, albeit with some struggle. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, MCE: Flesh out when to panic comment
On Mon, May 26, 2014 at 10:51 AM, Borislav Petkov b...@alien8.de wrote: On Mon, May 26, 2014 at 09:47:38AM -0700, Andy Lutomirski wrote: It may be worth at least pointing out that mce_severity looks at whether we faulted from kernel context. I missed that the first time around because mce_severity doesn't take a pt_regs pointer. Right, but next time we talk about a different aspect which isn't commented on in the handler, we'd have to add to it again, until we've rewritten the whole function in pseudo code. I think simply pointing to the function which decides the fate of the machine based on the MCE severity is enough - people can then go and stare at it, albeit with some struggle. Fair enough. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/