RE: [PATCH] x86, MCE: Flesh out when to panic comment

2014-05-27 Thread Luck, Tony
> And this tolerant check looks fishy to me:
>
>if (s->sev >= MCE_UC_SEVERITY && ctx == IN_KERNEL) {
>if (panic_on_oops || tolerant < 1)
>return MCE_PANIC_SEVERITY;
>}
>
> since we set it to 1 by default. But I'll look again on a clear head
> tomorrow - it is too late here.

tolerant level 0 exists - but is somewhat crazy in the opposite direction
from the large values.  Look at the comment in mce.c ... level 0
means always panic if you see a UC error

-Tony


Re: [PATCH] x86, MCE: Flesh out when to panic comment

2014-05-27 Thread Borislav Petkov
On Tue, May 27, 2014 at 09:53:56PM +, Luck, Tony wrote:
> It is far from obvious that mce_severity() will always say that an
> error detected inside the kernel will be fatal.

Oh yeah, it needs a good cleansing rewrite, that's for sure.

And this tolerant check looks fishy to me:

if (s->sev >= MCE_UC_SEVERITY && ctx == IN_KERNEL) {
if (panic_on_oops || tolerant < 1)
return MCE_PANIC_SEVERITY;
}

since we set it to 1 by default. But I'll look again on a clear head
tomorrow - it is too late here.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] x86, MCE: Flesh out when to panic comment

2014-05-27 Thread Luck, Tony
>> I think the comment is still not explaining the big part of what the
>> discussion was about -- i.e. if it was in kernel context, we always
>> panic.
>
> I thought the pointer to mce_severity was enough? People should open an
> editor and look at the function and at its gory insanity. :-P

It is far from obvious that mce_severity() will always say that an error
detected inside the kernel will be fatal.

-Tony


RE: [PATCH] x86, MCE: Flesh out when to panic comment

2014-05-27 Thread Luck, Tony
 I think the comment is still not explaining the big part of what the
 discussion was about -- i.e. if it was in kernel context, we always
 panic.

 I thought the pointer to mce_severity was enough? People should open an
 editor and look at the function and at its gory insanity. :-P

It is far from obvious that mce_severity() will always say that an error
detected inside the kernel will be fatal.

-Tony


Re: [PATCH] x86, MCE: Flesh out when to panic comment

2014-05-27 Thread Borislav Petkov
On Tue, May 27, 2014 at 09:53:56PM +, Luck, Tony wrote:
 It is far from obvious that mce_severity() will always say that an
 error detected inside the kernel will be fatal.

Oh yeah, it needs a good cleansing rewrite, that's for sure.

And this tolerant check looks fishy to me:

if (s-sev = MCE_UC_SEVERITY  ctx == IN_KERNEL) {
if (panic_on_oops || tolerant  1)
return MCE_PANIC_SEVERITY;
}

since we set it to 1 by default. But I'll look again on a clear head
tomorrow - it is too late here.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] x86, MCE: Flesh out when to panic comment

2014-05-27 Thread Luck, Tony
 And this tolerant check looks fishy to me:

if (s-sev = MCE_UC_SEVERITY  ctx == IN_KERNEL) {
if (panic_on_oops || tolerant  1)
return MCE_PANIC_SEVERITY;
}

 since we set it to 1 by default. But I'll look again on a clear head
 tomorrow - it is too late here.

tolerant level 0 exists - but is somewhat crazy in the opposite direction
from the large values.  Look at the comment in mce.c ... level 0
means always panic if you see a UC error

-Tony


Re: [PATCH] x86, MCE: Flesh out when to panic comment

2014-05-26 Thread Andy Lutomirski
On Mon, May 26, 2014 at 10:51 AM, Borislav Petkov  wrote:
> On Mon, May 26, 2014 at 09:47:38AM -0700, Andy Lutomirski wrote:
>> It may be worth at least pointing out that mce_severity looks at
>> whether we faulted from kernel context. I missed that the first time
>> around because mce_severity doesn't take a pt_regs pointer.
>
> Right, but next time we talk about a different aspect which isn't
> commented on in the handler, we'd have to add to it again, until we've
> rewritten the whole function in pseudo code.
>
> I think simply pointing to the function which decides the fate of the
> machine based on the MCE severity is enough - people can then go and
> stare at it, albeit with some struggle.

Fair enough.

>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, MCE: Flesh out when to panic comment

2014-05-26 Thread Borislav Petkov
On Mon, May 26, 2014 at 09:47:38AM -0700, Andy Lutomirski wrote:
> It may be worth at least pointing out that mce_severity looks at
> whether we faulted from kernel context. I missed that the first time
> around because mce_severity doesn't take a pt_regs pointer.

Right, but next time we talk about a different aspect which isn't
commented on in the handler, we'd have to add to it again, until we've
rewritten the whole function in pseudo code.

I think simply pointing to the function which decides the fate of the
machine based on the MCE severity is enough - people can then go and
stare at it, albeit with some struggle.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, MCE: Flesh out when to panic comment

2014-05-26 Thread Andy Lutomirski
On Mon, May 26, 2014 at 4:06 AM, Borislav Petkov  wrote:
> On Mon, May 26, 2014 at 12:51:10PM +0200, Jiri Kosina wrote:
>> I think the comment is still not explaining the big part of what the
>> discussion was about -- i.e. if it was in kernel context, we always
>> panic.
>
> I thought the pointer to mce_severity was enough? People should open an
> editor and look at the function and at its gory insanity. :-P

It may be worth at least pointing out that mce_severity looks at
whether we faulted from kernel context.  I missed that the first time
around because mce_severity doesn't take a pt_regs pointer.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, MCE: Flesh out when to panic comment

2014-05-26 Thread Borislav Petkov
On Mon, May 26, 2014 at 12:51:10PM +0200, Jiri Kosina wrote:
> I think the comment is still not explaining the big part of what the
> discussion was about -- i.e. if it was in kernel context, we always
> panic.

I thought the pointer to mce_severity was enough? People should open an
editor and look at the function and at its gory insanity. :-P

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, MCE: Flesh out when to panic comment

2014-05-26 Thread Jiri Kosina
On Mon, 26 May 2014, Borislav Petkov wrote:

> On Wed, May 21, 2014 at 03:13:54PM -0700, H. Peter Anvin wrote:
> > Seems like a comment would be in order, though.
> 
> ---
> From: Borislav Petkov 
> Subject: [PATCH] x86, MCE: Flesh out when to panic comment
> 
> Recent discussion (link below) showed that it is not really clear what
> appropriate recovery actions we're taking when in a machine check
> exception. Flesh out the comment which was explaining that with more
> detail.
> 
> Suggested-by: "H. Peter Anvin" 
> Cc: Tony Luck 
> Link: 
> http://lkml.kernel.org/r/calcetrxudj8bknf_m-r4o40xln%2bpnz5tozw0p7n4kqo3qng...@mail.gmail.com
> Signed-off-by: Borislav Petkov 
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c 
> b/arch/x86/kernel/cpu/mcheck/mce.c
> index 68317c80de7f..9f070339b09f 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -1151,10 +1151,14 @@ void do_machine_check(struct pt_regs *regs, long 
> error_code)
>   no_way_out = worst >= MCE_PANIC_SEVERITY;
>  
>   /*
> -  * At insane "tolerant" levels we take no action. Otherwise
> -  * we only die if we have no other choice. For less serious
> -  * issues we try to recover, or limit damage to the current
> -  * process.
> +  * At insane "tolerant" levels we take no action. Otherwise we only die
> +  * if we have no other choice. Which means, we're definitely going to
> +  * panic on unrecoverable, uncontainable errors which would otherwise
> +  * influence machine state and/or cause any type of corruption. The
> +  * decision what do to is done by mce_severity().
> +  *
> +  * For less serious issues we try to recover, or limit damage to the
> +  * current process.
>*/

I think the comment is still not explaining the big part of what the 
discussion was about -- i.e. if it was in kernel context, we always panic.

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, MCE: Flesh out when to panic comment

2014-05-26 Thread Jiri Kosina
On Mon, 26 May 2014, Borislav Petkov wrote:

 On Wed, May 21, 2014 at 03:13:54PM -0700, H. Peter Anvin wrote:
  Seems like a comment would be in order, though.
 
 ---
 From: Borislav Petkov b...@suse.de
 Subject: [PATCH] x86, MCE: Flesh out when to panic comment
 
 Recent discussion (link below) showed that it is not really clear what
 appropriate recovery actions we're taking when in a machine check
 exception. Flesh out the comment which was explaining that with more
 detail.
 
 Suggested-by: H. Peter Anvin h...@zytor.com
 Cc: Tony Luck tony.l...@intel.com
 Link: 
 http://lkml.kernel.org/r/calcetrxudj8bknf_m-r4o40xln%2bpnz5tozw0p7n4kqo3qng...@mail.gmail.com
 Signed-off-by: Borislav Petkov b...@suse.de
 ---
  arch/x86/kernel/cpu/mcheck/mce.c | 12 
  1 file changed, 8 insertions(+), 4 deletions(-)
 
 diff --git a/arch/x86/kernel/cpu/mcheck/mce.c 
 b/arch/x86/kernel/cpu/mcheck/mce.c
 index 68317c80de7f..9f070339b09f 100644
 --- a/arch/x86/kernel/cpu/mcheck/mce.c
 +++ b/arch/x86/kernel/cpu/mcheck/mce.c
 @@ -1151,10 +1151,14 @@ void do_machine_check(struct pt_regs *regs, long 
 error_code)
   no_way_out = worst = MCE_PANIC_SEVERITY;
  
   /*
 -  * At insane tolerant levels we take no action. Otherwise
 -  * we only die if we have no other choice. For less serious
 -  * issues we try to recover, or limit damage to the current
 -  * process.
 +  * At insane tolerant levels we take no action. Otherwise we only die
 +  * if we have no other choice. Which means, we're definitely going to
 +  * panic on unrecoverable, uncontainable errors which would otherwise
 +  * influence machine state and/or cause any type of corruption. The
 +  * decision what do to is done by mce_severity().
 +  *
 +  * For less serious issues we try to recover, or limit damage to the
 +  * current process.
*/

I think the comment is still not explaining the big part of what the 
discussion was about -- i.e. if it was in kernel context, we always panic.

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, MCE: Flesh out when to panic comment

2014-05-26 Thread Borislav Petkov
On Mon, May 26, 2014 at 12:51:10PM +0200, Jiri Kosina wrote:
 I think the comment is still not explaining the big part of what the
 discussion was about -- i.e. if it was in kernel context, we always
 panic.

I thought the pointer to mce_severity was enough? People should open an
editor and look at the function and at its gory insanity. :-P

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, MCE: Flesh out when to panic comment

2014-05-26 Thread Andy Lutomirski
On Mon, May 26, 2014 at 4:06 AM, Borislav Petkov b...@alien8.de wrote:
 On Mon, May 26, 2014 at 12:51:10PM +0200, Jiri Kosina wrote:
 I think the comment is still not explaining the big part of what the
 discussion was about -- i.e. if it was in kernel context, we always
 panic.

 I thought the pointer to mce_severity was enough? People should open an
 editor and look at the function and at its gory insanity. :-P

It may be worth at least pointing out that mce_severity looks at
whether we faulted from kernel context.  I missed that the first time
around because mce_severity doesn't take a pt_regs pointer.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, MCE: Flesh out when to panic comment

2014-05-26 Thread Borislav Petkov
On Mon, May 26, 2014 at 09:47:38AM -0700, Andy Lutomirski wrote:
 It may be worth at least pointing out that mce_severity looks at
 whether we faulted from kernel context. I missed that the first time
 around because mce_severity doesn't take a pt_regs pointer.

Right, but next time we talk about a different aspect which isn't
commented on in the handler, we'd have to add to it again, until we've
rewritten the whole function in pseudo code.

I think simply pointing to the function which decides the fate of the
machine based on the MCE severity is enough - people can then go and
stare at it, albeit with some struggle.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, MCE: Flesh out when to panic comment

2014-05-26 Thread Andy Lutomirski
On Mon, May 26, 2014 at 10:51 AM, Borislav Petkov b...@alien8.de wrote:
 On Mon, May 26, 2014 at 09:47:38AM -0700, Andy Lutomirski wrote:
 It may be worth at least pointing out that mce_severity looks at
 whether we faulted from kernel context. I missed that the first time
 around because mce_severity doesn't take a pt_regs pointer.

 Right, but next time we talk about a different aspect which isn't
 commented on in the handler, we'd have to add to it again, until we've
 rewritten the whole function in pseudo code.

 I think simply pointing to the function which decides the fate of the
 machine based on the MCE severity is enough - people can then go and
 stare at it, albeit with some struggle.

Fair enough.


 --
 Regards/Gruss,
 Boris.

 Sent from a fat crate under my desk. Formatting is fine.
 --



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/