> It's unclear (to lil ole me) what the end-user-visible effects of this
> are.
>
> Could we please have a description of that?  So a) people can
> understand your decision to cc:stable and b) people whose kernels are
> misbehaving can use your description to decide whether your patch might
> fix the issue their users are reporting.

Ingo already applied this to the tip tree, so too late to fix the commit 
message :-(

A very, very, unlucky end user with a system that supports machine check 
recovery
(Xeon E7, or Xeon-SP-platinum) that has recovered from one or more uncorrected
memory errors (lucky so far) might find a subsequent uncorrected memory error 
flagged
as fatal because the machine check bank that should log the error is already 
occupied
by a log caused by a speculative access to one of the earlier uncorrected 
errors (the
unlucky part).

We haven't seen this happen at the Linux OS level, but it is a theoretical 
possibility.
[Some BIOS that map physical memory 1:1 have seen this when doing eMCA 
processing
for the first error ... as soon as they load the address of the error from the 
MCi_ADDR
register they are vulnerable to some speculative access dereferencing the 
register with 
the address and setting the overflow bit in the machine check bank that still 
holds the
original log].

-Tony

Reply via email to