> -----Original Message-----
> From: linux-edac-ow...@vger.kernel.org <linux-edac-
> ow...@vger.kernel.org> On Behalf Of Borislav Petkov
> Sent: Monday, March 26, 2018 3:35 PM
> To: Ghannam, Yazen <yazen.ghan...@amd.com>
> Cc: linux-e...@vger.kernel.org; linux-kernel@vger.kernel.org;
> tony.l...@intel.com; x...@kernel.org
> Subject: Re: [PATCH 2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND}
> register contents
> 
> On Mon, Mar 26, 2018 at 02:15:26PM -0500, Yazen Ghannam wrote:
> > From: Yazen Ghannam <yazen.ghan...@amd.com>
> >
> > The Intel SDM and AMD APM both state that the contents of the
> MCA_ADDR
> > register should be saved if MCA_STATUS[ADDRV] is set. The same applies
> > to MCA_MISC and MCA_SYND (on SMCA systems) and their respective valid
> > bits.
> >
> > However, the Fam17h Processor Programming Reference states
> > "Error handlers should save the values in MCA_ADDR, MCA_MISC0, and
> > MCA_SYND even if MCA_STATUS[AddrV], MCA_STATUS[MiscV], and
> > MCA_STATUS[SyndV] are zero."
> 
> Well, then you can't remove valid bit checks for older families. This
> sounds like F17h only.
> 
> If so, it better be abstracted away cleanly and not changing the generic
> code.
> 

Sure, I can do that. But I didn't think it was necessary because it doesn't hurt
to read the registers whether or not the valid bits are set.

> >
> > This is to ensure that all MCA state information is collected even if
> > software cannot act upon it (because the valid bits are cleared).
> >
> > So always save the auxiliary MCA register contents even if the valid
> > bits are cleared. This should not affect error processing because
> > software should still check the valid bits before using the register
> > contents for error processing.
> >
> > Also, print MCA_{ADDR,MISC,SYND} even if their valid bits are not set.
> > Printing from EDAC/mce_amd is included here since we want to do this on
> > AMD systems.
> >
> > Signed-off-by: Yazen Ghannam <yazen.ghan...@amd.com>
> > ---
> >  arch/x86/kernel/cpu/mcheck/mce.c     | 23 +++++++----------------
> >  arch/x86/kernel/cpu/mcheck/mce_amd.c | 10 +++-------
> >  drivers/edac/mce_amd.c               | 10 +++-------
> >  3 files changed, 13 insertions(+), 30 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c
> b/arch/x86/kernel/cpu/mcheck/mce.c
> > index 42cf2880d0ed..a556e1cadfbc 100644
> > --- a/arch/x86/kernel/cpu/mcheck/mce.c
> > +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> > @@ -248,19 +248,14 @@ static void __print_mce(struct mce *m)
> >     }
> >
> >     pr_emerg(HW_ERR "TSC %llx ", m->tsc);
> > -   if (m->addr)
> > -           pr_cont("ADDR %llx ", m->addr);
> > -   if (m->misc)
> > -           pr_cont("MISC %llx ", m->misc);
> > +   pr_cont("ADDR %016llx ", m->addr);
> > +   pr_cont("MISC %016llx\n", m->misc);
> 
> You simply can't do this - this is generic code, not AMD only.
> 

I can change this if you'd like. I just thought it would be simpler to
make the change here since we're just printing the values.

Thanks,
Yazen

Reply via email to