RE: [PATCH 2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND} register contents

2018-03-27 Thread Ghannam, Yazen
> -Original Message-
> From: Luck, Tony <tony.l...@intel.com>
> Sent: Monday, March 26, 2018 4:28 PM
> To: Borislav Petkov <b...@alien8.de>
> Cc: Ghannam, Yazen <yazen.ghan...@amd.com>; linux-
> e...@vger.kernel.org; linux-kernel@vger.kernel.org; x...@kernel.org
> Subject: Re: [PATCH 2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND}
> register contents
> 
> On Mon, Mar 26, 2018 at 10:09:55PM +0200, Borislav Petkov wrote:
> > On Mon, Mar 26, 2018 at 08:05:37PM +, Ghannam, Yazen wrote:
> > > Sure, I can do that. But I didn't think it was necessary because it 
> > > doesn't
> hurt
> > > to read the registers whether or not the valid bits are set.
> >
> > No, this needs to be AMD-specific because it will confuse people using
> > Intel machines.
> 
> Worse than confusion it may even cause a crash on Intel. Quoting the
> Intel SDM:
> 
>   15.3.2.3 IA32_MCi_ADDR MSRs
> 
>   The IA32_MCi_ADDR MSR contains the address of the code or data memory
>   location that produced the machine- check error if the ADDRV flag in
>   the IA32_MCi_STATUS register is set (see Section 15-7, “IA32_MCi_ADDR
>   MSR”).  The IA32_MCi_ADDR register is either not implemented or
>   contains no address if the ADDRV flag in the IA32_MCi_STATUS register
>   is clear. When not implemented in the processor, all reads and writes
>   to this MSR will cause a general protection exception.
> 
> Ditto for the MISC register.  Please don't read them unless
> the ADDRV/MISCV bits are set in the corresponding STATUS
> register.
> 

Okay, then we won't do this.

On AMD, the registers are implemented if MCA is implemented. They'll just
be read-as-zero if not available for some reason.

Thanks,
Yazen


RE: [PATCH 2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND} register contents

2018-03-27 Thread Ghannam, Yazen
> -Original Message-
> From: Luck, Tony 
> Sent: Monday, March 26, 2018 4:28 PM
> To: Borislav Petkov 
> Cc: Ghannam, Yazen ; linux-
> e...@vger.kernel.org; linux-kernel@vger.kernel.org; x...@kernel.org
> Subject: Re: [PATCH 2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND}
> register contents
> 
> On Mon, Mar 26, 2018 at 10:09:55PM +0200, Borislav Petkov wrote:
> > On Mon, Mar 26, 2018 at 08:05:37PM +, Ghannam, Yazen wrote:
> > > Sure, I can do that. But I didn't think it was necessary because it 
> > > doesn't
> hurt
> > > to read the registers whether or not the valid bits are set.
> >
> > No, this needs to be AMD-specific because it will confuse people using
> > Intel machines.
> 
> Worse than confusion it may even cause a crash on Intel. Quoting the
> Intel SDM:
> 
>   15.3.2.3 IA32_MCi_ADDR MSRs
> 
>   The IA32_MCi_ADDR MSR contains the address of the code or data memory
>   location that produced the machine- check error if the ADDRV flag in
>   the IA32_MCi_STATUS register is set (see Section 15-7, “IA32_MCi_ADDR
>   MSR”).  The IA32_MCi_ADDR register is either not implemented or
>   contains no address if the ADDRV flag in the IA32_MCi_STATUS register
>   is clear. When not implemented in the processor, all reads and writes
>   to this MSR will cause a general protection exception.
> 
> Ditto for the MISC register.  Please don't read them unless
> the ADDRV/MISCV bits are set in the corresponding STATUS
> register.
> 

Okay, then we won't do this.

On AMD, the registers are implemented if MCA is implemented. They'll just
be read-as-zero if not available for some reason.

Thanks,
Yazen


Re: [PATCH 2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND} register contents

2018-03-26 Thread Luck, Tony
On Mon, Mar 26, 2018 at 10:09:55PM +0200, Borislav Petkov wrote:
> On Mon, Mar 26, 2018 at 08:05:37PM +, Ghannam, Yazen wrote:
> > Sure, I can do that. But I didn't think it was necessary because it doesn't 
> > hurt
> > to read the registers whether or not the valid bits are set.
> 
> No, this needs to be AMD-specific because it will confuse people using
> Intel machines.

Worse than confusion it may even cause a crash on Intel. Quoting the
Intel SDM:

  15.3.2.3 IA32_MCi_ADDR MSRs

  The IA32_MCi_ADDR MSR contains the address of the code or data memory
  location that produced the machine- check error if the ADDRV flag in
  the IA32_MCi_STATUS register is set (see Section 15-7, “IA32_MCi_ADDR
  MSR”).  The IA32_MCi_ADDR register is either not implemented or
  contains no address if the ADDRV flag in the IA32_MCi_STATUS register
  is clear. When not implemented in the processor, all reads and writes
  to this MSR will cause a general protection exception.

Ditto for the MISC register.  Please don't read them unless
the ADDRV/MISCV bits are set in the corresponding STATUS
register.

Thanks

-Tony


Re: [PATCH 2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND} register contents

2018-03-26 Thread Luck, Tony
On Mon, Mar 26, 2018 at 10:09:55PM +0200, Borislav Petkov wrote:
> On Mon, Mar 26, 2018 at 08:05:37PM +, Ghannam, Yazen wrote:
> > Sure, I can do that. But I didn't think it was necessary because it doesn't 
> > hurt
> > to read the registers whether or not the valid bits are set.
> 
> No, this needs to be AMD-specific because it will confuse people using
> Intel machines.

Worse than confusion it may even cause a crash on Intel. Quoting the
Intel SDM:

  15.3.2.3 IA32_MCi_ADDR MSRs

  The IA32_MCi_ADDR MSR contains the address of the code or data memory
  location that produced the machine- check error if the ADDRV flag in
  the IA32_MCi_STATUS register is set (see Section 15-7, “IA32_MCi_ADDR
  MSR”).  The IA32_MCi_ADDR register is either not implemented or
  contains no address if the ADDRV flag in the IA32_MCi_STATUS register
  is clear. When not implemented in the processor, all reads and writes
  to this MSR will cause a general protection exception.

Ditto for the MISC register.  Please don't read them unless
the ADDRV/MISCV bits are set in the corresponding STATUS
register.

Thanks

-Tony


Re: [PATCH 2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND} register contents

2018-03-26 Thread Borislav Petkov
On Mon, Mar 26, 2018 at 08:05:37PM +, Ghannam, Yazen wrote:
> Sure, I can do that. But I didn't think it was necessary because it doesn't 
> hurt
> to read the registers whether or not the valid bits are set.

No, this needs to be AMD-specific because it will confuse people using
Intel machines.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH 2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND} register contents

2018-03-26 Thread Borislav Petkov
On Mon, Mar 26, 2018 at 08:05:37PM +, Ghannam, Yazen wrote:
> Sure, I can do that. But I didn't think it was necessary because it doesn't 
> hurt
> to read the registers whether or not the valid bits are set.

No, this needs to be AMD-specific because it will confuse people using
Intel machines.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


RE: [PATCH 2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND} register contents

2018-03-26 Thread Ghannam, Yazen
> -Original Message-
> From: linux-edac-ow...@vger.kernel.org  ow...@vger.kernel.org> On Behalf Of Borislav Petkov
> Sent: Monday, March 26, 2018 3:35 PM
> To: Ghannam, Yazen <yazen.ghan...@amd.com>
> Cc: linux-e...@vger.kernel.org; linux-kernel@vger.kernel.org;
> tony.l...@intel.com; x...@kernel.org
> Subject: Re: [PATCH 2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND}
> register contents
> 
> On Mon, Mar 26, 2018 at 02:15:26PM -0500, Yazen Ghannam wrote:
> > From: Yazen Ghannam <yazen.ghan...@amd.com>
> >
> > The Intel SDM and AMD APM both state that the contents of the
> MCA_ADDR
> > register should be saved if MCA_STATUS[ADDRV] is set. The same applies
> > to MCA_MISC and MCA_SYND (on SMCA systems) and their respective valid
> > bits.
> >
> > However, the Fam17h Processor Programming Reference states
> > "Error handlers should save the values in MCA_ADDR, MCA_MISC0, and
> > MCA_SYND even if MCA_STATUS[AddrV], MCA_STATUS[MiscV], and
> > MCA_STATUS[SyndV] are zero."
> 
> Well, then you can't remove valid bit checks for older families. This
> sounds like F17h only.
> 
> If so, it better be abstracted away cleanly and not changing the generic
> code.
> 

Sure, I can do that. But I didn't think it was necessary because it doesn't hurt
to read the registers whether or not the valid bits are set.

> >
> > This is to ensure that all MCA state information is collected even if
> > software cannot act upon it (because the valid bits are cleared).
> >
> > So always save the auxiliary MCA register contents even if the valid
> > bits are cleared. This should not affect error processing because
> > software should still check the valid bits before using the register
> > contents for error processing.
> >
> > Also, print MCA_{ADDR,MISC,SYND} even if their valid bits are not set.
> > Printing from EDAC/mce_amd is included here since we want to do this on
> > AMD systems.
> >
> > Signed-off-by: Yazen Ghannam <yazen.ghan...@amd.com>
> > ---
> >  arch/x86/kernel/cpu/mcheck/mce.c | 23 +++
> >  arch/x86/kernel/cpu/mcheck/mce_amd.c | 10 +++---
> >  drivers/edac/mce_amd.c   | 10 +++---
> >  3 files changed, 13 insertions(+), 30 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c
> b/arch/x86/kernel/cpu/mcheck/mce.c
> > index 42cf2880d0ed..a556e1cadfbc 100644
> > --- a/arch/x86/kernel/cpu/mcheck/mce.c
> > +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> > @@ -248,19 +248,14 @@ static void __print_mce(struct mce *m)
> > }
> >
> > pr_emerg(HW_ERR "TSC %llx ", m->tsc);
> > -   if (m->addr)
> > -   pr_cont("ADDR %llx ", m->addr);
> > -   if (m->misc)
> > -   pr_cont("MISC %llx ", m->misc);
> > +   pr_cont("ADDR %016llx ", m->addr);
> > +   pr_cont("MISC %016llx\n", m->misc);
> 
> You simply can't do this - this is generic code, not AMD only.
> 

I can change this if you'd like. I just thought it would be simpler to
make the change here since we're just printing the values.

Thanks,
Yazen


RE: [PATCH 2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND} register contents

2018-03-26 Thread Ghannam, Yazen
> -Original Message-
> From: linux-edac-ow...@vger.kernel.org  ow...@vger.kernel.org> On Behalf Of Borislav Petkov
> Sent: Monday, March 26, 2018 3:35 PM
> To: Ghannam, Yazen 
> Cc: linux-e...@vger.kernel.org; linux-kernel@vger.kernel.org;
> tony.l...@intel.com; x...@kernel.org
> Subject: Re: [PATCH 2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND}
> register contents
> 
> On Mon, Mar 26, 2018 at 02:15:26PM -0500, Yazen Ghannam wrote:
> > From: Yazen Ghannam 
> >
> > The Intel SDM and AMD APM both state that the contents of the
> MCA_ADDR
> > register should be saved if MCA_STATUS[ADDRV] is set. The same applies
> > to MCA_MISC and MCA_SYND (on SMCA systems) and their respective valid
> > bits.
> >
> > However, the Fam17h Processor Programming Reference states
> > "Error handlers should save the values in MCA_ADDR, MCA_MISC0, and
> > MCA_SYND even if MCA_STATUS[AddrV], MCA_STATUS[MiscV], and
> > MCA_STATUS[SyndV] are zero."
> 
> Well, then you can't remove valid bit checks for older families. This
> sounds like F17h only.
> 
> If so, it better be abstracted away cleanly and not changing the generic
> code.
> 

Sure, I can do that. But I didn't think it was necessary because it doesn't hurt
to read the registers whether or not the valid bits are set.

> >
> > This is to ensure that all MCA state information is collected even if
> > software cannot act upon it (because the valid bits are cleared).
> >
> > So always save the auxiliary MCA register contents even if the valid
> > bits are cleared. This should not affect error processing because
> > software should still check the valid bits before using the register
> > contents for error processing.
> >
> > Also, print MCA_{ADDR,MISC,SYND} even if their valid bits are not set.
> > Printing from EDAC/mce_amd is included here since we want to do this on
> > AMD systems.
> >
> > Signed-off-by: Yazen Ghannam 
> > ---
> >  arch/x86/kernel/cpu/mcheck/mce.c | 23 +++
> >  arch/x86/kernel/cpu/mcheck/mce_amd.c | 10 +++---
> >  drivers/edac/mce_amd.c   | 10 +++---
> >  3 files changed, 13 insertions(+), 30 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c
> b/arch/x86/kernel/cpu/mcheck/mce.c
> > index 42cf2880d0ed..a556e1cadfbc 100644
> > --- a/arch/x86/kernel/cpu/mcheck/mce.c
> > +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> > @@ -248,19 +248,14 @@ static void __print_mce(struct mce *m)
> > }
> >
> > pr_emerg(HW_ERR "TSC %llx ", m->tsc);
> > -   if (m->addr)
> > -   pr_cont("ADDR %llx ", m->addr);
> > -   if (m->misc)
> > -   pr_cont("MISC %llx ", m->misc);
> > +   pr_cont("ADDR %016llx ", m->addr);
> > +   pr_cont("MISC %016llx\n", m->misc);
> 
> You simply can't do this - this is generic code, not AMD only.
> 

I can change this if you'd like. I just thought it would be simpler to
make the change here since we're just printing the values.

Thanks,
Yazen


Re: [PATCH 2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND} register contents

2018-03-26 Thread Borislav Petkov
On Mon, Mar 26, 2018 at 02:15:26PM -0500, Yazen Ghannam wrote:
> From: Yazen Ghannam 
> 
> The Intel SDM and AMD APM both state that the contents of the MCA_ADDR
> register should be saved if MCA_STATUS[ADDRV] is set. The same applies
> to MCA_MISC and MCA_SYND (on SMCA systems) and their respective valid
> bits.
> 
> However, the Fam17h Processor Programming Reference states
> "Error handlers should save the values in MCA_ADDR, MCA_MISC0, and
> MCA_SYND even if MCA_STATUS[AddrV], MCA_STATUS[MiscV], and
> MCA_STATUS[SyndV] are zero."

Well, then you can't remove valid bit checks for older families. This
sounds like F17h only.

If so, it better be abstracted away cleanly and not changing the generic
code.

> 
> This is to ensure that all MCA state information is collected even if
> software cannot act upon it (because the valid bits are cleared).
> 
> So always save the auxiliary MCA register contents even if the valid
> bits are cleared. This should not affect error processing because
> software should still check the valid bits before using the register
> contents for error processing.
> 
> Also, print MCA_{ADDR,MISC,SYND} even if their valid bits are not set.
> Printing from EDAC/mce_amd is included here since we want to do this on
> AMD systems.
> 
> Signed-off-by: Yazen Ghannam 
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c | 23 +++
>  arch/x86/kernel/cpu/mcheck/mce_amd.c | 10 +++---
>  drivers/edac/mce_amd.c   | 10 +++---
>  3 files changed, 13 insertions(+), 30 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c 
> b/arch/x86/kernel/cpu/mcheck/mce.c
> index 42cf2880d0ed..a556e1cadfbc 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -248,19 +248,14 @@ static void __print_mce(struct mce *m)
>   }
>  
>   pr_emerg(HW_ERR "TSC %llx ", m->tsc);
> - if (m->addr)
> - pr_cont("ADDR %llx ", m->addr);
> - if (m->misc)
> - pr_cont("MISC %llx ", m->misc);
> + pr_cont("ADDR %016llx ", m->addr);
> + pr_cont("MISC %016llx\n", m->misc);

You simply can't do this - this is generic code, not AMD only.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH 2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND} register contents

2018-03-26 Thread Borislav Petkov
On Mon, Mar 26, 2018 at 02:15:26PM -0500, Yazen Ghannam wrote:
> From: Yazen Ghannam 
> 
> The Intel SDM and AMD APM both state that the contents of the MCA_ADDR
> register should be saved if MCA_STATUS[ADDRV] is set. The same applies
> to MCA_MISC and MCA_SYND (on SMCA systems) and their respective valid
> bits.
> 
> However, the Fam17h Processor Programming Reference states
> "Error handlers should save the values in MCA_ADDR, MCA_MISC0, and
> MCA_SYND even if MCA_STATUS[AddrV], MCA_STATUS[MiscV], and
> MCA_STATUS[SyndV] are zero."

Well, then you can't remove valid bit checks for older families. This
sounds like F17h only.

If so, it better be abstracted away cleanly and not changing the generic
code.

> 
> This is to ensure that all MCA state information is collected even if
> software cannot act upon it (because the valid bits are cleared).
> 
> So always save the auxiliary MCA register contents even if the valid
> bits are cleared. This should not affect error processing because
> software should still check the valid bits before using the register
> contents for error processing.
> 
> Also, print MCA_{ADDR,MISC,SYND} even if their valid bits are not set.
> Printing from EDAC/mce_amd is included here since we want to do this on
> AMD systems.
> 
> Signed-off-by: Yazen Ghannam 
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c | 23 +++
>  arch/x86/kernel/cpu/mcheck/mce_amd.c | 10 +++---
>  drivers/edac/mce_amd.c   | 10 +++---
>  3 files changed, 13 insertions(+), 30 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c 
> b/arch/x86/kernel/cpu/mcheck/mce.c
> index 42cf2880d0ed..a556e1cadfbc 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -248,19 +248,14 @@ static void __print_mce(struct mce *m)
>   }
>  
>   pr_emerg(HW_ERR "TSC %llx ", m->tsc);
> - if (m->addr)
> - pr_cont("ADDR %llx ", m->addr);
> - if (m->misc)
> - pr_cont("MISC %llx ", m->misc);
> + pr_cont("ADDR %016llx ", m->addr);
> + pr_cont("MISC %016llx\n", m->misc);

You simply can't do this - this is generic code, not AMD only.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


[PATCH 2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND} register contents

2018-03-26 Thread Yazen Ghannam
From: Yazen Ghannam 

The Intel SDM and AMD APM both state that the contents of the MCA_ADDR
register should be saved if MCA_STATUS[ADDRV] is set. The same applies
to MCA_MISC and MCA_SYND (on SMCA systems) and their respective valid
bits.

However, the Fam17h Processor Programming Reference states
"Error handlers should save the values in MCA_ADDR, MCA_MISC0, and
MCA_SYND even if MCA_STATUS[AddrV], MCA_STATUS[MiscV], and
MCA_STATUS[SyndV] are zero."

This is to ensure that all MCA state information is collected even if
software cannot act upon it (because the valid bits are cleared).

So always save the auxiliary MCA register contents even if the valid
bits are cleared. This should not affect error processing because
software should still check the valid bits before using the register
contents for error processing.

Also, print MCA_{ADDR,MISC,SYND} even if their valid bits are not set.
Printing from EDAC/mce_amd is included here since we want to do this on
AMD systems.

Signed-off-by: Yazen Ghannam 
---
 arch/x86/kernel/cpu/mcheck/mce.c | 23 +++
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 10 +++---
 drivers/edac/mce_amd.c   | 10 +++---
 3 files changed, 13 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 42cf2880d0ed..a556e1cadfbc 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -248,19 +248,14 @@ static void __print_mce(struct mce *m)
}
 
pr_emerg(HW_ERR "TSC %llx ", m->tsc);
-   if (m->addr)
-   pr_cont("ADDR %llx ", m->addr);
-   if (m->misc)
-   pr_cont("MISC %llx ", m->misc);
+   pr_cont("ADDR %016llx ", m->addr);
+   pr_cont("MISC %016llx\n", m->misc);
 
if (mce_flags.smca) {
-   if (m->synd)
-   pr_cont("SYND %llx ", m->synd);
-   if (m->ipid)
-   pr_cont("IPID %llx ", m->ipid);
+   pr_emerg(HW_ERR "IPID %016llx ", m->ipid);
+   pr_cont("SYND %016llx\n", m->synd);
}
 
-   pr_cont("\n");
/*
 * Note this output is parsed by external tools and old fields
 * should not be changed.
@@ -639,12 +634,10 @@ static struct notifier_block mce_default_nb = {
  */
 static void mce_read_aux(struct mce *m, int i)
 {
-   if (m->status & MCI_STATUS_MISCV)
-   m->misc = mce_rdmsrl(msr_ops.misc(i));
+   m->misc = mce_rdmsrl(msr_ops.misc(i));
+   m->addr = mce_rdmsrl(msr_ops.addr(i));
 
if (m->status & MCI_STATUS_ADDRV) {
-   m->addr = mce_rdmsrl(msr_ops.addr(i));
-
/*
 * Mask the reported address by the reported granularity.
 */
@@ -667,9 +660,7 @@ static void mce_read_aux(struct mce *m, int i)
 
if (mce_flags.smca) {
m->ipid = mce_rdmsrl(MSR_AMD64_SMCA_MCx_IPID(i));
-
-   if (m->status & MCI_STATUS_SYNDV)
-   m->synd = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND(i));
+   m->synd = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND(i));
}
 }
 
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index f7666eef4a87..5a37ae704578 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -799,13 +799,12 @@ static void __log_error(unsigned int bank, u64 status, 
u64 addr, u64 misc)
mce_setup();
 
m.status = status;
+   m.addr   = addr;
m.misc   = misc;
m.bank   = bank;
m.tsc= rdtsc();
 
if (m.status & MCI_STATUS_ADDRV) {
-   m.addr = addr;
-
/*
 * Extract [55:] where lsb is the least significant
 * *valid* bit of the address bits.
@@ -819,9 +818,7 @@ static void __log_error(unsigned int bank, u64 status, u64 
addr, u64 misc)
 
if (mce_flags.smca) {
rdmsrl(MSR_AMD64_SMCA_MCx_IPID(bank), m.ipid);
-
-   if (m.status & MCI_STATUS_SYNDV)
-   rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m.synd);
+   rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m.synd);
}
 
mce_log();
@@ -849,8 +846,7 @@ _log_error_bank(unsigned int bank, u32 msr_stat, u32 
msr_addr, u64 misc)
if (!(status & MCI_STATUS_VAL))
return false;
 
-   if (status & MCI_STATUS_ADDRV)
-   rdmsrl(msr_addr, addr);
+   rdmsrl(msr_addr, addr);
 
__log_error(bank, status, addr, misc);
 
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index 2ab4d61ee47e..004425cc8ddf 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -990,16 +990,12 @@ amd_decode_mce(struct notifier_block *nb, unsigned long 
val, void *data)
 
pr_cont("]: 0x%016llx\n", m->status);
 
-   if (m->status & MCI_STATUS_ADDRV)
-   

[PATCH 2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND} register contents

2018-03-26 Thread Yazen Ghannam
From: Yazen Ghannam 

The Intel SDM and AMD APM both state that the contents of the MCA_ADDR
register should be saved if MCA_STATUS[ADDRV] is set. The same applies
to MCA_MISC and MCA_SYND (on SMCA systems) and their respective valid
bits.

However, the Fam17h Processor Programming Reference states
"Error handlers should save the values in MCA_ADDR, MCA_MISC0, and
MCA_SYND even if MCA_STATUS[AddrV], MCA_STATUS[MiscV], and
MCA_STATUS[SyndV] are zero."

This is to ensure that all MCA state information is collected even if
software cannot act upon it (because the valid bits are cleared).

So always save the auxiliary MCA register contents even if the valid
bits are cleared. This should not affect error processing because
software should still check the valid bits before using the register
contents for error processing.

Also, print MCA_{ADDR,MISC,SYND} even if their valid bits are not set.
Printing from EDAC/mce_amd is included here since we want to do this on
AMD systems.

Signed-off-by: Yazen Ghannam 
---
 arch/x86/kernel/cpu/mcheck/mce.c | 23 +++
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 10 +++---
 drivers/edac/mce_amd.c   | 10 +++---
 3 files changed, 13 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 42cf2880d0ed..a556e1cadfbc 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -248,19 +248,14 @@ static void __print_mce(struct mce *m)
}
 
pr_emerg(HW_ERR "TSC %llx ", m->tsc);
-   if (m->addr)
-   pr_cont("ADDR %llx ", m->addr);
-   if (m->misc)
-   pr_cont("MISC %llx ", m->misc);
+   pr_cont("ADDR %016llx ", m->addr);
+   pr_cont("MISC %016llx\n", m->misc);
 
if (mce_flags.smca) {
-   if (m->synd)
-   pr_cont("SYND %llx ", m->synd);
-   if (m->ipid)
-   pr_cont("IPID %llx ", m->ipid);
+   pr_emerg(HW_ERR "IPID %016llx ", m->ipid);
+   pr_cont("SYND %016llx\n", m->synd);
}
 
-   pr_cont("\n");
/*
 * Note this output is parsed by external tools and old fields
 * should not be changed.
@@ -639,12 +634,10 @@ static struct notifier_block mce_default_nb = {
  */
 static void mce_read_aux(struct mce *m, int i)
 {
-   if (m->status & MCI_STATUS_MISCV)
-   m->misc = mce_rdmsrl(msr_ops.misc(i));
+   m->misc = mce_rdmsrl(msr_ops.misc(i));
+   m->addr = mce_rdmsrl(msr_ops.addr(i));
 
if (m->status & MCI_STATUS_ADDRV) {
-   m->addr = mce_rdmsrl(msr_ops.addr(i));
-
/*
 * Mask the reported address by the reported granularity.
 */
@@ -667,9 +660,7 @@ static void mce_read_aux(struct mce *m, int i)
 
if (mce_flags.smca) {
m->ipid = mce_rdmsrl(MSR_AMD64_SMCA_MCx_IPID(i));
-
-   if (m->status & MCI_STATUS_SYNDV)
-   m->synd = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND(i));
+   m->synd = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND(i));
}
 }
 
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index f7666eef4a87..5a37ae704578 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -799,13 +799,12 @@ static void __log_error(unsigned int bank, u64 status, 
u64 addr, u64 misc)
mce_setup();
 
m.status = status;
+   m.addr   = addr;
m.misc   = misc;
m.bank   = bank;
m.tsc= rdtsc();
 
if (m.status & MCI_STATUS_ADDRV) {
-   m.addr = addr;
-
/*
 * Extract [55:] where lsb is the least significant
 * *valid* bit of the address bits.
@@ -819,9 +818,7 @@ static void __log_error(unsigned int bank, u64 status, u64 
addr, u64 misc)
 
if (mce_flags.smca) {
rdmsrl(MSR_AMD64_SMCA_MCx_IPID(bank), m.ipid);
-
-   if (m.status & MCI_STATUS_SYNDV)
-   rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m.synd);
+   rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m.synd);
}
 
mce_log();
@@ -849,8 +846,7 @@ _log_error_bank(unsigned int bank, u32 msr_stat, u32 
msr_addr, u64 misc)
if (!(status & MCI_STATUS_VAL))
return false;
 
-   if (status & MCI_STATUS_ADDRV)
-   rdmsrl(msr_addr, addr);
+   rdmsrl(msr_addr, addr);
 
__log_error(bank, status, addr, misc);
 
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index 2ab4d61ee47e..004425cc8ddf 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -990,16 +990,12 @@ amd_decode_mce(struct notifier_block *nb, unsigned long 
val, void *data)
 
pr_cont("]: 0x%016llx\n", m->status);
 
-   if (m->status & MCI_STATUS_ADDRV)
-   pr_emerg(HW_ERR "Error Addr: 0x%016llx\n", m->addr);
+