On Tue, Apr 04, 2017 at 12:24:31PM -0500, Yazen Ghannam wrote:
> From: Yazen Ghannam <yazen.ghan...@amd.com>
> 
> We have support for the new SMCA MCA_DE{STAT,ADDR} registers in Linux. So
> we've used these registers in place of MCA_{STATUS,ADDR} on SMCA systems.
> However, the guidance for current implementations of SMCA is to continue
> using MCA_{STATUS,ADDR} and to use MCA_DE{STAT,ADDR} only if a Deferred
> error was not found in the former registers. This also means we shouldn't
> clear MCA_CONFIG[LogDeferredInMcaStat].

Does that mean we should trust what BIOS has set it to?

> 
> Redo the AMD Deferred error interrupt handler to follow the guidance for
> current SMCA systems. Also, don't break after finding the first error.
> 
> Rework __log_error() for clarity.
> 
> Don't clear MCA_CONFIG[LogDeferredInMcaStat] during AMD mcheck init.
> 
> Signed-off-by: Yazen Ghannam <yazen.ghan...@amd.com>
> ---

...

> @@ -832,25 +821,29 @@ asmlinkage __visible void __irq_entry 
> smp_trace_deferred_error_interrupt(void)
>       exiting_ack_irq();
>  }
>  
> +static inline bool is_deferred_error(u64 status)
> +{
> +     return ((status & MCI_STATUS_VAL) && (status & MCI_STATUS_DEFERRED));
> +}
> +
>  /* APIC interrupt handler for deferred errors */
>  static void amd_deferred_error_interrupt(void)
>  {
>       unsigned int bank;
> -     u32 msr_status;
>       u64 status;
>  
>       for (bank = 0; bank < mca_cfg.banks; ++bank) {
> -             msr_status = (mce_flags.smca) ? MSR_AMD64_SMCA_MCx_DESTAT(bank)
> -                                           : msr_ops.status(bank);
> +             rdmsrl(msr_ops.status(bank), status);
>  
> -             rdmsrl(msr_status, status);
> +             if (is_deferred_error(status)) {
> +                     __log_error_deferred(bank, false);
>  
> -             if (!(status & MCI_STATUS_VAL) ||
> -                 !(status & MCI_STATUS_DEFERRED))
> -                     continue;
> +             } else if (mce_flags.smca) {
> +                     rdmsrl(MSR_AMD64_SMCA_MCx_DESTAT(bank), status);
>  
> -             __log_error(bank, true, false, 0);
> -             break;
> +                     if (is_deferred_error(status))
> +                             __log_error_deferred(bank, true);
> +             }

The diff is very hard to parse, lemme paste the whole resulting function
after the patch:

> /* APIC interrupt handler for deferred errors */
> static void amd_deferred_error_interrupt(void)
> {
>         unsigned int bank;
>         u64 status;
> 
>         for (bank = 0; bank < mca_cfg.banks; ++bank) {
>                 rdmsrl(msr_ops.status(bank), status);
> 
>                 if (is_deferred_error(status)) {
>                         __log_error_deferred(bank, false);
> 
>                 } else if (mce_flags.smca) {
>                         rdmsrl(MSR_AMD64_SMCA_MCx_DESTAT(bank), status);
> 
>                         if (is_deferred_error(status))
>                                 __log_error_deferred(bank, true);

So this doesn't sound like the guidance: if we should fallback to the
DE* versions only if a deferred error wasn't found in the usual regs,
the logic should be:

static void __log_error_deferred(unsigned int bank)
{
        u32 msr_stat = msr_ops.status(bank);
        u32 msr_addr = msr_ops.addr(bank);

        /*
         * __log_error() needs to return true to say it has logged successfully 
or false
         * if it hasn't.
         */
        if (__log_error(bank, msr_stat, msr_addr, 0))
                goto out;

        if (!mce_flags.smca)
                return;

        /*
         * Couldn't log a deferred error from the usual regs, fallback to DE*
         * variants.
         */
        msr_stat = MSR_AMD64_SMCA_MCx_DESTAT(bank);
        msr_addr = MSR_AMD64_SMCA_MCx_DEADDR(bank);

        __log_error(bank, msr_stat, msr_addr, 0);

out:
        /* We should still clear MCA_DESTAT even if we used MCA_STATUS. */
        if (mce_flags.smca)
                wrmsrl(MSR_AMD64_SMCA_MCx_DESTAT(bank), 0);
}

and this way you don't need to wiggle around that use_smca_destat thing.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Reply via email to