Hi Borislav,

On 1/10/2019 3:38 AM, Borislav Petkov wrote:
> On Mon, Jan 07, 2019 at 10:37:24AM +0000, S, Shirish wrote:
>> This patch adds threshold quirk applicable for family 15
> Same issue with "This patch" here.
Agree, have avoided the phrase in both patches that originate from this 
discussion.
>
>> in resume path as well, since mce_amd_feature_init()
>> does not have quirks applied when originating from mce_syscore_resume(),
>> resulting in the below message at every successful resume:
>>
>> "[Firmware Bug]: cpu 0, invalid threshold interrupt offset ..."
>>
>> Signed-off-by: Shirish S <shiris...@amd.com>
>> ---
>>   arch/x86/kernel/cpu/mce/amd.c | 34 ++++++++++++++++++++++++++++++++++
>>   1 file changed, 34 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
>> index 89298c8..27cbf66 100644
>> --- a/arch/x86/kernel/cpu/mce/amd.c
>> +++ b/arch/x86/kernel/cpu/mce/amd.c
>> @@ -545,6 +545,34 @@ prepare_threshold_block(unsigned int bank, unsigned int 
>> block, u32 addr,
>>      return offset;
>>   }
>>   
>> +void disable_err_thresholding(struct cpuinfo_x86 *c)
>> +{
>> +    int i;
>> +    u64 hwcr;
>> +    bool need_toggle;
>> +    u32 msrs[] = {
>> +            0x00000413, /* MC4_MISC0 */
>> +            0xc0000408, /* MC4_MISC1 */
>> +    };
>> +
>> +    if (c->x86_model >= 0x10 && c->x86_model <= 0x7f) {
> You can save yourself an indentation level by reversing the logic here:
>
>       if (c->x86 != 0x15)
>               return;
>
> Also, I'm wondering if you simply can't do
>
>       if (c->x86_model < 0x10)
>               return;
>
> The assumption being that all the models - even after 0x7f - are highly
> unlikely to get MC4_MISC thresholding supported, all of a sudden. Might
> wanna run it by HW guys first though.

Perhaps no need to check the model altogether, this is applicable to 
entire family 15, since there is no 8th Gen.

>
>> +            rdmsrl(MSR_K7_HWCR, hwcr);
>> +
>> +            /* McStatusWrEn has to be set */
>> +            need_toggle = !(hwcr & BIT(18));
>> +
>> +            if (need_toggle)
>> +                    wrmsrl(MSR_K7_HWCR, hwcr | BIT(18));
>> +
>> +            /* Clear CntP bit safely */
>> +            for (i = 0; i < ARRAY_SIZE(msrs); i++)
>> +                    msr_clear_bit(msrs[i], 62);
>> +
>> +            /* restore old settings */
>> +            if (need_toggle)
>> +                    wrmsrl(MSR_K7_HWCR, hwcr);
>> +    }
> So you copied the same code from __mcheck_cpu_apply_quirks().
>
> No.
>
> In a first patch, you carve that CntP clearing code in a separate
> function disable_err_thresholding() like you've done before.
Agree, first patch here:  https://lkml.org/lkml/2019/1/10/70
> Then, in a second patch, you call it from mce/amd.c and you move all the
> family/model checks inside the function so that you have a sole
>
>       disable_err_thresholding();
>
> calls where you need them.
>
> You don't have to pass in struct cpuinfo_x86 *c - you can use
> boot_cpu_data in the function.
Agree, second patch here: https://lkml.org/lkml/2019/1/10/72
>
>> +}
>>   /* cpu init entry point, called from mce.c with preempt off */
>>   void mce_amd_feature_init(struct cpuinfo_x86 *c)
>>   {
>> @@ -552,6 +580,12 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
>>      unsigned int bank, block, cpu = smp_processor_id();
>>      int offset = -1;
>>   
>> +    /* Disable error thresholding bank in S3 resume path as well,
> What S3 resume path? That's the CPU init path.

I have also put in the commit message, but for discussion sake, there 
are 2 code paths leading to mce_amd_feature_init() as below.
1) S5 -> S0: (boot)
secondary_startup_64 -> start_kernel -> identify_boot_cpu ->
identify_cpu ->
mcheck_cpu_init (calls  __mcheck_cpu_apply_quirks before) ->
mce_amd_feature_init

2) S3 -> S0: (resume)
syscore_resume -> mce_syscore_resume -> mce_amd_feature_init

I have summarized this sequence after capturing the call trace in both 
scenarios.

Its clear that the quirks are not applied during S3-> S0 transition, 
hence the patch.

I cannot move the quirk to mce_amd_feature_init() since amd.c is guarded 
by CONFIG_X86_MCE_AMD and the quirk should be applicable in case the 
config is off via

__mcheck_cpu_apply_quirks() at boot.

> Also, kernel comments style is:
>
>       /*
>        * A sentence ending with a full-stop.
>        * Another sentence. ...
>        * More sentences. ...
>        */
>
Agree, followed it in the above mentioned patches.

Thanks.

Regards,

Shirish S

> Good luck!
>
> :-)
>
-- 
Regards,
Shirish S

Reply via email to