On 7/27/2022 12:24 PM, Jane Chu wrote: > On 7/27/2022 11:56 AM, Dan Williams wrote: >> Jane Chu wrote: >>> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine >>> poison granularity") that changed nfit_handle_mce() callback to report >>> badrange according to 1ULL << MCI_MISC_ADDR_LSB(mce->misc), it's been >>> discovered that the mce->misc LSB field is 0x1000 bytes, hence injecting >>> 2 back-to-back poisons and the driver ends up logging 8 badblocks, >>> because 0x1000 bytes is 8 512-byte. >>> >>> Dan Williams noticed that apei_mce_report_mem_error() hardcode >>> the LSB field to PAGE_SHIFT instead of consulting the input >>> struct cper_sec_mem_err record. So change to rely on hardware whenever >>> support is available. >>> >>> Link: >>> https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8...@oracle.com >>> >>> >>> Reviewed-by: Dan Williams <dan.j.willi...@intel.com> >>> Signed-off-by: Jane Chu <jane....@oracle.com> >>> --- >>> arch/x86/kernel/cpu/mce/apei.c | 14 +++++++++++++- >>> 1 file changed, 13 insertions(+), 1 deletion(-) >>> >>> diff --git a/arch/x86/kernel/cpu/mce/apei.c >>> b/arch/x86/kernel/cpu/mce/apei.c >>> index 717192915f28..26d63818b2de 100644 >>> --- a/arch/x86/kernel/cpu/mce/apei.c >>> +++ b/arch/x86/kernel/cpu/mce/apei.c >>> @@ -29,15 +29,27 @@ >>> void apei_mce_report_mem_error(int severity, struct >>> cper_sec_mem_err *mem_err) >>> { >>> struct mce m; >>> + int grain = PAGE_SHIFT; >>> if (!(mem_err->validation_bits & CPER_MEM_VALID_PA)) >>> return; >>> + /* >>> + * Even if the ->validation_bits are set for address mask, >>> + * to be extra safe, check and reject an error radius '0', >>> + * and fallback to the default page size. >>> + */ >>> + if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK) { >>> + grain = ~mem_err->physical_addr_mask + 1; >>> + if (grain == 1) >>> + grain = PAGE_SHIFT; >> >> Wait, if @grain is the number of bits to mask off the address, shouldn't >> this be something like: >> >> grain = min_not_zero(PAGE_SHIFT, >> hweight64(~mem_err->physical_addr_mask)); > > I see. I guess what you meant is > grain = min(PAGE_SHIFT, (1 + hweight64(~mem_err->physical_addr_mask)));
Sorry, take that back, it won't work either. -jane > so that in the pmem poison case, 'grain' would be 8, not 7. > > thanks, > -jane > >> >> ...? >