On 7/15/2022 9:50 PM, Dan Williams wrote: > Jane Chu wrote: >> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine >> poison granularity") that changed nfit_handle_mce() callback to report >> badrange according to 1ULL << MCI_MISC_ADDR_LSB(mce->misc), it's been >> discovered that the mce->misc LSB field is 0x1000 bytes, hence injecting >> 2 back-to-back poisons and the driver ends up logging 8 badblocks, >> because 0x1000 bytes is 8 512-byte. >> >> Dan Williams noticed that apei_mce_report_mem_error() hardcode >> the LSB field to PAGE_SHIFT instead of consulting the input >> struct cper_sec_mem_err record. So change to rely on hardware whenever >> support is available. >> >> v1: https://lkml.org/lkml/2022/7/15/1040 > > This should be "Link:" and it should reference lore.kernel.org by > msg-id. Lore is maintained by LF infrastructure and the message-id can > be used to search lore.kernel.org mirrors even if the LF infra is down. > > Link: > https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8...@oracle.com >
Got it, thanks! >> Signed-off-by: Jane Chu <jane....@oracle.com> >> --- >> arch/x86/kernel/cpu/mce/apei.c | 6 +++++- >> 1 file changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c >> index 717192915f28..a4d5893632e0 100644 >> --- a/arch/x86/kernel/cpu/mce/apei.c >> +++ b/arch/x86/kernel/cpu/mce/apei.c >> @@ -29,6 +29,7 @@ >> void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err >> *mem_err) >> { >> struct mce m; >> + int lsb; >> >> if (!(mem_err->validation_bits & CPER_MEM_VALID_PA)) >> return; >> @@ -37,7 +38,10 @@ void apei_mce_report_mem_error(int severity, struct >> cper_sec_mem_err *mem_err) >> m.bank = -1; >> /* Fake a memory read error with unknown channel */ >> m.status = MCI_STATUS_VAL | MCI_STATUS_EN | MCI_STATUS_ADDRV | >> MCI_STATUS_MISCV | 0x9f; >> - m.misc = (MCI_MISC_ADDR_PHYS << 6) | PAGE_SHIFT; >> + lsb = __builtin_ffs(mem_err->physical_addr_mask) - 1; > > Just use the kernel's __ffs64() which does not require the substraction Will do. > fixup, and you can assume that physical_address_mask is non-zero just > like trace_extlog_mem_event() does. I guess to me, a more convincing evidence is /* Error grain */ if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK) e->grain = ~mem_err->physical_addr_mask + 1; in ghes_edac_report_mem_error(). > >> + if (lsb <= 0) >> + lsb = PAGE_SHIFT; > > This fixup can then be removed as well. > > After the above comments are addressed you can add: > > Reviewed-by: Dan Williams <dan.j.willi...@intel.com> Thanks! -jane