Re: [PATCH v2] arch/cacheflush: Introduce flush_all_caches()
Davidlohr Bueso wrote: > On Mon, 22 Aug 2022, Dan Williams wrote: > > >Davidlohr Bueso wrote: > >> On Sun, 21 Aug 2022, Christoph Hellwig wrote: > >> > >> >On Fri, Aug 19, 2022 at 10:10:24AM -0700, Davidlohr Bueso wrote: > >> >> index b192d917a6d0..ac4d4fd4e508 100644 > >> >> --- a/arch/x86/include/asm/cacheflush.h > >> >> +++ b/arch/x86/include/asm/cacheflush.h > >> >> @@ -10,4 +10,8 @@ > >> >> > >> >> void clflush_cache_range(void *addr, unsigned int size); > >> >> > >> >> +/* see comments in the stub version */ > >> >> +#define flush_all_caches() \ > >> >> + do { wbinvd_on_all_cpus(); } while(0) > >> > > >> >Yikes. This is just a horrible, horrible name and placement for a bad > >> >hack that should have no generic relevance. > >> > >> Why does this have no generic relevance? There's already been discussions > >> on how much wbinv is hated[0]. > >> > >> >Please fix up the naming to make it clear that this function is for a > >> >very specific nvdimm use case, and move it to a nvdimm-specific header > >> >file. > >> > >> Do you have any suggestions for a name? And, as the changelog describes, > >> this is not nvdimm specific anymore, and the whole point of all this is > >> volatile memory components for cxl, hence nvdimm namespace is bogus. > >> > >> [0] > >> https://lore.kernel.org/all/yvtc2u1j%2fqip8...@worktop.programming.kicks-ass.net/ > > > >While it is not nvdimm specific anymore, it's still specific to "memory > >devices that can bulk invalidate a physical address space". I.e. it's > >not as generic as its location in arch/x86/include/asm/cacheflush.h > >would imply. So, similar to arch_invalidate_pmem(), lets keep it in a > >device-driver-specific header file, because hch and peterz are right, we > >need to make this much more clear that it is not for general > >consumption. > > Fine, I won't argue - although I don't particularly agree, at least wrt > the naming. Imo my naming does _exactly_ what it should do and is much > easier to read than arch_has_flush_memregion() which is counter intuitive > when we are in fact flushing everything. This does not either make it > any more clearer about virt vs physical mappings either (except that > it's no longer associated to cacheflush). But, excepting arm cacheflush.h's > rare arch with braino cache users get way too much credit in their namespace > usage. > > But yes there is no doubt that my version is more inviting than it should be, > which made me think of naming it to flush_all_caches_careful() so the user > is forced to at least check the function (or one would hope). So I'm not married to arch_has_flush_memregion() or even including the physical address range to flush, the only aspect of the prototype I want to see incorporated is something about the target / motivation for the flush. "flush_all_caches_careful()" says nothing about what the API is being "careful" about. It reminds of Linus' comments on memcpy_mcsafe() https://lore.kernel.org/all/CAHk-=wh1SPyuGkTkQESsacwKTpjWd=_-KwoCK5o=suc3ymd...@mail.gmail.com/ "Naming - like comments - shouldn't be about what some implementation is, but about the concept." So "memregion" was meant to represent a memory device backed physical address range, but that association may only be in my own head. How about something even more explicit like: "flush_after_memdev_invalidate()" where someone would feel icky using it for anything other than what we have been talking about in this thread. > Anyway, I'll send a new version based on the below - I particularly agree > with the hypervisor bits. Ok, just one more lap around the bikeshed track, but I think we're converging.
RE: [PATCH v7] x86/mce: retrieve poison range from hardware
> What I'm missing from this text here is, what *is* the mce->misc LSB > field in human speak? What does that field denote? The SDM says: Recoverable Address LSB (bits 5:0): The lowest valid recoverable address bit. Indicates the position of the least significant bit (LSB) of the recoverable error address. For example, if the processor logs bits [43:9] of the address, the LSB sub-field in IA32_MCi_MISC is 01001b (9 decimal). For this example, bits [8:0] of the recoverable error address in IA32_MCi_ADDR should be ignored. So in human speak "how much data did you lose". "6" is a common value saying a cache line (2<<6 == 64) was lost. Sometimes you see "12' (2<<12 == 4096) for a whole page lost. -Tony
Re: [PATCH v7] x86/mce: retrieve poison range from hardware
>>> I suppose this wants to go upstream via the tree the bug came from (NVDIMM >>> tree? ACPI tree?), or should we pick it up into the x86 tree? >> >> No idea. Maintainers? > > There's no real NVDIMM dependency here, just a general cleanup of how > APEI error granularities are managed. So I think it is appropriate for > this to go through the x86 tree via the typical path for mce related > topics. + Huang, Ying. x86 maintainers, Please let me know if you need another revision. thanks, -jane On 8/8/2022 4:30 PM, Dan Williams wrote: > Jane Chu wrote: >> On 8/3/2022 1:53 AM, Ingo Molnar wrote: >>> >>> * Jane Chu wrote: >>> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine >>> >>> s/Commit/commit >> >> Maintainers, >> Would you prefer a v8, or take care the comment upon accepting the patch? >> >>> poison granularity") that changed nfit_handle_mce() callback to report badrange according to 1ULL << MCI_MISC_ADDR_LSB(mce->misc), it's been discovered that the mce->misc LSB field is 0x1000 bytes, hence injecting 2 back-to-back poisons and the driver ends up logging 8 badblocks, because 0x1000 bytes is 8 512-byte. Dan Williams noticed that apei_mce_report_mem_error() hardcode the LSB field to PAGE_SHIFT instead of consulting the input struct cper_sec_mem_err record. So change to rely on hardware whenever support is available. Link: https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8...@oracle.com Reviewed-by: Dan Williams Reviewed-by: Ingo Molnar Signed-off-by: Jane Chu --- arch/x86/kernel/cpu/mce/apei.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c index 717192915f28..8ed341714686 100644 --- a/arch/x86/kernel/cpu/mce/apei.c +++ b/arch/x86/kernel/cpu/mce/apei.c @@ -29,15 +29,26 @@ void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err) { struct mce m; + int lsb; if (!(mem_err->validation_bits & CPER_MEM_VALID_PA)) return; + /* + * Even if the ->validation_bits are set for address mask, + * to be extra safe, check and reject an error radius '0', + * and fall back to the default page size. + */ + if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK) + lsb = find_first_bit((void *)&mem_err->physical_addr_mask, PAGE_SHIFT); + else + lsb = PAGE_SHIFT; + mce_setup(&m); m.bank = -1; /* Fake a memory read error with unknown channel */ m.status = MCI_STATUS_VAL | MCI_STATUS_EN | MCI_STATUS_ADDRV | MCI_STATUS_MISCV | 0x9f; - m.misc = (MCI_MISC_ADDR_PHYS << 6) | PAGE_SHIFT; + m.misc = (MCI_MISC_ADDR_PHYS << 6) | lsb; >>> >>> LGTM. >>> >>> I suppose this wants to go upstream via the tree the bug came from (NVDIMM >>> tree? ACPI tree?), or should we pick it up into the x86 tree? >> >> No idea. Maintainers? > > There's no real NVDIMM dependency here, just a general cleanup of how > APEI error granularities are managed. So I think it is appropriate for > this to go through the x86 tree via the typical path for mce related > topics.
Re: [PATCH v2] arch/cacheflush: Introduce flush_all_caches()
On Mon, 22 Aug 2022, Dan Williams wrote: Davidlohr Bueso wrote: On Sun, 21 Aug 2022, Christoph Hellwig wrote: >On Fri, Aug 19, 2022 at 10:10:24AM -0700, Davidlohr Bueso wrote: >> index b192d917a6d0..ac4d4fd4e508 100644 >> --- a/arch/x86/include/asm/cacheflush.h >> +++ b/arch/x86/include/asm/cacheflush.h >> @@ -10,4 +10,8 @@ >> >> void clflush_cache_range(void *addr, unsigned int size); >> >> +/* see comments in the stub version */ >> +#define flush_all_caches() \ >> + do { wbinvd_on_all_cpus(); } while(0) > >Yikes. This is just a horrible, horrible name and placement for a bad >hack that should have no generic relevance. Why does this have no generic relevance? There's already been discussions on how much wbinv is hated[0]. >Please fix up the naming to make it clear that this function is for a >very specific nvdimm use case, and move it to a nvdimm-specific header >file. Do you have any suggestions for a name? And, as the changelog describes, this is not nvdimm specific anymore, and the whole point of all this is volatile memory components for cxl, hence nvdimm namespace is bogus. [0] https://lore.kernel.org/all/yvtc2u1j%2fqip8...@worktop.programming.kicks-ass.net/ While it is not nvdimm specific anymore, it's still specific to "memory devices that can bulk invalidate a physical address space". I.e. it's not as generic as its location in arch/x86/include/asm/cacheflush.h would imply. So, similar to arch_invalidate_pmem(), lets keep it in a device-driver-specific header file, because hch and peterz are right, we need to make this much more clear that it is not for general consumption. Fine, I won't argue - although I don't particularly agree, at least wrt the naming. Imo my naming does _exactly_ what it should do and is much easier to read than arch_has_flush_memregion() which is counter intuitive when we are in fact flushing everything. This does not either make it any more clearer about virt vs physical mappings either (except that it's no longer associated to cacheflush). But, excepting arm cacheflush.h's rare arch with braino cache users get way too much credit in their namespace usage. But yes there is no doubt that my version is more inviting than it should be, which made me think of naming it to flush_all_caches_careful() so the user is forced to at least check the function (or one would hope). Anyway, I'll send a new version based on the below - I particularly agree with the hypervisor bits. Thanks, Davidlohr