Re: [PATCH v2] arch/cacheflush: Introduce flush_all_caches()

2022-08-23 Thread Dan Williams
Davidlohr Bueso wrote:
> On Mon, 22 Aug 2022, Dan Williams wrote:
> 
> >Davidlohr Bueso wrote:
> >> On Sun, 21 Aug 2022, Christoph Hellwig wrote:
> >>
> >> >On Fri, Aug 19, 2022 at 10:10:24AM -0700, Davidlohr Bueso wrote:
> >> >> index b192d917a6d0..ac4d4fd4e508 100644
> >> >> --- a/arch/x86/include/asm/cacheflush.h
> >> >> +++ b/arch/x86/include/asm/cacheflush.h
> >> >> @@ -10,4 +10,8 @@
> >> >>
> >> >>  void clflush_cache_range(void *addr, unsigned int size);
> >> >>
> >> >> +/* see comments in the stub version */
> >> >> +#define flush_all_caches() \
> >> >> +   do { wbinvd_on_all_cpus(); } while(0)
> >> >
> >> >Yikes.  This is just a horrible, horrible name and placement for a bad
> >> >hack that should have no generic relevance.
> >>
> >> Why does this have no generic relevance? There's already been discussions
> >> on how much wbinv is hated[0].
> >>
> >> >Please fix up the naming to make it clear that this function is for a
> >> >very specific nvdimm use case, and move it to a nvdimm-specific header
> >> >file.
> >>
> >> Do you have any suggestions for a name? And, as the changelog describes,
> >> this is not nvdimm specific anymore, and the whole point of all this is
> >> volatile memory components for cxl, hence nvdimm namespace is bogus.
> >>
> >> [0] 
> >> https://lore.kernel.org/all/yvtc2u1j%2fqip8...@worktop.programming.kicks-ass.net/
> >
> >While it is not nvdimm specific anymore, it's still specific to "memory
> >devices that can bulk invalidate a physical address space". I.e. it's
> >not as generic as its location in arch/x86/include/asm/cacheflush.h
> >would imply. So, similar to arch_invalidate_pmem(), lets keep it in a
> >device-driver-specific header file, because hch and peterz are right, we
> >need to make this much more clear that it is not for general
> >consumption.
> 
> Fine, I won't argue - although I don't particularly agree, at least wrt
> the naming. Imo my naming does _exactly_ what it should do and is much
> easier to read than arch_has_flush_memregion() which is counter intuitive
> when we are in fact flushing everything. This does not either make it
> any more clearer about virt vs physical mappings either (except that
> it's no longer associated to cacheflush). But, excepting arm cacheflush.h's
> rare arch with braino cache users get way too much credit in their namespace
> usage.


> 
> But yes there is no doubt that my version is more inviting than it should be,
> which made me think of naming it to flush_all_caches_careful() so the user
> is forced to at least check the function (or one would hope).

So I'm not married to arch_has_flush_memregion() or even including the
physical address range to flush, the only aspect of the prototype I want
to see incorporated is something about the target / motivation for the
flush.

"flush_all_caches_careful()" says nothing about what the API is being
"careful" about. It reminds of Linus' comments on memcpy_mcsafe()

https://lore.kernel.org/all/CAHk-=wh1SPyuGkTkQESsacwKTpjWd=_-KwoCK5o=suc3ymd...@mail.gmail.com/

"Naming - like comments - shouldn't be about what some implementation
is, but about the concept."

So "memregion" was meant to represent a memory device backed physical
address range, but that association may only be in my own head.  How
about something even more explicit like:
"flush_after_memdev_invalidate()" where someone would feel icky using it
for anything other than what we have been talking about in this thread.

> Anyway, I'll send a new version based on the below - I particularly agree
> with the hypervisor bits.

Ok, just one more lap around the bikeshed track, but I think we're
converging.



RE: [PATCH v7] x86/mce: retrieve poison range from hardware

2022-08-23 Thread Luck, Tony
> What I'm missing from this text here is, what *is* the mce->misc LSB
> field in human speak? What does that field denote?

The SDM says:

 Recoverable Address LSB (bits 5:0): The lowest valid recoverable address bit. 
Indicates the position of the least
 significant bit (LSB) of the recoverable error address. For example, if the 
processor logs bits [43:9] of the
 address, the LSB sub-field in IA32_MCi_MISC is 01001b (9 decimal). For this 
example, bits [8:0] of the
 recoverable error address in IA32_MCi_ADDR should be ignored.

So in human speak "how much data did you lose". "6" is a common value saying a 
cache line (2<<6 == 64)
was lost. Sometimes you see "12' (2<<12 == 4096) for a whole page lost.

-Tony


Re: [PATCH v7] x86/mce: retrieve poison range from hardware

2022-08-23 Thread Jane Chu
 >>> I suppose this wants to go upstream via the tree the bug came from 
(NVDIMM
 >>> tree? ACPI tree?), or should we pick it up into the x86 tree?
 >>
 >> No idea.  Maintainers?
 >
 > There's no real NVDIMM dependency here, just a general cleanup of how
 > APEI error granularities are managed. So I think it is appropriate for
 > this to go through the x86 tree via the typical path for mce related
 > topics.

+ Huang, Ying.

x86 maintainers,

Please let me know if you need another revision.

thanks,
-jane


On 8/8/2022 4:30 PM, Dan Williams wrote:
> Jane Chu wrote:
>> On 8/3/2022 1:53 AM, Ingo Molnar wrote:
>>>
>>> * Jane Chu  wrote:
>>>
 With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine
>>>
>>> s/Commit/commit
>>
>> Maintainers,
>> Would you prefer a v8, or take care the comment upon accepting the patch?
>>
>>>
 poison granularity") that changed nfit_handle_mce() callback to report
 badrange according to 1ULL << MCI_MISC_ADDR_LSB(mce->misc), it's been
 discovered that the mce->misc LSB field is 0x1000 bytes, hence injecting
 2 back-to-back poisons and the driver ends up logging 8 badblocks,
 because 0x1000 bytes is 8 512-byte.

 Dan Williams noticed that apei_mce_report_mem_error() hardcode
 the LSB field to PAGE_SHIFT instead of consulting the input
 struct cper_sec_mem_err record.  So change to rely on hardware whenever
 support is available.

 Link: 
 https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8...@oracle.com

 Reviewed-by: Dan Williams 
 Reviewed-by: Ingo Molnar 
 Signed-off-by: Jane Chu 
 ---
arch/x86/kernel/cpu/mce/apei.c | 13 -
1 file changed, 12 insertions(+), 1 deletion(-)

 diff --git a/arch/x86/kernel/cpu/mce/apei.c 
 b/arch/x86/kernel/cpu/mce/apei.c
 index 717192915f28..8ed341714686 100644
 --- a/arch/x86/kernel/cpu/mce/apei.c
 +++ b/arch/x86/kernel/cpu/mce/apei.c
 @@ -29,15 +29,26 @@
void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err 
 *mem_err)
{
struct mce m;
 +  int lsb;

if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))
return;

 +  /*
 +   * Even if the ->validation_bits are set for address mask,
 +   * to be extra safe, check and reject an error radius '0',
 +   * and fall back to the default page size.
 +   */
 +  if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK)
 +  lsb = find_first_bit((void *)_err->physical_addr_mask, 
 PAGE_SHIFT);
 +  else
 +  lsb = PAGE_SHIFT;
 +
mce_setup();
m.bank = -1;
/* Fake a memory read error with unknown channel */
m.status = MCI_STATUS_VAL | MCI_STATUS_EN | MCI_STATUS_ADDRV | 
 MCI_STATUS_MISCV | 0x9f;
 -  m.misc = (MCI_MISC_ADDR_PHYS << 6) | PAGE_SHIFT;
 +  m.misc = (MCI_MISC_ADDR_PHYS << 6) | lsb;
>>>
>>> LGTM.
>>>
>>> I suppose this wants to go upstream via the tree the bug came from (NVDIMM
>>> tree? ACPI tree?), or should we pick it up into the x86 tree?
>>
>> No idea.  Maintainers?
> 
> There's no real NVDIMM dependency here, just a general cleanup of how
> APEI error granularities are managed. So I think it is appropriate for
> this to go through the x86 tree via the typical path for mce related
> topics.



Re: [PATCH v2] arch/cacheflush: Introduce flush_all_caches()

2022-08-23 Thread Davidlohr Bueso

On Mon, 22 Aug 2022, Dan Williams wrote:


Davidlohr Bueso wrote:

On Sun, 21 Aug 2022, Christoph Hellwig wrote:

>On Fri, Aug 19, 2022 at 10:10:24AM -0700, Davidlohr Bueso wrote:
>> index b192d917a6d0..ac4d4fd4e508 100644
>> --- a/arch/x86/include/asm/cacheflush.h
>> +++ b/arch/x86/include/asm/cacheflush.h
>> @@ -10,4 +10,8 @@
>>
>>  void clflush_cache_range(void *addr, unsigned int size);
>>
>> +/* see comments in the stub version */
>> +#define flush_all_caches() \
>> +  do { wbinvd_on_all_cpus(); } while(0)
>
>Yikes.  This is just a horrible, horrible name and placement for a bad
>hack that should have no generic relevance.

Why does this have no generic relevance? There's already been discussions
on how much wbinv is hated[0].

>Please fix up the naming to make it clear that this function is for a
>very specific nvdimm use case, and move it to a nvdimm-specific header
>file.

Do you have any suggestions for a name? And, as the changelog describes,
this is not nvdimm specific anymore, and the whole point of all this is
volatile memory components for cxl, hence nvdimm namespace is bogus.

[0] 
https://lore.kernel.org/all/yvtc2u1j%2fqip8...@worktop.programming.kicks-ass.net/


While it is not nvdimm specific anymore, it's still specific to "memory
devices that can bulk invalidate a physical address space". I.e. it's
not as generic as its location in arch/x86/include/asm/cacheflush.h
would imply. So, similar to arch_invalidate_pmem(), lets keep it in a
device-driver-specific header file, because hch and peterz are right, we
need to make this much more clear that it is not for general
consumption.


Fine, I won't argue - although I don't particularly agree, at least wrt
the naming. Imo my naming does _exactly_ what it should do and is much
easier to read than arch_has_flush_memregion() which is counter intuitive
when we are in fact flushing everything. This does not either make it
any more clearer about virt vs physical mappings either (except that
it's no longer associated to cacheflush). But, excepting arm cacheflush.h's
rare arch with braino cache users get way too much credit in their namespace
usage.

But yes there is no doubt that my version is more inviting than it should be,
which made me think of naming it to flush_all_caches_careful() so the user
is forced to at least check the function (or one would hope).

Anyway, I'll send a new version based on the below - I particularly agree
with the hypervisor bits.

Thanks,
Davidlohr