RE: [PATCH v7 1/1] memory_hotplug: Add a bounds check to __add_pages

2019-10-02 Thread Alastair D'Silva
On Wed, 2019-10-02 at 15:14 -0700, Andrew Morton wrote:
> On Tue, 1 Oct 2019 11:49:47 +0200 David Hildenbrand  > wrote:
> 
> > > @@ -278,6 +278,22 @@ static int check_pfn_span(unsigned long pfn,
> > > unsigned long nr_pages,
> > >   return 0;
> > >  }
> > >  
> > > +static int check_hotplug_memory_addressable(unsigned long pfn,
> > > + unsigned long nr_pages)
> > > +{
> > > + const u64 max_addr = PFN_PHYS(pfn + nr_pages) - 1;
> > > +
> > > + if (max_addr >> MAX_PHYSMEM_BITS) {
> > > + const u64 max_allowed = (1ull << (MAX_PHYSMEM_BITS +
> > > 1)) - 1;
> > > + WARN(1,
> > > +  "Hotplugged memory exceeds maximum addressable
> > > address, range=%#llx-%#llx, maximum=%#llx\n",
> > > +  (u64)PFN_PHYS(pfn), max_addr, max_allowed);
> > > + return -E2BIG;
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +
> > >  /*
> > >   * Reasonably generic function for adding memory.  It is
> > >   * expected that archs that support memory hotplug will
> > > @@ -291,6 +307,10 @@ int __ref __add_pages(int nid, unsigned long
> > > pfn, unsigned long nr_pages,
> > >   unsigned long nr, start_sec, end_sec;
> > >   struct vmem_altmap *altmap = restrictions->altmap;
> > >  
> > > + err = check_hotplug_memory_addressable(pfn, nr_pages);
> > > + if (err)
> > > + return err;
> > > +
> > >   if (altmap) {
> > >   /*
> > >* Validate altmap is within bounds of the total
> > > request
> > > 
> > 
> > I actually wanted to give my RB to v7, not v6 :)
> > 
> 
> Given that check_hotplug_memory_addressable() is now static, I'll
> assume that the old [2/2]
> mm-add-a-bounds-check-in-devm_memremap_pages.patch is now obsolete.
> 

Yes, please ignore that whole series.

> From: Alastair D'Silva 
> Subject: mm/memremap.c: add a bounds check in devm_memremap_pages()
> 
> The call to check_hotplug_memory_addressable() validates that the
> memory
> is fully addressable.
> 
> Without this call, it is possible that we may remap pages that is not
> physically addressable, resulting in bogus section numbers being
> returned
> from __section_nr().
> 
> Link: 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_20190917010752.28395-2D3-2Dalastair-40au1.ibm.com=DwICAg=jf_iaSHvJObTbx-siA1ZOg=cT4tgeEQ0Ll3SIlZDHE5AEXyKy6uKADMtf9_Eb7-vec=pVid6q3tQNfU2PQborLw8oYmNm9naF133dZ8AJ5lW9A=51ZuQa-kwRu8vW9vt5OgxjaIMWm4_n-aqp5xMSdkI4k=
>  
> Signed-off-by: Alastair D'Silva 
> Acked-by: David Hildenbrand 
> Cc: Dan Williams 
> Cc: Ira Weiny 
> Cc: Jason Gunthorpe 
> Cc: Logan Gunthorpe 
> Cc: Michal Hocko 
> Cc: Oscar Salvador 
> Cc: Pavel Tatashin 
> Cc: Qian Cai 
> Cc: Wei Yang 
> Signed-off-by: Andrew Morton 
> ---
> 
>  mm/memremap.c |5 +
>  1 file changed, 5 insertions(+)
> 
> --- a/mm/memremap.c~mm-add-a-bounds-check-in-devm_memremap_pages
> +++ a/mm/memremap.c
> @@ -185,6 +185,11 @@ void *memremap_pages(struct dev_pagemap
>   int error, is_ram;
>   bool need_devmap_managed = true;
>  
> + error = check_hotplug_memory_addressable(res->start,
> +  resource_size(res));
> + if (error)
> + return ERR_PTR(error);
> +
>   switch (pgmap->type) {
>   case MEMORY_DEVICE_PRIVATE:
>   if (!IS_ENABLED(CONFIG_DEVICE_PRIVATE)) {
> _
> 
-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



[PATCH v7 0/1] Add bounds check for Hotplugged memory

2019-09-30 Thread Alastair D'Silva
From: Alastair D'Silva 

This series adds bounds checks for hotplugged memory, ensuring that
it is within the physically addressable range (for platforms that
define MAX_(POSSIBLE_)PHYSMEM_BITS.

This allows for early failure, rather than attempting to access
bogus section numbers.

Changelog:
 V7:
   - Cast PFN_PHYS as u64 since it's type is platform dependant
 V6:
   - Fix printf formats
 V5:
   - Factor out calculation into max_allowed var
   - Declare unchanging vars as const
   - Use PFN_PHYS macro instead of shifting by PAGE_SHIFT
 V4:
   - Relocate call to __add_pages
   - Add a warning when the addressable check fails
 V3:
   - Perform the addressable check before we take the hotplug lock
 V2:
   - Don't use MAX_POSSIBLE_PHYSMEM_BITS as it's wider that what
 may be available

Alastair D'Silva (1):
  memory_hotplug: Add a bounds check to __add_pages

 mm/memory_hotplug.c | 20 
 1 file changed, 20 insertions(+)

-- 
2.21.0



[PATCH v7 1/1] memory_hotplug: Add a bounds check to __add_pages

2019-09-30 Thread Alastair D'Silva
From: Alastair D'Silva 

On PowerPC, the address ranges allocated to OpenCAPI LPC memory
are allocated from firmware. These address ranges may be higher
than what older kernels permit, as we increased the maximum
permissable address in commit 4ffe713b7587
("powerpc/mm: Increase the max addressable memory to 2PB"). It is
possible that the addressable range may change again in the
future.

In this scenario, we end up with a bogus section returned from
__section_nr (see the discussion on the thread "mm: Trigger bug on
if a section is not found in __section_nr").

Adding a check here means that we fail early and have an
opportunity to handle the error gracefully, rather than rumbling
on and potentially accessing an incorrect section.

Further discussion is also on the thread ("powerpc: Perform a bounds
check in arch_add_memory")
http://lkml.kernel.org/r/20190827052047.31547-1-alast...@au1.ibm.com

Signed-off-by: Alastair D'Silva 
---
 mm/memory_hotplug.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c73f09913165..5af9f4466ad1 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -278,6 +278,22 @@ static int check_pfn_span(unsigned long pfn, unsigned long 
nr_pages,
return 0;
 }
 
+static int check_hotplug_memory_addressable(unsigned long pfn,
+   unsigned long nr_pages)
+{
+   const u64 max_addr = PFN_PHYS(pfn + nr_pages) - 1;
+
+   if (max_addr >> MAX_PHYSMEM_BITS) {
+   const u64 max_allowed = (1ull << (MAX_PHYSMEM_BITS + 1)) - 1;
+   WARN(1,
+"Hotplugged memory exceeds maximum addressable address, 
range=%#llx-%#llx, maximum=%#llx\n",
+(u64)PFN_PHYS(pfn), max_addr, max_allowed);
+   return -E2BIG;
+   }
+
+   return 0;
+}
+
 /*
  * Reasonably generic function for adding memory.  It is
  * expected that archs that support memory hotplug will
@@ -291,6 +307,10 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned 
long nr_pages,
unsigned long nr, start_sec, end_sec;
struct vmem_altmap *altmap = restrictions->altmap;
 
+   err = check_hotplug_memory_addressable(pfn, nr_pages);
+   if (err)
+   return err;
+
if (altmap) {
/*
 * Validate altmap is within bounds of the total request
-- 
2.21.0



[PATCH v6 1/1] memory_hotplug: Add a bounds check to __add_pages

2019-09-30 Thread Alastair D'Silva
From: Alastair D'Silva 

On PowerPC, the address ranges allocated to OpenCAPI LPC memory
are allocated from firmware. These address ranges may be higher
than what older kernels permit, as we increased the maximum
permissable address in commit 4ffe713b7587
("powerpc/mm: Increase the max addressable memory to 2PB"). It is
possible that the addressable range may change again in the
future.

In this scenario, we end up with a bogus section returned from
__section_nr (see the discussion on the thread "mm: Trigger bug on
if a section is not found in __section_nr").

Adding a check here means that we fail early and have an
opportunity to handle the error gracefully, rather than rumbling
on and potentially accessing an incorrect section.

Further discussion is also on the thread ("powerpc: Perform a bounds
check in arch_add_memory")
http://lkml.kernel.org/r/20190827052047.31547-1-alast...@au1.ibm.com

Signed-off-by: Alastair D'Silva 
---
 mm/memory_hotplug.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c73f09913165..a5eddf3c3c1f 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -278,6 +278,22 @@ static int check_pfn_span(unsigned long pfn, unsigned long 
nr_pages,
return 0;
 }
 
+static int check_hotplug_memory_addressable(unsigned long pfn,
+   unsigned long nr_pages)
+{
+   const u64 max_addr = PFN_PHYS(pfn + nr_pages) - 1;
+
+   if (max_addr >> MAX_PHYSMEM_BITS) {
+   const u64 max_allowed = (1ull << (MAX_PHYSMEM_BITS + 1)) - 1;
+   WARN(1,
+"Hotplugged memory exceeds maximum addressable address, 
range=%#llx-%#llx, maximum=%#llx\n",
+PFN_PHYS(pfn), max_addr, max_allowed);
+   return -E2BIG;
+   }
+
+   return 0;
+}
+
 /*
  * Reasonably generic function for adding memory.  It is
  * expected that archs that support memory hotplug will
@@ -291,6 +307,10 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned 
long nr_pages,
unsigned long nr, start_sec, end_sec;
struct vmem_altmap *altmap = restrictions->altmap;
 
+   err = check_hotplug_memory_addressable(pfn, nr_pages);
+   if (err)
+   return err;
+
if (altmap) {
/*
 * Validate altmap is within bounds of the total request
-- 
2.21.0



[PATCH v6 0/1] Add bounds check for Hotplugged memory

2019-09-30 Thread Alastair D'Silva
From: Alastair D'Silva 

This series adds bounds checks for hotplugged memory, ensuring that
it is within the physically addressable range (for platforms that
define MAX_(POSSIBLE_)PHYSMEM_BITS.

This allows for early failure, rather than attempting to access
bogus section numbers.

Changelog:
 V6:
   - Fix printf formats
 V5:
   - Factor out calculation into max_allowed var
   - Declare unchanging vars as const
   - Use PFN_PHYS macro instead of shifting by PAGE_SHIFT
 V4:
   - Relocate call to __add_pages
   - Add a warning when the addressable check fails
 V3:
   - Perform the addressable check before we take the hotplug lock
 V2:
   - Don't use MAX_POSSIBLE_PHYSMEM_BITS as it's wider that what
 may be available

Alastair D'Silva (1):
  memory_hotplug: Add a bounds check to __add_pages

 mm/memory_hotplug.c | 20 
 1 file changed, 20 insertions(+)

-- 
2.21.0



Re: [PATCH v5 1/1] memory_hotplug: Add a bounds check to __add_pages

2019-09-29 Thread Alastair D'Silva
On Mon, 2019-09-30 at 12:21 +1000, Alastair D'Silva wrote:
> From: Alastair D'Silva 
> 
> On PowerPC, the address ranges allocated to OpenCAPI LPC memory
> are allocated from firmware. These address ranges may be higher
> than what older kernels permit, as we increased the maximum
> permissable address in commit 4ffe713b7587
> ("powerpc/mm: Increase the max addressable memory to 2PB"). It is
> possible that the addressable range may change again in the
> future.
> 
> In this scenario, we end up with a bogus section returned from
> __section_nr (see the discussion on the thread "mm: Trigger bug on
> if a section is not found in __section_nr").
> 
> Adding a check here means that we fail early and have an
> opportunity to handle the error gracefully, rather than rumbling
> on and potentially accessing an incorrect section.
> 
> Further discussion is also on the thread ("powerpc: Perform a bounds
> check in arch_add_memory")
> http://lkml.kernel.org/r/20190827052047.31547-1-alast...@au1.ibm.com
> 
> Signed-off-by: Alastair D'Silva 
> ---
>  mm/memory_hotplug.c | 20 
>  1 file changed, 20 insertions(+)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index c73f09913165..1909607da640 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -278,6 +278,22 @@ static int check_pfn_span(unsigned long pfn,
> unsigned long nr_pages,
>   return 0;
>  }
>  
> +static int check_hotplug_memory_addressable(unsigned long pfn,
> + unsigned long nr_pages)
> +{
> + const u64 max_addr = PFN_PHYS(pfn + nr_pages) - 1;
> +
> + if (max_addr >> MAX_PHYSMEM_BITS) {
> + const u64 max_allowed = (1ull << (MAX_PHYSMEM_BITS +
> 1)) - 1;
> + WARN(1,
> +  "Hotplugged memory exceeds maximum addressable
> address, range=%#lx-%#lx, maximum=%#lx\n",
Gah, these should all be %#llx.

> +  PFN_PHYS(pfn), max_addr, max_allowed);
> + return -E2BIG;
> + }
> +
> + return 0;
> +}
> +
>  /*
>   * Reasonably generic function for adding memory.  It is
>   * expected that archs that support memory hotplug will
> @@ -291,6 +307,10 @@ int __ref __add_pages(int nid, unsigned long
> pfn, unsigned long nr_pages,
>   unsigned long nr, start_sec, end_sec;
>   struct vmem_altmap *altmap = restrictions->altmap;
>  
> + err = check_hotplug_memory_addressable(pfn, nr_pages);
> + if (err)
> + return err;
> +
>   if (altmap) {
>   /*
>* Validate altmap is within bounds of the total
> request



[PATCH v5 0/1] Add bounds check for Hotplugged memory

2019-09-29 Thread Alastair D'Silva
From: Alastair D'Silva 

This series adds bounds checks for hotplugged memory, ensuring that
it is within the physically addressable range (for platforms that
define MAX_(POSSIBLE_)PHYSMEM_BITS.

This allows for early failure, rather than attempting to access
bogus section numbers.

Changelog:
 V5:
   - Factor out calculation into max_allowed var
   - Declare unchanging vars as const
   - Use PFN_PHYS macro instead of shifting by PAGE_SHIFT
 V4:
   - Relocate call to __add_pages
   - Add a warning when the addressable check fails
 V3:
   - Perform the addressable check before we take the hotplug lock
 V2:
   - Don't use MAX_POSSIBLE_PHYSMEM_BITS as it's wider that what
 may be available

Alastair D'Silva (1):
  memory_hotplug: Add a bounds check to __add_pages

 mm/memory_hotplug.c | 20 
 1 file changed, 20 insertions(+)

-- 
2.21.0



[PATCH v5 1/1] memory_hotplug: Add a bounds check to __add_pages

2019-09-29 Thread Alastair D'Silva
From: Alastair D'Silva 

On PowerPC, the address ranges allocated to OpenCAPI LPC memory
are allocated from firmware. These address ranges may be higher
than what older kernels permit, as we increased the maximum
permissable address in commit 4ffe713b7587
("powerpc/mm: Increase the max addressable memory to 2PB"). It is
possible that the addressable range may change again in the
future.

In this scenario, we end up with a bogus section returned from
__section_nr (see the discussion on the thread "mm: Trigger bug on
if a section is not found in __section_nr").

Adding a check here means that we fail early and have an
opportunity to handle the error gracefully, rather than rumbling
on and potentially accessing an incorrect section.

Further discussion is also on the thread ("powerpc: Perform a bounds
check in arch_add_memory")
http://lkml.kernel.org/r/20190827052047.31547-1-alast...@au1.ibm.com

Signed-off-by: Alastair D'Silva 
---
 mm/memory_hotplug.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c73f09913165..1909607da640 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -278,6 +278,22 @@ static int check_pfn_span(unsigned long pfn, unsigned long 
nr_pages,
return 0;
 }
 
+static int check_hotplug_memory_addressable(unsigned long pfn,
+   unsigned long nr_pages)
+{
+   const u64 max_addr = PFN_PHYS(pfn + nr_pages) - 1;
+
+   if (max_addr >> MAX_PHYSMEM_BITS) {
+   const u64 max_allowed = (1ull << (MAX_PHYSMEM_BITS + 1)) - 1;
+   WARN(1,
+"Hotplugged memory exceeds maximum addressable address, 
range=%#lx-%#lx, maximum=%#lx\n",
+PFN_PHYS(pfn), max_addr, max_allowed);
+   return -E2BIG;
+   }
+
+   return 0;
+}
+
 /*
  * Reasonably generic function for adding memory.  It is
  * expected that archs that support memory hotplug will
@@ -291,6 +307,10 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned 
long nr_pages,
unsigned long nr, start_sec, end_sec;
struct vmem_altmap *altmap = restrictions->altmap;
 
+   err = check_hotplug_memory_addressable(pfn, nr_pages);
+   if (err)
+   return err;
+
if (altmap) {
/*
 * Validate altmap is within bounds of the total request
-- 
2.21.0



RE: [PATCH v4 5/6] powerpc: Chunk calls to flush_dcache_range in arch_*_memory

2019-09-27 Thread Alastair D'Silva
On Fri, 2019-09-27 at 08:37 +0200, Mark Marshall wrote:
> Comment below...
> 
> On Thu, 26 Sep 2019 at 12:18, Alastair D'Silva 
> wrote:
> > From: Alastair D'Silva 
> > 
> > When presented with large amounts of memory being hotplugged
> > (in my test case, ~890GB), the call to flush_dcache_range takes
> > a while (~50 seconds), triggering RCU stalls.
> > 
> > This patch breaks up the call into 1GB chunks, calling
> > cond_resched() inbetween to allow the scheduler to run.
> > 
> > Signed-off-by: Alastair D'Silva 
> > ---
> >  arch/powerpc/mm/mem.c | 27 +--
> >  1 file changed, 25 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> > index cff947cb2a84..a2758e473d58 100644
> > --- a/arch/powerpc/mm/mem.c
> > +++ b/arch/powerpc/mm/mem.c
> > @@ -104,6 +104,27 @@ int __weak remove_section_mapping(unsigned
> > long start, unsigned long end)
> > return -ENODEV;
> >  }
> > 
> > +#define FLUSH_CHUNK_SIZE SZ_1G
> > +/**
> > + * flush_dcache_range_chunked(): Write any modified data cache
> > blocks out to
> > + * memory and invalidate them, in chunks of up to FLUSH_CHUNK_SIZE
> > + * Does not invalidate the corresponding instruction cache blocks.
> > + *
> > + * @start: the start address
> > + * @stop: the stop address (exclusive)
> > + * @chunk: the max size of the chunks
> > + */
> > +static void flush_dcache_range_chunked(unsigned long start,
> > unsigned long stop,
> > +  unsigned long chunk)
> > +{
> > +   unsigned long i;
> > +
> > +   for (i = start; i < stop; i += FLUSH_CHUNK_SIZE) {
> Here you ignore the function parameter "chunk" and use the define
> FLUSH_CHUNK_SIZE.
> You should do one or the other; use the parameter or remove it.

Good catch, thankyou :)

> > +   flush_dcache_range(i, min(stop, start +
> > FLUSH_CHUNK_SIZE));
> > +   cond_resched();
> > +   }
> > +}
> > +
> >  int __ref arch_add_memory(int nid, u64 start, u64 size,
> > struct mhp_restrictions *restrictions)
> >  {
> > @@ -120,7 +141,8 @@ int __ref arch_add_memory(int nid, u64 start,
> > u64 size,
> > start, start + size, rc);
> > return -EFAULT;
> > }
> > -   flush_dcache_range(start, start + size);
> > +
> > +   flush_dcache_range_chunked(start, start + size,
> > FLUSH_CHUNK_SIZE);
> > 
> > return __add_pages(nid, start_pfn, nr_pages, restrictions);
> >  }
> > @@ -137,7 +159,8 @@ void __ref arch_remove_memory(int nid, u64
> > start, u64 size,
> > 
> > /* Remove htab bolted mappings for this section of memory
> > */
> > start = (unsigned long)__va(start);
> > -   flush_dcache_range(start, start + size);
> > +   flush_dcache_range_chunked(start, start + size,
> > FLUSH_CHUNK_SIZE);
> > +
> > ret = remove_section_mapping(start, start + size);
> > WARN_ON_ONCE(ret);
> > 
> > --
> > 2.21.0
> > 
-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



RE: [PATCH v4] memory_hotplug: Add a bounds check to __add_pages

2019-09-27 Thread Alastair D'Silva
On Thu, 2019-09-26 at 09:46 +0200, David Hildenbrand wrote:
> On 26.09.19 09:43, Michal Hocko wrote:
> > On Thu 26-09-19 09:12:50, David Hildenbrand wrote:
> > > On 26.09.19 03:34, Alastair D'Silva wrote:
> > > > From: Alastair D'Silva 
> > > > 
> > > > On PowerPC, the address ranges allocated to OpenCAPI LPC memory
> > > > are allocated from firmware. These address ranges may be higher
> > > > than what older kernels permit, as we increased the maximum
> > > > permissable address in commit 4ffe713b7587
> > > > ("powerpc/mm: Increase the max addressable memory to 2PB"). It
> > > > is
> > > > possible that the addressable range may change again in the
> > > > future.
> > > > 
> > > > In this scenario, we end up with a bogus section returned from
> > > > __section_nr (see the discussion on the thread "mm: Trigger bug
> > > > on
> > > > if a section is not found in __section_nr").
> > > > 
> > > > Adding a check here means that we fail early and have an
> > > > opportunity to handle the error gracefully, rather than
> > > > rumbling
> > > > on and potentially accessing an incorrect section.
> > > > 
> > > > Further discussion is also on the thread ("powerpc: Perform a
> > > > bounds
> > > > check in arch_add_memory")
> > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_20190827052047.31547-2D1-2Dalastair-40au1.ibm.com=DwICaQ=jf_iaSHvJObTbx-siA1ZOg=cT4tgeEQ0Ll3SIlZDHE5AEXyKy6uKADMtf9_Eb7-vec=p9ZS4kSnvF0zq81jcCFd2nYj1zfTMvfbApCtmKI2KNA=yif-duzz_RESW3LUyU_0kkmefRAnKWjjn_p5Et-9B2g=
> > > >  
> > > > 
> > > > Signed-off-by: Alastair D'Silva 
> > > > ---
> > > >  mm/memory_hotplug.c | 20 
> > > >  1 file changed, 20 insertions(+)
> > > > 
> > > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > > > index c73f09913165..212804c0f7f5 100644
> > > > --- a/mm/memory_hotplug.c
> > > > +++ b/mm/memory_hotplug.c
> > > > @@ -278,6 +278,22 @@ static int check_pfn_span(unsigned long
> > > > pfn, unsigned long nr_pages,
> > > > return 0;
> > > >  }
> > > >  
> > > > +static int check_hotplug_memory_addressable(unsigned long pfn,
> > > > +   unsigned long
> > > > nr_pages)
> > > > +{
> > > > +   unsigned long max_addr = ((pfn + nr_pages) <<
> > > > PAGE_SHIFT) - 1;
> > > > +
> > > > +   if (max_addr >> MAX_PHYSMEM_BITS) {
> > > > +   WARN(1,
> > > > +"Hotplugged memory exceeds maximum
> > > > addressable address, range=%#lx-%#lx, maximum=%#lx\n",
> > > > +pfn << PAGE_SHIFT, max_addr,
> > > > +(1ul << (MAX_PHYSMEM_BITS + 1)) - 1);
> > > > +   return -E2BIG;
> > > > +   }
> > > > +
> > > > +   return 0;
> > > > +}
> > > > +
> > > >  /*
> > > >   * Reasonably generic function for adding memory.  It is
> > > >   * expected that archs that support memory hotplug will
> > > > @@ -291,6 +307,10 @@ int __ref __add_pages(int nid, unsigned
> > > > long pfn, unsigned long nr_pages,
> > > > unsigned long nr, start_sec, end_sec;
> > > > struct vmem_altmap *altmap = restrictions->altmap;
> > > >  
> > > > +   err = check_hotplug_memory_addressable(pfn, nr_pages);
> > > > +   if (err)
> > > > +   return err;
> > > > +
> > > > if (altmap) {
> > > > /*
> > > >  * Validate altmap is within bounds of the
> > > > total request
> > > > 
> > > 
> > > I know Michal suggested this, but I still prefer checking early
> > > instead
> > > of when we're knees-deep into adding of memory.
> > 
> > What is your concern here? Unwinding the state should be pretty
> > straightfoward from this failure path.
> 
> Just the general "check what you can check early without locks"
> approach. But yeah, this series is probably not worth a v5, so I can
> live with this change just fine :)
> 
> 

I'm going to spin a V5 anyway - where were you suggesting?

> -- 
> 
> Thanks,
> 
> David / dhildenb

-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



Re: [PATCH v4] memory_hotplug: Add a bounds check to __add_pages

2019-09-26 Thread Alastair D'Silva
On Thu, 2019-09-26 at 09:53 +0200, Oscar Salvador wrote:
> On Thu, Sep 26, 2019 at 11:34:05AM +1000, Alastair D'Silva wrote:
> > From: Alastair D'Silva 
> > 
> > On PowerPC, the address ranges allocated to OpenCAPI LPC memory
> > are allocated from firmware. These address ranges may be higher
> > than what older kernels permit, as we increased the maximum
> > permissable address in commit 4ffe713b7587
> > ("powerpc/mm: Increase the max addressable memory to 2PB"). It is
> > possible that the addressable range may change again in the
> > future.
> > 
> > In this scenario, we end up with a bogus section returned from
> > __section_nr (see the discussion on the thread "mm: Trigger bug on
> > if a section is not found in __section_nr").
> > 
> > Adding a check here means that we fail early and have an
> > opportunity to handle the error gracefully, rather than rumbling
> > on and potentially accessing an incorrect section.
> > 
> > Further discussion is also on the thread ("powerpc: Perform a
> > bounds
> > check in arch_add_memory")
> > http://lkml.kernel.org/r/20190827052047.31547-1-alast...@au1.ibm.com
> > 
> > Signed-off-by: Alastair D'Silva 
> 
> Reviewed-by: Oscar Salvador 
> 
> Just a nit-picking below:
> 
> > ---
> >  mm/memory_hotplug.c | 20 
> >  1 file changed, 20 insertions(+)
> > 
> > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > index c73f09913165..212804c0f7f5 100644
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -278,6 +278,22 @@ static int check_pfn_span(unsigned long pfn,
> > unsigned long nr_pages,
> > return 0;
> >  }
> >  
> > +static int check_hotplug_memory_addressable(unsigned long pfn,
> > +   unsigned long nr_pages)
> > +{
> > +   unsigned long max_addr = ((pfn + nr_pages) << PAGE_SHIFT) - 1;
> 
> I would use PFN_PHYS instead:
> 
>   unsigned long max_addr = PFN_PHYS(pfn + nr_pages) - 1;
> 
> > +
> > +   if (max_addr >> MAX_PHYSMEM_BITS) {
> > +   WARN(1,
> > +"Hotplugged memory exceeds maximum addressable
> > address, range=%#lx-%#lx, maximum=%#lx\n",
> > +    pfn << PAGE_SHIFT, max_addr,
> 
> Same here.
> 
> > +(1ul << (MAX_PHYSMEM_BITS + 1)) - 1);
> 
> I would use a local variable to hold this computation.
> 
> > +   return -E2BIG;
> > +   }
> > +
> > +   return 0;


Looks like I'll have to do another spin to change that to a ull anyway,
so I'll implement those suggestions.

-- 
Alastair D'Silva   mob: 0423 762 819
skype: alastair_dsilva
Twitter: @EvilDeece
blog: http://alastair.d-silva.org




[PATCH v4 6/6] powerpc: Don't flush caches when adding memory

2019-09-25 Thread Alastair D'Silva
From: Alastair D'Silva 

This operation takes a significant amount of time when hotplugging
large amounts of memory (~50 seconds with 890GB of persistent memory).

This was orignally in commit fb5924fddf9e
("powerpc/mm: Flush cache on memory hot(un)plug") to support memtrace,
but the flush on add is not needed as it is flushed on remove.

Signed-off-by: Alastair D'Silva 
---
 arch/powerpc/mm/mem.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index a2758e473d58..9afb77a8c8d5 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -142,8 +142,6 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
return -EFAULT;
}
 
-   flush_dcache_range_chunked(start, start + size, FLUSH_CHUNK_SIZE);
-
return __add_pages(nid, start_pfn, nr_pages, restrictions);
 }
 
-- 
2.21.0



[PATCH v4 5/6] powerpc: Chunk calls to flush_dcache_range in arch_*_memory

2019-09-25 Thread Alastair D'Silva
From: Alastair D'Silva 

When presented with large amounts of memory being hotplugged
(in my test case, ~890GB), the call to flush_dcache_range takes
a while (~50 seconds), triggering RCU stalls.

This patch breaks up the call into 1GB chunks, calling
cond_resched() inbetween to allow the scheduler to run.

Signed-off-by: Alastair D'Silva 
---
 arch/powerpc/mm/mem.c | 27 +--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index cff947cb2a84..a2758e473d58 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -104,6 +104,27 @@ int __weak remove_section_mapping(unsigned long start, 
unsigned long end)
return -ENODEV;
 }
 
+#define FLUSH_CHUNK_SIZE SZ_1G
+/**
+ * flush_dcache_range_chunked(): Write any modified data cache blocks out to
+ * memory and invalidate them, in chunks of up to FLUSH_CHUNK_SIZE
+ * Does not invalidate the corresponding instruction cache blocks.
+ *
+ * @start: the start address
+ * @stop: the stop address (exclusive)
+ * @chunk: the max size of the chunks
+ */
+static void flush_dcache_range_chunked(unsigned long start, unsigned long stop,
+  unsigned long chunk)
+{
+   unsigned long i;
+
+   for (i = start; i < stop; i += FLUSH_CHUNK_SIZE) {
+   flush_dcache_range(i, min(stop, start + FLUSH_CHUNK_SIZE));
+   cond_resched();
+   }
+}
+
 int __ref arch_add_memory(int nid, u64 start, u64 size,
struct mhp_restrictions *restrictions)
 {
@@ -120,7 +141,8 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
start, start + size, rc);
return -EFAULT;
}
-   flush_dcache_range(start, start + size);
+
+   flush_dcache_range_chunked(start, start + size, FLUSH_CHUNK_SIZE);
 
return __add_pages(nid, start_pfn, nr_pages, restrictions);
 }
@@ -137,7 +159,8 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size,
 
/* Remove htab bolted mappings for this section of memory */
start = (unsigned long)__va(start);
-   flush_dcache_range(start, start + size);
+   flush_dcache_range_chunked(start, start + size, FLUSH_CHUNK_SIZE);
+
ret = remove_section_mapping(start, start + size);
WARN_ON_ONCE(ret);
 
-- 
2.21.0



[PATCH v4 3/6] powerpc: define helpers to get L1 icache sizes

2019-09-25 Thread Alastair D'Silva
From: Alastair D'Silva 

This patch adds helpers to retrieve icache sizes, and renames the existing
helpers to make it clear that they are for dcache.

Signed-off-by: Alastair D'Silva 
---
 arch/powerpc/include/asm/cache.h  | 29 +++
 arch/powerpc/include/asm/cacheflush.h | 12 +--
 2 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h
index b3388d95f451..f852d5cd746c 100644
--- a/arch/powerpc/include/asm/cache.h
+++ b/arch/powerpc/include/asm/cache.h
@@ -55,25 +55,46 @@ struct ppc64_caches {
 
 extern struct ppc64_caches ppc64_caches;
 
-static inline u32 l1_cache_shift(void)
+static inline u32 l1_dcache_shift(void)
 {
return ppc64_caches.l1d.log_block_size;
 }
 
-static inline u32 l1_cache_bytes(void)
+static inline u32 l1_dcache_bytes(void)
 {
return ppc64_caches.l1d.block_size;
 }
+
+static inline u32 l1_icache_shift(void)
+{
+   return ppc64_caches.l1i.log_block_size;
+}
+
+static inline u32 l1_icache_bytes(void)
+{
+   return ppc64_caches.l1i.block_size;
+}
 #else
-static inline u32 l1_cache_shift(void)
+static inline u32 l1_dcache_shift(void)
 {
return L1_CACHE_SHIFT;
 }
 
-static inline u32 l1_cache_bytes(void)
+static inline u32 l1_dcache_bytes(void)
 {
return L1_CACHE_BYTES;
 }
+
+static inline u32 l1_icache_shift(void)
+{
+   return L1_CACHE_SHIFT;
+}
+
+static inline u32 l1_icache_bytes(void)
+{
+   return L1_CACHE_BYTES;
+}
+
 #endif
 #endif /* ! __ASSEMBLY__ */
 
diff --git a/arch/powerpc/include/asm/cacheflush.h 
b/arch/powerpc/include/asm/cacheflush.h
index eef388f2659f..ed57843ef452 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -63,8 +63,8 @@ static inline void __flush_dcache_icache_phys(unsigned long 
physaddr)
  */
 static inline void flush_dcache_range(unsigned long start, unsigned long stop)
 {
-   unsigned long shift = l1_cache_shift();
-   unsigned long bytes = l1_cache_bytes();
+   unsigned long shift = l1_dcache_shift();
+   unsigned long bytes = l1_dcache_bytes();
void *addr = (void *)(start & ~(bytes - 1));
unsigned long size = stop - (unsigned long)addr + (bytes - 1);
unsigned long i;
@@ -89,8 +89,8 @@ static inline void flush_dcache_range(unsigned long start, 
unsigned long stop)
  */
 static inline void clean_dcache_range(unsigned long start, unsigned long stop)
 {
-   unsigned long shift = l1_cache_shift();
-   unsigned long bytes = l1_cache_bytes();
+   unsigned long shift = l1_dcache_shift();
+   unsigned long bytes = l1_dcache_bytes();
void *addr = (void *)(start & ~(bytes - 1));
unsigned long size = stop - (unsigned long)addr + (bytes - 1);
unsigned long i;
@@ -108,8 +108,8 @@ static inline void clean_dcache_range(unsigned long start, 
unsigned long stop)
 static inline void invalidate_dcache_range(unsigned long start,
   unsigned long stop)
 {
-   unsigned long shift = l1_cache_shift();
-   unsigned long bytes = l1_cache_bytes();
+   unsigned long shift = l1_dcache_shift();
+   unsigned long bytes = l1_dcache_bytes();
void *addr = (void *)(start & ~(bytes - 1));
unsigned long size = stop - (unsigned long)addr + (bytes - 1);
unsigned long i;
-- 
2.21.0



[PATCH v4 2/6] powerpc: Allow 64bit VDSO __kernel_sync_dicache to work across ranges >4GB

2019-09-25 Thread Alastair D'Silva
From: Alastair D'Silva 

When calling __kernel_sync_dicache with a size >4GB, we were masking
off the upper 32 bits, so we would incorrectly flush a range smaller
than intended.

This patch replaces the 32 bit shifts with 64 bit ones, so that
the full size is accounted for.

Signed-off-by: Alastair D'Silva 
Cc: sta...@vger.kernel.org
---
 arch/powerpc/kernel/vdso64/cacheflush.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/vdso64/cacheflush.S 
b/arch/powerpc/kernel/vdso64/cacheflush.S
index 3f92561a64c4..526f5ba2593e 100644
--- a/arch/powerpc/kernel/vdso64/cacheflush.S
+++ b/arch/powerpc/kernel/vdso64/cacheflush.S
@@ -35,7 +35,7 @@ V_FUNCTION_BEGIN(__kernel_sync_dicache)
subfr8,r6,r4/* compute length */
add r8,r8,r5/* ensure we get enough */
lwz r9,CFG_DCACHE_LOGBLOCKSZ(r10)
-   srw.r8,r8,r9/* compute line count */
+   srd.r8,r8,r9/* compute line count */
crclr   cr0*4+so
beqlr   /* nothing to do? */
mtctr   r8
@@ -52,7 +52,7 @@ V_FUNCTION_BEGIN(__kernel_sync_dicache)
subfr8,r6,r4/* compute length */
add r8,r8,r5
lwz r9,CFG_ICACHE_LOGBLOCKSZ(r10)
-   srw.r8,r8,r9/* compute line count */
+   srd.r8,r8,r9/* compute line count */
crclr   cr0*4+so
beqlr   /* nothing to do? */
mtctr   r8
-- 
2.21.0



[PATCH v4 4/6] powerpc: Convert flush_icache_range & friends to C

2019-09-25 Thread Alastair D'Silva
From: Alastair D'Silva 

Similar to commit 22e9c88d486a
("powerpc/64: reuse PPC32 static inline flush_dcache_range()")
this patch converts the following ASM symbols to C:
flush_icache_range()
__flush_dcache_icache()
__flush_dcache_icache_phys()

This was done as we discovered a long-standing bug where the length of the
range was truncated due to using a 32 bit shift instead of a 64 bit one.

By converting these functions to C, it becomes easier to maintain.

flush_dcache_icache_phys() retains a critical assembler section as we must
ensure there are no memory accesses while the data MMU is disabled
(authored by Christophe Leroy). Since this has no external callers, it has
also been made static, allowing the compiler to inline it within
flush_dcache_icache_page().

Signed-off-by: Alastair D'Silva 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/cache.h  |  26 ++---
 arch/powerpc/include/asm/cacheflush.h |  24 ++--
 arch/powerpc/kernel/misc_32.S | 117 
 arch/powerpc/kernel/misc_64.S | 102 -
 arch/powerpc/mm/mem.c | 151 +-
 5 files changed, 172 insertions(+), 248 deletions(-)

diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h
index f852d5cd746c..e33451a9c4e6 100644
--- a/arch/powerpc/include/asm/cache.h
+++ b/arch/powerpc/include/asm/cache.h
@@ -98,20 +98,7 @@ static inline u32 l1_icache_bytes(void)
 #endif
 #endif /* ! __ASSEMBLY__ */
 
-#if defined(__ASSEMBLY__)
-/*
- * For a snooping icache, we still need a dummy icbi to purge all the
- * prefetched instructions from the ifetch buffers. We also need a sync
- * before the icbi to order the the actual stores to memory that might
- * have modified instructions with the icbi.
- */
-#define PURGE_PREFETCHED_INS   \
-   sync;   \
-   icbi0,r3;   \
-   sync;   \
-   isync
-
-#else
+#if !defined(__ASSEMBLY__)
 #define __read_mostly __attribute__((__section__(".data..read_mostly")))
 
 #ifdef CONFIG_PPC_BOOK3S_32
@@ -145,6 +132,17 @@ static inline void dcbst(void *addr)
 {
__asm__ __volatile__ ("dcbst %y0" : : "Z"(*(u8 *)addr) : "memory");
 }
+
+static inline void icbi(void *addr)
+{
+   asm volatile ("icbi 0, %0" : : "r"(addr) : "memory");
+}
+
+static inline void iccci(void *addr)
+{
+   asm volatile ("iccci 0, %0" : : "r"(addr) : "memory");
+}
+
 #endif /* !__ASSEMBLY__ */
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_CACHE_H */
diff --git a/arch/powerpc/include/asm/cacheflush.h 
b/arch/powerpc/include/asm/cacheflush.h
index ed57843ef452..4a1c9f0200e1 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -42,24 +42,20 @@ extern void flush_dcache_page(struct page *page);
 #define flush_dcache_mmap_lock(mapping)do { } while (0)
 #define flush_dcache_mmap_unlock(mapping)  do { } while (0)
 
-extern void flush_icache_range(unsigned long, unsigned long);
+void flush_icache_range(unsigned long start, unsigned long stop);
 extern void flush_icache_user_range(struct vm_area_struct *vma,
struct page *page, unsigned long addr,
int len);
-extern void __flush_dcache_icache(void *page_va);
 extern void flush_dcache_icache_page(struct page *page);
-#if defined(CONFIG_PPC32) && !defined(CONFIG_BOOKE)
-extern void __flush_dcache_icache_phys(unsigned long physaddr);
-#else
-static inline void __flush_dcache_icache_phys(unsigned long physaddr)
-{
-   BUG();
-}
-#endif
-
-/*
- * Write any modified data cache blocks out to memory and invalidate them.
- * Does not invalidate the corresponding instruction cache blocks.
+void __flush_dcache_icache(void *page);
+
+/**
+ * flush_dcache_range(): Write any modified data cache blocks out to memory and
+ * invalidate them. Does not invalidate the corresponding instruction cache
+ * blocks.
+ *
+ * @start: the start address
+ * @stop: the stop address (exclusive)
  */
 static inline void flush_dcache_range(unsigned long start, unsigned long stop)
 {
diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index fe4bd321730e..12b95e6799d4 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -318,123 +318,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_UNIFIED_ID_CACHE)
 EXPORT_SYMBOL(flush_instruction_cache)
 #endif /* CONFIG_PPC_8xx */
 
-/*
- * Write any modified data cache blocks out to memory
- * and invalidate the corresponding instruction cache blocks.
- * This is a no-op on the 601.
- *
- * flush_icache_range(unsigned long start, unsigned long stop)
- */
-_GLOBAL(flush_icache_range)
-BEGIN_FTR_SECTION
-   PURGE_PREFETCHED_INS
-   blr /* for 601, do nothing */
-END_FTR_SECTION_IFSET

[PATCH v4 1/6] powerpc: Allow flush_icache_range to work across ranges >4GB

2019-09-25 Thread Alastair D'Silva
From: Alastair D'Silva 

When calling flush_icache_range with a size >4GB, we were masking
off the upper 32 bits, so we would incorrectly flush a range smaller
than intended.

This patch replaces the 32 bit shifts with 64 bit ones, so that
the full size is accounted for.

Signed-off-by: Alastair D'Silva 
Cc: sta...@vger.kernel.org
---
 arch/powerpc/kernel/misc_64.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
index b55a7b4cb543..9bc0aa9aeb65 100644
--- a/arch/powerpc/kernel/misc_64.S
+++ b/arch/powerpc/kernel/misc_64.S
@@ -82,7 +82,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
subfr8,r6,r4/* compute length */
add r8,r8,r5/* ensure we get enough */
lwz r9,DCACHEL1LOGBLOCKSIZE(r10)/* Get log-2 of cache block 
size */
-   srw.r8,r8,r9/* compute line count */
+   srd.r8,r8,r9/* compute line count */
beqlr   /* nothing to do? */
mtctr   r8
 1: dcbst   0,r6
@@ -98,7 +98,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
subfr8,r6,r4/* compute length */
add r8,r8,r5
lwz r9,ICACHEL1LOGBLOCKSIZE(r10)/* Get log-2 of Icache block 
size */
-   srw.r8,r8,r9/* compute line count */
+   srd.r8,r8,r9/* compute line count */
beqlr   /* nothing to do? */
mtctr   r8
 2: icbi0,r6
-- 
2.21.0



[PATCH v4 0/6] powerpc: convert cache asm to C

2019-09-25 Thread Alastair D'Silva
From: Alastair D'Silva 

This series addresses a few issues discovered in how we flush caches:
1. Flushes were truncated at 4GB, so larger flushes were incorrect.
2. Flushing the dcache in arch_add_memory was unnecessary

This series also converts much of the cache assembler to C, with the
aim of making it easier to maintain.

Changelog:
 V4:
 - Split out VDSO patch
 - Pass/cast the correct types in 'powerpc: Convert
   flush_icache_range & friends to C'
 V3:
 - factor out chunking loop
 - Replace __asm__ __volatile__ with asm volatile
 - Replace flush_coherent_icache_or_return macro with
   flush_coherent_icache function
 - factor our invalidate_icache_range
 - Replace code duplicating clean_dcache_range() in
   __flush_dcache_icache() with a call to clean_dcache_range()
 - Remove redundant #ifdef CONFIG_44x
 - Fix preprocessor logic:
 #if !defined(CONFIG_PPC_8xx) & !defined(CONFIG_PPC64)
 - Added loop(1|2) to earlyclobbers in flush_dcache_icache_phys
 - Drop "Remove extern" patch
 - Replace 32 bit shifts in 64 bit VDSO with 64 bit ones
 V2:
 - Replace C implementation of flush_dcache_icache_phys() with
   inline assembler authored by Christophe Leroy
 - Add memory clobbers for iccci implementation
 - Give __flush_dcache_icache a real implementation, it can't
   just be a wrapper around flush_icache_range()
 - Remove PPC64_CACHES from misc_64.S
 - Replace code duplicating clean_dcache_range() in
   flush_icache_range() with a call to clean_dcache_range()
 - Replace #ifdef CONFIG_44x with IS_ENABLED(...) in
   flush_icache_cange()
 - Use 1GB chunks instead of 16GB in arch_*_memory


Alastair D'Silva (6):
  powerpc: Allow flush_icache_range to work across ranges >4GB
  powerpc: Allow 64bit VDSO __kernel_sync_dicache to work across ranges
>4GB
  powerpc: define helpers to get L1 icache sizes
  powerpc: Convert flush_icache_range & friends to C
  powerpc: Chunk calls to flush_dcache_range in arch_*_memory
  powerpc: Don't flush caches when adding memory

 arch/powerpc/include/asm/cache.h|  55 +---
 arch/powerpc/include/asm/cacheflush.h   |  36 +++--
 arch/powerpc/kernel/misc_32.S   | 117 
 arch/powerpc/kernel/misc_64.S   | 102 --
 arch/powerpc/kernel/vdso64/cacheflush.S |   4 +-
 arch/powerpc/mm/mem.c   | 176 +++-
 6 files changed, 228 insertions(+), 262 deletions(-)

-- 
2.21.0



Re: [PATCH 4/5] ocxl: Add functions to map/unmap LPC memory

2019-09-25 Thread Alastair D'Silva
On Mon, 2019-09-23 at 13:39 +0200, Frederic Barrat wrote:
> 
> 
> > diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
> > index 2874811a4398..9e303a5f4d85 100644
> > --- a/drivers/misc/ocxl/link.c
> > +++ b/drivers/misc/ocxl/link.c
> > @@ -738,7 +738,7 @@ int ocxl_link_add_lpc_mem(void *link_handle,
> > u64 size)
> >   }
> >   EXPORT_SYMBOL_GPL(ocxl_link_add_lpc_mem);
> >   
> > -u64 ocxl_link_lpc_map(void *link_handle, struct pci_dev *pdev)
> > +u64 ocxl_link_lpc_online(void *link_handle, struct pci_dev *pdev)
> >   {
> > struct ocxl_link *link = (struct ocxl_link *) link_handle;
> >   
> 
> A bit of a nitpick, but is there any specific reason to rename with 
> "online" suffix? I'm discovering it myself, but with memory hotplug, 
> "onlining" seems to refer to the second, a.k.a logical memory
> hotplug 
> phase (as described in Documentation/admin-guide/mm/memory-
> hotplug.rst). 
> We'll need to worry about it, but the function here is really doing
> the 
> first phase, a.k.a physical memory hotplug.
> 
>Fred

It's been a while since I wrote that, so I can't remember why, but your
reasoning is sound, I'll drop the rename.

> 
> 
> > @@ -759,7 +759,7 @@ u64 ocxl_link_lpc_map(void *link_handle, struct
> > pci_dev *pdev)
> > return link->lpc_mem;
> >   }
> >   
> > -void ocxl_link_lpc_release(void *link_handle, struct pci_dev
> > *pdev)
> > +void ocxl_link_lpc_offline(void *link_handle, struct pci_dev
> > *pdev)
> >   {
> > struct ocxl_link *link = (struct ocxl_link *) link_handle;
> >   
> > diff --git a/drivers/misc/ocxl/ocxl_internal.h
> > b/drivers/misc/ocxl/ocxl_internal.h
> > index db2647a90fc8..5656a4aab5b7 100644
> > --- a/drivers/misc/ocxl/ocxl_internal.h
> > +++ b/drivers/misc/ocxl/ocxl_internal.h
> > @@ -52,6 +52,12 @@ struct ocxl_afu {
> > void __iomem *global_mmio_ptr;
> > u64 pp_mmio_start;
> > void *private;
> > +   u64 lpc_base_addr; /* Covers both LPC & special purpose memory
> > */
> > +   struct bin_attribute attr_global_mmio;
> > +   struct bin_attribute attr_lpc_mem;
> > +   struct resource lpc_res;
> > +   struct bin_attribute attr_special_purpose_mem;
> > +   struct resource special_purpose_res;
> >   };
> >   
> >   enum ocxl_context_status {
> > @@ -170,7 +176,7 @@ extern u64 ocxl_link_get_lpc_mem_sz(void
> > *link_handle);
> >* @link_handle: The OpenCAPI link handle
> >* @pdev: A device that is on the link
> >*/
> > -u64 ocxl_link_lpc_map(void *link_handle, struct pci_dev *pdev);
> > +u64 ocxl_link_lpc_online(void *link_handle, struct pci_dev *pdev);
> >   
> >   /**
> >* Release the LPC memory device for an OpenCAPI device
> > @@ -181,6 +187,6 @@ u64 ocxl_link_lpc_map(void *link_handle, struct
> > pci_dev *pdev);
> >* @link_handle: The OpenCAPI link handle
> >* @pdev: A device that is on the link
> >*/
> > -void ocxl_link_lpc_release(void *link_handle, struct pci_dev
> > *pdev);
> > +void ocxl_link_lpc_offline(void *link_handle, struct pci_dev
> > *pdev);
> >   
> >   #endif /* _OCXL_INTERNAL_H_ */
> > diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
> > index 06dd5839e438..a1897737908d 100644
> > --- a/include/misc/ocxl.h
> > +++ b/include/misc/ocxl.h
> > @@ -212,6 +212,24 @@ int ocxl_irq_set_handler(struct ocxl_context
> > *ctx, int irq_id,
> >   
> >   // AFU Metadata
> >   
> > +/**
> > + * Map the LPC system & special purpose memory for an AFU
> > + *
> > + * Do not call this during device discovery, as there may me
> > multiple
> > + * devices on a link, and the memory is mapped for the whole link,
> > not
> > + * just one device. It should only be called after all devices
> > have
> > + * registered their memory on the link.
> > + *
> > + * afu: The AFU that has the LPC memory to map
> > + */
> > +extern int ocxl_map_lpc_mem(struct ocxl_afu *afu);
> > +
> > +/**
> > + * Get the physical address range of LPC memory for an AFU
> > + * afu: The AFU associated with the LPC memory
> > + */
> > +extern struct resource *ocxl_afu_lpc_mem(struct ocxl_afu *afu);
> > +
> >   /**
> >* Get a pointer to the config for an AFU
> >*
> > 
-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



[PATCH v4 0/1] Add bounds check for Hotplugged memory

2019-09-25 Thread Alastair D'Silva
From: Alastair D'Silva 

This series adds bounds checks for hotplugged memory, ensuring that
it is within the physically addressable range (for platforms that
define MAX_(POSSIBLE_)PHYSMEM_BITS.

This allows for early failure, rather than attempting to access
bogus section numbers.

Changelog:
 V4:
   - Relocate call to __add_pages
   - Add a warning when the addressable check fails
 V3:
   - Perform the addressable check before we take the hotplug lock
 V2:
   - Don't use MAX_POSSIBLE_PHYSMEM_BITS as it's wider that what
 may be available

Alastair D'Silva (1):
  memory_hotplug: Add a bounds check to __add_pages

 mm/memory_hotplug.c | 21 +
 1 file changed, 21 insertions(+)

-- 
2.21.0



[PATCH v4] memory_hotplug: Add a bounds check to __add_pages

2019-09-25 Thread Alastair D'Silva
From: Alastair D'Silva 

On PowerPC, the address ranges allocated to OpenCAPI LPC memory
are allocated from firmware. These address ranges may be higher
than what older kernels permit, as we increased the maximum
permissable address in commit 4ffe713b7587
("powerpc/mm: Increase the max addressable memory to 2PB"). It is
possible that the addressable range may change again in the
future.

In this scenario, we end up with a bogus section returned from
__section_nr (see the discussion on the thread "mm: Trigger bug on
if a section is not found in __section_nr").

Adding a check here means that we fail early and have an
opportunity to handle the error gracefully, rather than rumbling
on and potentially accessing an incorrect section.

Further discussion is also on the thread ("powerpc: Perform a bounds
check in arch_add_memory")
http://lkml.kernel.org/r/20190827052047.31547-1-alast...@au1.ibm.com

Signed-off-by: Alastair D'Silva 
---
 mm/memory_hotplug.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c73f09913165..212804c0f7f5 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -278,6 +278,22 @@ static int check_pfn_span(unsigned long pfn, unsigned long 
nr_pages,
return 0;
 }
 
+static int check_hotplug_memory_addressable(unsigned long pfn,
+   unsigned long nr_pages)
+{
+   unsigned long max_addr = ((pfn + nr_pages) << PAGE_SHIFT) - 1;
+
+   if (max_addr >> MAX_PHYSMEM_BITS) {
+   WARN(1,
+"Hotplugged memory exceeds maximum addressable address, 
range=%#lx-%#lx, maximum=%#lx\n",
+pfn << PAGE_SHIFT, max_addr,
+(1ul << (MAX_PHYSMEM_BITS + 1)) - 1);
+   return -E2BIG;
+   }
+
+   return 0;
+}
+
 /*
  * Reasonably generic function for adding memory.  It is
  * expected that archs that support memory hotplug will
@@ -291,6 +307,10 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned 
long nr_pages,
unsigned long nr, start_sec, end_sec;
struct vmem_altmap *altmap = restrictions->altmap;
 
+   err = check_hotplug_memory_addressable(pfn, nr_pages);
+   if (err)
+   return err;
+
if (altmap) {
/*
 * Validate altmap is within bounds of the total request
-- 
2.21.0



Re: [PATCH v3 1/2] memory_hotplug: Add a bounds check to check_hotplug_memory_range()

2019-09-23 Thread Alastair D'Silva
On Mon, 2019-09-23 at 14:25 +0200, Michal Hocko wrote:
> On Tue 17-09-19 11:07:47, Alastair D'Silva wrote:
> > From: Alastair D'Silva 
> > 
> > On PowerPC, the address ranges allocated to OpenCAPI LPC memory
> > are allocated from firmware. These address ranges may be higher
> > than what older kernels permit, as we increased the maximum
> > permissable address in commit 4ffe713b7587
> > ("powerpc/mm: Increase the max addressable memory to 2PB"). It is
> > possible that the addressable range may change again in the
> > future.
> > 
> > In this scenario, we end up with a bogus section returned from
> > __section_nr (see the discussion on the thread "mm: Trigger bug on
> > if a section is not found in __section_nr").
> > 
> > Adding a check here means that we fail early and have an
> > opportunity to handle the error gracefully, rather than rumbling
> > on and potentially accessing an incorrect section.
> > 
> > Further discussion is also on the thread ("powerpc: Perform a
> > bounds
> > check in arch_add_memory").
> 
> It would be nicer to refer to this by a message-id based url.
> E.g. 
> http://lkml.kernel.org/r/20190827052047.31547-1-alast...@au1.ibm.com
> 

Ok.

> > Signed-off-by: Alastair D'Silva 
> > ---
> >  include/linux/memory_hotplug.h |  1 +
> >  mm/memory_hotplug.c| 13 -
> >  2 files changed, 13 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/memory_hotplug.h
> > b/include/linux/memory_hotplug.h
> > index f46ea71b4ffd..bc477e98a310 100644
> > --- a/include/linux/memory_hotplug.h
> > +++ b/include/linux/memory_hotplug.h
> > @@ -110,6 +110,7 @@ extern void
> > __online_page_increment_counters(struct page *page);
> >  extern void __online_page_free(struct page *page);
> >  
> >  extern int try_online_node(int nid);
> > +int check_hotplug_memory_addressable(u64 start, u64 size);
> >  
> >  extern int arch_add_memory(int nid, u64 start, u64 size,
> > struct mhp_restrictions *restrictions);
> > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > index c73f09913165..02cb9a74f561 100644
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -1030,6 +1030,17 @@ int try_online_node(int nid)
> > return ret;
> >  }
> >  
> > +int check_hotplug_memory_addressable(u64 start, u64 size)
> > +{
> > +#ifdef MAX_PHYSMEM_BITS
> > +   if ((start + size - 1) >> MAX_PHYSMEM_BITS)
> > +   return -E2BIG;
> > +#endif
> 
> Is there any arch which doesn't define this? We seemed to be using
> this
> in sparsemem code without any ifdefs.

A few, but none of them would be enabling hotplug (which depends on
sparsemem), so you're right, the ifdef could be removed.

> > +
> > +   return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(check_hotplug_memory_addressable);
> 
> If you squashed the patch 2 then it would become clear why this needs
> to
> be exported because you would have a driver user. I find it a bit
> unfortunate to expect that any driver which uses the hotplug code is
> expected to know that this check should be called. This sounds too
> error
> prone. Why hasn't been this done at __add_pages layer?
> 

It seemed that is should be a peer of check_hotplug_memory_range(), as
it gives similar feedback (whether the provided range is suitable).

If we did the check in __add_pages, wouldn't we potentially lose bits
from the LSBs of start & size, or is there some other requirement that
already ensures start & size are always page aligned?

It appears this patch has been accepted - if we were to make this
change, does it go as another spin on this series or a new series?

> > +
> >  static int check_hotplug_memory_range(u64 start, u64 size)
> >  {
> > /* memory range must be block size aligned */
> > @@ -1040,7 +1051,7 @@ static int check_hotplug_memory_range(u64
> > start, u64 size)
> > return -EINVAL;
> > }
> >  
> > -   return 0;
> > +   return check_hotplug_memory_addressable(start, size);
> 
> This will result in a silent failure (unlike misaligned case). Is
> this
> what we want?

Good point - I guess it comes down to, is there anything we expect an
end user to do about it? I'm not sure there is, in which case the bad
RC, which is reported up every call chain that I can see, should be
sufficient.

> >  }
> >  
> >  static int online_memory_block(struct memory_block *mem, void
> > *arg)
> > -- 
> > 2.21.0
> > 

-- 
Alastair D'Silva   mob: 0423 762 819
skype: alastair_dsilva
Twitter: @EvilDeece
blog: http://alastair.d-silva.org




Re: [PATCH 3/5] ocxl: Tally up the LPC memory on a link & allow it to be mapped

2019-09-18 Thread Alastair D'Silva
On Wed, 2019-09-18 at 16:02 +0200, Frederic Barrat wrote:
> 
> Le 17/09/2019 à 03:42, Alastair D'Silva a écrit :
> > From: Alastair D'Silva 
> > 
> > Tally up the LPC memory on an OpenCAPI link & allow it to be mapped
> > 
> > Signed-off-by: Alastair D'Silva 
> > ---
> >   drivers/misc/ocxl/core.c  |  9 +
> >   drivers/misc/ocxl/link.c  | 61
> > +++
> >   drivers/misc/ocxl/ocxl_internal.h | 42 +
> >   3 files changed, 112 insertions(+)
> > 
> > diff --git a/drivers/misc/ocxl/core.c b/drivers/misc/ocxl/core.c
> > index b7a09b21ab36..fdfe4e0a34e1 100644
> > --- a/drivers/misc/ocxl/core.c
> > +++ b/drivers/misc/ocxl/core.c
> > @@ -230,8 +230,17 @@ static int configure_afu(struct ocxl_afu *afu,
> > u8 afu_idx, struct pci_dev *dev)
> > if (rc)
> > goto err_free_pasid;
> >   
> > +   if (afu->config.lpc_mem_size || afu-
> > >config.special_purpose_mem_size) {
> > +   rc = ocxl_link_add_lpc_mem(afu->fn->link,
> > +   afu->config.lpc_mem_size + afu-
> > >config.special_purpose_mem_size);
> 
> I don't think we should count the special purpose memory, as it's
> not 
> meant to be accessed through the GPU mem BAR, but I'll check.

At least for OpenCAPI 3.0, there is no other in-spec way to access the
memory if it is not mapped by the NPU.

> 
> What happens when unconfiguring the AFU? We should reduce the range
> (see 
> also below). Partial reconfig doesn't seem so far off, so we should
> take 
> it into account.
> 

The mapping is left until the last AFU on the link offlines it's
memory, at which point we clear the mapping from the NPU.

> 
> > +   if (rc)
> > +   goto err_free_mmio;
> > +   }
> > +
> > return 0;
> >   
> > +err_free_mmio:
> > +   unmap_mmio_areas(afu);
> >   err_free_pasid:
> > reclaim_afu_pasid(afu);
> >   err_free_actag:
> > diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
> > index 58d111afd9f6..2874811a4398 100644
> > --- a/drivers/misc/ocxl/link.c
> > +++ b/drivers/misc/ocxl/link.c
> > @@ -84,6 +84,11 @@ struct ocxl_link {
> > int dev;
> > atomic_t irq_available;
> > struct spa *spa;
> > +   struct mutex lpc_mem_lock;
> > +   u64 lpc_mem_sz; /* Total amount of LPC memory presented on the
> > link */
> > +   u64 lpc_mem;
> > +   int lpc_consumers;
> > +
> > void *platform_data;
> >   };
> >   static struct list_head links_list = LIST_HEAD_INIT(links_list);
> > @@ -396,6 +401,8 @@ static int alloc_link(struct pci_dev *dev, int
> > PE_mask, struct ocxl_link **out_l
> > if (rc)
> > goto err_spa;
> >   
> > +   mutex_init(>lpc_mem_lock);
> > +
> > /* platform specific hook */
> > rc = pnv_ocxl_spa_setup(dev, link->spa->spa_mem, PE_mask,
> > >platform_data);
> > @@ -711,3 +718,57 @@ void ocxl_link_free_irq(void *link_handle, int
> > hw_irq)
> > atomic_inc(>irq_available);
> >   }
> >   EXPORT_SYMBOL_GPL(ocxl_link_free_irq);
> > +
> > +int ocxl_link_add_lpc_mem(void *link_handle, u64 size)
> > +{
> > +   struct ocxl_link *link = (struct ocxl_link *) link_handle;
> > +
> > +   u64 orig_size;
> > +   bool good = false;
> > +
> > +   mutex_lock(>lpc_mem_lock);
> > +   orig_size = link->lpc_mem_sz;
> > +   link->lpc_mem_sz += size;
> 
> 
> We have a choice to make here:
> 1. either we only support one LPC memory-carrying AFU (and the above
> is 
> overkill)
> 2. or we support multiple AFUs with LPC memory (on the same
> function), 
> but then I think the above is too simple.
> 
>  From the opencapi spec, each AFU can define a chunk of memory with
> a 
> starting address and a size. There's no rule which says they have to
> be 
> contiguous. There's no rule which says it must start at 0. So to
> support 
> multiple AFUs with LPC memory, we should record the current maximum 
> range instead of just the global size. Ultimately, we need to tell
> the 
> NPU the range of permissible addresses. It starts at 0, so we need
> to 
> take into account any intial offset and holes.
> 
> I would go for option 2, to at least be consistent within ocxl and 
> support multiple AFUs. Even though I don't think we'll see FPGA
> images 
> with multiple AFUs with LPC memory any time soon.
> 

Ill rework this to take an offset & size, the NPU wil

Re: [PATCH v3 1/5] powerpc: Allow flush_icache_range to work across ranges >4GB

2019-09-18 Thread Alastair D'Silva
On Thu, 2019-09-19 at 13:43 +1000, Michael Ellerman wrote:
> "Alastair D'Silva"  writes:
> > From: Alastair D'Silva 
> > 
> > When calling flush_icache_range with a size >4GB, we were masking
> > off the upper 32 bits, so we would incorrectly flush a range
> > smaller
> > than intended.
> > 
> > __kernel_sync_dicache in the 64 bit VDSO has the same bug.
> 
> Please fix that in a separate patch.
> 
> Your subject doesn't mention __kernel_sync_dicache(), and also the
> two
> changes backport differently, so it's better if they're done as
> separate
> patches.
> 

Ok.

> cheers
> 
> > This patch replaces the 32 bit shifts with 64 bit ones, so that
> > the full size is accounted for.
> > 
> > Signed-off-by: Alastair D'Silva 
> > Cc: sta...@vger.kernel.org
> > ---
> >  arch/powerpc/kernel/misc_64.S   | 4 ++--
> >  arch/powerpc/kernel/vdso64/cacheflush.S | 4 ++--
> >  2 files changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/misc_64.S
> > b/arch/powerpc/kernel/misc_64.S
> > index b55a7b4cb543..9bc0aa9aeb65 100644
> > --- a/arch/powerpc/kernel/misc_64.S
> > +++ b/arch/powerpc/kernel/misc_64.S
> > @@ -82,7 +82,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
> > subfr8,r6,r4/* compute length */
> > add r8,r8,r5/* ensure we get enough */
> > lwz r9,DCACHEL1LOGBLOCKSIZE(r10)/* Get log-2 of cache block
> > size */
> > -   srw.r8,r8,r9/* compute line count */
> > +   srd.r8,r8,r9/* compute line count */
> > beqlr   /* nothing to do? */
> > mtctr   r8
> >  1: dcbst   0,r6
> > @@ -98,7 +98,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
> > subfr8,r6,r4/* compute length */
> > add r8,r8,r5
> > lwz r9,ICACHEL1LOGBLOCKSIZE(r10)/* Get log-2 of Icache
> > block size */
> > -   srw.r8,r8,r9/* compute line count */
> > +   srd.r8,r8,r9/* compute line count */
> > beqlr   /* nothing to do? */
> > mtctr   r8
> >  2: icbi0,r6
> > diff --git a/arch/powerpc/kernel/vdso64/cacheflush.S
> > b/arch/powerpc/kernel/vdso64/cacheflush.S
> > index 3f92561a64c4..526f5ba2593e 100644
> > --- a/arch/powerpc/kernel/vdso64/cacheflush.S
> > +++ b/arch/powerpc/kernel/vdso64/cacheflush.S
> > @@ -35,7 +35,7 @@ V_FUNCTION_BEGIN(__kernel_sync_dicache)
> > subfr8,r6,r4/* compute length */
> > add r8,r8,r5/* ensure we get enough */
> > lwz r9,CFG_DCACHE_LOGBLOCKSZ(r10)
> > -   srw.r8,r8,r9/* compute line count */
> > +   srd.r8,r8,r9/* compute line count */
> > crclr   cr0*4+so
> > beqlr   /* nothing to do? */
> > mtctr   r8
> > @@ -52,7 +52,7 @@ V_FUNCTION_BEGIN(__kernel_sync_dicache)
> > subfr8,r6,r4/* compute length */
> > add r8,r8,r5
> > lwz r9,CFG_ICACHE_LOGBLOCKSZ(r10)
> > -   srw.r8,r8,r9/* compute line count */
> > +   srd.r8,r8,r9/* compute line count */
> > crclr   cr0*4+so
> > beqlr   /* nothing to do? */
> > mtctr   r8
> > -- 
> > 2.21.0
-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



Re: [PATCH 4/5] ocxl: Add functions to map/unmap LPC memory

2019-09-18 Thread Alastair D'Silva
On Tue, 2019-09-17 at 11:43 +1000, Alastair D'Silva wrote:
> From: Alastair D'Silva 
> 
> Add functions to map/unmap LPC memory
> 
> Signed-off-by: Alastair D'Silva 
> ---
>  drivers/misc/ocxl/config.c|  4 +++
>  drivers/misc/ocxl/core.c  | 50
> +++
>  drivers/misc/ocxl/link.c  |  4 +--
>  drivers/misc/ocxl/ocxl_internal.h | 10 +--
>  include/misc/ocxl.h   | 18 +++
>  5 files changed, 82 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
> index c8e19bfb5ef9..fb0c3b6f8312 100644
> --- a/drivers/misc/ocxl/config.c
> +++ b/drivers/misc/ocxl/config.c
> @@ -568,6 +568,10 @@ static int read_afu_lpc_memory_info(struct
> pci_dev *dev,
>   afu->special_purpose_mem_size =
>   total_mem_size - lpc_mem_size;
>   }
> +
> + dev_info(>dev, "Probed LPC memory of %#llx bytes and
> special purpose memory of %#llx bytes\n",
> + afu->lpc_mem_size, afu->special_purpose_mem_size);
> +
>   return 0;
>  }
>  
> diff --git a/drivers/misc/ocxl/core.c b/drivers/misc/ocxl/core.c
> index fdfe4e0a34e1..eb24bb9d655f 100644
> --- a/drivers/misc/ocxl/core.c
> +++ b/drivers/misc/ocxl/core.c
> @@ -210,6 +210,55 @@ static void unmap_mmio_areas(struct ocxl_afu
> *afu)
>   release_fn_bar(afu->fn, afu->config.global_mmio_bar);
>  }
>  
> +int ocxl_map_lpc_mem(struct ocxl_afu *afu)
> +{
> + struct pci_dev *dev = to_pci_dev(afu->fn->dev.parent);
> +
> + if ((afu->config.lpc_mem_size + afu-
> >config.special_purpose_mem_size) == 0)
> + return 0;
> +
> + afu->lpc_base_addr = ocxl_link_lpc_online(afu->fn->link, dev);
> + if (afu->lpc_base_addr == 0)
> + return -EINVAL;
> +
> + if (afu->config.lpc_mem_size) {
> + afu->lpc_res.start = afu->lpc_base_addr + afu-
> >config.lpc_mem_offset;
> + afu->lpc_res.end = afu->lpc_res.start + afu-
> >config.lpc_mem_size - 1;
> + }
> +
> + if (afu->config.special_purpose_mem_size) {
> + afu->special_purpose_res.start = afu->lpc_base_addr +
> +  afu-
> >config.special_purpose_mem_offset;
> + afu->special_purpose_res.end = afu-
> >special_purpose_res.start +
> +afu-
> >config.special_purpose_mem_size - 1;
> + }
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(ocxl_map_lpc_mem);
> +
> +struct resource *ocxl_afu_lpc_mem(struct ocxl_afu *afu)
> +{
> + return >lpc_res;
> +}
> +EXPORT_SYMBOL(ocxl_afu_lpc_mem);
> +
> +static void unmap_lpc_mem(struct ocxl_afu *afu)
> +{
> + struct pci_dev *dev = to_pci_dev(afu->fn->dev.parent);
> +
> + if (afu->lpc_res.start || afu->special_purpose_res.start) {
> + void *link = afu->fn->link;
> +
> + ocxl_link_lpc_offline(link, dev);
> +
> + afu->lpc_res.start = 0;
> + afu->lpc_res.end = 0;
> + afu->special_purpose_res.start = 0;
> + afu->special_purpose_res.end = 0;
> + }
> +}
> +
>  static int configure_afu(struct ocxl_afu *afu, u8 afu_idx, struct
> pci_dev *dev)
>  {
>   int rc;
> @@ -250,6 +299,7 @@ static int configure_afu(struct ocxl_afu *afu, u8
> afu_idx, struct pci_dev *dev)
>  
>  static void deconfigure_afu(struct ocxl_afu *afu)
>  {
> + unmap_lpc_mem(afu);
>   unmap_mmio_areas(afu);
>   reclaim_afu_pasid(afu);
>   reclaim_afu_actag(afu);
> diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
> index 2874811a4398..9e303a5f4d85 100644
> --- a/drivers/misc/ocxl/link.c
> +++ b/drivers/misc/ocxl/link.c
> @@ -738,7 +738,7 @@ int ocxl_link_add_lpc_mem(void *link_handle, u64
> size)
>  }
>  EXPORT_SYMBOL_GPL(ocxl_link_add_lpc_mem);
>  
> -u64 ocxl_link_lpc_map(void *link_handle, struct pci_dev *pdev)
> +u64 ocxl_link_lpc_online(void *link_handle, struct pci_dev *pdev)
>  {
>   struct ocxl_link *link = (struct ocxl_link *) link_handle;
>  
> @@ -759,7 +759,7 @@ u64 ocxl_link_lpc_map(void *link_handle, struct
> pci_dev *pdev)
>   return link->lpc_mem;
>  }
>  
> -void ocxl_link_lpc_release(void *link_handle, struct pci_dev *pdev)
> +void ocxl_link_lpc_offline(void *link_handle, struct pci_dev *pdev)
>  {
>   struct ocxl_link *link = (struct ocxl_link *) link_handle;
>  
> diff --git a/drivers/misc/ocxl/ocxl_internal.h
> b/drivers/misc/ocxl/ocxl_in

Re: [PATCH 4/5] ocxl: Add functions to map/unmap LPC memory

2019-09-18 Thread Alastair D'Silva
On Wed, 2019-09-18 at 16:03 +0200, Frederic Barrat wrote:
> 
> Le 17/09/2019 à 03:43, Alastair D'Silva a écrit :
> > From: Alastair D'Silva 
> > 
> > Add functions to map/unmap LPC memory
> > 
> > Signed-off-by: Alastair D'Silva 
> > ---
> >   drivers/misc/ocxl/config.c|  4 +++
> >   drivers/misc/ocxl/core.c  | 50
> > +++
> >   drivers/misc/ocxl/link.c  |  4 +--
> >   drivers/misc/ocxl/ocxl_internal.h | 10 +--
> >   include/misc/ocxl.h   | 18 +++
> >   5 files changed, 82 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/misc/ocxl/config.c
> > b/drivers/misc/ocxl/config.c
> > index c8e19bfb5ef9..fb0c3b6f8312 100644
> > --- a/drivers/misc/ocxl/config.c
> > +++ b/drivers/misc/ocxl/config.c
> > @@ -568,6 +568,10 @@ static int read_afu_lpc_memory_info(struct
> > pci_dev *dev,
> > afu->special_purpose_mem_size =
> > total_mem_size - lpc_mem_size;
> > }
> > +
> > +   dev_info(>dev, "Probed LPC memory of %#llx bytes and
> > special purpose memory of %#llx bytes\n",
> > +   afu->lpc_mem_size, afu->special_purpose_mem_size);
> > +
> > return 0;
> >   }
> >   
> > diff --git a/drivers/misc/ocxl/core.c b/drivers/misc/ocxl/core.c
> > index fdfe4e0a34e1..eb24bb9d655f 100644
> > --- a/drivers/misc/ocxl/core.c
> > +++ b/drivers/misc/ocxl/core.c
> > @@ -210,6 +210,55 @@ static void unmap_mmio_areas(struct ocxl_afu
> > *afu)
> > release_fn_bar(afu->fn, afu->config.global_mmio_bar);
> >   }
> >   
> > +int ocxl_map_lpc_mem(struct ocxl_afu *afu)
> > +{
> > +   struct pci_dev *dev = to_pci_dev(afu->fn->dev.parent);
> > +
> > +   if ((afu->config.lpc_mem_size + afu-
> > >config.special_purpose_mem_size) == 0)
> > +   return 0;
> > +
> > +   afu->lpc_base_addr = ocxl_link_lpc_online(afu->fn->link, dev);
> > +   if (afu->lpc_base_addr == 0)
> > +   return -EINVAL;
> > +
> > +   if (afu->config.lpc_mem_size) {
> > +   afu->lpc_res.start = afu->lpc_base_addr + afu-
> > >config.lpc_mem_offset;
> > +   afu->lpc_res.end = afu->lpc_res.start + afu-
> > >config.lpc_mem_size - 1;
> > +   }
> > +
> > +   if (afu->config.special_purpose_mem_size) {
> > +   afu->special_purpose_res.start = afu->lpc_base_addr +
> > +afu-
> > >config.special_purpose_mem_offset;
> > +   afu->special_purpose_res.end = afu-
> > >special_purpose_res.start +
> > +  afu-
> > >config.special_purpose_mem_size - 1;
> > +   }
> > +
> > +   return 0;
> > +}
> > +EXPORT_SYMBOL(ocxl_map_lpc_mem);
> > +
> > +struct resource *ocxl_afu_lpc_mem(struct ocxl_afu *afu)
> > +{
> > +   return >lpc_res;
> > +}
> > +EXPORT_SYMBOL(ocxl_afu_lpc_mem);
> > +
> > +static void unmap_lpc_mem(struct ocxl_afu *afu)
> > +{
> > +   struct pci_dev *dev = to_pci_dev(afu->fn->dev.parent);
> > +
> > +   if (afu->lpc_res.start || afu->special_purpose_res.start) {
> > +   void *link = afu->fn->link;
> > +
> > +   ocxl_link_lpc_offline(link, dev);
> > +
> > +   afu->lpc_res.start = 0;
> > +   afu->lpc_res.end = 0;
> > +   afu->special_purpose_res.start = 0;
> > +   afu->special_purpose_res.end = 0;
> > +   }
> > +}
> > +
> >   static int configure_afu(struct ocxl_afu *afu, u8 afu_idx, struct
> > pci_dev *dev)
> >   {
> > int rc;
> > @@ -250,6 +299,7 @@ static int configure_afu(struct ocxl_afu *afu,
> > u8 afu_idx, struct pci_dev *dev)
> >   
> >   static void deconfigure_afu(struct ocxl_afu *afu)
> >   {
> > +   unmap_lpc_mem(afu);
> > unmap_mmio_areas(afu);
> > reclaim_afu_pasid(afu);
> > reclaim_afu_actag(afu);
> > diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
> > index 2874811a4398..9e303a5f4d85 100644
> > --- a/drivers/misc/ocxl/link.c
> > +++ b/drivers/misc/ocxl/link.c
> > @@ -738,7 +738,7 @@ int ocxl_link_add_lpc_mem(void *link_handle,
> > u64 size)
> >   }
> >   EXPORT_SYMBOL_GPL(ocxl_link_add_lpc_mem);
> >   
> > -u64 ocxl_link_lpc_map(void *link_handle, struct pci_dev *pdev)
> > +u64 ocxl_link_lpc_online(vo

Re: [PATCH 2/5] powerpc: Map & release OpenCAPI LPC memory

2019-09-18 Thread Alastair D'Silva
On Wed, 2019-09-18 at 16:03 +0200, Frederic Barrat wrote:
> 
> Le 17/09/2019 à 03:42, Alastair D'Silva a écrit :
> > From: Alastair D'Silva 
> > 
> > Map & release OpenCAPI LPC memory.
> > 
> > Signed-off-by: Alastair D'Silva 
> > ---
> >   arch/powerpc/include/asm/pnv-ocxl.h   |  2 ++
> >   arch/powerpc/platforms/powernv/ocxl.c | 42
> > +++
> >   2 files changed, 44 insertions(+)
> > 
> > diff --git a/arch/powerpc/include/asm/pnv-ocxl.h
> > b/arch/powerpc/include/asm/pnv-ocxl.h
> > index 7de82647e761..f8f8ffb48aa8 100644
> > --- a/arch/powerpc/include/asm/pnv-ocxl.h
> > +++ b/arch/powerpc/include/asm/pnv-ocxl.h
> > @@ -32,5 +32,7 @@ extern int pnv_ocxl_spa_remove_pe_from_cache(void
> > *platform_data, int pe_handle)
> >   
> >   extern int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr);
> >   extern void pnv_ocxl_free_xive_irq(u32 irq);
> > +extern u64 pnv_ocxl_platform_lpc_setup(struct pci_dev *pdev, u64
> > size);
> > +extern void pnv_ocxl_platform_lpc_release(struct pci_dev *pdev);
> >   
> >   #endif /* _ASM_PNV_OCXL_H */
> > diff --git a/arch/powerpc/platforms/powernv/ocxl.c
> > b/arch/powerpc/platforms/powernv/ocxl.c
> > index 8c65aacda9c8..81393728d6a3 100644
> > --- a/arch/powerpc/platforms/powernv/ocxl.c
> > +++ b/arch/powerpc/platforms/powernv/ocxl.c
> > @@ -475,6 +475,48 @@ void pnv_ocxl_spa_release(void *platform_data)
> >   }
> >   EXPORT_SYMBOL_GPL(pnv_ocxl_spa_release);
> >   
> > +u64 pnv_ocxl_platform_lpc_setup(struct pci_dev *pdev, u64 size)
> > +{
> > +   struct pci_controller *hose = pci_bus_to_host(pdev->bus);
> > +   struct pnv_phb *phb = hose->private_data;
> > +   struct pci_dn *pdn = pci_get_pdn(pdev);
> > +   u32 bdfn = (pdn->busno << 8) | pdn->devfn;
> 
> We can spare a call to pci_get_pdn() with
> bdfn = (pdev->bus->number << 8) | pdev->devfn;
> 

Ok.

> 
> > +   u64 base_addr = 0;
> > +
> > +   int rc = opal_npu_mem_alloc(phb->opal_id, bdfn, size,
> > _addr);
> > +
> > +   WARN_ON(rc);
> 
> Instead of a WARN, we should catch the error and return a null
> address 
> to the caller.
> 

base_addr will be 0 in the error case, are you suggesting we just
remove the WARN_ON()?

> > +
> > +   base_addr = be64_to_cpu(base_addr);
> > +
> > +   rc = check_hotplug_memory_addressable(base_addr, base_addr +
> > size);
> 
> That code is missing?
> 

That's added in the following patch on the mm list:
 [PATCH v3 1/2] memory_hotplug: Add a bounds check to
check_hotplug_memory_range()

> 
> > +   if (rc) {
> > +   dev_warn(>dev,
> > +"LPC memory range 0x%llx-0x%llx is not fully
> > addressable",
> > +base_addr, base_addr + size - 1);
> > +   return 0;
> > +   }
> > +
> > +
> > +   return base_addr;
> > +}
> > +EXPORT_SYMBOL_GPL(pnv_ocxl_platform_lpc_setup);
> > +
> > +void pnv_ocxl_platform_lpc_release(struct pci_dev *pdev)
> > +{
> > +   struct pci_controller *hose = pci_bus_to_host(pdev->bus);
> > +   struct pnv_phb *phb = hose->private_data;
> > +   struct pci_dn *pdn = pci_get_pdn(pdev);
> > +   u32 bdfn;
> > +   int rc;
> > +
> > +   bdfn = (pdn->busno << 8) | pdn->devfn;
> > +   rc = opal_npu_mem_release(phb->opal_id, bdfn);
> > +   WARN_ON(rc);
> 
> Same comments as above.
> 
>Fred
> 
> 
> 
> > +}
> > +EXPORT_SYMBOL_GPL(pnv_ocxl_platform_lpc_release);
> > +
> > +
> >   int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int
> > pe_handle)
> >   {
> > struct spa_data *data = (struct spa_data *) platform_data;
> > 
-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



[PATCH v3 5/5] powerpc: Don't flush caches when adding memory

2019-09-17 Thread Alastair D'Silva
From: Alastair D'Silva 

This operation takes a significant amount of time when hotplugging
large amounts of memory (~50 seconds with 890GB of persistent memory).

This was orignally in commit fb5924fddf9e
("powerpc/mm: Flush cache on memory hot(un)plug") to support memtrace,
but the flush on add is not needed as it is flushed on remove.

Signed-off-by: Alastair D'Silva 
---
 arch/powerpc/mm/mem.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 4f892da59a6a..dd1aa80e854a 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -142,8 +142,6 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
return -EFAULT;
}
 
-   flush_dcache_range_chunked(start, start + size, FLUSH_CHUNK_SIZE);
-
return __add_pages(nid, start_pfn, nr_pages, restrictions);
 }
 
-- 
2.21.0



[PATCH v3 3/5] powerpc: Convert flush_icache_range & friends to C

2019-09-17 Thread Alastair D'Silva
From: Alastair D'Silva 

Similar to commit 22e9c88d486a
("powerpc/64: reuse PPC32 static inline flush_dcache_range()")
this patch converts the following ASM symbols to C:
flush_icache_range()
__flush_dcache_icache()
__flush_dcache_icache_phys()

This was done as we discovered a long-standing bug where the length of the
range was truncated due to using a 32 bit shift instead of a 64 bit one.

By converting these functions to C, it becomes easier to maintain.

flush_dcache_icache_phys() retains a critical assembler section as we must
ensure there are no memory accesses while the data MMU is disabled
(authored by Christophe Leroy). Since this has no external callers, it has
also been made static, allowing the compiler to inline it within
flush_dcache_icache_page().

Signed-off-by: Alastair D'Silva 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/cache.h  |  26 ++---
 arch/powerpc/include/asm/cacheflush.h |  24 ++--
 arch/powerpc/kernel/misc_32.S | 117 
 arch/powerpc/kernel/misc_64.S | 102 -
 arch/powerpc/mm/mem.c | 151 +-
 5 files changed, 172 insertions(+), 248 deletions(-)

diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h
index f852d5cd746c..e33451a9c4e6 100644
--- a/arch/powerpc/include/asm/cache.h
+++ b/arch/powerpc/include/asm/cache.h
@@ -98,20 +98,7 @@ static inline u32 l1_icache_bytes(void)
 #endif
 #endif /* ! __ASSEMBLY__ */
 
-#if defined(__ASSEMBLY__)
-/*
- * For a snooping icache, we still need a dummy icbi to purge all the
- * prefetched instructions from the ifetch buffers. We also need a sync
- * before the icbi to order the the actual stores to memory that might
- * have modified instructions with the icbi.
- */
-#define PURGE_PREFETCHED_INS   \
-   sync;   \
-   icbi0,r3;   \
-   sync;   \
-   isync
-
-#else
+#if !defined(__ASSEMBLY__)
 #define __read_mostly __attribute__((__section__(".data..read_mostly")))
 
 #ifdef CONFIG_PPC_BOOK3S_32
@@ -145,6 +132,17 @@ static inline void dcbst(void *addr)
 {
__asm__ __volatile__ ("dcbst %y0" : : "Z"(*(u8 *)addr) : "memory");
 }
+
+static inline void icbi(void *addr)
+{
+   asm volatile ("icbi 0, %0" : : "r"(addr) : "memory");
+}
+
+static inline void iccci(void *addr)
+{
+   asm volatile ("iccci 0, %0" : : "r"(addr) : "memory");
+}
+
 #endif /* !__ASSEMBLY__ */
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_CACHE_H */
diff --git a/arch/powerpc/include/asm/cacheflush.h 
b/arch/powerpc/include/asm/cacheflush.h
index ed57843ef452..4a1c9f0200e1 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -42,24 +42,20 @@ extern void flush_dcache_page(struct page *page);
 #define flush_dcache_mmap_lock(mapping)do { } while (0)
 #define flush_dcache_mmap_unlock(mapping)  do { } while (0)
 
-extern void flush_icache_range(unsigned long, unsigned long);
+void flush_icache_range(unsigned long start, unsigned long stop);
 extern void flush_icache_user_range(struct vm_area_struct *vma,
struct page *page, unsigned long addr,
int len);
-extern void __flush_dcache_icache(void *page_va);
 extern void flush_dcache_icache_page(struct page *page);
-#if defined(CONFIG_PPC32) && !defined(CONFIG_BOOKE)
-extern void __flush_dcache_icache_phys(unsigned long physaddr);
-#else
-static inline void __flush_dcache_icache_phys(unsigned long physaddr)
-{
-   BUG();
-}
-#endif
-
-/*
- * Write any modified data cache blocks out to memory and invalidate them.
- * Does not invalidate the corresponding instruction cache blocks.
+void __flush_dcache_icache(void *page);
+
+/**
+ * flush_dcache_range(): Write any modified data cache blocks out to memory and
+ * invalidate them. Does not invalidate the corresponding instruction cache
+ * blocks.
+ *
+ * @start: the start address
+ * @stop: the stop address (exclusive)
  */
 static inline void flush_dcache_range(unsigned long start, unsigned long stop)
 {
diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index fe4bd321730e..12b95e6799d4 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -318,123 +318,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_UNIFIED_ID_CACHE)
 EXPORT_SYMBOL(flush_instruction_cache)
 #endif /* CONFIG_PPC_8xx */
 
-/*
- * Write any modified data cache blocks out to memory
- * and invalidate the corresponding instruction cache blocks.
- * This is a no-op on the 601.
- *
- * flush_icache_range(unsigned long start, unsigned long stop)
- */
-_GLOBAL(flush_icache_range)
-BEGIN_FTR_SECTION
-   PURGE_PREFETCHED_INS
-   blr /* for 601, do nothing */
-END_FTR_SECTION_IFSET

[PATCH v3 4/5] powerpc: Chunk calls to flush_dcache_range in arch_*_memory

2019-09-17 Thread Alastair D'Silva
From: Alastair D'Silva 

When presented with large amounts of memory being hotplugged
(in my test case, ~890GB), the call to flush_dcache_range takes
a while (~50 seconds), triggering RCU stalls.

This patch breaks up the call into 1GB chunks, calling
cond_resched() inbetween to allow the scheduler to run.

Signed-off-by: Alastair D'Silva 
---
 arch/powerpc/mm/mem.c | 27 +--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 85b40bad90c0..4f892da59a6a 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -104,6 +104,27 @@ int __weak remove_section_mapping(unsigned long start, 
unsigned long end)
return -ENODEV;
 }
 
+#define FLUSH_CHUNK_SIZE SZ_1G
+/**
+ * flush_dcache_range_chunked(): Write any modified data cache blocks out to
+ * memory and invalidate them, in chunks of up to FLUSH_CHUNK_SIZE
+ * Does not invalidate the corresponding instruction cache blocks.
+ *
+ * @start: the start address
+ * @stop: the stop address (exclusive)
+ * @chunk: the max size of the chunks
+ */
+static void flush_dcache_range_chunked(unsigned long start, unsigned long stop,
+  unsigned long chunk)
+{
+   unsigned long i;
+
+   for (i = start; i < stop; i += FLUSH_CHUNK_SIZE) {
+   flush_dcache_range(i, min(stop, start + FLUSH_CHUNK_SIZE));
+   cond_resched();
+   }
+}
+
 int __ref arch_add_memory(int nid, u64 start, u64 size,
struct mhp_restrictions *restrictions)
 {
@@ -120,7 +141,8 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
start, start + size, rc);
return -EFAULT;
}
-   flush_dcache_range(start, start + size);
+
+   flush_dcache_range_chunked(start, start + size, FLUSH_CHUNK_SIZE);
 
return __add_pages(nid, start_pfn, nr_pages, restrictions);
 }
@@ -137,7 +159,8 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size,
 
/* Remove htab bolted mappings for this section of memory */
start = (unsigned long)__va(start);
-   flush_dcache_range(start, start + size);
+   flush_dcache_range_chunked(start, start + size, FLUSH_CHUNK_SIZE);
+
ret = remove_section_mapping(start, start + size);
WARN_ON_ONCE(ret);
 
-- 
2.21.0



[PATCH v3 2/5] powerpc: define helpers to get L1 icache sizes

2019-09-17 Thread Alastair D'Silva
From: Alastair D'Silva 

This patch adds helpers to retrieve icache sizes, and renames the existing
helpers to make it clear that they are for dcache.

Signed-off-by: Alastair D'Silva 
---
 arch/powerpc/include/asm/cache.h  | 29 +++
 arch/powerpc/include/asm/cacheflush.h | 12 +--
 2 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h
index b3388d95f451..f852d5cd746c 100644
--- a/arch/powerpc/include/asm/cache.h
+++ b/arch/powerpc/include/asm/cache.h
@@ -55,25 +55,46 @@ struct ppc64_caches {
 
 extern struct ppc64_caches ppc64_caches;
 
-static inline u32 l1_cache_shift(void)
+static inline u32 l1_dcache_shift(void)
 {
return ppc64_caches.l1d.log_block_size;
 }
 
-static inline u32 l1_cache_bytes(void)
+static inline u32 l1_dcache_bytes(void)
 {
return ppc64_caches.l1d.block_size;
 }
+
+static inline u32 l1_icache_shift(void)
+{
+   return ppc64_caches.l1i.log_block_size;
+}
+
+static inline u32 l1_icache_bytes(void)
+{
+   return ppc64_caches.l1i.block_size;
+}
 #else
-static inline u32 l1_cache_shift(void)
+static inline u32 l1_dcache_shift(void)
 {
return L1_CACHE_SHIFT;
 }
 
-static inline u32 l1_cache_bytes(void)
+static inline u32 l1_dcache_bytes(void)
 {
return L1_CACHE_BYTES;
 }
+
+static inline u32 l1_icache_shift(void)
+{
+   return L1_CACHE_SHIFT;
+}
+
+static inline u32 l1_icache_bytes(void)
+{
+   return L1_CACHE_BYTES;
+}
+
 #endif
 #endif /* ! __ASSEMBLY__ */
 
diff --git a/arch/powerpc/include/asm/cacheflush.h 
b/arch/powerpc/include/asm/cacheflush.h
index eef388f2659f..ed57843ef452 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -63,8 +63,8 @@ static inline void __flush_dcache_icache_phys(unsigned long 
physaddr)
  */
 static inline void flush_dcache_range(unsigned long start, unsigned long stop)
 {
-   unsigned long shift = l1_cache_shift();
-   unsigned long bytes = l1_cache_bytes();
+   unsigned long shift = l1_dcache_shift();
+   unsigned long bytes = l1_dcache_bytes();
void *addr = (void *)(start & ~(bytes - 1));
unsigned long size = stop - (unsigned long)addr + (bytes - 1);
unsigned long i;
@@ -89,8 +89,8 @@ static inline void flush_dcache_range(unsigned long start, 
unsigned long stop)
  */
 static inline void clean_dcache_range(unsigned long start, unsigned long stop)
 {
-   unsigned long shift = l1_cache_shift();
-   unsigned long bytes = l1_cache_bytes();
+   unsigned long shift = l1_dcache_shift();
+   unsigned long bytes = l1_dcache_bytes();
void *addr = (void *)(start & ~(bytes - 1));
unsigned long size = stop - (unsigned long)addr + (bytes - 1);
unsigned long i;
@@ -108,8 +108,8 @@ static inline void clean_dcache_range(unsigned long start, 
unsigned long stop)
 static inline void invalidate_dcache_range(unsigned long start,
   unsigned long stop)
 {
-   unsigned long shift = l1_cache_shift();
-   unsigned long bytes = l1_cache_bytes();
+   unsigned long shift = l1_dcache_shift();
+   unsigned long bytes = l1_dcache_bytes();
void *addr = (void *)(start & ~(bytes - 1));
unsigned long size = stop - (unsigned long)addr + (bytes - 1);
unsigned long i;
-- 
2.21.0



[PATCH v3 0/5] powerpc: convert cache asm to C

2019-09-17 Thread Alastair D'Silva
From: Alastair D'Silva 

This series addresses a few issues discovered in how we flush caches:
1. Flushes were truncated at 4GB, so larger flushes were incorrect.
2. Flushing the dcache in arch_add_memory was unnecessary

This series also converts much of the cache assembler to C, with the
aim of making it easier to maintain.

Alastair D'Silva (6):
  powerpc: Allow flush_icache_range to work across ranges >4GB
  powerpc: define helpers to get L1 icache sizes
  powerpc: Convert flush_icache_range & friends to C
  powerpc: Chunk calls to flush_dcache_range in arch_*_memory
  powerpc: Remove 'extern' from func prototypes in cache headers
  powerpc: Don't flush caches when adding memory

Changelog:
 V3:
 - factor out chunking loop
 - Replace __asm__ __volatile__ with asm volatile
 - Replace flush_coherent_icache_or_return macro with
   flush_coherent_icache function
 - factor our invalidate_icache_range
 - Replace code duplicating clean_dcache_range() in
   __flush_dcache_icache() with a call to clean_dcache_range()
 - Remove redundant #ifdef CONFIG_44x
 - Fix preprocessor logic:
 #if !defined(CONFIG_PPC_8xx) & !defined(CONFIG_PPC64)
 - Added loop(1|2) to earlyclobbers in flush_dcache_icache_phys
 - Drop "Remove extern" patch
 - Replace 32 bit shifts in 64 bit VDSO with 64 bit ones
 V2:
 - Replace C implementation of flush_dcache_icache_phys() with
   inline assembler authored by Christophe Leroy
 - Add memory clobbers for iccci implementation
 - Give __flush_dcache_icache a real implementation, it can't
   just be a wrapper around flush_icache_range()
 - Remove PPC64_CACHES from misc_64.S
 - Replace code duplicating clean_dcache_range() in
   flush_icache_range() with a call to clean_dcache_range()
 - Replace #ifdef CONFIG_44x with IS_ENABLED(...) in
   flush_icache_cange()
 - Use 1GB chunks instead of 16GB in arch_*_memory

Alastair D'Silva (5):
  powerpc: Allow flush_icache_range to work across ranges >4GB
  powerpc: define helpers to get L1 icache sizes
  powerpc: Convert flush_icache_range & friends to C
  powerpc: Chunk calls to flush_dcache_range in arch_*_memory
  powerpc: Don't flush caches when adding memory

 arch/powerpc/include/asm/cache.h|  55 +---
 arch/powerpc/include/asm/cacheflush.h   |  36 +++--
 arch/powerpc/kernel/misc_32.S   | 117 
 arch/powerpc/kernel/misc_64.S   | 102 --
 arch/powerpc/kernel/vdso64/cacheflush.S |   4 +-
 arch/powerpc/mm/mem.c   | 176 +++-
 6 files changed, 228 insertions(+), 262 deletions(-)

-- 
2.21.0



[PATCH v3 1/5] powerpc: Allow flush_icache_range to work across ranges >4GB

2019-09-17 Thread Alastair D'Silva
From: Alastair D'Silva 

When calling flush_icache_range with a size >4GB, we were masking
off the upper 32 bits, so we would incorrectly flush a range smaller
than intended.

__kernel_sync_dicache in the 64 bit VDSO has the same bug.

This patch replaces the 32 bit shifts with 64 bit ones, so that
the full size is accounted for.

Signed-off-by: Alastair D'Silva 
Cc: sta...@vger.kernel.org
---
 arch/powerpc/kernel/misc_64.S   | 4 ++--
 arch/powerpc/kernel/vdso64/cacheflush.S | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
index b55a7b4cb543..9bc0aa9aeb65 100644
--- a/arch/powerpc/kernel/misc_64.S
+++ b/arch/powerpc/kernel/misc_64.S
@@ -82,7 +82,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
subfr8,r6,r4/* compute length */
add r8,r8,r5/* ensure we get enough */
lwz r9,DCACHEL1LOGBLOCKSIZE(r10)/* Get log-2 of cache block 
size */
-   srw.r8,r8,r9/* compute line count */
+   srd.r8,r8,r9/* compute line count */
beqlr   /* nothing to do? */
mtctr   r8
 1: dcbst   0,r6
@@ -98,7 +98,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
subfr8,r6,r4/* compute length */
add r8,r8,r5
lwz r9,ICACHEL1LOGBLOCKSIZE(r10)/* Get log-2 of Icache block 
size */
-   srw.r8,r8,r9/* compute line count */
+   srd.r8,r8,r9/* compute line count */
beqlr   /* nothing to do? */
mtctr   r8
 2: icbi0,r6
diff --git a/arch/powerpc/kernel/vdso64/cacheflush.S 
b/arch/powerpc/kernel/vdso64/cacheflush.S
index 3f92561a64c4..526f5ba2593e 100644
--- a/arch/powerpc/kernel/vdso64/cacheflush.S
+++ b/arch/powerpc/kernel/vdso64/cacheflush.S
@@ -35,7 +35,7 @@ V_FUNCTION_BEGIN(__kernel_sync_dicache)
subfr8,r6,r4/* compute length */
add r8,r8,r5/* ensure we get enough */
lwz r9,CFG_DCACHE_LOGBLOCKSZ(r10)
-   srw.r8,r8,r9/* compute line count */
+   srd.r8,r8,r9/* compute line count */
crclr   cr0*4+so
beqlr   /* nothing to do? */
mtctr   r8
@@ -52,7 +52,7 @@ V_FUNCTION_BEGIN(__kernel_sync_dicache)
subfr8,r6,r4/* compute length */
add r8,r8,r5
lwz r9,CFG_ICACHE_LOGBLOCKSZ(r10)
-   srw.r8,r8,r9/* compute line count */
+   srd.r8,r8,r9/* compute line count */
crclr   cr0*4+so
beqlr   /* nothing to do? */
mtctr   r8
-- 
2.21.0



Re: [PATCH 5/5] ocxl: Provide additional metadata to userspace

2019-09-17 Thread Alastair D'Silva
On Tue, 2019-09-17 at 11:43 +1000, Alastair D'Silva wrote:
> From: Alastair D'Silva 
> 
> This patch exposes the OpenCAPI device serial number to
> userspace.
> 
> It also includes placeholders for the LPC & special purpose
> memory information (which will be populated in a subsequent patch)
> to avoid creating excessive versions of the IOCTL.
> 

I think it makes more sense to fold in the population of LPC & special
purpose memory into this patch, I'll address this in the next spin.

> Signed-off-by: Alastair D'Silva 
> ---
>  drivers/misc/ocxl/config.c | 46
> ++
>  drivers/misc/ocxl/file.c   |  3 ++-
>  include/misc/ocxl.h|  1 +
>  include/uapi/misc/ocxl.h   |  9 +++-
>  4 files changed, 57 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
> index fb0c3b6f8312..a9203c309365 100644
> --- a/drivers/misc/ocxl/config.c
> +++ b/drivers/misc/ocxl/config.c
> @@ -71,6 +71,51 @@ static int find_dvsec_afu_ctrl(struct pci_dev
> *dev, u8 afu_idx)
>   return 0;
>  }
>  
> +/**
> + * Find a related PCI device (function 0)
> + * @device: PCI device to match
> + *
> + * Returns a pointer to the related device, or null if not found
> + */
> +static struct pci_dev *get_function_0(struct pci_dev *dev)
> +{
> + unsigned int devfn = PCI_DEVFN(PCI_SLOT(dev->devfn), 0); //
> Look for function 0
> +
> + return pci_get_domain_bus_and_slot(pci_domain_nr(dev->bus),
> + dev->bus->number, devfn);
> +}
> +
> +static void read_serial(struct pci_dev *dev, struct ocxl_fn_config
> *fn)
> +{
> + u32 low, high;
> + int pos;
> +
> + pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DSN);
> + if (pos) {
> + pci_read_config_dword(dev, pos + 0x04, );
> + pci_read_config_dword(dev, pos + 0x08, );
> +
> + fn->serial = low | ((u64)high) << 32;
> +
> + return;
> + }
> +
> + if (PCI_FUNC(dev->devfn) != 0) {
> + struct pci_dev *related = get_function_0(dev);
> +
> + if (!related) {
> + fn->serial = 0;
> + return;
> + }
> +
> + read_serial(related, fn);
> + pci_dev_put(related);
> + return;
> + }
> +
> + fn->serial = 0;
> +}
> +
>  static void read_pasid(struct pci_dev *dev, struct ocxl_fn_config
> *fn)
>  {
>   u16 val;
> @@ -208,6 +253,7 @@ int ocxl_config_read_function(struct pci_dev
> *dev, struct ocxl_fn_config *fn)
>   int rc;
>  
>   read_pasid(dev, fn);
> + read_serial(dev, fn);
>  
>   rc = read_dvsec_tl(dev, fn);
>   if (rc) {
> diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c
> index 2870c25da166..08f6f594a11d 100644
> --- a/drivers/misc/ocxl/file.c
> +++ b/drivers/misc/ocxl/file.c
> @@ -98,13 +98,14 @@ static long afu_ioctl_get_metadata(struct
> ocxl_context *ctx,
>  
>   memset(, 0, sizeof(arg));
>  
> - arg.version = 0;
> + arg.version = 1;
>  
>   arg.afu_version_major = ctx->afu->config.version_major;
>   arg.afu_version_minor = ctx->afu->config.version_minor;
>   arg.pasid = ctx->pasid;
>   arg.pp_mmio_size = ctx->afu->config.pp_mmio_stride;
>   arg.global_mmio_size = ctx->afu->config.global_mmio_size;
> + arg.serial = ctx->afu->fn->config.serial;
>  
>   if (copy_to_user(uarg, , sizeof(arg)))
>   return -EFAULT;
> diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
> index a1897737908d..da75db149e6c 100644
> --- a/include/misc/ocxl.h
> +++ b/include/misc/ocxl.h
> @@ -46,6 +46,7 @@ struct ocxl_fn_config {
>   int dvsec_afu_info_pos; /* offset of the AFU information DVSEC
> */
>   s8 max_pasid_log;
>   s8 max_afu_index;
> + u64 serial;
>  };
>  
>  enum ocxl_endian {
> diff --git a/include/uapi/misc/ocxl.h b/include/uapi/misc/ocxl.h
> index 6d29a60a896a..d4c6bf10580c 100644
> --- a/include/uapi/misc/ocxl.h
> +++ b/include/uapi/misc/ocxl.h
> @@ -45,7 +45,14 @@ struct ocxl_ioctl_metadata {
>  
>   /* End version 0 fields */
>  
> - __u64 reserved[13]; /* Total of 16*u64 */
> + // Version 1 fields
> + __u64 lpc_mem_size;
> + __u64 special_purpose_mem_size;
> + __u64 serial;   // Device serial number
> +
> + // End version 1 fields
> +
> + __u64 reserved[10]; // Total of 16*u64
>  };
>  
>  struct ocxl_ioctl_p9_wait {
> -- 
> 2.21.0

-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



[PATCH 5/5] ocxl: Provide additional metadata to userspace

2019-09-16 Thread Alastair D'Silva
From: Alastair D'Silva 

This patch exposes the OpenCAPI device serial number to
userspace.

It also includes placeholders for the LPC & special purpose
memory information (which will be populated in a subsequent patch)
to avoid creating excessive versions of the IOCTL.

Signed-off-by: Alastair D'Silva 
---
 drivers/misc/ocxl/config.c | 46 ++
 drivers/misc/ocxl/file.c   |  3 ++-
 include/misc/ocxl.h|  1 +
 include/uapi/misc/ocxl.h   |  9 +++-
 4 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
index fb0c3b6f8312..a9203c309365 100644
--- a/drivers/misc/ocxl/config.c
+++ b/drivers/misc/ocxl/config.c
@@ -71,6 +71,51 @@ static int find_dvsec_afu_ctrl(struct pci_dev *dev, u8 
afu_idx)
return 0;
 }
 
+/**
+ * Find a related PCI device (function 0)
+ * @device: PCI device to match
+ *
+ * Returns a pointer to the related device, or null if not found
+ */
+static struct pci_dev *get_function_0(struct pci_dev *dev)
+{
+   unsigned int devfn = PCI_DEVFN(PCI_SLOT(dev->devfn), 0); // Look for 
function 0
+
+   return pci_get_domain_bus_and_slot(pci_domain_nr(dev->bus),
+   dev->bus->number, devfn);
+}
+
+static void read_serial(struct pci_dev *dev, struct ocxl_fn_config *fn)
+{
+   u32 low, high;
+   int pos;
+
+   pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DSN);
+   if (pos) {
+   pci_read_config_dword(dev, pos + 0x04, );
+   pci_read_config_dword(dev, pos + 0x08, );
+
+   fn->serial = low | ((u64)high) << 32;
+
+   return;
+   }
+
+   if (PCI_FUNC(dev->devfn) != 0) {
+   struct pci_dev *related = get_function_0(dev);
+
+   if (!related) {
+   fn->serial = 0;
+   return;
+   }
+
+   read_serial(related, fn);
+   pci_dev_put(related);
+   return;
+   }
+
+   fn->serial = 0;
+}
+
 static void read_pasid(struct pci_dev *dev, struct ocxl_fn_config *fn)
 {
u16 val;
@@ -208,6 +253,7 @@ int ocxl_config_read_function(struct pci_dev *dev, struct 
ocxl_fn_config *fn)
int rc;
 
read_pasid(dev, fn);
+   read_serial(dev, fn);
 
rc = read_dvsec_tl(dev, fn);
if (rc) {
diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c
index 2870c25da166..08f6f594a11d 100644
--- a/drivers/misc/ocxl/file.c
+++ b/drivers/misc/ocxl/file.c
@@ -98,13 +98,14 @@ static long afu_ioctl_get_metadata(struct ocxl_context *ctx,
 
memset(, 0, sizeof(arg));
 
-   arg.version = 0;
+   arg.version = 1;
 
arg.afu_version_major = ctx->afu->config.version_major;
arg.afu_version_minor = ctx->afu->config.version_minor;
arg.pasid = ctx->pasid;
arg.pp_mmio_size = ctx->afu->config.pp_mmio_stride;
arg.global_mmio_size = ctx->afu->config.global_mmio_size;
+   arg.serial = ctx->afu->fn->config.serial;
 
if (copy_to_user(uarg, , sizeof(arg)))
return -EFAULT;
diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
index a1897737908d..da75db149e6c 100644
--- a/include/misc/ocxl.h
+++ b/include/misc/ocxl.h
@@ -46,6 +46,7 @@ struct ocxl_fn_config {
int dvsec_afu_info_pos; /* offset of the AFU information DVSEC */
s8 max_pasid_log;
s8 max_afu_index;
+   u64 serial;
 };
 
 enum ocxl_endian {
diff --git a/include/uapi/misc/ocxl.h b/include/uapi/misc/ocxl.h
index 6d29a60a896a..d4c6bf10580c 100644
--- a/include/uapi/misc/ocxl.h
+++ b/include/uapi/misc/ocxl.h
@@ -45,7 +45,14 @@ struct ocxl_ioctl_metadata {
 
/* End version 0 fields */
 
-   __u64 reserved[13]; /* Total of 16*u64 */
+   // Version 1 fields
+   __u64 lpc_mem_size;
+   __u64 special_purpose_mem_size;
+   __u64 serial;   // Device serial number
+
+   // End version 1 fields
+
+   __u64 reserved[10]; // Total of 16*u64
 };
 
 struct ocxl_ioctl_p9_wait {
-- 
2.21.0



[PATCH 4/5] ocxl: Add functions to map/unmap LPC memory

2019-09-16 Thread Alastair D'Silva
From: Alastair D'Silva 

Add functions to map/unmap LPC memory

Signed-off-by: Alastair D'Silva 
---
 drivers/misc/ocxl/config.c|  4 +++
 drivers/misc/ocxl/core.c  | 50 +++
 drivers/misc/ocxl/link.c  |  4 +--
 drivers/misc/ocxl/ocxl_internal.h | 10 +--
 include/misc/ocxl.h   | 18 +++
 5 files changed, 82 insertions(+), 4 deletions(-)

diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
index c8e19bfb5ef9..fb0c3b6f8312 100644
--- a/drivers/misc/ocxl/config.c
+++ b/drivers/misc/ocxl/config.c
@@ -568,6 +568,10 @@ static int read_afu_lpc_memory_info(struct pci_dev *dev,
afu->special_purpose_mem_size =
total_mem_size - lpc_mem_size;
}
+
+   dev_info(>dev, "Probed LPC memory of %#llx bytes and special 
purpose memory of %#llx bytes\n",
+   afu->lpc_mem_size, afu->special_purpose_mem_size);
+
return 0;
 }
 
diff --git a/drivers/misc/ocxl/core.c b/drivers/misc/ocxl/core.c
index fdfe4e0a34e1..eb24bb9d655f 100644
--- a/drivers/misc/ocxl/core.c
+++ b/drivers/misc/ocxl/core.c
@@ -210,6 +210,55 @@ static void unmap_mmio_areas(struct ocxl_afu *afu)
release_fn_bar(afu->fn, afu->config.global_mmio_bar);
 }
 
+int ocxl_map_lpc_mem(struct ocxl_afu *afu)
+{
+   struct pci_dev *dev = to_pci_dev(afu->fn->dev.parent);
+
+   if ((afu->config.lpc_mem_size + afu->config.special_purpose_mem_size) 
== 0)
+   return 0;
+
+   afu->lpc_base_addr = ocxl_link_lpc_online(afu->fn->link, dev);
+   if (afu->lpc_base_addr == 0)
+   return -EINVAL;
+
+   if (afu->config.lpc_mem_size) {
+   afu->lpc_res.start = afu->lpc_base_addr + 
afu->config.lpc_mem_offset;
+   afu->lpc_res.end = afu->lpc_res.start + 
afu->config.lpc_mem_size - 1;
+   }
+
+   if (afu->config.special_purpose_mem_size) {
+   afu->special_purpose_res.start = afu->lpc_base_addr +
+
afu->config.special_purpose_mem_offset;
+   afu->special_purpose_res.end = afu->special_purpose_res.start +
+  
afu->config.special_purpose_mem_size - 1;
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL(ocxl_map_lpc_mem);
+
+struct resource *ocxl_afu_lpc_mem(struct ocxl_afu *afu)
+{
+   return >lpc_res;
+}
+EXPORT_SYMBOL(ocxl_afu_lpc_mem);
+
+static void unmap_lpc_mem(struct ocxl_afu *afu)
+{
+   struct pci_dev *dev = to_pci_dev(afu->fn->dev.parent);
+
+   if (afu->lpc_res.start || afu->special_purpose_res.start) {
+   void *link = afu->fn->link;
+
+   ocxl_link_lpc_offline(link, dev);
+
+   afu->lpc_res.start = 0;
+   afu->lpc_res.end = 0;
+   afu->special_purpose_res.start = 0;
+   afu->special_purpose_res.end = 0;
+   }
+}
+
 static int configure_afu(struct ocxl_afu *afu, u8 afu_idx, struct pci_dev *dev)
 {
int rc;
@@ -250,6 +299,7 @@ static int configure_afu(struct ocxl_afu *afu, u8 afu_idx, 
struct pci_dev *dev)
 
 static void deconfigure_afu(struct ocxl_afu *afu)
 {
+   unmap_lpc_mem(afu);
unmap_mmio_areas(afu);
reclaim_afu_pasid(afu);
reclaim_afu_actag(afu);
diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index 2874811a4398..9e303a5f4d85 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -738,7 +738,7 @@ int ocxl_link_add_lpc_mem(void *link_handle, u64 size)
 }
 EXPORT_SYMBOL_GPL(ocxl_link_add_lpc_mem);
 
-u64 ocxl_link_lpc_map(void *link_handle, struct pci_dev *pdev)
+u64 ocxl_link_lpc_online(void *link_handle, struct pci_dev *pdev)
 {
struct ocxl_link *link = (struct ocxl_link *) link_handle;
 
@@ -759,7 +759,7 @@ u64 ocxl_link_lpc_map(void *link_handle, struct pci_dev 
*pdev)
return link->lpc_mem;
 }
 
-void ocxl_link_lpc_release(void *link_handle, struct pci_dev *pdev)
+void ocxl_link_lpc_offline(void *link_handle, struct pci_dev *pdev)
 {
struct ocxl_link *link = (struct ocxl_link *) link_handle;
 
diff --git a/drivers/misc/ocxl/ocxl_internal.h 
b/drivers/misc/ocxl/ocxl_internal.h
index db2647a90fc8..5656a4aab5b7 100644
--- a/drivers/misc/ocxl/ocxl_internal.h
+++ b/drivers/misc/ocxl/ocxl_internal.h
@@ -52,6 +52,12 @@ struct ocxl_afu {
void __iomem *global_mmio_ptr;
u64 pp_mmio_start;
void *private;
+   u64 lpc_base_addr; /* Covers both LPC & special purpose memory */
+   struct bin_attribute attr_global_mmio;
+   struct bin_attribute attr_lpc_mem;
+   struct resource lpc_res;
+   struct bin_attribute attr_special_purpose_mem;
+   struct resource special_purpose_res;
 };
 
 enum ocxl_context_status {
@@ -170,7 +176,7 @@ extern u64 ocxl_link_get_lpc_mem_sz(void *link_han

[PATCH 2/5] powerpc: Map & release OpenCAPI LPC memory

2019-09-16 Thread Alastair D'Silva
From: Alastair D'Silva 

Map & release OpenCAPI LPC memory.

Signed-off-by: Alastair D'Silva 
---
 arch/powerpc/include/asm/pnv-ocxl.h   |  2 ++
 arch/powerpc/platforms/powernv/ocxl.c | 42 +++
 2 files changed, 44 insertions(+)

diff --git a/arch/powerpc/include/asm/pnv-ocxl.h 
b/arch/powerpc/include/asm/pnv-ocxl.h
index 7de82647e761..f8f8ffb48aa8 100644
--- a/arch/powerpc/include/asm/pnv-ocxl.h
+++ b/arch/powerpc/include/asm/pnv-ocxl.h
@@ -32,5 +32,7 @@ extern int pnv_ocxl_spa_remove_pe_from_cache(void 
*platform_data, int pe_handle)
 
 extern int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr);
 extern void pnv_ocxl_free_xive_irq(u32 irq);
+extern u64 pnv_ocxl_platform_lpc_setup(struct pci_dev *pdev, u64 size);
+extern void pnv_ocxl_platform_lpc_release(struct pci_dev *pdev);
 
 #endif /* _ASM_PNV_OCXL_H */
diff --git a/arch/powerpc/platforms/powernv/ocxl.c 
b/arch/powerpc/platforms/powernv/ocxl.c
index 8c65aacda9c8..81393728d6a3 100644
--- a/arch/powerpc/platforms/powernv/ocxl.c
+++ b/arch/powerpc/platforms/powernv/ocxl.c
@@ -475,6 +475,48 @@ void pnv_ocxl_spa_release(void *platform_data)
 }
 EXPORT_SYMBOL_GPL(pnv_ocxl_spa_release);
 
+u64 pnv_ocxl_platform_lpc_setup(struct pci_dev *pdev, u64 size)
+{
+   struct pci_controller *hose = pci_bus_to_host(pdev->bus);
+   struct pnv_phb *phb = hose->private_data;
+   struct pci_dn *pdn = pci_get_pdn(pdev);
+   u32 bdfn = (pdn->busno << 8) | pdn->devfn;
+   u64 base_addr = 0;
+
+   int rc = opal_npu_mem_alloc(phb->opal_id, bdfn, size, _addr);
+
+   WARN_ON(rc);
+
+   base_addr = be64_to_cpu(base_addr);
+
+   rc = check_hotplug_memory_addressable(base_addr, base_addr + size);
+   if (rc) {
+   dev_warn(>dev,
+"LPC memory range 0x%llx-0x%llx is not fully 
addressable",
+base_addr, base_addr + size - 1);
+   return 0;
+   }
+
+
+   return base_addr;
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_platform_lpc_setup);
+
+void pnv_ocxl_platform_lpc_release(struct pci_dev *pdev)
+{
+   struct pci_controller *hose = pci_bus_to_host(pdev->bus);
+   struct pnv_phb *phb = hose->private_data;
+   struct pci_dn *pdn = pci_get_pdn(pdev);
+   u32 bdfn;
+   int rc;
+
+   bdfn = (pdn->busno << 8) | pdn->devfn;
+   rc = opal_npu_mem_release(phb->opal_id, bdfn);
+   WARN_ON(rc);
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_platform_lpc_release);
+
+
 int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle)
 {
struct spa_data *data = (struct spa_data *) platform_data;
-- 
2.21.0



[PATCH 1/5] powerpc: Add OPAL calls for LPC memory alloc/release

2019-09-16 Thread Alastair D'Silva
From: Alastair D'Silva 

Add OPAL calls for LPC memory alloc/release

Signed-off-by: Alastair D'Silva 
---
 arch/powerpc/include/asm/opal-api.h| 4 +++-
 arch/powerpc/include/asm/opal.h| 3 +++
 arch/powerpc/platforms/powernv/opal-call.c | 2 ++
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 383242eb0dea..c58161cd7cfb 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -208,7 +208,9 @@
 #define OPAL_HANDLE_HMI2   166
 #defineOPAL_NX_COPROC_INIT 167
 #define OPAL_XIVE_GET_VP_STATE 170
-#define OPAL_LAST  170
+#define OPAL_NPU_MEM_ALLOC 171
+#define OPAL_NPU_MEM_RELEASE   172
+#define OPAL_LAST  172
 
 #define QUIESCE_HOLD   1 /* Spin all calls at entry */
 #define QUIESCE_REJECT 2 /* Fail all calls with OPAL_BUSY */
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 57bd029c715e..8c957a003be4 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -39,6 +39,9 @@ int64_t opal_npu_spa_clear_cache(uint64_t phb_id, uint32_t 
bdfn,
uint64_t PE_handle);
 int64_t opal_npu_tl_set(uint64_t phb_id, uint32_t bdfn, long cap,
uint64_t rate_phys, uint32_t size);
+int64_t opal_npu_mem_alloc(uint64_t phb_id, uint32_t bdfn,
+   uint64_t size, uint64_t *bar);
+int64_t opal_npu_mem_release(uint64_t phb_id, uint32_t bdfn);
 int64_t opal_console_write(int64_t term_number, __be64 *length,
   const uint8_t *buffer);
 int64_t opal_console_read(int64_t term_number, __be64 *length,
diff --git a/arch/powerpc/platforms/powernv/opal-call.c 
b/arch/powerpc/platforms/powernv/opal-call.c
index 29ca523c1c79..09a280446507 100644
--- a/arch/powerpc/platforms/powernv/opal-call.c
+++ b/arch/powerpc/platforms/powernv/opal-call.c
@@ -287,3 +287,5 @@ OPAL_CALL(opal_pci_set_pbcq_tunnel_bar, 
OPAL_PCI_SET_PBCQ_TUNNEL_BAR);
 OPAL_CALL(opal_sensor_read_u64,OPAL_SENSOR_READ_U64);
 OPAL_CALL(opal_sensor_group_enable,OPAL_SENSOR_GROUP_ENABLE);
 OPAL_CALL(opal_nx_coproc_init, OPAL_NX_COPROC_INIT);
+OPAL_CALL(opal_npu_mem_alloc,  OPAL_NPU_MEM_ALLOC);
+OPAL_CALL(opal_npu_mem_release,OPAL_NPU_MEM_RELEASE);
-- 
2.21.0



[PATCH 3/5] ocxl: Tally up the LPC memory on a link & allow it to be mapped

2019-09-16 Thread Alastair D'Silva
From: Alastair D'Silva 

Tally up the LPC memory on an OpenCAPI link & allow it to be mapped

Signed-off-by: Alastair D'Silva 
---
 drivers/misc/ocxl/core.c  |  9 +
 drivers/misc/ocxl/link.c  | 61 +++
 drivers/misc/ocxl/ocxl_internal.h | 42 +
 3 files changed, 112 insertions(+)

diff --git a/drivers/misc/ocxl/core.c b/drivers/misc/ocxl/core.c
index b7a09b21ab36..fdfe4e0a34e1 100644
--- a/drivers/misc/ocxl/core.c
+++ b/drivers/misc/ocxl/core.c
@@ -230,8 +230,17 @@ static int configure_afu(struct ocxl_afu *afu, u8 afu_idx, 
struct pci_dev *dev)
if (rc)
goto err_free_pasid;
 
+   if (afu->config.lpc_mem_size || afu->config.special_purpose_mem_size) {
+   rc = ocxl_link_add_lpc_mem(afu->fn->link,
+   afu->config.lpc_mem_size + 
afu->config.special_purpose_mem_size);
+   if (rc)
+   goto err_free_mmio;
+   }
+
return 0;
 
+err_free_mmio:
+   unmap_mmio_areas(afu);
 err_free_pasid:
reclaim_afu_pasid(afu);
 err_free_actag:
diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index 58d111afd9f6..2874811a4398 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -84,6 +84,11 @@ struct ocxl_link {
int dev;
atomic_t irq_available;
struct spa *spa;
+   struct mutex lpc_mem_lock;
+   u64 lpc_mem_sz; /* Total amount of LPC memory presented on the link */
+   u64 lpc_mem;
+   int lpc_consumers;
+
void *platform_data;
 };
 static struct list_head links_list = LIST_HEAD_INIT(links_list);
@@ -396,6 +401,8 @@ static int alloc_link(struct pci_dev *dev, int PE_mask, 
struct ocxl_link **out_l
if (rc)
goto err_spa;
 
+   mutex_init(>lpc_mem_lock);
+
/* platform specific hook */
rc = pnv_ocxl_spa_setup(dev, link->spa->spa_mem, PE_mask,
>platform_data);
@@ -711,3 +718,57 @@ void ocxl_link_free_irq(void *link_handle, int hw_irq)
atomic_inc(>irq_available);
 }
 EXPORT_SYMBOL_GPL(ocxl_link_free_irq);
+
+int ocxl_link_add_lpc_mem(void *link_handle, u64 size)
+{
+   struct ocxl_link *link = (struct ocxl_link *) link_handle;
+
+   u64 orig_size;
+   bool good = false;
+
+   mutex_lock(>lpc_mem_lock);
+   orig_size = link->lpc_mem_sz;
+   link->lpc_mem_sz += size;
+
+   good = orig_size < link->lpc_mem_sz;
+   mutex_unlock(>lpc_mem_lock);
+
+   // Check for overflow
+   return (good) ? 0 : -EINVAL;
+}
+EXPORT_SYMBOL_GPL(ocxl_link_add_lpc_mem);
+
+u64 ocxl_link_lpc_map(void *link_handle, struct pci_dev *pdev)
+{
+   struct ocxl_link *link = (struct ocxl_link *) link_handle;
+
+   mutex_lock(>lpc_mem_lock);
+   if (link->lpc_mem) {
+   u64 lpc_mem = link->lpc_mem;
+
+   link->lpc_consumers++;
+   mutex_unlock(>lpc_mem_lock);
+   return lpc_mem;
+   }
+
+   link->lpc_mem = pnv_ocxl_platform_lpc_setup(pdev, link->lpc_mem_sz);
+   if (link->lpc_mem)
+   link->lpc_consumers++;
+   mutex_unlock(>lpc_mem_lock);
+
+   return link->lpc_mem;
+}
+
+void ocxl_link_lpc_release(void *link_handle, struct pci_dev *pdev)
+{
+   struct ocxl_link *link = (struct ocxl_link *) link_handle;
+
+   mutex_lock(>lpc_mem_lock);
+   link->lpc_consumers--;
+   if (link->lpc_consumers == 0) {
+   pnv_ocxl_platform_lpc_release(pdev);
+   link->lpc_mem = 0;
+   }
+
+   mutex_unlock(>lpc_mem_lock);
+}
diff --git a/drivers/misc/ocxl/ocxl_internal.h 
b/drivers/misc/ocxl/ocxl_internal.h
index 97415afd79f3..db2647a90fc8 100644
--- a/drivers/misc/ocxl/ocxl_internal.h
+++ b/drivers/misc/ocxl/ocxl_internal.h
@@ -141,4 +141,46 @@ int ocxl_irq_offset_to_id(struct ocxl_context *ctx, u64 
offset);
 u64 ocxl_irq_id_to_offset(struct ocxl_context *ctx, int irq_id);
 void ocxl_afu_irq_free_all(struct ocxl_context *ctx);
 
+/**
+ * Increment the amount of memory required by an OpenCAPI link
+ *
+ * link_handle: The OpenCAPI link handle
+ * size: The amount of memory to increment by
+ *
+ * Return 0 on success, negative on overflow
+ */
+extern int ocxl_link_add_lpc_mem(void *link_handle, u64 size);
+
+/**
+ * Get the amount of memory required by an OpenCAPI link
+ *
+ * link_handle: The OpenCAPI link handle
+ *
+ * Return the amount of memory required by the link, this value is undefined if
+ * ocxl_link_add_lpc_mem failed.
+ */
+extern u64 ocxl_link_get_lpc_mem_sz(void *link_handle);
+
+/**
+ * Map the LPC memory for an OpenCAPI device
+ *
+ * Since LPC memory belongs to a link, the whole LPC memory available
+ * on the link bust be mapped in order to make it accessible to a device.
+ *
+ * @link_handle: The OpenCAPI link handle
+ * @pdev: A device that is on th

[PATCH 0/5] ocxl: Allow external drivers to access LPC memory

2019-09-16 Thread Alastair D'Silva
From: Alastair D'Silva 

This series provides the prerequisite infrastructure to allow
external drivers to map & access OpenCAPI LPC memory.

Alastair D'Silva (5):
  powerpc: Add OPAL calls for LPC memory alloc/release
  powerpc: Map & release OpenCAPI LPC memory
  ocxl: Tally up the LPC memory on a link & allow it to be mapped
  ocxl: Add functions to map/unmap LPC memory
  ocxl: Provide additional metadata to userspace

 arch/powerpc/include/asm/opal-api.h|  4 +-
 arch/powerpc/include/asm/opal.h|  3 ++
 arch/powerpc/include/asm/pnv-ocxl.h|  2 +
 arch/powerpc/platforms/powernv/ocxl.c  | 42 +++
 arch/powerpc/platforms/powernv/opal-call.c |  2 +
 drivers/misc/ocxl/config.c | 50 ++
 drivers/misc/ocxl/core.c   | 59 +
 drivers/misc/ocxl/file.c   |  3 +-
 drivers/misc/ocxl/link.c   | 61 ++
 drivers/misc/ocxl/ocxl_internal.h  | 48 +
 include/misc/ocxl.h| 19 +++
 include/uapi/misc/ocxl.h   |  9 +++-
 12 files changed, 299 insertions(+), 3 deletions(-)

-- 
2.21.0



[PATCH v3 2/2] mm: Add a bounds check in devm_memremap_pages()

2019-09-16 Thread Alastair D'Silva
From: Alastair D'Silva 

The call to check_hotplug_memory_addressable() validates that the memory
is fully addressable.

Without this call, it is possible that we may remap pages that is
not physically addressable, resulting in bogus section numbers
being returned from __section_nr().

Signed-off-by: Alastair D'Silva 
---
 mm/memremap.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/mm/memremap.c b/mm/memremap.c
index 86432650f829..de2b67586401 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -175,6 +175,11 @@ void *devm_memremap_pages(struct device *dev, struct 
dev_pagemap *pgmap)
int error, nid, is_ram;
bool need_devmap_managed = true;
 
+   error = check_hotplug_memory_addressable(res->start,
+resource_size(res));
+   if (error)
+   return ERR_PTR(error);
+
switch (pgmap->type) {
case MEMORY_DEVICE_PRIVATE:
if (!IS_ENABLED(CONFIG_DEVICE_PRIVATE)) {
-- 
2.21.0



[PATCH v3 1/2] memory_hotplug: Add a bounds check to check_hotplug_memory_range()

2019-09-16 Thread Alastair D'Silva
From: Alastair D'Silva 

On PowerPC, the address ranges allocated to OpenCAPI LPC memory
are allocated from firmware. These address ranges may be higher
than what older kernels permit, as we increased the maximum
permissable address in commit 4ffe713b7587
("powerpc/mm: Increase the max addressable memory to 2PB"). It is
possible that the addressable range may change again in the
future.

In this scenario, we end up with a bogus section returned from
__section_nr (see the discussion on the thread "mm: Trigger bug on
if a section is not found in __section_nr").

Adding a check here means that we fail early and have an
opportunity to handle the error gracefully, rather than rumbling
on and potentially accessing an incorrect section.

Further discussion is also on the thread ("powerpc: Perform a bounds
check in arch_add_memory").

Signed-off-by: Alastair D'Silva 
---
 include/linux/memory_hotplug.h |  1 +
 mm/memory_hotplug.c| 13 -
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index f46ea71b4ffd..bc477e98a310 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -110,6 +110,7 @@ extern void __online_page_increment_counters(struct page 
*page);
 extern void __online_page_free(struct page *page);
 
 extern int try_online_node(int nid);
+int check_hotplug_memory_addressable(u64 start, u64 size);
 
 extern int arch_add_memory(int nid, u64 start, u64 size,
struct mhp_restrictions *restrictions);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c73f09913165..02cb9a74f561 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1030,6 +1030,17 @@ int try_online_node(int nid)
return ret;
 }
 
+int check_hotplug_memory_addressable(u64 start, u64 size)
+{
+#ifdef MAX_PHYSMEM_BITS
+   if ((start + size - 1) >> MAX_PHYSMEM_BITS)
+   return -E2BIG;
+#endif
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(check_hotplug_memory_addressable);
+
 static int check_hotplug_memory_range(u64 start, u64 size)
 {
/* memory range must be block size aligned */
@@ -1040,7 +1051,7 @@ static int check_hotplug_memory_range(u64 start, u64 size)
return -EINVAL;
}
 
-   return 0;
+   return check_hotplug_memory_addressable(start, size);
 }
 
 static int online_memory_block(struct memory_block *mem, void *arg)
-- 
2.21.0



[PATCH v3 0/2] Add bounds check for Hotplugged memory

2019-09-16 Thread Alastair D'Silva
From: Alastair D'Silva 

This series adds bounds checks for hotplugged memory, ensuring that
it is within the physically addressable range (for platforms that
define MAX_(POSSIBLE_)PHYSMEM_BITS.

This allows for early failure, rather than attempting to access
bogus section numbers.

Changelog:
 V3:
   - Perform the addressable check before we take the hotplug lock
 V2:
   - Don't use MAX_POSSIBLE_PHYSMEM_BITS as it's wider that what
 may be available

Alastair D'Silva (2):
  memory_hotplug: Add a bounds check to check_hotplug_memory_range()
  mm: Add a bounds check in devm_memremap_pages()

 include/linux/memory_hotplug.h |  1 +
 mm/memory_hotplug.c| 13 -
 mm/memremap.c  |  5 +
 3 files changed, 18 insertions(+), 1 deletion(-)

-- 
2.21.0



[PATCH v2 2/2] mm: Add a bounds check in devm_memremap_pages()

2019-09-16 Thread Alastair D'Silva
From: Alastair D'Silva 

The call to check_hotplug_memory_addressable() validates that the memory
is fully addressable.

Without this call, it is possible that we may remap pages that is
not physically addressable, resulting in bogus section numbers
being returned from __section_nr().

Signed-off-by: Alastair D'Silva 
---
 mm/memremap.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/mm/memremap.c b/mm/memremap.c
index 86432650f829..fd00993caa3e 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -269,6 +269,13 @@ void *devm_memremap_pages(struct device *dev, struct 
dev_pagemap *pgmap)
 
mem_hotplug_begin();
 
+   error = check_hotplug_memory_addressable(res->start,
+resource_size(res));
+   if (error) {
+   mem_hotplug_done();
+   goto err_checkrange;
+   }
+
/*
 * For device private memory we call add_pages() as we only need to
 * allocate and initialize struct page for the device memory. More-
@@ -324,6 +331,7 @@ void *devm_memremap_pages(struct device *dev, struct 
dev_pagemap *pgmap)
 
  err_add_memory:
kasan_remove_zero_shadow(__va(res->start), resource_size(res));
+ err_checkrange:
  err_kasan:
untrack_pfn(NULL, PHYS_PFN(res->start), resource_size(res));
  err_pfn_remap:
-- 
2.21.0



[PATCH v2 1/2] memory_hotplug: Add a bounds check to check_hotplug_memory_range()

2019-09-16 Thread Alastair D'Silva
From: Alastair D'Silva 

On PowerPC, the address ranges allocated to OpenCAPI LPC memory
are allocated from firmware. These address ranges may be higher
than what older kernels permit, as we increased the maximum
permissable address in commit 4ffe713b7587
("powerpc/mm: Increase the max addressable memory to 2PB"). It is
possible that the addressable range may change again in the
future.

In this scenario, we end up with a bogus section returned from
__section_nr (see the discussion on the thread "mm: Trigger bug on
if a section is not found in __section_nr").

Adding a check here means that we fail early and have an
opportunity to handle the error gracefully, rather than rumbling
on and potentially accessing an incorrect section.

Further discussion is also on the thread ("powerpc: Perform a bounds
check in arch_add_memory").

Signed-off-by: Alastair D'Silva 
---
 include/linux/memory_hotplug.h |  1 +
 mm/memory_hotplug.c| 13 -
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index f46ea71b4ffd..bc477e98a310 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -110,6 +110,7 @@ extern void __online_page_increment_counters(struct page 
*page);
 extern void __online_page_free(struct page *page);
 
 extern int try_online_node(int nid);
+int check_hotplug_memory_addressable(u64 start, u64 size);
 
 extern int arch_add_memory(int nid, u64 start, u64 size,
struct mhp_restrictions *restrictions);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c73f09913165..02cb9a74f561 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1030,6 +1030,17 @@ int try_online_node(int nid)
return ret;
 }
 
+int check_hotplug_memory_addressable(u64 start, u64 size)
+{
+#ifdef MAX_PHYSMEM_BITS
+   if ((start + size - 1) >> MAX_PHYSMEM_BITS)
+   return -E2BIG;
+#endif
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(check_hotplug_memory_addressable);
+
 static int check_hotplug_memory_range(u64 start, u64 size)
 {
/* memory range must be block size aligned */
@@ -1040,7 +1051,7 @@ static int check_hotplug_memory_range(u64 start, u64 size)
return -EINVAL;
}
 
-   return 0;
+   return check_hotplug_memory_addressable(start, size);
 }
 
 static int online_memory_block(struct memory_block *mem, void *arg)
-- 
2.21.0



[PATCH v2 0/2] Add bounds check for Hotplugged memory

2019-09-16 Thread Alastair D'Silva
From: Alastair D'Silva 

This series adds bounds checks for hotplugged memory, ensuring that
it is within the physically addressable range (for platforms that
define MAX_(POSSIBLE_)PHYSMEM_BITS.

This allows for early failure, rather than attempting to access
bogus section numbers.

Changelog:
 V2:
   - Don't use MAX_POSSIBLE_PHYSMEM_BITS as it's wider that what
 may be available

Alastair D'Silva (2):
  memory_hotplug: Add a bounds check to check_hotplug_memory_range()
  mm: Add a bounds check in devm_memremap_pages()

 include/linux/memory_hotplug.h |  1 +
 mm/memory_hotplug.c| 13 -
 mm/memremap.c  |  8 
 3 files changed, 21 insertions(+), 1 deletion(-)

-- 
2.21.0



RE: [PATCH v2 1/6] powerpc: Allow flush_icache_range to work across ranges >4GB

2019-09-15 Thread Alastair D'Silva
On Sat, 2019-09-14 at 09:46 +0200, Christophe Leroy wrote:
> 
> Le 03/09/2019 à 07:23, Alastair D'Silva a écrit :
> > From: Alastair D'Silva 
> > 
> > When calling flush_icache_range with a size >4GB, we were masking
> > off the upper 32 bits, so we would incorrectly flush a range
> > smaller
> > than intended.
> > 
> > This patch replaces the 32 bit shifts with 64 bit ones, so that
> > the full size is accounted for.
> 
> Isn't there the same issue in arch/powerpc/kernel/vdso64/cacheflush.S
> ?
> 
> Christophe

Yes, there is. I'll fix it, but I wonder whether anything calls it? I
asked Google, and every mention of it was in the kernel source or
mailing list.

Maybe BenH can chime in?

-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



RE: [PATCH 1/2] memory_hotplug: Add a bounds check to check_hotplug_memory_range()

2019-09-10 Thread Alastair D'Silva
> -Original Message-
> From: Kirill A. Shutemov 
> Sent: Tuesday, 10 September 2019 8:15 PM
> To: Alastair D'Silva 
> Cc: alast...@d-silva.org; Andrew Morton ;
> David Hildenbrand ; Oscar Salvador
> ; Michal Hocko ; Pavel Tatashin
> ; Wei Yang ;
> Dan Williams ; Qian Cai ; Jason
> Gunthorpe ; Logan Gunthorpe ; Ira
> Weiny ; linux...@kvack.org; linux-
> ker...@vger.kernel.org
> Subject: Re: [PATCH 1/2] memory_hotplug: Add a bounds check to
> check_hotplug_memory_range()
> 
> On Tue, Sep 10, 2019 at 12:52:20PM +1000, Alastair D'Silva wrote:
> > From: Alastair D'Silva 
> >
> > On PowerPC, the address ranges allocated to OpenCAPI LPC memory are
> > allocated from firmware. These address ranges may be higher than what
> > older kernels permit, as we increased the maximum permissable address
> > in commit 4ffe713b7587
> > ("powerpc/mm: Increase the max addressable memory to 2PB"). It is
> > possible that the addressable range may change again in the future.
> >
> > In this scenario, we end up with a bogus section returned from
> > __section_nr (see the discussion on the thread "mm: Trigger bug on if
> > a section is not found in __section_nr").
> >
> > Adding a check here means that we fail early and have an opportunity
> > to handle the error gracefully, rather than rumbling on and
> > potentially accessing an incorrect section.
> >
> > Further discussion is also on the thread ("powerpc: Perform a bounds
> > check in arch_add_memory").
> >
> > Signed-off-by: Alastair D'Silva 
> > ---
> >  include/linux/memory_hotplug.h |  1 +
> >  mm/memory_hotplug.c| 19 ++-
> >  2 files changed, 19 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/memory_hotplug.h
> > b/include/linux/memory_hotplug.h index f46ea71b4ffd..bc477e98a310
> > 100644
> > --- a/include/linux/memory_hotplug.h
> > +++ b/include/linux/memory_hotplug.h
> > @@ -110,6 +110,7 @@ extern void
> > __online_page_increment_counters(struct page *page);  extern void
> > __online_page_free(struct page *page);
> >
> >  extern int try_online_node(int nid);
> > +int check_hotplug_memory_addressable(u64 start, u64 size);
> >
> >  extern int arch_add_memory(int nid, u64 start, u64 size,
> > struct mhp_restrictions *restrictions); diff --git
> > a/mm/memory_hotplug.c b/mm/memory_hotplug.c index
> > c73f09913165..3c5428b014f9 100644
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -1030,6 +1030,23 @@ int try_online_node(int nid)
> > return ret;
> >  }
> >
> > +#ifndef MAX_POSSIBLE_PHYSMEM_BITS
> > +#ifdef MAX_PHYSMEM_BITS
> > +#define MAX_POSSIBLE_PHYSMEM_BITS MAX_PHYSMEM_BITS #endif
> #endif
> > +
> > +int check_hotplug_memory_addressable(u64 start, u64 size) { #ifdef
> > +MAX_POSSIBLE_PHYSMEM_BITS
> 
> How can it be not defined? You've defined it 6 lines above.
> 

It's only conditionally defined.

I'll be following David H's advice and just using MAX_PHYSMEM_BITS in the
next spin anyway.

> > +   if ((start + size - 1) >> MAX_POSSIBLE_PHYSMEM_BITS)
> > +   return -E2BIG;
> > +#endif
> > +
> > +   return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(check_hotplug_memory_addressable);
> > +
> >  static int check_hotplug_memory_range(u64 start, u64 size)  {
> > /* memory range must be block size aligned */ @@ -1040,7 +1057,7
> @@
> > static int check_hotplug_memory_range(u64 start, u64 size)
> > return -EINVAL;
> > }
> >
> > -   return 0;
> > +   return check_hotplug_memory_addressable(start, size);
> >  }
> >
> >  static int online_memory_block(struct memory_block *mem, void *arg)
> > --
> > 2.21.0
> >
> 
> --
>  Kirill A. Shutemov
> 


-- 
Alastair D'Silva   mob: 0423 762 819
skype: alastair_dsilva msn: alast...@d-silva.org
blog: http://alastair.d-silva.orgTwitter: @EvilDeece




RE: [PATCH 1/2] memory_hotplug: Add a bounds check to check_hotplug_memory_range()

2019-09-10 Thread Alastair D'Silva
> -Original Message-
> From: David Hildenbrand 
> Sent: Tuesday, 10 September 2019 5:46 PM
> To: Alastair D'Silva ; alast...@d-silva.org
> Cc: Andrew Morton ; Oscar Salvador
> ; Michal Hocko ; Pavel Tatashin
> ; Wei Yang ;
> Dan Williams ; Qian Cai ; Jason
> Gunthorpe ; Logan Gunthorpe ; Ira
> Weiny ; linux...@kvack.org; linux-
> ker...@vger.kernel.org
> Subject: Re: [PATCH 1/2] memory_hotplug: Add a bounds check to
> check_hotplug_memory_range()
> 
> On 10.09.19 04:52, Alastair D'Silva wrote:
> > From: Alastair D'Silva 
> >
> > On PowerPC, the address ranges allocated to OpenCAPI LPC memory are
> > allocated from firmware. These address ranges may be higher than what
> > older kernels permit, as we increased the maximum permissable address
> > in commit 4ffe713b7587
> > ("powerpc/mm: Increase the max addressable memory to 2PB"). It is
> > possible that the addressable range may change again in the future.
> >
> > In this scenario, we end up with a bogus section returned from
> > __section_nr (see the discussion on the thread "mm: Trigger bug on if
> > a section is not found in __section_nr").
> >
> > Adding a check here means that we fail early and have an opportunity
> > to handle the error gracefully, rather than rumbling on and
> > potentially accessing an incorrect section.
> >
> > Further discussion is also on the thread ("powerpc: Perform a bounds
> > check in arch_add_memory").
> >
> > Signed-off-by: Alastair D'Silva 
> > ---
> >  include/linux/memory_hotplug.h |  1 +
> >  mm/memory_hotplug.c| 19 ++-
> >  2 files changed, 19 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/memory_hotplug.h
> > b/include/linux/memory_hotplug.h index f46ea71b4ffd..bc477e98a310
> > 100644
> > --- a/include/linux/memory_hotplug.h
> > +++ b/include/linux/memory_hotplug.h
> > @@ -110,6 +110,7 @@ extern void
> > __online_page_increment_counters(struct page *page);  extern void
> > __online_page_free(struct page *page);
> >
> >  extern int try_online_node(int nid);
> > +int check_hotplug_memory_addressable(u64 start, u64 size);
> >
> >  extern int arch_add_memory(int nid, u64 start, u64 size,
> > struct mhp_restrictions *restrictions); diff --git
> > a/mm/memory_hotplug.c b/mm/memory_hotplug.c index
> > c73f09913165..3c5428b014f9 100644
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -1030,6 +1030,23 @@ int try_online_node(int nid)
> > return ret;
> >  }
> >
> > +#ifndef MAX_POSSIBLE_PHYSMEM_BITS
> > +#ifdef MAX_PHYSMEM_BITS
> > +#define MAX_POSSIBLE_PHYSMEM_BITS MAX_PHYSMEM_BITS #endif
> #endif
> > +
> 
> I think using MAX_POSSIBLE_PHYSMEM_BITS bits is wrong. You should use
> MAX_PHYSMEM_BITS.
> 
> E.g. on x86_64, MAX_POSSIBLE_PHYSMEM_BITS is 52, while
> MAX_PHYSMEM_BITS is (pgtable_l5_enabled() ? 52 : 46) - so
> MAX_PHYSMEM_BITS depends on the actual HW.
> 

Thanks, I was following the pattern from zsmalloc.c, but what you say makes 
sense.

> > +int check_hotplug_memory_addressable(u64 start, u64 size) { #ifdef
> > +MAX_POSSIBLE_PHYSMEM_BITS
> > +   if ((start + size - 1) >> MAX_POSSIBLE_PHYSMEM_BITS)
> > +   return -E2BIG;
> > +#endif
> > +
> > +   return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(check_hotplug_memory_addressable);
> > +
> >  static int check_hotplug_memory_range(u64 start, u64 size)  {
> > /* memory range must be block size aligned */ @@ -1040,7 +1057,7
> @@
> > static int check_hotplug_memory_range(u64 start, u64 size)
> > return -EINVAL;
> > }
> >
> > -   return 0;
> > +   return check_hotplug_memory_addressable(start, size);
> >  }
> >
> >  static int online_memory_block(struct memory_block *mem, void *arg)
> >
> 
> 
> --
> 
> Thanks,
> 
> David / dhildenb
> 


-- 
Alastair D'Silva   mob: 0423 762 819
skype: alastair_dsilva msn: alast...@d-silva.org
blog: http://alastair.d-silva.orgTwitter: @EvilDeece



RE: [PATCH 2/2] mm: Add a bounds check in devm_memremap_pages()

2019-09-10 Thread Alastair D'Silva
> -Original Message-
> From: David Hildenbrand 
> Sent: Tuesday, 10 September 2019 5:39 PM
> To: Alastair D'Silva ; alast...@d-silva.org
> Cc: Andrew Morton ; Oscar Salvador
> ; Michal Hocko ; Pavel Tatashin
> ; Dan Williams ;
> Wei Yang ; Qian Cai ; Jason
> Gunthorpe ; Logan Gunthorpe ; Ira
> Weiny ; linux...@kvack.org; linux-
> ker...@vger.kernel.org
> Subject: Re: [PATCH 2/2] mm: Add a bounds check in
> devm_memremap_pages()
> 
> On 10.09.19 04:52, Alastair D'Silva wrote:
> > From: Alastair D'Silva 
> >
> > The call to check_hotplug_memory_addressable() validates that the
> > memory is fully addressable.
> >
> > Without this call, it is possible that we may remap pages that is not
> > physically addressable, resulting in bogus section numbers being
> > returned from __section_nr().
> >
> > Signed-off-by: Alastair D'Silva 
> > ---
> >  mm/memremap.c | 8 
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/mm/memremap.c b/mm/memremap.c index
> > 86432650f829..fd00993caa3e 100644
> > --- a/mm/memremap.c
> > +++ b/mm/memremap.c
> > @@ -269,6 +269,13 @@ void *devm_memremap_pages(struct device
> *dev,
> > struct dev_pagemap *pgmap)
> >
> > mem_hotplug_begin();
> >
> > +   error = check_hotplug_memory_addressable(res->start,
> > +resource_size(res));
> > +   if (error) {
> > +   mem_hotplug_done();
> > +   goto err_checkrange;
> > +   }
> > +
> 
> No need to check under the memory hotplug lock.
> 

Thanks, I'll adjust it.

> > /*
> >  * For device private memory we call add_pages() as we only need to
> >  * allocate and initialize struct page for the device memory. More-
> > @@ -324,6 +331,7 @@ void *devm_memremap_pages(struct device *dev,
> > struct dev_pagemap *pgmap)
> >
> >   err_add_memory:
> > kasan_remove_zero_shadow(__va(res->start), resource_size(res));
> > + err_checkrange:
> >   err_kasan:
> > untrack_pfn(NULL, PHYS_PFN(res->start), resource_size(res));
> >   err_pfn_remap:
> >
> 
> 
> --
> 
> Thanks,
> 
> David / dhildenb
> 

-- 
Alastair D'Silva   mob: 0423 762 819
skype: alastair_dsilva msn: alast...@d-silva.org
blog: http://alastair.d-silva.orgTwitter: @EvilDeece



[PATCH 0/2] Add bounds check for Hotplugged memory

2019-09-09 Thread Alastair D'Silva
From: Alastair D'Silva 

This series adds bounds checks for hotplugged memory, ensuring that
it is within the physically addressable range (for platforms that
define MAX_(POSSIBLE_)PHYSMEM_BITS.

This allows for early failure, rather than attempting to access
bogus section numbers.

Alastair D'Silva (2):
  memory_hotplug: Add a bounds check to check_hotplug_memory_range()
  mm: Add a bounds check in devm_memremap_pages()

 include/linux/memory_hotplug.h |  1 +
 mm/memory_hotplug.c| 19 ++-
 mm/memremap.c  |  8 
 3 files changed, 27 insertions(+), 1 deletion(-)

-- 
2.21.0



[PATCH 2/2] mm: Add a bounds check in devm_memremap_pages()

2019-09-09 Thread Alastair D'Silva
From: Alastair D'Silva 

The call to check_hotplug_memory_addressable() validates that the memory
is fully addressable.

Without this call, it is possible that we may remap pages that is
not physically addressable, resulting in bogus section numbers
being returned from __section_nr().

Signed-off-by: Alastair D'Silva 
---
 mm/memremap.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/mm/memremap.c b/mm/memremap.c
index 86432650f829..fd00993caa3e 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -269,6 +269,13 @@ void *devm_memremap_pages(struct device *dev, struct 
dev_pagemap *pgmap)
 
mem_hotplug_begin();
 
+   error = check_hotplug_memory_addressable(res->start,
+resource_size(res));
+   if (error) {
+   mem_hotplug_done();
+   goto err_checkrange;
+   }
+
/*
 * For device private memory we call add_pages() as we only need to
 * allocate and initialize struct page for the device memory. More-
@@ -324,6 +331,7 @@ void *devm_memremap_pages(struct device *dev, struct 
dev_pagemap *pgmap)
 
  err_add_memory:
kasan_remove_zero_shadow(__va(res->start), resource_size(res));
+ err_checkrange:
  err_kasan:
untrack_pfn(NULL, PHYS_PFN(res->start), resource_size(res));
  err_pfn_remap:
-- 
2.21.0



[PATCH 1/2] memory_hotplug: Add a bounds check to check_hotplug_memory_range()

2019-09-09 Thread Alastair D'Silva
From: Alastair D'Silva 

On PowerPC, the address ranges allocated to OpenCAPI LPC memory
are allocated from firmware. These address ranges may be higher
than what older kernels permit, as we increased the maximum
permissable address in commit 4ffe713b7587
("powerpc/mm: Increase the max addressable memory to 2PB"). It is
possible that the addressable range may change again in the
future.

In this scenario, we end up with a bogus section returned from
__section_nr (see the discussion on the thread "mm: Trigger bug on
if a section is not found in __section_nr").

Adding a check here means that we fail early and have an
opportunity to handle the error gracefully, rather than rumbling
on and potentially accessing an incorrect section.

Further discussion is also on the thread ("powerpc: Perform a bounds
check in arch_add_memory").

Signed-off-by: Alastair D'Silva 
---
 include/linux/memory_hotplug.h |  1 +
 mm/memory_hotplug.c| 19 ++-
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index f46ea71b4ffd..bc477e98a310 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -110,6 +110,7 @@ extern void __online_page_increment_counters(struct page 
*page);
 extern void __online_page_free(struct page *page);
 
 extern int try_online_node(int nid);
+int check_hotplug_memory_addressable(u64 start, u64 size);
 
 extern int arch_add_memory(int nid, u64 start, u64 size,
struct mhp_restrictions *restrictions);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c73f09913165..3c5428b014f9 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1030,6 +1030,23 @@ int try_online_node(int nid)
return ret;
 }
 
+#ifndef MAX_POSSIBLE_PHYSMEM_BITS
+#ifdef MAX_PHYSMEM_BITS
+#define MAX_POSSIBLE_PHYSMEM_BITS MAX_PHYSMEM_BITS
+#endif
+#endif
+
+int check_hotplug_memory_addressable(u64 start, u64 size)
+{
+#ifdef MAX_POSSIBLE_PHYSMEM_BITS
+   if ((start + size - 1) >> MAX_POSSIBLE_PHYSMEM_BITS)
+   return -E2BIG;
+#endif
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(check_hotplug_memory_addressable);
+
 static int check_hotplug_memory_range(u64 start, u64 size)
 {
/* memory range must be block size aligned */
@@ -1040,7 +1057,7 @@ static int check_hotplug_memory_range(u64 start, u64 size)
return -EINVAL;
}
 
-   return 0;
+   return check_hotplug_memory_addressable(start, size);
 }
 
 static int online_memory_block(struct memory_block *mem, void *arg)
-- 
2.21.0



Re: [PATCH] powerpc: Perform a bounds check in arch_add_memory

2019-09-03 Thread Alastair D'Silva
On Mon, 2019-09-02 at 09:28 +0200, David Hildenbrand wrote:
> On 02.09.19 01:54, Alastair D'Silva wrote:
> > On Tue, 2019-08-27 at 09:13 +0200, David Hildenbrand wrote:
> > > On 27.08.19 08:39, Alastair D'Silva wrote:
> > > > On Tue, 2019-08-27 at 08:28 +0200, Michal Hocko wrote:
> > > > > On Tue 27-08-19 15:20:46, Alastair D'Silva wrote:
> > > > > > From: Alastair D'Silva 
> > > > > > 
> > > > > > It is possible for firmware to allocate memory ranges
> > > > > > outside
> > > > > > the range of physical memory that we support
> > > > > > (MAX_PHYSMEM_BITS).
> > > > > 
> > > > > Doesn't that count as a FW bug? Do you have any evidence of
> > > > > that
> > > > > in
> > > > > the
> > > > > field? Just wondering...
> > > > > 
> > > > 
> > > > Not outside our lab, but OpenCAPI attached LPC memory is
> > > > assigned
> > > > addresses based on the slot/NPU it is connected to. These
> > > > addresses
> > > > prior to:
> > > > 4ffe713b7587 ("powerpc/mm: Increase the max addressable memory
> > > > to
> > > > 2PB")
> > > > were inaccessible and resulted in bogus sections - see our
> > > > discussion
> > > > on 'mm: Trigger bug on if a section is not found in
> > > > __section_nr'.
> > > > Doing this check here was your suggestion :)
> > > > 
> > > > It's entirely possible that a similar problem will occur in the
> > > > future,
> > > > and it's cheap to guard against, which is why I've added this.
> > > > 
> > > 
> > > If you keep it here, I guess this should be wrapped by a
> > > WARN_ON_ONCE().
> > > 
> > > If we move it to common code (e.g., __add_pages() or
> > > add_memory()),
> > > then
> > > probably not. I can see that s390x allows to configure
> > > MAX_PHYSMEM_BITS,
> > > so the check could actually make sense.
> > > 
> > 
> > I couldn't see a nice platform indepedent way to determine the
> > allowable address range, but if there is, then I'll move this to
> > the
> > generic code instead.
> > 
> 
> At least on the !ZONE_DEVICE path we have
> 
> __add_memory() -> register_memory_resource() ...
> 
> return ERR_PTR(-E2BIG);
> 
> 
> I was thinking about something like
> 
> int add_pages()
> {
>   if ((start + size - 1) >> MAX_PHYSMEM_BITS)
>   return -E2BIG;  
> 
>   return arch_add_memory(...)
> }
> 
> And switching users of arch_add_memory() to add_pages(). However, x86
> already has an add_pages() function, so that would need some more
> thought.
> 
> Maybe simply renaming the existing add_pages() to arch_add_pages().
> 
> add_pages(): Create virtual mapping
> __add_pages(): Don't create virtual mapping
> 
> arch_add_memory(): Arch backend for add_pages()
> arch_add_pages(): Arch backend for __add_pages()
> 
> It would be even more consistent if we would have arch_add_pages()
> vs.
> __arch_add_pages().

Looking a bit further, I think a good course of action would be to add
the check to memory_hotplug.c:check_hotplug_memory_range().

This would be the least invasive, and could check both
MAX_POSSIBLE_PHYSMEM_BITS and MAX_PHYSMEM_BITS.

With that in mind, we can drop this patch.

-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



RE: [PATCH v2 4/6] powerpc: Chunk calls to flush_dcache_range in arch_*_memory

2019-09-03 Thread Alastair D'Silva
On Tue, 2019-09-03 at 08:51 +0200, Christophe Leroy wrote:


> > > This piece of code looks pretty similar to the one before. Can we
> > > refactor into a small helper ?
> > > 
> > 
> > Not much point, it's removed in a subsequent patch.
> > 
> 
> But you tell me that you leave to people the opportunity to not
> apply 
> that subsequent patch, and that's the reason you didn't put that
> patch 
> before this one. In that case adding a helper is worth it.
> 
> Christophe

I factored it out anyway, since it made the code much nicer to read.

-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



RE: [PATCH v2 3/6] powerpc: Convert flush_icache_range & friends to C

2019-09-03 Thread Alastair D'Silva
On Tue, 2019-09-03 at 22:11 +0200, Gabriel Paubert wrote:
> On Tue, Sep 03, 2019 at 01:31:57PM -0500, Segher Boessenkool wrote:
> > On Tue, Sep 03, 2019 at 07:05:19PM +0200, Christophe Leroy wrote:
> > > Le 03/09/2019 à 18:04, Segher Boessenkool a écrit :
> > > > (Why are they separate though?  It could just be one loop var).
> > > 
> > > Yes it could just be a single loop var, but in that case it would
> > > have 
> > > to be reset at the start of the second loop, which means we would
> > > have 
> > > to pass 'addr' for resetting the loop anyway,
> > 
> > Right, I noticed that after hitting send, as usual.
> > 
> > > so I opted to do it 
> > > outside the inline asm by using to separate loop vars set to
> > > their 
> > > starting value outside the inline asm.
> > 
> > The thing is, the way it is written now, it will get separate
> > registers
> > for each loop (with proper earlyclobbers added).  Not that that
> > really
> > matters of course, it just feels wrong :-)
> 
> After "mtmsr %3", it is always possible to copy %0 to %3 and use it
> as
> an address register for the second loop. One register less to
> allocate
> for the compiler. Constraints of course have to be adjusted.
> 
> 

Given that we're dealing with registers holding data that has been
named outside the assembler, this feels dirty. We'd be using the
register passed in as 'msr' to hold the address instead.

Since we're not short on registers, I don't see this as a good change.

-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



RE: [PATCH v2 3/6] powerpc: Convert flush_icache_range & friends to C

2019-09-03 Thread Alastair D'Silva
On Tue, 2019-09-03 at 11:04 -0500, Segher Boessenkool wrote:
> On Tue, Sep 03, 2019 at 04:28:09PM +0200, Christophe Leroy wrote:
> > Le 03/09/2019 à 15:04, Segher Boessenkool a écrit :
> > > On Tue, Sep 03, 2019 at 03:23:57PM +1000, Alastair D'Silva wrote:
> > > > +   asm volatile(
> > > > +   "   mtctr %2;"
> > > > +   "   mtmsr %3;"
> > > > +   "   isync;"
> > > > +   "0: dcbst   0, %0;"
> > > > +   "   addi%0, %0, %4;"
> > > > +   "   bdnz0b;"
> > > > +   "   sync;"
> > > > +   "   mtctr %2;"
> > > > +   "1: icbi0, %1;"
> > > > +   "   addi%1, %1, %4;"
> > > > +   "   bdnz1b;"
> > > > +   "   sync;"
> > > > +   "   mtmsr %5;"
> > > > +   "   isync;"
> > > > +   : "+r" (loop1), "+r" (loop2)
> > > > +   : "r" (nb), "r" (msr), "i" (bytes), "r" (msr0)
> > > > +   : "ctr", "memory");
> > > 
> > > This outputs as one huge assembler statement, all on one
> > > line.  That's
> > > going to be fun to read or debug.
> > 
> > Do you mean \n has to be added after the ; ?
> 
> Something like that.  There is no really satisfying way for doing
> huge
> inline asm, and maybe that is a good thing ;-)
> 
> Often people write \n\t at the end of each line of inline asm.  This
> works
> pretty well (but then there are labels, oh joy).
> 
> > > loop1 and/or loop2 can be assigned the same register as msr0 or
> > > nb.  They
> > > need to be made earlyclobbers.  (msr is fine, all of its reads
> > > are before
> > > any writes to loop1 or loop2; and bytes is fine, it's not a
> > > register).
> > 
> > Can you explicit please ? Doesn't '+r' means that they are input
> > and 
> > output at the same time ?
> 
> That is what + means, yes -- that this output is an input as
> well.  It is
> the same to write
> 
>   asm("mov %1,%0 ; mov %0,42" : "+r"(x), "=r"(y));
> or to write
>   asm("mov %1,%0 ; mov %0,42" : "=r"(x), "=r"(y) : "0"(x));
> 
> (So not "at the same time" as in "in the same machine instruction",
> but
> more loosely, as in "in the same inline asm statement").
> 
> > "to be made earlyclobbers", what does this means exactly ? How to
> > do that ?
> 
> You write &, like "+" in this case.  It means the machine code
> writes
> to this register before it has consumed all asm inputs (remember, GCC
> does not understand (or even parse!) the assembler string).
> 
> So just
> 
>   : "+" (loop1), "+" (loop2)
> 
> will do.  (Why are they separate though?  It could just be one loop
> var).
> 
> 

Thanks, I've updated these.

-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



RE: [PATCH v2 3/6] powerpc: Convert flush_icache_range & friends to C

2019-09-03 Thread Alastair D'Silva
On Tue, 2019-09-03 at 08:08 +0200, Christophe Leroy wrote:
> 
> Le 03/09/2019 à 07:23, Alastair D'Silva a écrit :
> > From: Alastair D'Silva 
> > 
> > Similar to commit 22e9c88d486a
> > ("powerpc/64: reuse PPC32 static inline flush_dcache_range()")
> > this patch converts the following ASM symbols to C:
> >  flush_icache_range()
> >  __flush_dcache_icache()
> >  __flush_dcache_icache_phys()
> > 
> > This was done as we discovered a long-standing bug where the length
> > of the
> > range was truncated due to using a 32 bit shift instead of a 64 bit
> > one.
> > 
> > By converting these functions to C, it becomes easier to maintain.
> > 
> > flush_dcache_icache_phys() retains a critical assembler section as
> > we must
> > ensure there are no memory accesses while the data MMU is disabled
> > (authored by Christophe Leroy). Since this has no external callers,
> > it has
> > also been made static, allowing the compiler to inline it within
> > flush_dcache_icache_page().
> > 
> > Signed-off-by: Alastair D'Silva 
> > Signed-off-by: Christophe Leroy 
> > ---
> >   arch/powerpc/include/asm/cache.h  |  26 ++---
> >   arch/powerpc/include/asm/cacheflush.h |  24 ++--
> >   arch/powerpc/kernel/misc_32.S | 117 
> >   arch/powerpc/kernel/misc_64.S | 102 -
> >   arch/powerpc/mm/mem.c | 152
> > +-
> >   5 files changed, 173 insertions(+), 248 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/cache.h
> > b/arch/powerpc/include/asm/cache.h
> > index f852d5cd746c..91c808c6738b 100644
> > --- a/arch/powerpc/include/asm/cache.h
> > +++ b/arch/powerpc/include/asm/cache.h
> > @@ -98,20 +98,7 @@ static inline u32 l1_icache_bytes(void)
> >   #endif
> >   #endif /* ! __ASSEMBLY__ */
> >   
> > -#if defined(__ASSEMBLY__)
> > -/*
> > - * For a snooping icache, we still need a dummy icbi to purge all
> > the
> > - * prefetched instructions from the ifetch buffers. We also need a
> > sync
> > - * before the icbi to order the the actual stores to memory that
> > might
> > - * have modified instructions with the icbi.
> > - */
> > -#define PURGE_PREFETCHED_INS   \
> > -   sync;   \
> > -   icbi0,r3;   \
> > -   sync;   \
> > -   isync
> > -
> > -#else
> > +#if !defined(__ASSEMBLY__)
> >   #define __read_mostly
> > __attribute__((__section__(".data..read_mostly")))
> >   
> >   #ifdef CONFIG_PPC_BOOK3S_32
> > @@ -145,6 +132,17 @@ static inline void dcbst(void *addr)
> >   {
> > __asm__ __volatile__ ("dcbst %y0" : : "Z"(*(u8 *)addr) :
> > "memory");
> >   }
> > +
> > +static inline void icbi(void *addr)
> > +{
> > +   __asm__ __volatile__ ("icbi 0, %0" : : "r"(addr) : "memory");
> 
> I think "__asm__ __volatile__" is deprecated. Use "asm volatile"
> instead.
> 

Ok.

> > +}
> > +
> > +static inline void iccci(void *addr)
> > +{
> > +   __asm__ __volatile__ ("iccci 0, %0" : : "r"(addr) : "memory");
> > +}
> > +
> 
> Same
> 
> >   #endif /* !__ASSEMBLY__ */
> >   #endif /* __KERNEL__ */
> >   #endif /* _ASM_POWERPC_CACHE_H */
> > diff --git a/arch/powerpc/include/asm/cacheflush.h
> > b/arch/powerpc/include/asm/cacheflush.h
> > index ed57843ef452..4a1c9f0200e1 100644
> > --- a/arch/powerpc/include/asm/cacheflush.h
> > +++ b/arch/powerpc/include/asm/cacheflush.h
> > @@ -42,24 +42,20 @@ extern void flush_dcache_page(struct page
> > *page);
> >   #define flush_dcache_mmap_lock(mapping)   do { } while
> > (0)
> >   #define flush_dcache_mmap_unlock(mapping) do { } while (0)
> >   
> > -extern void flush_icache_range(unsigned long, unsigned long);
> > +void flush_icache_range(unsigned long start, unsigned long stop);
> >   extern void flush_icache_user_range(struct vm_area_struct *vma,
> > struct page *page, unsigned long
> > addr,
> > int len);
> > -extern void __flush_dcache_icache(void *page_va);
> >   extern void flush_dcache_icache_page(struct page *page);
> > -#if defined(CONFIG_PPC32) && !defined(CONFIG_BOOKE)
> > -extern void __flush_dcache_icache_phys(unsigned long physaddr);
> > -#else
> > -static 

Re: [PATCH v2 6/6] powerpc: Don't flush caches when adding memory

2019-09-03 Thread Alastair D'Silva
On Tue, 2019-09-03 at 08:23 +0200, Christophe Leroy wrote:
> 
> Le 03/09/2019 à 07:24, Alastair D'Silva a écrit :
> > From: Alastair D'Silva 
> > 
> > This operation takes a significant amount of time when hotplugging
> > large amounts of memory (~50 seconds with 890GB of persistent
> > memory).
> > 
> > This was orignally in commit fb5924fddf9e
> > ("powerpc/mm: Flush cache on memory hot(un)plug") to support
> > memtrace,
> > but the flush on add is not needed as it is flushed on remove.
> > 
> > Signed-off-by: Alastair D'Silva 
> > ---
> >   arch/powerpc/mm/mem.c | 7 ---
> >   1 file changed, 7 deletions(-)
> > 
> > diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> > index 854aaea2c6ae..2a14b5b93e19 100644
> > --- a/arch/powerpc/mm/mem.c
> > +++ b/arch/powerpc/mm/mem.c
> > @@ -111,7 +111,6 @@ int __ref arch_add_memory(int nid, u64 start,
> > u64 size,
> >   {
> > unsigned long start_pfn = start >> PAGE_SHIFT;
> > unsigned long nr_pages = size >> PAGE_SHIFT;
> > -   u64 i;
> > int rc;
> >   
> > resize_hpt_for_hotplug(memblock_phys_mem_size());
> > @@ -124,12 +123,6 @@ int __ref arch_add_memory(int nid, u64 start,
> > u64 size,
> > return -EFAULT;
> > }
> >   
> > -   for (i = 0; i < size; i += FLUSH_CHUNK_SIZE) {
> > -   flush_dcache_range(start + i,
> > -  min(start + size, start + i +
> > FLUSH_CHUNK_SIZE));
> > -   cond_resched();
> > -   }
> > -
> 
> So you are removing the code you added in patch 4. Why not move this
> one 
> before patch 4 ?
> 

I put them in this order so that if someone did want the flushes in
arch_add_memory, they could drop the later patch, but not trigger RCU
stalls.

-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



Re: [PATCH v2 4/6] powerpc: Chunk calls to flush_dcache_range in arch_*_memory

2019-09-03 Thread Alastair D'Silva
On Tue, 2019-09-03 at 08:19 +0200, Christophe Leroy wrote:
> 
> Le 03/09/2019 à 07:23, Alastair D'Silva a écrit :
> > From: Alastair D'Silva 
> > 
> > When presented with large amounts of memory being hotplugged
> > (in my test case, ~890GB), the call to flush_dcache_range takes
> > a while (~50 seconds), triggering RCU stalls.
> > 
> > This patch breaks up the call into 1GB chunks, calling
> > cond_resched() inbetween to allow the scheduler to run.
> > 
> > Signed-off-by: Alastair D'Silva 
> > ---
> >   arch/powerpc/mm/mem.c | 18 --
> >   1 file changed, 16 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> > index cd540123874d..854aaea2c6ae 100644
> > --- a/arch/powerpc/mm/mem.c
> > +++ b/arch/powerpc/mm/mem.c
> > @@ -104,11 +104,14 @@ int __weak remove_section_mapping(unsigned
> > long start, unsigned long end)
> > return -ENODEV;
> >   }
> >   
> > +#define FLUSH_CHUNK_SIZE SZ_1G
> 
> Maybe the name is a bit long for a local define. See if we could
> reduce 
> code line splits below by shortening this name.
> 
> > +
> >   int __ref arch_add_memory(int nid, u64 start, u64 size,
> > struct mhp_restrictions *restrictions)
> >   {
> > unsigned long start_pfn = start >> PAGE_SHIFT;
> > unsigned long nr_pages = size >> PAGE_SHIFT;
> > +   u64 i;
> > int rc;
> >   
> > resize_hpt_for_hotplug(memblock_phys_mem_size());
> > @@ -120,7 +123,12 @@ int __ref arch_add_memory(int nid, u64 start,
> > u64 size,
> > start, start + size, rc);
> > return -EFAULT;
> > }
> > -   flush_dcache_range(start, start + size);
> > +
> > +   for (i = 0; i < size; i += FLUSH_CHUNK_SIZE) {
> > +   flush_dcache_range(start + i,
> > +  min(start + size, start + i +
> > FLUSH_CHUNK_SIZE));
> 
> My eyes don't like it.
> 
> What about
>   for (; i < size; i += FLUSH_CHUNK_SIZE) {
>   int len = min(size - i, FLUSH_CHUNK_SIZE);
> 
>   flush_dcache_range(start + i, start + i + len);
>   cond_resched();
>   }
> 
> or
> 
>   end = start + size;
>   for (; start < end; start += FLUSH_CHUNK_SIZE, size -=
> FLUSH_CHUNK_SIZE) {
>   int len = min(size, FLUSH_CHUNK_SIZE);
> 
>   flush_dcache_range(start, start + len);
>   cond_resched();
>   }
> 
> > +   cond_resched();
> > +   }
> >   
> > return __add_pages(nid, start_pfn, nr_pages, restrictions);
> >   }
> > @@ -131,13 +139,19 @@ void __ref arch_remove_memory(int nid, u64
> > start, u64 size,
> > unsigned long start_pfn = start >> PAGE_SHIFT;
> > unsigned long nr_pages = size >> PAGE_SHIFT;
> > struct page *page = pfn_to_page(start_pfn) +
> > vmem_altmap_offset(altmap);
> > +   u64 i;
> > int ret;
> >   
> > __remove_pages(page_zone(page), start_pfn, nr_pages, altmap);
> >   
> > /* Remove htab bolted mappings for this section of memory */
> > start = (unsigned long)__va(start);
> > -   flush_dcache_range(start, start + size);
> > +   for (i = 0; i < size; i += FLUSH_CHUNK_SIZE) {
> > +   flush_dcache_range(start + i,
> > +  min(start + size, start + i +
> > FLUSH_CHUNK_SIZE));
> > +   cond_resched();
> > +   }
> > +
> 
> This piece of code looks pretty similar to the one before. Can we 
> refactor into a small helper ?
> 

Not much point, it's removed in a subsequent patch.

-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



[PATCH v2 3/6] powerpc: Convert flush_icache_range & friends to C

2019-09-02 Thread Alastair D'Silva
From: Alastair D'Silva 

Similar to commit 22e9c88d486a
("powerpc/64: reuse PPC32 static inline flush_dcache_range()")
this patch converts the following ASM symbols to C:
flush_icache_range()
__flush_dcache_icache()
__flush_dcache_icache_phys()

This was done as we discovered a long-standing bug where the length of the
range was truncated due to using a 32 bit shift instead of a 64 bit one.

By converting these functions to C, it becomes easier to maintain.

flush_dcache_icache_phys() retains a critical assembler section as we must
ensure there are no memory accesses while the data MMU is disabled
(authored by Christophe Leroy). Since this has no external callers, it has
also been made static, allowing the compiler to inline it within
flush_dcache_icache_page().

Signed-off-by: Alastair D'Silva 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/cache.h  |  26 ++---
 arch/powerpc/include/asm/cacheflush.h |  24 ++--
 arch/powerpc/kernel/misc_32.S | 117 
 arch/powerpc/kernel/misc_64.S | 102 -
 arch/powerpc/mm/mem.c | 152 +-
 5 files changed, 173 insertions(+), 248 deletions(-)

diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h
index f852d5cd746c..91c808c6738b 100644
--- a/arch/powerpc/include/asm/cache.h
+++ b/arch/powerpc/include/asm/cache.h
@@ -98,20 +98,7 @@ static inline u32 l1_icache_bytes(void)
 #endif
 #endif /* ! __ASSEMBLY__ */
 
-#if defined(__ASSEMBLY__)
-/*
- * For a snooping icache, we still need a dummy icbi to purge all the
- * prefetched instructions from the ifetch buffers. We also need a sync
- * before the icbi to order the the actual stores to memory that might
- * have modified instructions with the icbi.
- */
-#define PURGE_PREFETCHED_INS   \
-   sync;   \
-   icbi0,r3;   \
-   sync;   \
-   isync
-
-#else
+#if !defined(__ASSEMBLY__)
 #define __read_mostly __attribute__((__section__(".data..read_mostly")))
 
 #ifdef CONFIG_PPC_BOOK3S_32
@@ -145,6 +132,17 @@ static inline void dcbst(void *addr)
 {
__asm__ __volatile__ ("dcbst %y0" : : "Z"(*(u8 *)addr) : "memory");
 }
+
+static inline void icbi(void *addr)
+{
+   __asm__ __volatile__ ("icbi 0, %0" : : "r"(addr) : "memory");
+}
+
+static inline void iccci(void *addr)
+{
+   __asm__ __volatile__ ("iccci 0, %0" : : "r"(addr) : "memory");
+}
+
 #endif /* !__ASSEMBLY__ */
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_CACHE_H */
diff --git a/arch/powerpc/include/asm/cacheflush.h 
b/arch/powerpc/include/asm/cacheflush.h
index ed57843ef452..4a1c9f0200e1 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -42,24 +42,20 @@ extern void flush_dcache_page(struct page *page);
 #define flush_dcache_mmap_lock(mapping)do { } while (0)
 #define flush_dcache_mmap_unlock(mapping)  do { } while (0)
 
-extern void flush_icache_range(unsigned long, unsigned long);
+void flush_icache_range(unsigned long start, unsigned long stop);
 extern void flush_icache_user_range(struct vm_area_struct *vma,
struct page *page, unsigned long addr,
int len);
-extern void __flush_dcache_icache(void *page_va);
 extern void flush_dcache_icache_page(struct page *page);
-#if defined(CONFIG_PPC32) && !defined(CONFIG_BOOKE)
-extern void __flush_dcache_icache_phys(unsigned long physaddr);
-#else
-static inline void __flush_dcache_icache_phys(unsigned long physaddr)
-{
-   BUG();
-}
-#endif
-
-/*
- * Write any modified data cache blocks out to memory and invalidate them.
- * Does not invalidate the corresponding instruction cache blocks.
+void __flush_dcache_icache(void *page);
+
+/**
+ * flush_dcache_range(): Write any modified data cache blocks out to memory and
+ * invalidate them. Does not invalidate the corresponding instruction cache
+ * blocks.
+ *
+ * @start: the start address
+ * @stop: the stop address (exclusive)
  */
 static inline void flush_dcache_range(unsigned long start, unsigned long stop)
 {
diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index fe4bd321730e..12b95e6799d4 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -318,123 +318,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_UNIFIED_ID_CACHE)
 EXPORT_SYMBOL(flush_instruction_cache)
 #endif /* CONFIG_PPC_8xx */
 
-/*
- * Write any modified data cache blocks out to memory
- * and invalidate the corresponding instruction cache blocks.
- * This is a no-op on the 601.
- *
- * flush_icache_range(unsigned long start, unsigned long stop)
- */
-_GLOBAL(flush_icache_range)
-BEGIN_FTR_SECTION
-   PURGE_PREFETCHED_INS
-   blr /* for 601, do nothing */

[PATCH v2 6/6] powerpc: Don't flush caches when adding memory

2019-09-02 Thread Alastair D'Silva
From: Alastair D'Silva 

This operation takes a significant amount of time when hotplugging
large amounts of memory (~50 seconds with 890GB of persistent memory).

This was orignally in commit fb5924fddf9e
("powerpc/mm: Flush cache on memory hot(un)plug") to support memtrace,
but the flush on add is not needed as it is flushed on remove.

Signed-off-by: Alastair D'Silva 
---
 arch/powerpc/mm/mem.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 854aaea2c6ae..2a14b5b93e19 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -111,7 +111,6 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
 {
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
-   u64 i;
int rc;
 
resize_hpt_for_hotplug(memblock_phys_mem_size());
@@ -124,12 +123,6 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
return -EFAULT;
}
 
-   for (i = 0; i < size; i += FLUSH_CHUNK_SIZE) {
-   flush_dcache_range(start + i,
-  min(start + size, start + i + 
FLUSH_CHUNK_SIZE));
-   cond_resched();
-   }
-
return __add_pages(nid, start_pfn, nr_pages, restrictions);
 }
 
-- 
2.21.0



[PATCH v2 5/6] powerpc: Remove 'extern' from func prototypes in cache headers

2019-09-02 Thread Alastair D'Silva
From: Alastair D'Silva 

The 'extern' keyword does not value-add for function prototypes.

Signed-off-by: Alastair D'Silva 
---
 arch/powerpc/include/asm/cache.h  | 8 
 arch/powerpc/include/asm/cacheflush.h | 6 +++---
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h
index 91c808c6738b..54fffdf5a6ec 100644
--- a/arch/powerpc/include/asm/cache.h
+++ b/arch/powerpc/include/asm/cache.h
@@ -102,10 +102,10 @@ static inline u32 l1_icache_bytes(void)
 #define __read_mostly __attribute__((__section__(".data..read_mostly")))
 
 #ifdef CONFIG_PPC_BOOK3S_32
-extern long _get_L2CR(void);
-extern long _get_L3CR(void);
-extern void _set_L2CR(unsigned long);
-extern void _set_L3CR(unsigned long);
+long _get_L2CR(void);
+long _get_L3CR(void);
+void _set_L2CR(unsigned long val);
+void _set_L3CR(unsigned long val);
 #else
 #define _get_L2CR()0L
 #define _get_L3CR()0L
diff --git a/arch/powerpc/include/asm/cacheflush.h 
b/arch/powerpc/include/asm/cacheflush.h
index 4a1c9f0200e1..fa10dc19206c 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -38,15 +38,15 @@ static inline void flush_cache_vmap(unsigned long start, 
unsigned long end) { }
 #endif
 
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
-extern void flush_dcache_page(struct page *page);
+void flush_dcache_page(struct page *page);
 #define flush_dcache_mmap_lock(mapping)do { } while (0)
 #define flush_dcache_mmap_unlock(mapping)  do { } while (0)
 
 void flush_icache_range(unsigned long start, unsigned long stop);
-extern void flush_icache_user_range(struct vm_area_struct *vma,
+void flush_icache_user_range(struct vm_area_struct *vma,
struct page *page, unsigned long addr,
int len);
-extern void flush_dcache_icache_page(struct page *page);
+void flush_dcache_icache_page(struct page *page);
 void __flush_dcache_icache(void *page);
 
 /**
-- 
2.21.0



[PATCH v2 4/6] powerpc: Chunk calls to flush_dcache_range in arch_*_memory

2019-09-02 Thread Alastair D'Silva
From: Alastair D'Silva 

When presented with large amounts of memory being hotplugged
(in my test case, ~890GB), the call to flush_dcache_range takes
a while (~50 seconds), triggering RCU stalls.

This patch breaks up the call into 1GB chunks, calling
cond_resched() inbetween to allow the scheduler to run.

Signed-off-by: Alastair D'Silva 
---
 arch/powerpc/mm/mem.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index cd540123874d..854aaea2c6ae 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -104,11 +104,14 @@ int __weak remove_section_mapping(unsigned long start, 
unsigned long end)
return -ENODEV;
 }
 
+#define FLUSH_CHUNK_SIZE SZ_1G
+
 int __ref arch_add_memory(int nid, u64 start, u64 size,
struct mhp_restrictions *restrictions)
 {
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
+   u64 i;
int rc;
 
resize_hpt_for_hotplug(memblock_phys_mem_size());
@@ -120,7 +123,12 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
start, start + size, rc);
return -EFAULT;
}
-   flush_dcache_range(start, start + size);
+
+   for (i = 0; i < size; i += FLUSH_CHUNK_SIZE) {
+   flush_dcache_range(start + i,
+  min(start + size, start + i + 
FLUSH_CHUNK_SIZE));
+   cond_resched();
+   }
 
return __add_pages(nid, start_pfn, nr_pages, restrictions);
 }
@@ -131,13 +139,19 @@ void __ref arch_remove_memory(int nid, u64 start, u64 
size,
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
struct page *page = pfn_to_page(start_pfn) + vmem_altmap_offset(altmap);
+   u64 i;
int ret;
 
__remove_pages(page_zone(page), start_pfn, nr_pages, altmap);
 
/* Remove htab bolted mappings for this section of memory */
start = (unsigned long)__va(start);
-   flush_dcache_range(start, start + size);
+   for (i = 0; i < size; i += FLUSH_CHUNK_SIZE) {
+   flush_dcache_range(start + i,
+  min(start + size, start + i + 
FLUSH_CHUNK_SIZE));
+   cond_resched();
+   }
+
ret = remove_section_mapping(start, start + size);
WARN_ON_ONCE(ret);
 
-- 
2.21.0



[PATCH v2 2/6] powerpc: define helpers to get L1 icache sizes

2019-09-02 Thread Alastair D'Silva
From: Alastair D'Silva 

This patch adds helpers to retrieve icache sizes, and renames the existing
helpers to make it clear that they are for dcache.

Signed-off-by: Alastair D'Silva 
---
 arch/powerpc/include/asm/cache.h  | 29 +++
 arch/powerpc/include/asm/cacheflush.h | 12 +--
 2 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h
index b3388d95f451..f852d5cd746c 100644
--- a/arch/powerpc/include/asm/cache.h
+++ b/arch/powerpc/include/asm/cache.h
@@ -55,25 +55,46 @@ struct ppc64_caches {
 
 extern struct ppc64_caches ppc64_caches;
 
-static inline u32 l1_cache_shift(void)
+static inline u32 l1_dcache_shift(void)
 {
return ppc64_caches.l1d.log_block_size;
 }
 
-static inline u32 l1_cache_bytes(void)
+static inline u32 l1_dcache_bytes(void)
 {
return ppc64_caches.l1d.block_size;
 }
+
+static inline u32 l1_icache_shift(void)
+{
+   return ppc64_caches.l1i.log_block_size;
+}
+
+static inline u32 l1_icache_bytes(void)
+{
+   return ppc64_caches.l1i.block_size;
+}
 #else
-static inline u32 l1_cache_shift(void)
+static inline u32 l1_dcache_shift(void)
 {
return L1_CACHE_SHIFT;
 }
 
-static inline u32 l1_cache_bytes(void)
+static inline u32 l1_dcache_bytes(void)
 {
return L1_CACHE_BYTES;
 }
+
+static inline u32 l1_icache_shift(void)
+{
+   return L1_CACHE_SHIFT;
+}
+
+static inline u32 l1_icache_bytes(void)
+{
+   return L1_CACHE_BYTES;
+}
+
 #endif
 #endif /* ! __ASSEMBLY__ */
 
diff --git a/arch/powerpc/include/asm/cacheflush.h 
b/arch/powerpc/include/asm/cacheflush.h
index eef388f2659f..ed57843ef452 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -63,8 +63,8 @@ static inline void __flush_dcache_icache_phys(unsigned long 
physaddr)
  */
 static inline void flush_dcache_range(unsigned long start, unsigned long stop)
 {
-   unsigned long shift = l1_cache_shift();
-   unsigned long bytes = l1_cache_bytes();
+   unsigned long shift = l1_dcache_shift();
+   unsigned long bytes = l1_dcache_bytes();
void *addr = (void *)(start & ~(bytes - 1));
unsigned long size = stop - (unsigned long)addr + (bytes - 1);
unsigned long i;
@@ -89,8 +89,8 @@ static inline void flush_dcache_range(unsigned long start, 
unsigned long stop)
  */
 static inline void clean_dcache_range(unsigned long start, unsigned long stop)
 {
-   unsigned long shift = l1_cache_shift();
-   unsigned long bytes = l1_cache_bytes();
+   unsigned long shift = l1_dcache_shift();
+   unsigned long bytes = l1_dcache_bytes();
void *addr = (void *)(start & ~(bytes - 1));
unsigned long size = stop - (unsigned long)addr + (bytes - 1);
unsigned long i;
@@ -108,8 +108,8 @@ static inline void clean_dcache_range(unsigned long start, 
unsigned long stop)
 static inline void invalidate_dcache_range(unsigned long start,
   unsigned long stop)
 {
-   unsigned long shift = l1_cache_shift();
-   unsigned long bytes = l1_cache_bytes();
+   unsigned long shift = l1_dcache_shift();
+   unsigned long bytes = l1_dcache_bytes();
void *addr = (void *)(start & ~(bytes - 1));
unsigned long size = stop - (unsigned long)addr + (bytes - 1);
unsigned long i;
-- 
2.21.0



[PATCH v2 0/6] powerpc: convert cache asm to C

2019-09-02 Thread Alastair D'Silva
From: Alastair D'Silva 

This series addresses a few issues discovered in how we flush caches:
1. Flushes were truncated at 4GB, so larger flushes were incorrect.
2. Flushing the dcache in arch_add_memory was unnecessary

This series also converts much of the cache assembler to C, with the
aim of making it easier to maintain.

Alastair D'Silva (6):
  powerpc: Allow flush_icache_range to work across ranges >4GB
  powerpc: define helpers to get L1 icache sizes
  powerpc: Convert flush_icache_range & friends to C
  powerpc: Chunk calls to flush_dcache_range in arch_*_memory
  powerpc: Remove 'extern' from func prototypes in cache headers
  powerpc: Don't flush caches when adding memory

Changelog:
 V2:
 - Replace C implementation of flush_dcache_icache_phys() with
   inline assembler authored by Christophe Leroy
 - Add memory clobbers for iccci implementation
 - Give __flush_dcache_icache a real implementation, it can't
   just be a wrapper around flush_icache_range()
 - Remove PPC64_CACHES from misc_64.S
 - Replace code duplicating clean_dcache_range() in
   flush_icache_range() with a call to clean_dcache_range()
 - Replace #ifdef CONFIG_44x with IS_ENABLED(...) in
   flush_icache_cange()
 - Use 1GB chunks instead of 16GB in arch_*_memory


 arch/powerpc/include/asm/cache.h  |  63 ++
 arch/powerpc/include/asm/cacheflush.h |  37 +++---
 arch/powerpc/kernel/misc_32.S | 117 ---
 arch/powerpc/kernel/misc_64.S | 102 -
 arch/powerpc/mm/mem.c | 159 +-
 5 files changed, 213 insertions(+), 265 deletions(-)

-- 
2.21.0



[PATCH v2 1/6] powerpc: Allow flush_icache_range to work across ranges >4GB

2019-09-02 Thread Alastair D'Silva
From: Alastair D'Silva 

When calling flush_icache_range with a size >4GB, we were masking
off the upper 32 bits, so we would incorrectly flush a range smaller
than intended.

This patch replaces the 32 bit shifts with 64 bit ones, so that
the full size is accounted for.

Signed-off-by: Alastair D'Silva 
Cc: sta...@vger.kernel.org
---
 arch/powerpc/kernel/misc_64.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
index b55a7b4cb543..9bc0aa9aeb65 100644
--- a/arch/powerpc/kernel/misc_64.S
+++ b/arch/powerpc/kernel/misc_64.S
@@ -82,7 +82,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
subfr8,r6,r4/* compute length */
add r8,r8,r5/* ensure we get enough */
lwz r9,DCACHEL1LOGBLOCKSIZE(r10)/* Get log-2 of cache block 
size */
-   srw.r8,r8,r9/* compute line count */
+   srd.r8,r8,r9/* compute line count */
beqlr   /* nothing to do? */
mtctr   r8
 1: dcbst   0,r6
@@ -98,7 +98,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
subfr8,r6,r4/* compute length */
add r8,r8,r5
lwz r9,ICACHEL1LOGBLOCKSIZE(r10)/* Get log-2 of Icache block 
size */
-   srw.r8,r8,r9/* compute line count */
+   srd.r8,r8,r9/* compute line count */
beqlr   /* nothing to do? */
mtctr   r8
 2: icbi0,r6
-- 
2.21.0



RE: [PATCH] powerpc: Perform a bounds check in arch_add_memory

2019-09-01 Thread Alastair D'Silva
On Tue, 2019-08-27 at 09:13 +0200, David Hildenbrand wrote:
> On 27.08.19 08:39, Alastair D'Silva wrote:
> > On Tue, 2019-08-27 at 08:28 +0200, Michal Hocko wrote:
> > > On Tue 27-08-19 15:20:46, Alastair D'Silva wrote:
> > > > From: Alastair D'Silva 
> > > > 
> > > > It is possible for firmware to allocate memory ranges outside
> > > > the range of physical memory that we support
> > > > (MAX_PHYSMEM_BITS).
> > > 
> > > Doesn't that count as a FW bug? Do you have any evidence of that
> > > in
> > > the
> > > field? Just wondering...
> > > 
> > 
> > Not outside our lab, but OpenCAPI attached LPC memory is assigned
> > addresses based on the slot/NPU it is connected to. These addresses
> > prior to:
> > 4ffe713b7587 ("powerpc/mm: Increase the max addressable memory to
> > 2PB")
> > were inaccessible and resulted in bogus sections - see our
> > discussion
> > on 'mm: Trigger bug on if a section is not found in __section_nr'.
> > Doing this check here was your suggestion :)
> > 
> > It's entirely possible that a similar problem will occur in the
> > future,
> > and it's cheap to guard against, which is why I've added this.
> > 
> 
> If you keep it here, I guess this should be wrapped by a
> WARN_ON_ONCE().
> 
> If we move it to common code (e.g., __add_pages() or add_memory()),
> then
> probably not. I can see that s390x allows to configure
> MAX_PHYSMEM_BITS,
> so the check could actually make sense.
> 

I couldn't see a nice platform indepedent way to determine the
allowable address range, but if there is, then I'll move this to the
generic code instead.

-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



[PATCH] mm: Remove NULL check in clear_hwpoisoned_pages()

2019-08-28 Thread Alastair D'Silva
There is no possibility for memmap to be NULL in the current
codebase.

This check was added in commit 95a4774d055c ("memory-hotplug:
update mce_bad_pages when removing the memory")
where memmap was originally inited to NULL, and only conditionally
given a value.

The code that could have passed a NULL has been removed, so there
is no longer a possibility that memmap can be NULL.

Signed-off-by: Alastair D'Silva 
---
 mm/sparse.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index 78979c142b7d..9f7e3682cdcb 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -754,9 +754,6 @@ static void clear_hwpoisoned_pages(struct page *memmap, int 
nr_pages)
 {
int i;
 
-   if (!memmap)
-   return;
-
/*
 * A further optimization is to have per section refcounted
 * num_poisoned_pages.  But that would need more space per memmap, so
-- 
2.21.0



Re: [PATCH 2/2] mm: don't hide potentially null memmap pointer in sparse_remove_section

2019-08-27 Thread Alastair D'Silva
On Tue, 2019-08-27 at 08:24 +0200, Michal Hocko wrote:
> On Tue 27-08-19 15:36:55, Alastair D'Silva wrote:
> > From: Alastair D'Silva 
> > 
> > By adding offset to memmap before passing it in to
> > clear_hwpoisoned_pages,
> > we hide a theoretically null memmap from the null check inside
> > clear_hwpoisoned_pages.
> 
> Isn't that other way around? Calculating the offset struct page
> pointer
> will actually make the null check effective. Besides that I cannot
> really see how pfn_to_page would return NULL. I have to confess that
> I
> cannot really see how offset could lead to a NULL struct page either
> and
> I strongly suspect that the NULL check is not really needed. Maybe it
> used to be in the past.
> 

You're probably right, but I didn't feel confident in removing the NULL
check. 

While the NULL check remains though, I can't see how adding the offset
would turn a non-NULL pointer into a NULL unless the pointer is invalid
in the first place, and if this is the case, we should have a comment
explaining this.

The NULL check was added in commit:
95a4774d055c ("memory-hotplug: update mce_bad_pages when removing the
memory")
where memmap was originally inited to NULL, and only conditionally
given a value.

With this in mind, since that situation is no longer true, I think we
could instead drop the NULL check.

-- 
Alastair D'Silva   mob: 0423 762 819
skype: alastair_dsilva
Twitter: @EvilDeece
blog: http://alastair.d-silva.org




Re: [PATCH] powerpc: Perform a bounds check in arch_add_memory

2019-08-27 Thread Alastair D'Silva
On Tue, 2019-08-27 at 08:28 +0200, Michal Hocko wrote:
> On Tue 27-08-19 15:20:46, Alastair D'Silva wrote:
> > From: Alastair D'Silva 
> > 
> > It is possible for firmware to allocate memory ranges outside
> > the range of physical memory that we support (MAX_PHYSMEM_BITS).
> 
> Doesn't that count as a FW bug? Do you have any evidence of that in
> the
> field? Just wondering...
> 

Not outside our lab, but OpenCAPI attached LPC memory is assigned
addresses based on the slot/NPU it is connected to. These addresses
prior to:
4ffe713b7587 ("powerpc/mm: Increase the max addressable memory to 2PB")
were inaccessible and resulted in bogus sections - see our discussion
on 'mm: Trigger bug on if a section is not found in __section_nr'.
Doing this check here was your suggestion :)

It's entirely possible that a similar problem will occur in the future,
and it's cheap to guard against, which is why I've added this.

-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



[PATCH 1/2] mm: Don't manually decrement num_poisoned_pages

2019-08-26 Thread Alastair D'Silva
From: Alastair D'Silva 

Use the function written to do it instead.

Signed-off-by: Alastair D'Silva 
---
 mm/sparse.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index 72f010d9bff5..e41917a7e844 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -11,6 +11,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "internal.h"
 #include 
@@ -898,7 +900,7 @@ static void clear_hwpoisoned_pages(struct page *memmap, int 
nr_pages)
 
for (i = 0; i < nr_pages; i++) {
if (PageHWPoison([i])) {
-   atomic_long_sub(1, _poisoned_pages);
+   num_poisoned_pages_dec();
ClearPageHWPoison([i]);
}
}
-- 
2.21.0



[PATCH 2/2] mm: don't hide potentially null memmap pointer in sparse_remove_section

2019-08-26 Thread Alastair D'Silva
From: Alastair D'Silva 

By adding offset to memmap before passing it in to clear_hwpoisoned_pages,
we hide a theoretically null memmap from the null check inside
clear_hwpoisoned_pages.

This patch passes the offset to clear_hwpoisoned_pages instead, allowing
memmap to successfully perform it's null check.

Signed-off-by: Alastair D'Silva 
---
 mm/sparse.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index e41917a7e844..3ff84e627e58 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -882,7 +882,7 @@ int __meminit sparse_add_section(int nid, unsigned long 
start_pfn,
 }
 
 #ifdef CONFIG_MEMORY_FAILURE
-static void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
+static void clear_hwpoisoned_pages(struct page *memmap, int start, int count)
 {
int i;
 
@@ -898,7 +898,7 @@ static void clear_hwpoisoned_pages(struct page *memmap, int 
nr_pages)
if (atomic_long_read(_poisoned_pages) == 0)
return;
 
-   for (i = 0; i < nr_pages; i++) {
+   for (i = start; i < start + count; i++) {
if (PageHWPoison([i])) {
num_poisoned_pages_dec();
ClearPageHWPoison([i]);
@@ -906,7 +906,8 @@ static void clear_hwpoisoned_pages(struct page *memmap, int 
nr_pages)
}
 }
 #else
-static inline void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
+static inline void clear_hwpoisoned_pages(struct page *memmap, int start,
+   int count)
 {
 }
 #endif
@@ -915,7 +916,7 @@ void sparse_remove_section(struct mem_section *ms, unsigned 
long pfn,
unsigned long nr_pages, unsigned long map_offset,
struct vmem_altmap *altmap)
 {
-   clear_hwpoisoned_pages(pfn_to_page(pfn) + map_offset,
+   clear_hwpoisoned_pages(pfn_to_page(pfn), map_offset,
nr_pages - map_offset);
section_deactivate(pfn, nr_pages, altmap);
 }
-- 
2.21.0



[PATCH 0/2] mm: Minor cleanup

2019-08-26 Thread Alastair D'Silva
From: Alastair D'Silva 

This series addresses some minor issues & obsoletes:
mm: Cleanup & allow modules to hotplug memory

Alastair D'Silva (2):
  mm: Don't manually decrement num_poisoned_pages
  mm: don't hide potentially null memmap pointer in
sparse_remove_section

 mm/sparse.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

-- 
2.21.0



[PATCH] powerpc: Perform a bounds check in arch_add_memory

2019-08-26 Thread Alastair D'Silva
From: Alastair D'Silva 

It is possible for firmware to allocate memory ranges outside
the range of physical memory that we support (MAX_PHYSMEM_BITS).

This patch adds a bounds check to ensure that any hotplugged
memory is addressable.

Signed-off-by: Alastair D'Silva 
---
 arch/powerpc/mm/mem.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 9191a66b3bc5..de18fb73de30 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -111,6 +111,9 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
unsigned long nr_pages = size >> PAGE_SHIFT;
int rc;
 
+   if ((start + size - 1) >> MAX_PHYSMEM_BITS)
+   return -EINVAL;
+
resize_hpt_for_hotplug(memblock_phys_mem_size());
 
start = (unsigned long)__va(start);
-- 
2.21.0



RE: [RFC PATCH] powerpc: Convert ____flush_dcache_icache_phys() to C

2019-08-21 Thread Alastair D'Silva
On Thu, 2019-08-22 at 07:06 +0200, Christophe Leroy wrote:
> 
> Le 22/08/2019 à 02:27, Alastair D'Silva a écrit :
> > On Wed, 2019-08-21 at 22:27 +0200, Christophe Leroy wrote:
> > > Le 20/08/2019 à 06:36, Alastair D'Silva a écrit :
> > > > On Fri, 2019-08-16 at 15:52 +, Christophe Leroy wrote:
> > > 
> > > [...]
> > > 
> > > > Thanks Christophe,
> > > > 
> > > > I'm trying a somewhat different approach that requires less
> > > > knowledge
> > > > of assembler. Handling of CPU_FTR_COHERENT_ICACHE is outside
> > > > this
> > > > function. The code below is not a patch as my tree is a bit
> > > > messy,
> > > > sorry:
> > > 
> > > Can we be 100% sure that GCC won't add any code accessing some
> > > global
> > > data or stack while the Data MMU is OFF ?
> > > 
> > > Christophe
> > > 
> > 
> > +mpe
> > 
> > I'm not sure how we would go about making such a guarantee, but
> > I've
> > tied every variable used to a register and addr is passed in a
> > register, so there is no stack usage, and every call in there only
> > operates on it's operands.
> > 
> > The calls to the inline cache helpers (for the PPC32 case) are all
> > constants, so I can't see a reasonable scenario where there would
> > be a
> > function call and reordered to after the DR bit is turned off, but
> > I
> > guess if we want to be paranoid, we could always add an mb() call
> > before the DR bit is manipulated to prevent the compiler from
> > reordering across the section where the data MMU is disabled.
> > 
> > 
> 
> Anyway, I think the benefit of converting that function to C is
> pretty 
> small. flush_dcache_range() and friends were converted to C mainly
> in 
> order to inline them. But this __flush_dcache_icache_phys() is too
> big 
> to be worth inlining, yet small and stable enough to remain in
> assembly 
> for the time being.
> 
I disagree on this point, after converting it to C, using
44x/currituck.defconfig, the compiler definitely will inline it (noting
that there is only 1 caller of it):

0134 :
 134:   94 21 ff f0 stwur1,-16(r1)
 138:   3d 20 00 00 lis r9,0
 13c:   81 29 00 00 lwz r9,0(r9)
 140:   7c 08 02 a6 mflrr0
 144:   38 81 00 0c addir4,r1,12
 148:   90 01 00 14 stw r0,20(r1)
 14c:   91 21 00 0c stw r9,12(r1)
 150:   48 00 00 01 bl  150 
 154:   39 00 00 20 li  r8,32
 158:   39 43 10 00 addir10,r3,4096
 15c:   7c 69 1b 78 mr  r9,r3
 160:   7d 09 03 a6 mtctr   r8
 164:   7c 00 48 6c dcbst   0,r9
 168:   39 29 00 80 addir9,r9,128
 16c:   42 00 ff f8 bdnz164 
 170:   7c 00 04 ac hwsync
 174:   7c 69 1b 78 mr  r9,r3
 178:   7c 00 4f ac icbi0,r9
 17c:   39 29 00 80 addir9,r9,128
 180:   7f 8a 48 40 cmplw   cr7,r10,r9
 184:   40 9e ff f4 bne cr7,178 
 188:   7c 00 04 ac hwsync
 18c:   4c 00 01 2c isync
 190:   80 01 00 14 lwz r0,20(r1)
 194:   38 21 00 10 addir1,r1,16
 198:   7c 08 03 a6 mtlrr0
 19c:   48 00 00 00 b   19c 


> So I suggest you keep it aside your series for now, just move 
> PURGE_PREFETCHED_INS inside it directly as it will be the only
> remaining 
> user of it.
> 
> Christophe

-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



RE: [RFC PATCH] powerpc: Convert ____flush_dcache_icache_phys() to C

2019-08-21 Thread Alastair D'Silva
On Wed, 2019-08-21 at 22:27 +0200, Christophe Leroy wrote:
> 
> Le 20/08/2019 à 06:36, Alastair D'Silva a écrit :
> > On Fri, 2019-08-16 at 15:52 +, Christophe Leroy wrote:
> 
> [...]
> 
> > 
> > Thanks Christophe,
> > 
> > I'm trying a somewhat different approach that requires less
> > knowledge
> > of assembler. Handling of CPU_FTR_COHERENT_ICACHE is outside this
> > function. The code below is not a patch as my tree is a bit messy,
> > sorry:
> 
> Can we be 100% sure that GCC won't add any code accessing some
> global 
> data or stack while the Data MMU is OFF ?
> 
> Christophe
> 

+mpe

I'm not sure how we would go about making such a guarantee, but I've
tied every variable used to a register and addr is passed in a
register, so there is no stack usage, and every call in there only
operates on it's operands.

The calls to the inline cache helpers (for the PPC32 case) are all
constants, so I can't see a reasonable scenario where there would be a
function call and reordered to after the DR bit is turned off, but I
guess if we want to be paranoid, we could always add an mb() call
before the DR bit is manipulated to prevent the compiler from
reordering across the section where the data MMU is disabled.


> 
> > /**
> >   * flush_dcache_icache_phys() - Flush a page by it's physical
> > address
> >   * @addr: the physical address of the page
> >   */
> > static void flush_dcache_icache_phys(unsigned long addr)
> > {
> > register unsigned long msr;
> > register unsigned long dlines = PAGE_SIZE >> l1_dcache_shift();
> > register unsigned long dbytes = l1_dcache_bytes();
> > register unsigned long ilines = PAGE_SIZE >> l1_icache_shift();
> > register unsigned long ibytes = l1_icache_bytes();
> > register unsigned long i;
> > register unsigned long address = addr;
> > 
> > /*
> >  * Clear the DR bit so that we operate on physical
> >  * rather than virtual addresses
> >  */
> > msr = mfmsr();
> > mtmsr(msr & ~(MSR_DR));
> > 
> > /* Write out the data cache */
> > for (i = 0; i < dlines; i++, address += dbytes)
> > dcbst((void *)address);
> > 
> > /* Invalidate the instruction cache */
> > address = addr;
> > for (i = 0; i < ilines; i++, address += ibytes)
> > icbi((void *)address);
> > 
> > mtmsr(msr);
> > }
> > 
> > void test_flush_phys(unsigned long addr)
> > {
> > flush_dcache_icache_phys(addr);
> > }
> > 
> > 
> > This gives the following assembler (using pmac32_defconfig):
> > 03cc :
> >   3cc:   94 21 ff f0 stwur1,-16(r1)
> >   3d0:   7d 00 00 a6 mfmsr   r8
> >   3d4:   55 09 07 34 rlwinm  r9,r8,0,28,26
> >   3d8:   7d 20 01 24 mtmsr   r9
> >   3dc:   39 20 00 80 li  r9,128
> >   3e0:   7d 29 03 a6 mtctr   r9
> >   3e4:   39 43 10 00 addir10,r3,4096
> >   3e8:   7c 69 1b 78 mr  r9,r3
> >   3ec:   7c 00 48 6c dcbst   0,r9
> >   3f0:   39 29 00 20 addir9,r9,32
> >   3f4:   42 00 ff f8 bdnz3ec 
> >   3f8:   7c 00 1f ac icbi0,r3
> >   3fc:   38 63 00 20 addir3,r3,32
> >   400:   7f 8a 18 40 cmplw   cr7,r10,r3
> >   404:   40 9e ff f4 bne cr7,3f8 
> >   408:   7d 00 01 24 mtmsr   r8
> >   40c:   38 21 00 10 addir1,r1,16
> >   410:   4e 80 00 20 blr
> > 
> > 
-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



[PATCH v2] powerpc: Allow flush_(inval_)dcache_range to work across ranges >4GB

2019-08-20 Thread Alastair D'Silva
From: Alastair D'Silva 

The upstream commit:
22e9c88d486a ("powerpc/64: reuse PPC32 static inline flush_dcache_range()")
has a similar effect, but since it is a rewrite of the assembler to C, is
too invasive for stable. This patch is a minimal fix to address the issue in
assembler.

This patch applies cleanly to v5.2, v4.19 & v4.14.

When calling flush_(inval_)dcache_range with a size >4GB, we were masking
off the upper 32 bits, so we would incorrectly flush a range smaller
than intended.

This patch replaces the 32 bit shifts with 64 bit ones, so that
the full size is accounted for.

Changelog:
v2
  - Add related upstream commit

Signed-off-by: Alastair D'Silva 
---
 arch/powerpc/kernel/misc_64.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
index 1ad4089dd110..d4d096f80f4b 100644
--- a/arch/powerpc/kernel/misc_64.S
+++ b/arch/powerpc/kernel/misc_64.S
@@ -130,7 +130,7 @@ _GLOBAL_TOC(flush_dcache_range)
subfr8,r6,r4/* compute length */
add r8,r8,r5/* ensure we get enough */
lwz r9,DCACHEL1LOGBLOCKSIZE(r10)/* Get log-2 of dcache block 
size */
-   srw.r8,r8,r9/* compute line count */
+   srd.r8,r8,r9/* compute line count */
beqlr   /* nothing to do? */
mtctr   r8
 0: dcbst   0,r6
@@ -148,7 +148,7 @@ _GLOBAL(flush_inval_dcache_range)
subfr8,r6,r4/* compute length */
add r8,r8,r5/* ensure we get enough */
lwz r9,DCACHEL1LOGBLOCKSIZE(r10)/* Get log-2 of dcache block size */
-   srw.r8,r8,r9/* compute line count */
+   srd.r8,r8,r9/* compute line count */
beqlr   /* nothing to do? */
sync
isync
-- 
2.21.0



Re: [RFC PATCH] powerpc: Convert ____flush_dcache_icache_phys() to C

2019-08-19 Thread Alastair D'Silva
);
> -
> - /* Clear the DR bit so that we operate on physical
> -  * rather than virtual addresses
> -  */
> - mtmsr(msr & ~(MSR_DR));
> -
> - __flush_dcache_icache((void *)physaddr);
> -
> - mtmsr(msr);
> + __flush_dcache_icache_phys(page_to_pfn(page) <<
> PAGE_SHIFT);
>   }
>  #endif
>  }


Thanks Christophe,

I'm trying a somewhat different approach that requires less knowledge
of assembler. Handling of CPU_FTR_COHERENT_ICACHE is outside this
function. The code below is not a patch as my tree is a bit messy,
sorry:

/**
 * flush_dcache_icache_phys() - Flush a page by it's physical address
 * @addr: the physical address of the page
 */
static void flush_dcache_icache_phys(unsigned long addr)
{
register unsigned long msr;
register unsigned long dlines = PAGE_SIZE >> l1_dcache_shift();
register unsigned long dbytes = l1_dcache_bytes();
register unsigned long ilines = PAGE_SIZE >> l1_icache_shift();
register unsigned long ibytes = l1_icache_bytes();
register unsigned long i;
register unsigned long address = addr;

/*
 * Clear the DR bit so that we operate on physical
 * rather than virtual addresses
 */
msr = mfmsr();
mtmsr(msr & ~(MSR_DR));

/* Write out the data cache */
for (i = 0; i < dlines; i++, address += dbytes)
dcbst((void *)address);

/* Invalidate the instruction cache */
address = addr;
for (i = 0; i < ilines; i++, address += ibytes)
icbi((void *)address);

mtmsr(msr);
}

void test_flush_phys(unsigned long addr)
{
flush_dcache_icache_phys(addr);
}


This gives the following assembler (using pmac32_defconfig):
000003cc :
 3cc:   94 21 ff f0 stwur1,-16(r1)
 3d0:   7d 00 00 a6 mfmsr   r8
 3d4:   55 09 07 34 rlwinm  r9,r8,0,28,26
 3d8:   7d 20 01 24 mtmsr   r9
 3dc:   39 20 00 80 li  r9,128
 3e0:   7d 29 03 a6 mtctr   r9
 3e4:   39 43 10 00 addir10,r3,4096
 3e8:   7c 69 1b 78 mr  r9,r3
 3ec:   7c 00 48 6c dcbst   0,r9
 3f0:   39 29 00 20 addir9,r9,32
 3f4:   42 00 ff f8 bdnz3ec 
 3f8:   7c 00 1f ac icbi0,r3
 3fc:   38 63 00 20 addir3,r3,32
 400:   7f 8a 18 40 cmplw   cr7,r10,r3
 404:   40 9e ff f4 bne cr7,3f8 
 408:   7d 00 01 24 mtmsr   r8
 40c:   38 21 00 10 addir1,r1,16
 410:   4e 80 00 20 blr


-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



RE: [PATCH 4/6] powerpc: Chunk calls to flush_dcache_range in arch_*_memory

2019-08-15 Thread Alastair D'Silva
On Thu, 2019-08-15 at 09:36 +0200, christophe leroy wrote:
> 
> Le 15/08/2019 à 06:10, Alastair D'Silva a écrit :
> > From: Alastair D'Silva 
> > 
> > When presented with large amounts of memory being hotplugged
> > (in my test case, ~890GB), the call to flush_dcache_range takes
> > a while (~50 seconds), triggering RCU stalls.
> > 
> > This patch breaks up the call into 16GB chunks, calling
> > cond_resched() inbetween to allow the scheduler to run.
> 
> Is 16GB small enough ? If 890GB takes 50s, 16GB still takes about 1s.
> I'd use 1GB chuncks to remain below 100ms.
> 
> > Signed-off-by: Alastair D'Silva 
> > ---
> >   arch/powerpc/mm/mem.c | 16 ++--
> >   1 file changed, 14 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> > index 5400da87a804..fb0d5e9aa11b 100644
> > --- a/arch/powerpc/mm/mem.c
> > +++ b/arch/powerpc/mm/mem.c
> > @@ -104,11 +104,14 @@ int __weak remove_section_mapping(unsigned
> > long start, unsigned long end)
> > return -ENODEV;
> >   }
> >   
> > +#define FLUSH_CHUNK_SIZE (16ull * 1024ull * 1024ull * 1024ull)
> 
> Can we use SZ_16GB ?
Sure, I'll go with 1GB as you recommended above

> > +
> >   int __ref arch_add_memory(int nid, u64 start, u64 size,
> > struct mhp_restrictions *restrictions)
> >   {
> > unsigned long start_pfn = start >> PAGE_SHIFT;
> > unsigned long nr_pages = size >> PAGE_SHIFT;
> > +   unsigned long i;
> > int rc;
> >   
> > resize_hpt_for_hotplug(memblock_phys_mem_size());
> > @@ -120,7 +123,11 @@ int __ref arch_add_memory(int nid, u64 start,
> > u64 size,
> > start, start + size, rc);
> > return -EFAULT;
> > }
> > -   flush_dcache_range(start, start + size);
> > +
> > +   for (i = 0; i < size; i += FLUSH_CHUNK_SIZE) {
> > +   flush_dcache_range(start + i, min(start + size, start +
> > i + FLUSH_CHUNK_SIZE));
> 
> Isn't the line a bit long (I have not checked).
> 
> > +   cond_resched();
> > +   }
> >   
> > return __add_pages(nid, start_pfn, nr_pages, restrictions);
> >   }
> > @@ -131,13 +138,18 @@ void __ref arch_remove_memory(int nid, u64
> > start, u64 size,
> > unsigned long start_pfn = start >> PAGE_SHIFT;
> > unsigned long nr_pages = size >> PAGE_SHIFT;
> > struct page *page = pfn_to_page(start_pfn) +
> > vmem_altmap_offset(altmap);
> > +   unsigned long i;
> > int ret;
> >   
> > __remove_pages(page_zone(page), start_pfn, nr_pages, altmap);
> >   
> > /* Remove htab bolted mappings for this section of memory */
> > start = (unsigned long)__va(start);
> > -   flush_dcache_range(start, start + size);
> > +   for (i = 0; i < size; i += FLUSH_CHUNK_SIZE) {
> > +   flush_dcache_range(start + i, min(start + size, start +
> > i + FLUSH_CHUNK_SIZE));
> > +   cond_resched();
> > +   }
> > +
> > ret = remove_section_mapping(start, start + size);
> > WARN_ON_ONCE(ret);
> >   
> > 
> 
> Christophe
> 
> ---
> L'absence de virus dans ce courrier électronique a été vérifiée par
> le logiciel antivirus Avast.
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.avast.com_antivirus=DwIDaQ=jf_iaSHvJObTbx-siA1ZOg=cT4tgeEQ0Ll3SIlZDHE5AEXyKy6uKADMtf9_Eb7-vec=TBT2NNM2DXqDWHhSb_WdFPcfAjYk9hP2cvGksF001cQ=XURKAOQQ4h3_RhJlezSguD2kpSitAF-uBhQqVZLU4GU=
>  
> 
-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



Re: [PATCH 3/6] powerpc: Convert flush_icache_range & friends to C

2019-08-15 Thread Alastair D'Silva
On Thu, 2019-08-15 at 09:29 +0200, christophe leroy wrote:
> 
> Le 15/08/2019 à 06:10, Alastair D'Silva a écrit :
> > From: Alastair D'Silva 
> > 
> > Similar to commit 22e9c88d486a
> > ("powerpc/64: reuse PPC32 static inline flush_dcache_range()")
> > this patch converts flush_icache_range() to C, and reimplements the
> > following functions as wrappers around it:
> > __flush_dcache_icache
> > __flush_dcache_icache_phys
> 
> Not sure you can do that for __flush_dcache_icache_phys(), see
> detailed 
> comments below
> 
> > This was done as we discovered a long-standing bug where the length
> > of the
> > range was truncated due to using a 32 bit shift instead of a 64 bit
> > one.
> > 
> > By converting these functions to C, it becomes easier to maintain.
> > 
> > Signed-off-by: Alastair D'Silva 
> > ---
> >   arch/powerpc/include/asm/cache.h  |  26 +++---
> >   arch/powerpc/include/asm/cacheflush.h |  32 ---
> >   arch/powerpc/kernel/misc_32.S | 117 ---
> > ---
> >   arch/powerpc/kernel/misc_64.S |  97 -
> >   arch/powerpc/mm/mem.c |  71 +++-
> >   5 files changed, 102 insertions(+), 241 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/cache.h
> > b/arch/powerpc/include/asm/cache.h
> > index f852d5cd746c..728f154204db 100644
> > --- a/arch/powerpc/include/asm/cache.h
> > +++ b/arch/powerpc/include/asm/cache.h
> > @@ -98,20 +98,7 @@ static inline u32 l1_icache_bytes(void)
> >   #endif
> >   #endif /* ! __ASSEMBLY__ */
> >   
> > -#if defined(__ASSEMBLY__)
> > -/*
> > - * For a snooping icache, we still need a dummy icbi to purge all
> > the
> > - * prefetched instructions from the ifetch buffers. We also need a
> > sync
> > - * before the icbi to order the the actual stores to memory that
> > might
> > - * have modified instructions with the icbi.
> > - */
> > -#define PURGE_PREFETCHED_INS   \
> > -   sync;   \
> > -   icbi0,r3;   \
> > -   sync;   \
> > -   isync
> 
> Is this still used anywhere now ?
No, this patch removes all users of it.

> > -
> > -#else
> > +#if !defined(__ASSEMBLY__)
> >   #define __read_mostly
> > __attribute__((__section__(".data..read_mostly")))
> >   
> >   #ifdef CONFIG_PPC_BOOK3S_32
> > @@ -145,6 +132,17 @@ static inline void dcbst(void *addr)
> >   {
> > __asm__ __volatile__ ("dcbst %y0" : : "Z"(*(u8 *)addr) :
> > "memory");
> >   }
> > +
> > +static inline void icbi(void *addr)
> > +{
> > +   __asm__ __volatile__ ("icbi 0, %0" : : "r"(addr) : "memory");
> > +}
> > +
> > +static inline void iccci(void)
> > +{
> > +   __asm__ __volatile__ ("iccci 0, r0");
> 
> I think you need the "memory" clobber too here.
> 
Thanks, I'll add that.

> > +}
> > +
> >   #endif /* !__ASSEMBLY__ */
> >   #endif /* __KERNEL__ */
> >   #endif /* _ASM_POWERPC_CACHE_H */
> > diff --git a/arch/powerpc/include/asm/cacheflush.h
> > b/arch/powerpc/include/asm/cacheflush.h
> > index ed57843ef452..4c3377aff8ed 100644
> > --- a/arch/powerpc/include/asm/cacheflush.h
> > +++ b/arch/powerpc/include/asm/cacheflush.h
> > @@ -42,24 +42,18 @@ extern void flush_dcache_page(struct page
> > *page);
> >   #define flush_dcache_mmap_lock(mapping)   do { } while
> > (0)
> >   #define flush_dcache_mmap_unlock(mapping) do { } while (0)
> >   
> > -extern void flush_icache_range(unsigned long, unsigned long);
> > +void flush_icache_range(unsigned long start, unsigned long stop);
> >   extern void flush_icache_user_range(struct vm_area_struct *vma,
> > struct page *page, unsigned long
> > addr,
> > int len);
> > -extern void __flush_dcache_icache(void *page_va);
> >   extern void flush_dcache_icache_page(struct page *page);
> > -#if defined(CONFIG_PPC32) && !defined(CONFIG_BOOKE)
> > -extern void __flush_dcache_icache_phys(unsigned long physaddr);
> > -#else
> > -static inline void __flush_dcache_icache_phys(unsigned long
> > physaddr)
> > -{
> > -   BUG();
> > -}
> > -#endif
> >   
> > -/*
> > - * Write any modified data cache blocks out to memory and
> > invalidate them.
> > +/**
> > + * flush_dcache_r

[PATCH] powerpc: Allow flush_(inval_)dcache_range to work across ranges >4GB

2019-08-14 Thread Alastair D'Silva
From: Alastair D'Silva 

Heads Up: This patch cannot be submitted to Linus's tree, as the affected
assembler functions have already been converted to C.

When calling flush_(inval_)dcache_range with a size >4GB, we were masking
off the upper 32 bits, so we would incorrectly flush a range smaller
than intended.

This patch replaces the 32 bit shifts with 64 bit ones, so that
the full size is accounted for.

Signed-off-by: Alastair D'Silva 
---
 arch/powerpc/kernel/misc_64.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
index 1ad4089dd110..d4d096f80f4b 100644
--- a/arch/powerpc/kernel/misc_64.S
+++ b/arch/powerpc/kernel/misc_64.S
@@ -130,7 +130,7 @@ _GLOBAL_TOC(flush_dcache_range)
subfr8,r6,r4/* compute length */
add r8,r8,r5/* ensure we get enough */
lwz r9,DCACHEL1LOGBLOCKSIZE(r10)/* Get log-2 of dcache block 
size */
-   srw.r8,r8,r9/* compute line count */
+   srd.r8,r8,r9/* compute line count */
beqlr   /* nothing to do? */
mtctr   r8
 0: dcbst   0,r6
@@ -148,7 +148,7 @@ _GLOBAL(flush_inval_dcache_range)
subfr8,r6,r4/* compute length */
add r8,r8,r5/* ensure we get enough */
lwz r9,DCACHEL1LOGBLOCKSIZE(r10)/* Get log-2 of dcache block size */
-   srw.r8,r8,r9/* compute line count */
+   srd.r8,r8,r9/* compute line count */
beqlr   /* nothing to do? */
sync
isync
-- 
2.21.0



[PATCH 0/6] powerpc: convert cache asm to C

2019-08-14 Thread Alastair D'Silva
From: Alastair D'Silva 

This series addresses a few issues discovered in how we flush caches:
1. Flushes were truncated at 4GB, so larger flushes were incorrect.
2. Flushing the dcache in arch_add_memory was unnecessary

This series also converts much of the cache assembler to C, with the
aim of making it easier to maintain.

Alastair D'Silva (6):
  powerpc: Allow flush_icache_range to work across ranges >4GB
  powerpc: define helpers to get L1 icache sizes
  powerpc: Convert flush_icache_range & friends to C
  powerpc: Chunk calls to flush_dcache_range in arch_*_memory
  powerpc: Remove 'extern' from func prototypes in cache headers
  powerpc: Don't flush caches when adding memory

 arch/powerpc/include/asm/cache.h  |  63 +-
 arch/powerpc/include/asm/cacheflush.h |  49 ++-
 arch/powerpc/kernel/misc_32.S | 117 --
 arch/powerpc/kernel/misc_64.S |  97 -
 arch/powerpc/mm/mem.c |  80 +-
 5 files changed, 146 insertions(+), 260 deletions(-)

-- 
2.21.0



[PATCH 6/6] powerpc: Don't flush caches when adding memory

2019-08-14 Thread Alastair D'Silva
From: Alastair D'Silva 

This operation takes a significant amount of time when hotplugging
large amounts of memory (~50 seconds with 890GB of persistent memory).

This was orignally in commit fb5924fddf9e
("powerpc/mm: Flush cache on memory hot(un)plug") to support memtrace,
but the flush on add is not needed as it is flushed on remove.

Signed-off-by: Alastair D'Silva 
---
 arch/powerpc/mm/mem.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index fb0d5e9aa11b..43be99de7c9a 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -111,7 +111,6 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
 {
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
-   unsigned long i;
int rc;
 
resize_hpt_for_hotplug(memblock_phys_mem_size());
@@ -124,11 +123,6 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
return -EFAULT;
}
 
-   for (i = 0; i < size; i += FLUSH_CHUNK_SIZE) {
-   flush_dcache_range(start + i, min(start + size, start + i + 
FLUSH_CHUNK_SIZE));
-   cond_resched();
-   }
-
return __add_pages(nid, start_pfn, nr_pages, restrictions);
 }
 
-- 
2.21.0



[PATCH 5/6] powerpc: Remove 'extern' from func prototypes in cache headers

2019-08-14 Thread Alastair D'Silva
From: Alastair D'Silva 

The 'extern' keyword does not value-add for function prototypes.

Signed-off-by: Alastair D'Silva 
---
 arch/powerpc/include/asm/cache.h  | 8 
 arch/powerpc/include/asm/cacheflush.h | 6 +++---
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h
index 728f154204db..c5c096e968e0 100644
--- a/arch/powerpc/include/asm/cache.h
+++ b/arch/powerpc/include/asm/cache.h
@@ -102,10 +102,10 @@ static inline u32 l1_icache_bytes(void)
 #define __read_mostly __attribute__((__section__(".data..read_mostly")))
 
 #ifdef CONFIG_PPC_BOOK3S_32
-extern long _get_L2CR(void);
-extern long _get_L3CR(void);
-extern void _set_L2CR(unsigned long);
-extern void _set_L3CR(unsigned long);
+long _get_L2CR(void);
+long _get_L3CR(void);
+void _set_L2CR(unsigned long val);
+void _set_L3CR(unsigned long val);
 #else
 #define _get_L2CR()0L
 #define _get_L3CR()0L
diff --git a/arch/powerpc/include/asm/cacheflush.h 
b/arch/powerpc/include/asm/cacheflush.h
index 4c3377aff8ed..1826bf2cc137 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -38,15 +38,15 @@ static inline void flush_cache_vmap(unsigned long start, 
unsigned long end) { }
 #endif
 
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
-extern void flush_dcache_page(struct page *page);
+void flush_dcache_page(struct page *page);
 #define flush_dcache_mmap_lock(mapping)do { } while (0)
 #define flush_dcache_mmap_unlock(mapping)  do { } while (0)
 
 void flush_icache_range(unsigned long start, unsigned long stop);
-extern void flush_icache_user_range(struct vm_area_struct *vma,
+void flush_icache_user_range(struct vm_area_struct *vma,
struct page *page, unsigned long addr,
int len);
-extern void flush_dcache_icache_page(struct page *page);
+void flush_dcache_icache_page(struct page *page);
 
 /**
  * flush_dcache_range(): Write any modified data cache blocks out to memory 
and invalidate them.
-- 
2.21.0



[PATCH 3/6] powerpc: Convert flush_icache_range & friends to C

2019-08-14 Thread Alastair D'Silva
From: Alastair D'Silva 

Similar to commit 22e9c88d486a
("powerpc/64: reuse PPC32 static inline flush_dcache_range()")
this patch converts flush_icache_range() to C, and reimplements the
following functions as wrappers around it:
__flush_dcache_icache
__flush_dcache_icache_phys

This was done as we discovered a long-standing bug where the length of the
range was truncated due to using a 32 bit shift instead of a 64 bit one.

By converting these functions to C, it becomes easier to maintain.

Signed-off-by: Alastair D'Silva 
---
 arch/powerpc/include/asm/cache.h  |  26 +++---
 arch/powerpc/include/asm/cacheflush.h |  32 ---
 arch/powerpc/kernel/misc_32.S | 117 --
 arch/powerpc/kernel/misc_64.S |  97 -
 arch/powerpc/mm/mem.c |  71 +++-
 5 files changed, 102 insertions(+), 241 deletions(-)

diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h
index f852d5cd746c..728f154204db 100644
--- a/arch/powerpc/include/asm/cache.h
+++ b/arch/powerpc/include/asm/cache.h
@@ -98,20 +98,7 @@ static inline u32 l1_icache_bytes(void)
 #endif
 #endif /* ! __ASSEMBLY__ */
 
-#if defined(__ASSEMBLY__)
-/*
- * For a snooping icache, we still need a dummy icbi to purge all the
- * prefetched instructions from the ifetch buffers. We also need a sync
- * before the icbi to order the the actual stores to memory that might
- * have modified instructions with the icbi.
- */
-#define PURGE_PREFETCHED_INS   \
-   sync;   \
-   icbi0,r3;   \
-   sync;   \
-   isync
-
-#else
+#if !defined(__ASSEMBLY__)
 #define __read_mostly __attribute__((__section__(".data..read_mostly")))
 
 #ifdef CONFIG_PPC_BOOK3S_32
@@ -145,6 +132,17 @@ static inline void dcbst(void *addr)
 {
__asm__ __volatile__ ("dcbst %y0" : : "Z"(*(u8 *)addr) : "memory");
 }
+
+static inline void icbi(void *addr)
+{
+   __asm__ __volatile__ ("icbi 0, %0" : : "r"(addr) : "memory");
+}
+
+static inline void iccci(void)
+{
+   __asm__ __volatile__ ("iccci 0, r0");
+}
+
 #endif /* !__ASSEMBLY__ */
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_CACHE_H */
diff --git a/arch/powerpc/include/asm/cacheflush.h 
b/arch/powerpc/include/asm/cacheflush.h
index ed57843ef452..4c3377aff8ed 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -42,24 +42,18 @@ extern void flush_dcache_page(struct page *page);
 #define flush_dcache_mmap_lock(mapping)do { } while (0)
 #define flush_dcache_mmap_unlock(mapping)  do { } while (0)
 
-extern void flush_icache_range(unsigned long, unsigned long);
+void flush_icache_range(unsigned long start, unsigned long stop);
 extern void flush_icache_user_range(struct vm_area_struct *vma,
struct page *page, unsigned long addr,
int len);
-extern void __flush_dcache_icache(void *page_va);
 extern void flush_dcache_icache_page(struct page *page);
-#if defined(CONFIG_PPC32) && !defined(CONFIG_BOOKE)
-extern void __flush_dcache_icache_phys(unsigned long physaddr);
-#else
-static inline void __flush_dcache_icache_phys(unsigned long physaddr)
-{
-   BUG();
-}
-#endif
 
-/*
- * Write any modified data cache blocks out to memory and invalidate them.
+/**
+ * flush_dcache_range(): Write any modified data cache blocks out to memory 
and invalidate them.
  * Does not invalidate the corresponding instruction cache blocks.
+ *
+ * @start: the start address
+ * @stop: the stop address (exclusive)
  */
 static inline void flush_dcache_range(unsigned long start, unsigned long stop)
 {
@@ -82,6 +76,20 @@ static inline void flush_dcache_range(unsigned long start, 
unsigned long stop)
isync();
 }
 
+/**
+ * __flush_dcache_icache(): Flush a particular page from the data cache to RAM.
+ * Note: this is necessary because the instruction cache does *not*
+ * snoop from the data cache.
+ *
+ * @page: the address of the page to flush
+ */
+static inline void __flush_dcache_icache(void *page)
+{
+   unsigned long page_addr = (unsigned long)page;
+
+   flush_icache_range(page_addr, page_addr + PAGE_SIZE);
+}
+
 /*
  * Write any modified data cache blocks out to memory.
  * Does not invalidate the corresponding cache lines (especially for
diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index fe4bd321730e..12b95e6799d4 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -318,123 +318,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_UNIFIED_ID_CACHE)
 EXPORT_SYMBOL(flush_instruction_cache)
 #endif /* CONFIG_PPC_8xx */
 
-/*
- * Write any modified data cache blocks out to memory
- * and invalidate the corresponding instruction cache blocks.
- * This is a no-op on the 601.
- *
- * flush_icache_r

[PATCH 4/6] powerpc: Chunk calls to flush_dcache_range in arch_*_memory

2019-08-14 Thread Alastair D'Silva
From: Alastair D'Silva 

When presented with large amounts of memory being hotplugged
(in my test case, ~890GB), the call to flush_dcache_range takes
a while (~50 seconds), triggering RCU stalls.

This patch breaks up the call into 16GB chunks, calling
cond_resched() inbetween to allow the scheduler to run.

Signed-off-by: Alastair D'Silva 
---
 arch/powerpc/mm/mem.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 5400da87a804..fb0d5e9aa11b 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -104,11 +104,14 @@ int __weak remove_section_mapping(unsigned long start, 
unsigned long end)
return -ENODEV;
 }
 
+#define FLUSH_CHUNK_SIZE (16ull * 1024ull * 1024ull * 1024ull)
+
 int __ref arch_add_memory(int nid, u64 start, u64 size,
struct mhp_restrictions *restrictions)
 {
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
+   unsigned long i;
int rc;
 
resize_hpt_for_hotplug(memblock_phys_mem_size());
@@ -120,7 +123,11 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
start, start + size, rc);
return -EFAULT;
}
-   flush_dcache_range(start, start + size);
+
+   for (i = 0; i < size; i += FLUSH_CHUNK_SIZE) {
+   flush_dcache_range(start + i, min(start + size, start + i + 
FLUSH_CHUNK_SIZE));
+   cond_resched();
+   }
 
return __add_pages(nid, start_pfn, nr_pages, restrictions);
 }
@@ -131,13 +138,18 @@ void __ref arch_remove_memory(int nid, u64 start, u64 
size,
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
struct page *page = pfn_to_page(start_pfn) + vmem_altmap_offset(altmap);
+   unsigned long i;
int ret;
 
__remove_pages(page_zone(page), start_pfn, nr_pages, altmap);
 
/* Remove htab bolted mappings for this section of memory */
start = (unsigned long)__va(start);
-   flush_dcache_range(start, start + size);
+   for (i = 0; i < size; i += FLUSH_CHUNK_SIZE) {
+   flush_dcache_range(start + i, min(start + size, start + i + 
FLUSH_CHUNK_SIZE));
+   cond_resched();
+   }
+
ret = remove_section_mapping(start, start + size);
WARN_ON_ONCE(ret);
 
-- 
2.21.0



[PATCH 2/6] powerpc: define helpers to get L1 icache sizes

2019-08-14 Thread Alastair D'Silva
From: Alastair D'Silva 

This patch adds helpers to retrieve icache sizes, and renames the existing
helpers to make it clear that they are for dcache.

Signed-off-by: Alastair D'Silva 
---
 arch/powerpc/include/asm/cache.h  | 29 +++
 arch/powerpc/include/asm/cacheflush.h | 12 +--
 2 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h
index b3388d95f451..f852d5cd746c 100644
--- a/arch/powerpc/include/asm/cache.h
+++ b/arch/powerpc/include/asm/cache.h
@@ -55,25 +55,46 @@ struct ppc64_caches {
 
 extern struct ppc64_caches ppc64_caches;
 
-static inline u32 l1_cache_shift(void)
+static inline u32 l1_dcache_shift(void)
 {
return ppc64_caches.l1d.log_block_size;
 }
 
-static inline u32 l1_cache_bytes(void)
+static inline u32 l1_dcache_bytes(void)
 {
return ppc64_caches.l1d.block_size;
 }
+
+static inline u32 l1_icache_shift(void)
+{
+   return ppc64_caches.l1i.log_block_size;
+}
+
+static inline u32 l1_icache_bytes(void)
+{
+   return ppc64_caches.l1i.block_size;
+}
 #else
-static inline u32 l1_cache_shift(void)
+static inline u32 l1_dcache_shift(void)
 {
return L1_CACHE_SHIFT;
 }
 
-static inline u32 l1_cache_bytes(void)
+static inline u32 l1_dcache_bytes(void)
 {
return L1_CACHE_BYTES;
 }
+
+static inline u32 l1_icache_shift(void)
+{
+   return L1_CACHE_SHIFT;
+}
+
+static inline u32 l1_icache_bytes(void)
+{
+   return L1_CACHE_BYTES;
+}
+
 #endif
 #endif /* ! __ASSEMBLY__ */
 
diff --git a/arch/powerpc/include/asm/cacheflush.h 
b/arch/powerpc/include/asm/cacheflush.h
index eef388f2659f..ed57843ef452 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -63,8 +63,8 @@ static inline void __flush_dcache_icache_phys(unsigned long 
physaddr)
  */
 static inline void flush_dcache_range(unsigned long start, unsigned long stop)
 {
-   unsigned long shift = l1_cache_shift();
-   unsigned long bytes = l1_cache_bytes();
+   unsigned long shift = l1_dcache_shift();
+   unsigned long bytes = l1_dcache_bytes();
void *addr = (void *)(start & ~(bytes - 1));
unsigned long size = stop - (unsigned long)addr + (bytes - 1);
unsigned long i;
@@ -89,8 +89,8 @@ static inline void flush_dcache_range(unsigned long start, 
unsigned long stop)
  */
 static inline void clean_dcache_range(unsigned long start, unsigned long stop)
 {
-   unsigned long shift = l1_cache_shift();
-   unsigned long bytes = l1_cache_bytes();
+   unsigned long shift = l1_dcache_shift();
+   unsigned long bytes = l1_dcache_bytes();
void *addr = (void *)(start & ~(bytes - 1));
unsigned long size = stop - (unsigned long)addr + (bytes - 1);
unsigned long i;
@@ -108,8 +108,8 @@ static inline void clean_dcache_range(unsigned long start, 
unsigned long stop)
 static inline void invalidate_dcache_range(unsigned long start,
   unsigned long stop)
 {
-   unsigned long shift = l1_cache_shift();
-   unsigned long bytes = l1_cache_bytes();
+   unsigned long shift = l1_dcache_shift();
+   unsigned long bytes = l1_dcache_bytes();
void *addr = (void *)(start & ~(bytes - 1));
unsigned long size = stop - (unsigned long)addr + (bytes - 1);
unsigned long i;
-- 
2.21.0



[PATCH 1/6] powerpc: Allow flush_icache_range to work across ranges >4GB

2019-08-14 Thread Alastair D'Silva
From: Alastair D'Silva 

When calling flush_icache_range with a size >4GB, we were masking
off the upper 32 bits, so we would incorrectly flush a range smaller
than intended.

This patch replaces the 32 bit shifts with 64 bit ones, so that
the full size is accounted for.

Signed-off-by: Alastair D'Silva 
Cc: sta...@vger.kernel.org
---
 arch/powerpc/kernel/misc_64.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
index b55a7b4cb543..9bc0aa9aeb65 100644
--- a/arch/powerpc/kernel/misc_64.S
+++ b/arch/powerpc/kernel/misc_64.S
@@ -82,7 +82,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
subfr8,r6,r4/* compute length */
add r8,r8,r5/* ensure we get enough */
lwz r9,DCACHEL1LOGBLOCKSIZE(r10)/* Get log-2 of cache block 
size */
-   srw.r8,r8,r9/* compute line count */
+   srd.r8,r8,r9/* compute line count */
beqlr   /* nothing to do? */
mtctr   r8
 1: dcbst   0,r6
@@ -98,7 +98,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
subfr8,r6,r4/* compute length */
add r8,r8,r5
lwz r9,ICACHEL1LOGBLOCKSIZE(r10)/* Get log-2 of Icache block 
size */
-   srw.r8,r8,r9/* compute line count */
+   srd.r8,r8,r9/* compute line count */
beqlr   /* nothing to do? */
mtctr   r8
 2: icbi0,r6
-- 
2.21.0



Re: [PATCH 1/2] powerpc: Allow flush_icache_range to work across ranges >4GB

2019-08-11 Thread Alastair D'Silva
On Fri, 2019-08-09 at 10:59 +0200, Christophe Leroy wrote:
> 
> Le 09/08/2019 à 02:45, Alastair D'Silva a écrit :
> > From: Alastair D'Silva 
> > 
> > When calling flush_icache_range with a size >4GB, we were masking
> > off the upper 32 bits, so we would incorrectly flush a range
> > smaller
> > than intended.
> > 
> > This patch replaces the 32 bit shifts with 64 bit ones, so that
> > the full size is accounted for.
> > 
> > Heads-up for backporters: the old version of flush_dcache_range is
> > subject to a similar bug (this has since been replaced with a C
> > implementation).
> 
> Can you submit a patch to stable, explaining this ?
> 

This patch was sent to stable too - or did you mean send another patch
for the stable asm version of flush_dcache_range?

-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



[PATCH 2/2] powerpc: Convert flush_icache_range to C

2019-08-08 Thread Alastair D'Silva
From: Alastair D'Silva 

Similar to commit 22e9c88d486a
("powerpc/64: reuse PPC32 static inline flush_dcache_range()")
this patch converts flush_icache_range to C.

This was done as we discovered a long-standing bug where the
length of the range was truncated due to using a 32 bit shift
instead of a 64 bit one.

By converting this function to C, it becomes easier to maintain.

Signed-off-by: Alastair D'Silva 
---
 arch/powerpc/include/asm/cache.h  | 35 +--
 arch/powerpc/include/asm/cacheflush.h | 63 +++
 arch/powerpc/kernel/misc_64.S | 55 ---
 3 files changed, 85 insertions(+), 68 deletions(-)

diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h
index b3388d95f451..d3d7077b75e2 100644
--- a/arch/powerpc/include/asm/cache.h
+++ b/arch/powerpc/include/asm/cache.h
@@ -55,25 +55,46 @@ struct ppc64_caches {
 
 extern struct ppc64_caches ppc64_caches;
 
-static inline u32 l1_cache_shift(void)
+static inline u32 l1_dcache_shift(void)
 {
return ppc64_caches.l1d.log_block_size;
 }
 
-static inline u32 l1_cache_bytes(void)
+static inline u32 l1_dcache_bytes(void)
 {
return ppc64_caches.l1d.block_size;
 }
+
+static inline u32 l1_icache_shift(void)
+{
+   return ppc64_caches.l1i.log_block_size;
+}
+
+static inline u32 l1_icache_bytes(void)
+{
+   return ppc64_caches.l1i.block_size;
+}
 #else
-static inline u32 l1_cache_shift(void)
+static inline u32 l1_dcache_shift(void)
 {
return L1_CACHE_SHIFT;
 }
 
-static inline u32 l1_cache_bytes(void)
+static inline u32 l1_dcache_bytes(void)
 {
return L1_CACHE_BYTES;
 }
+
+static inline u32 l1_icache_shift(void)
+{
+   return L1_CACHE_SHIFT;
+}
+
+static inline u32 l1_icache_bytes(void)
+{
+   return L1_CACHE_BYTES;
+}
+
 #endif
 #endif /* ! __ASSEMBLY__ */
 
@@ -124,6 +145,12 @@ static inline void dcbst(void *addr)
 {
__asm__ __volatile__ ("dcbst %y0" : : "Z"(*(u8 *)addr) : "memory");
 }
+
+static inline void icbi(void *addr)
+{
+   __asm__ __volatile__ ("icbi %y0" : : "Z"(*(u8 *)addr) : "memory");
+}
+
 #endif /* !__ASSEMBLY__ */
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_CACHE_H */
diff --git a/arch/powerpc/include/asm/cacheflush.h 
b/arch/powerpc/include/asm/cacheflush.h
index eef388f2659f..f68e75a6dc4b 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -42,7 +42,6 @@ extern void flush_dcache_page(struct page *page);
 #define flush_dcache_mmap_lock(mapping)do { } while (0)
 #define flush_dcache_mmap_unlock(mapping)  do { } while (0)
 
-extern void flush_icache_range(unsigned long, unsigned long);
 extern void flush_icache_user_range(struct vm_area_struct *vma,
struct page *page, unsigned long addr,
int len);
@@ -57,14 +56,17 @@ static inline void __flush_dcache_icache_phys(unsigned long 
physaddr)
 }
 #endif
 
-/*
- * Write any modified data cache blocks out to memory and invalidate them.
+/**
+ * flush_dcache_range: Write any modified data cache blocks out to memory and 
invalidate them.
  * Does not invalidate the corresponding instruction cache blocks.
+ *
+ * @start: the start address
+ * @stop: the stop address (exclusive)
  */
 static inline void flush_dcache_range(unsigned long start, unsigned long stop)
 {
-   unsigned long shift = l1_cache_shift();
-   unsigned long bytes = l1_cache_bytes();
+   unsigned long shift = l1_dcache_shift();
+   unsigned long bytes = l1_dcache_bytes();
void *addr = (void *)(start & ~(bytes - 1));
unsigned long size = stop - (unsigned long)addr + (bytes - 1);
unsigned long i;
@@ -82,6 +84,49 @@ static inline void flush_dcache_range(unsigned long start, 
unsigned long stop)
isync();
 }
 
+/**
+ * flush_icache_range: Write any modified data cache blocks out to memory
+ * and invalidate the corresponding blocks in the instruction cache
+ *
+ * Generic code will call this after writing memory, before executing from it.
+ *
+ * @start: the start address
+ * @stop: the stop address (exclusive)
+ */
+static inline void flush_icache_range(unsigned long start, unsigned long stop)
+{
+   unsigned long shift = l1_dcache_shift();
+   unsigned long bytes = l1_dcache_bytes();
+   void *addr = (void *)(start & ~(bytes - 1));
+   unsigned long size = stop - (unsigned long)addr + (bytes - 1);
+   unsigned long i;
+
+   if (cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) {
+   mb(); /* sync */
+   icbi((void *)start);
+   mb(); /* sync */
+   isync();
+   return;
+   }
+
+   /* Flush the data cache to memory */
+   for (i = 0; i < size >> shift; i++, addr += bytes)
+   dcbst(addr);
+
+   mb();   /* sync */
+
+  

[PATCH 1/2] powerpc: Allow flush_icache_range to work across ranges >4GB

2019-08-08 Thread Alastair D'Silva
From: Alastair D'Silva 

When calling flush_icache_range with a size >4GB, we were masking
off the upper 32 bits, so we would incorrectly flush a range smaller
than intended.

This patch replaces the 32 bit shifts with 64 bit ones, so that
the full size is accounted for.

Heads-up for backporters: the old version of flush_dcache_range is
subject to a similar bug (this has since been replaced with a C
implementation).

Signed-off-by: Alastair D'Silva 
---
 arch/powerpc/kernel/misc_64.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
index b55a7b4cb543..9bc0aa9aeb65 100644
--- a/arch/powerpc/kernel/misc_64.S
+++ b/arch/powerpc/kernel/misc_64.S
@@ -82,7 +82,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
subfr8,r6,r4/* compute length */
add r8,r8,r5/* ensure we get enough */
lwz r9,DCACHEL1LOGBLOCKSIZE(r10)/* Get log-2 of cache block 
size */
-   srw.r8,r8,r9/* compute line count */
+   srd.r8,r8,r9/* compute line count */
beqlr   /* nothing to do? */
mtctr   r8
 1: dcbst   0,r6
@@ -98,7 +98,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
subfr8,r6,r4/* compute length */
add r8,r8,r5
lwz r9,ICACHEL1LOGBLOCKSIZE(r10)/* Get log-2 of Icache block 
size */
-   srw.r8,r8,r9/* compute line count */
+   srd.r8,r8,r9/* compute line count */
beqlr   /* nothing to do? */
mtctr   r8
 2: icbi0,r6
-- 
2.21.0



Re: [PATCH v2 1/3] mm: Trigger bug on if a section is not found in __section_nr

2019-07-02 Thread Alastair D'Silva
On Tue, 2019-07-02 at 08:13 +0200, Michal Hocko wrote:
> On Tue 02-07-19 14:13:25, Alastair D'Silva wrote:
> > On Mon, 2019-07-01 at 12:46 +0200, Michal Hocko wrote:
> > > On Fri 28-06-19 10:46:28, Alastair D'Silva wrote:
> > > [...]
> > > > Given that there is already a VM_BUG_ON in the code, how do you
> > > > feel
> > > > about broadening the scope from 'VM_BUG_ON(!root)' to
> > > > 'VM_BUG_ON(!root
> > > > > > (root_nr == NR_SECTION_ROOTS))'?
> > > 
> > > As far as I understand the existing VM_BUG_ON will hit when the
> > > mem_section tree gets corrupted. This is a different situation to
> > > an
> > > incorrect section given so I wouldn't really mix those two. And I
> > > still
> > > do not see much point to protect from unexpected input parameter
> > > as
> > > this
> > > is internal function as already pointed out.
> > > 
> > 
> > Hi Michael,
> > 
> > I was able to hit this problem as the system firmware had assigned
> > the
> > prototype pmem device an address range above the 128TB limit that
> > we
> > originally supported. This has since been lifted to 2PB with patch
> > 4ffe713b7587b14695c9bec26a000fc88ef54895.
> > 
> > As it stands, we cannot move this range lower as the high bits are
> > dictated by the location the card is connected.
> > 
> > Since the physical address of the memory is not controlled by the
> > kernel, I believe we should catch (or at least make it easy to
> > debug)
> > the sitution where external firmware allocates physical addresses
> > beyond that which the kernel supports.
> 
> Just make it clear, I am not against a sanitization. I am objecting
> to
> put it into __section_nr because this is way too late. As already
> explained, you already must have a bogus mem_section object in hand.
> Why cannot you add a sanity check right there when the memory is
> added?
> Either when the section is registered or even sooner in
> arch_add_memory.
> 

Good point, I was thinking of a libnvdimm enhancement to check that the
end address is in range, but a more generic solution is better.

-- 
Alastair D'Silva   mob: 0423 762 819
skype: alastair_dsilva
Twitter: @EvilDeece
blog: http://alastair.d-silva.org




Re: [PATCH v2 1/3] mm: Trigger bug on if a section is not found in __section_nr

2019-07-01 Thread Alastair D'Silva
On Mon, 2019-07-01 at 12:46 +0200, Michal Hocko wrote:
> On Fri 28-06-19 10:46:28, Alastair D'Silva wrote:
> [...]
> > Given that there is already a VM_BUG_ON in the code, how do you
> > feel
> > about broadening the scope from 'VM_BUG_ON(!root)' to
> > 'VM_BUG_ON(!root
> > > > (root_nr == NR_SECTION_ROOTS))'?
> 
> As far as I understand the existing VM_BUG_ON will hit when the
> mem_section tree gets corrupted. This is a different situation to an
> incorrect section given so I wouldn't really mix those two. And I
> still
> do not see much point to protect from unexpected input parameter as
> this
> is internal function as already pointed out.
> 

Hi Michael,

I was able to hit this problem as the system firmware had assigned the
prototype pmem device an address range above the 128TB limit that we
originally supported. This has since been lifted to 2PB with patch
4ffe713b7587b14695c9bec26a000fc88ef54895.

As it stands, we cannot move this range lower as the high bits are
dictated by the location the card is connected.

Since the physical address of the memory is not controlled by the
kernel, I believe we should catch (or at least make it easy to debug)
the sitution where external firmware allocates physical addresses
beyond that which the kernel supports.

-- 
Alastair D'Silva   mob: 0423 762 819
skype: alastair_dsilva
Twitter: @EvilDeece
blog: http://alastair.d-silva.org




Re: [PATCH v2 1/3] mm: Trigger bug on if a section is not found in __section_nr

2019-06-27 Thread Alastair D'Silva
On Thu, 2019-06-27 at 10:10 +0200, Michal Hocko wrote:
> On Thu 27-06-19 10:50:57, Alastair D'Silva wrote:
> > On Wed, 2019-06-26 at 08:57 +0200, Michal Hocko wrote:
> > > On Wed 26-06-19 16:27:30, Alastair D'Silva wrote:
> > > > On Wed, 2019-06-26 at 08:21 +0200, Michal Hocko wrote:
> > > > > On Wed 26-06-19 16:11:21, Alastair D'Silva wrote:
> > > > > > From: Alastair D'Silva 
> > > > > > 
> > > > > > If a memory section comes in where the physical address is
> > > > > > greater
> > > > > > than
> > > > > > that which is managed by the kernel, this function would
> > > > > > not
> > > > > > trigger the
> > > > > > bug and instead return a bogus section number.
> > > > > > 
> > > > > > This patch tracks whether the section was actually found,
> > > > > > and
> > > > > > triggers the
> > > > > > bug if not.
> > > > > 
> > > > > Why do we want/need that? In other words the changelog should
> > > > > contina
> > > > > WHY and WHAT. This one contains only the later one.
> > > > >  
> > > > 
> > > > Thanks, I'll update the comment.
> > > > 
> > > > During driver development, I tried adding peristent memory at a
> > > > memory
> > > > address that exceeded the maximum permissable address for the
> > > > platform.
> > > > 
> > > > This caused __section_nr to silently return bogus section
> > > > numbers,
> > > > rather than complaining.
> > > 
> > > OK, I see, but is an additional code worth it for the non-
> > > development
> > > case? I mean why should we be testing for something that
> > > shouldn't
> > > happen normally? Is it too easy to get things wrong or what is
> > > the
> > > underlying reason to change it now?
> > > 
> > 
> > It took me a while to identify what the problem was - having the
> > BUG_ON
> > would have saved me a few hours.
> > 
> > I'm happy to just have the BUG_ON 'nd drop the new error return (I
> > added that in response to Mike Rapoport's comment that the original
> > patch would still return a bogus section number).
> 
> Well, BUG_ON is about the worst way to handle an incorrect input. You
> really do not want to put a production environment down just because
> there is a bug in a driver, right? There are still many {VM_}BUG_ONs
> in the tree and there is a general trend to get rid of many of them
> rather than adding new ones.
> 
> Now back to your patch. You are adding an error handling where we
> simply
> do not expect invalid data. This is often the case for the core
> kernel
> functionality because we do expect consumers of the code to do the
> right
> thing. E.g. __section_nr already takes a pointer to struct section
> which
> assumes that this core data structure is already valid. Adding a
> check
> here adds an unnecessary overhead so this doesn't really sound like a
> good idea to me.


Thanks for providing a good explanation.

In this situation, since we return a bogus section number, we get
crashes elsewhere in the kernel if the code rumbles on.

Given that there is already a VM_BUG_ON in the code, how do you feel
about broadening the scope from 'VM_BUG_ON(!root)' to 'VM_BUG_ON(!root
|| (root_nr == NR_SECTION_ROOTS))'?

-- 
Alastair D'Silva   mob: 0423 762 819
skype: alastair_dsilva
Twitter: @EvilDeece
blog: http://alastair.d-silva.org




Re: [PATCH v2 0/3] mm: Cleanup & allow modules to hotplug memory

2019-06-26 Thread Alastair D'Silva
On Wed, 2019-06-26 at 00:57 -0700, Christoph Hellwig wrote:
> On Wed, Jun 26, 2019 at 04:11:20PM +1000, Alastair D'Silva wrote:
> >   - Drop mm/hotplug: export try_online_node
> > (not necessary)
> 
> With this the subject line of the cover letter seems incorrect now :)
> 

Indeed :)

I left it as I was unsure whether changing the series title would make
it harder to track revisions.


-- 
Alastair D'Silva   mob: 0423 762 819
skype: alastair_dsilva
Twitter: @EvilDeece
blog: http://alastair.d-silva.org




Re: [PATCH v2 1/3] mm: Trigger bug on if a section is not found in __section_nr

2019-06-26 Thread Alastair D'Silva
On Wed, 2019-06-26 at 08:57 +0200, Michal Hocko wrote:
> On Wed 26-06-19 16:27:30, Alastair D'Silva wrote:
> > On Wed, 2019-06-26 at 08:21 +0200, Michal Hocko wrote:
> > > On Wed 26-06-19 16:11:21, Alastair D'Silva wrote:
> > > > From: Alastair D'Silva 
> > > > 
> > > > If a memory section comes in where the physical address is
> > > > greater
> > > > than
> > > > that which is managed by the kernel, this function would not
> > > > trigger the
> > > > bug and instead return a bogus section number.
> > > > 
> > > > This patch tracks whether the section was actually found, and
> > > > triggers the
> > > > bug if not.
> > > 
> > > Why do we want/need that? In other words the changelog should
> > > contina
> > > WHY and WHAT. This one contains only the later one.
> > >  
> > 
> > Thanks, I'll update the comment.
> > 
> > During driver development, I tried adding peristent memory at a
> > memory
> > address that exceeded the maximum permissable address for the
> > platform.
> > 
> > This caused __section_nr to silently return bogus section numbers,
> > rather than complaining.
> 
> OK, I see, but is an additional code worth it for the non-development
> case? I mean why should we be testing for something that shouldn't
> happen normally? Is it too easy to get things wrong or what is the
> underlying reason to change it now?
> 

It took me a while to identify what the problem was - having the BUG_ON
would have saved me a few hours.

I'm happy to just have the BUG_ON 'nd drop the new error return (I
added that in response to Mike Rapoport's comment that the original
patch would still return a bogus section number).


-- 
Alastair D'Silva   mob: 0423 762 819
skype: alastair_dsilva
Twitter: @EvilDeece
blog: http://alastair.d-silva.org




Re: [PATCH v2 2/3] mm: don't hide potentially null memmap pointer in sparse_remove_one_section

2019-06-26 Thread Alastair D'Silva
On Wed, 2019-06-26 at 08:23 +0200, Michal Hocko wrote:
> On Wed 26-06-19 16:11:22, Alastair D'Silva wrote:
> > From: Alastair D'Silva 
> > 
> > By adding offset to memmap before passing it in to
> > clear_hwpoisoned_pages,
> > we hide a potentially null memmap from the null check inside
> > clear_hwpoisoned_pages.
> > 
> > This patch passes the offset to clear_hwpoisoned_pages instead,
> > allowing
> > memmap to successfully peform it's null check.
> 
> Same issue with the changelog as the previous patch (missing WHY).
> 

The first paragraph explains what the problem is with the existing code
(same applies to 1/3 too).

-- 
Alastair D'Silva   mob: 0423 762 819
skype: alastair_dsilva
Twitter: @EvilDeece
blog: http://alastair.d-silva.org




Re: [PATCH v2 1/3] mm: Trigger bug on if a section is not found in __section_nr

2019-06-26 Thread Alastair D'Silva
On Wed, 2019-06-26 at 08:21 +0200, Michal Hocko wrote:
> On Wed 26-06-19 16:11:21, Alastair D'Silva wrote:
> > From: Alastair D'Silva 
> > 
> > If a memory section comes in where the physical address is greater
> > than
> > that which is managed by the kernel, this function would not
> > trigger the
> > bug and instead return a bogus section number.
> > 
> > This patch tracks whether the section was actually found, and
> > triggers the
> > bug if not.
> 
> Why do we want/need that? In other words the changelog should contina
> WHY and WHAT. This one contains only the later one.
>  

Thanks, I'll update the comment.

During driver development, I tried adding peristent memory at a memory
address that exceeded the maximum permissable address for the platform.

This caused __section_nr to silently return bogus section numbers,
rather than complaining.

-- 
Alastair D'Silva   mob: 0423 762 819
skype: alastair_dsilva
Twitter: @EvilDeece
blog: http://alastair.d-silva.org




[PATCH v2 3/3] mm: Don't manually decrement num_poisoned_pages

2019-06-26 Thread Alastair D'Silva
From: Alastair D'Silva 

Use the function written to do it instead.

Signed-off-by: Alastair D'Silva 
---
 mm/sparse.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index 1ec32aef5590..d9b3625bfdf0 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -11,6 +11,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "internal.h"
 #include 
@@ -772,7 +774,7 @@ static void clear_hwpoisoned_pages(struct page *memmap,
 
for (i = start; i < start + count; i++) {
if (PageHWPoison([i])) {
-   atomic_long_sub(1, _poisoned_pages);
+   num_poisoned_pages_dec();
ClearPageHWPoison([i]);
}
}
-- 
2.21.0



  1   2   3   >